This repository has been archived by the owner on Dec 6, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 49
[BUG] Qwen-1.8-Chat,用llama.cpp量化为f16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗? #69
Comments
Lyzin
changed the title
[BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱看不懂
[BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
Dec 26, 2023
Lyzin
changed the title
[BUG] Qwen-1.8-Chat,用llama.cpp量化为F16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
[BUG] Qwen-1.8-Chat,用llama.cpp量化为int,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
Dec 26, 2023
Lyzin
changed the title
[BUG] Qwen-1.8-Chat,用llama.cpp量化为int,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
[BUG] Qwen-1.8-Chat,用llama.cpp量化为f16,然后推理回答错乱,请问1.8在llama.cpp还不支持吗?
Dec 26, 2023
你能转换成功也是nb 我这用llama的转换都不行 那边现在是gguf格式了 这边刚出来怎么qwen.cpp 转换的是ggml格式呢? 能不能无缝转成gguf格式啊 这样就能llama使用了 那边服务端也能运行了 |
是的,我试了qwen 0.5B,7B,14B,用llama.cpp转换F16的GGUF,回答还都是错乱的 |
我用qwen7b微调自己数据集之后,在llamafactory合并之后加载推理是正确的,但是使用llama.cpp转换gguf格式后,回答错乱,有人遇上这个问题并解决吗? |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
使用llama.cpp项目先转化为f16
python3 convert-hf-to-gguf.py models/Qwen-1_8B-Chat/
然后推理
./main -m ./models/Qwen-1_8B-Chat/ggml-model-f16.gguf -n 512 --color -i -cml -f prompts/chat-with-qwen.txt
但是回答错乱,1.8B是不支持llama.cpp量化吗?
同样试了转为int4量化,也是出现回答错乱
期望行为 | Expected Behavior
期望可以正常回答
复现方法 | Steps To Reproduce
下载llama.cpp项目
下载Qwen-1_8B-Chat模型
转化模型为f16精度
再转为int4量化版本推理
推理出现回答错乱看不懂
运行环境 | Environment
备注 | Anything else?
No response
The text was updated successfully, but these errors were encountered: