关于Mini-InternVL-Chat-4B-V1-5推理速度慢的问题 #2902
zhuchen1109
started this conversation in
General
Replies: 1 comment 1 reply
-
升级下新版本,这个 kernel 好像早两个版本就删了 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用Mini-InternVL-Chat-4B-V1-5在A800进行推理,输入token数在2200左右,输出5个token,发现推理很慢,使用ns工具分析,_fwd_kernel函数耗时异常,我打印计算grid的关键几个参数值:
max_seqlen 2243
q shape [2243, 32, 96]
k shape [75, 64, 32, 96]
BLOCK_M 256
grid (9,32,1)
对应代码:
想请教下是什么原因导致这个很慢呢?个人觉得这个grid不太合理吧,internvl2-8b grid是[62,8,1]。
Beta Was this translation helpful? Give feedback.
All reactions