-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] [Kernel] Fix GPU SEGV occuring in fused_moe kernel #10381
Conversation
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
/ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix! I noted a couple of spots where I think 32-bits should be fine, WDYT?
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
@@ -116,7 +116,7 @@ def fused_moe_kernel( | |||
a_ptrs = a_ptr + (offs_token[:, None] // top_k * stride_am + | |||
offs_k[None, :] * stride_ak) | |||
|
|||
off_experts = tl.load(expert_ids_ptr + pid_m).to(tl.int64) | |||
off_experts = tl.load(expert_ids_ptr + pid_m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tlrmchlsmth Doing this causes SEGV when I tested.
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Head branch was pushed to by a user without write access
e5fc430
to
23e7dfe
Compare
DCO issues, need to close and reopen. |
When running a large model (~500G), I encountered GPU SEGV. It was occurring in
fused_moe_kernel
:It seemed to me that this could be integer overflow. Changing offsets to use
tl.int64
infused_moe.py:fused_moe_kernel
fixes the issue.