-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling. #10198
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
@sroy745 @LiuXiaoxuanPKU @njhill |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jeongin601 this looks like a very nice finding!
We may still want to make and use a (shallow) copy of the sampling parameters with the seed removed in the case a seed is set, to avoid doing seeded sampling for the non-final tokens.
@njhill, I'm curious about the reason why the seed should be removed, especially if it is used for the target model sampling and affects the output token selection when proposals are rejected. |
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com> Signed-off-by: jeong_in.bae <jeong_in.bae@navercorp.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
289341d
to
a54d83e
Compare
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
@joennlae ah sorry, perhaps I misremembered the logic, I didn't think those sampled tokens could end up getting used. I'll check it again but if you're right then makes sense to ignore that seed optimization. |
Adding /ready to kick off the tests and verify nothing else fails from this |
@njhill vllm/vllm/model_executor/layers/rejection_sampler.py Lines 183 to 189 in 3a763ba
|
Hi, cc: @tdoublep who made the change for respecting per request seed in spec-decode worker. @tdoublep can you PTAL and see if this change impacts the per request seeding logic or not. @jeongin601 there is one test failure in the spec_decoding tests (test_many_k[1-32-2-test_llm_kwargs3-baseline_llm_kwargs0-per_test_common_llm_kwargs0-common_llm_kwargs0]). I ran the test locally and it passes. Also from the failure logs it seems transient. Can you please trigger the tests once to see if it passes or not? |
Thank you @sroy745, I was able to check correctly after your comments. I found out that this PR also corrects the seed for I also confirmed that this section remains unchanged by this PR and is already using the correct sampling parameters. This PR cannot affect the |
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
Signed-off-by: jeongin601 <0200angela@gmail.com>
What I suspect is that the number of attempts to sample with the same seed may have changed due to this PR. This could affect the output because it causes the generator to use a different part of the random values. If that's the case, I believe the outcome is not incorrect, but we need to verify it. |
FIX #9834 (link existing issues this PR will resolve)
Problem
The current
BatchExpansionTop1Scorer
implements a speculative scoring mechanism that uses batch expansion to estimate the probabilities of speculative tokens based on the scoring model. However, in the existing setup,SequenceGroupMetadata
applies default sampling parameters (top_p=1.0, temperature=1.0, repetition_penalty=1.0) when generating target probabilities. According to comments in the code, this choice seems to be made since the sampled tokens are not used directly.Modification
Although we do not directly sample tokens from the target model while scoring, I believe applying consistent sampling parameters to both draft and target probabilities is essential for accurate rejection sampling. The current implementation uses draft probabilities influenced by sampling (filtered by top_p), while target probabilities are not, leading to a mismatch that could affect scoring accuracy. Because the unsampled target probabilities don’t represent actual usage probabilities, I modified the code to apply the same sampling parameters to both draft and target probabilities for consistency in rejection sampling.
In my experiment, this change resulted in a significant difference in the acceptance rate, as shown in the figures below.
Experiment
Setting
As-Is
To-be (applied in this PR)