-
-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[v1] EngineArgs for better config handling for v1 #10382
base: main
Are you sure you want to change the base?
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty clean to me!
@WoosukKwon please review to see if this format is desired to you. Also what's the current best practice to test this in v1?
vllm/engine/arg_utils.py
Outdated
assert ( | ||
usage_context is not None | ||
), "usage_context must be provided for V1EngineArgs" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WoosukKwon We need to pass usage_context because the default value depends on it, but this argument looks a bit weird to me. Do you have a better way to decide the default max_num_batched_tokens
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. cc @WoosukKwon @robertgshaw2-neuralmagic
@rickyyx could you rebase and see if the errors go away? |
d3ee119
to
db20919
Compare
This pull request has merge conflicts that must be resolved before it can be |
db20919
to
c3efa25
Compare
Test failures look related - taking a look |
Test failures look unrelated |
vllm/engine/arg_utils.py
Outdated
if envs.VLLM_USE_V1: | ||
# Overwrite EngineArgs to use EngineArgsV1 | ||
# This has to be done before `AsyncEngineArgs` is imported. | ||
EngineArgs = EngineArgsV1 # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dynamically changing the class looks quite strange. for example, if someone wants to create both a v0 engine args and v1 engine args for testing, it will not be possible under this PR.
I think you can have v1 config override inside the create_engine_config()
function, and read envs.VLLM_USE_V1
there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, if someone wants to create both a v0 engine args and v1 engine args for testing, it will not be possible under this PR.
Yeah, I think one will have to reimport the file with VLLM_USE_V1=0/1
to do this. But I am not sure how niche this usecase would be.
I think you can have v1 config override inside the create_engine_config() function, and read envs.VLLM_USE_V1 there.
Yeah, I think the there might be a few issues with:
- how to have different default value for v1 and v0.
- how to support additional args in v1 not supported by v0.
But maybe there's some way to make this possible for now. Let me try.
Remove the dynamic override of Thanks for the suggestion. |
vllm/engine/arg_utils.py
Outdated
@@ -113,7 +114,7 @@ class EngineArgs: | |||
# NOTE(kzawora): default block size for Gaudi should be 128 | |||
# smaller sizes still work, but very inefficiently | |||
block_size: int = 16 if not current_platform.is_hpu() else 128 | |||
enable_prefix_caching: bool = False | |||
enable_prefix_caching: bool = bool(envs.VLLM_USE_V1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is also read in class-creation time. changing the env var later will not affect the default value.
iirc, @WoosukKwon mention that enable_prefix_caching
will be ignored for v1, and we can ignore this argument directly. please check if my understanding is correct, or we also support disabling it in v1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. Even prefix caching is enabled by default, we still need a way to disable it for testing like purposes. ofc later on we could change the flag to "disable-prefix-cache", but we shouldn't close the door of configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we can make it None
by default, and set the real default value when we create the engine args.
Test failures should be unrelated. |
Hand over to @youkaichao for final review and force merge. |
can you merge main to see if these errors disappear? |
This pull request has merge conflicts that must be resolved before it can be |
This allows:
VLLM_USE_V1
This PRs:
create_engine_config
to include usage context, which is currently needed for v1 arg's update._override_v1_args
to override some of the EngineArg's value before creation of engine config_override_v1_configs
to override the generated engine config.