Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v1] EngineArgs for better config handling for v1 #10382

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

rickyyx
Copy link
Contributor

@rickyyx rickyyx commented Nov 15, 2024

This allows:

  1. Users to use v0/v1 transparently with the simple flag change VLLM_USE_V1
  2. When VLLM_USE_V1 sets, v1-specific defaults will be automatically used if users don't provide them.
  3. This also allows easier migration in the future where we would simply replace the old EngineArgs with the V1's EngineArgs

This PRs:

  • updates create_engine_config to include usage context, which is currently needed for v1 arg's update.
  • It adds _override_v1_args to override some of the EngineArg's value before creation of engine config
  • It adds _override_v1_configs to override the generated engine config.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the ci/build label Nov 15, 2024
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's pretty clean to me!
@WoosukKwon please review to see if this format is desired to you. Also what's the current best practice to test this in v1?

Comment on lines 1234 to 1236
assert (
usage_context is not None
), "usage_context must be provided for V1EngineArgs"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WoosukKwon We need to pass usage_context because the default value depends on it, but this argument looks a bit weird to me. Do you have a better way to decide the default max_num_batched_tokens?

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved
vllm/engine/arg_utils.py Outdated Show resolved Hide resolved
vllm/engine/arg_utils.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.buildkite/test-pipeline.yaml Show resolved Hide resolved
@comaniac
Copy link
Collaborator

@rickyyx could you rebase and see if the errors go away?

Copy link

mergify bot commented Nov 20, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rickyyx.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 20, 2024
Signed-off-by: rickyx <rickyx@anyscale.com>
@mergify mergify bot removed the needs-rebase label Nov 20, 2024
@comaniac comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2024
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 21, 2024

Test failures look related - taking a look

Signed-off-by: rickyx <rickyx@anyscale.com>
@rickyyx rickyyx changed the title [v1] V1EngineArgs for better config handling [v1] EngineArgsV1 for better config handling Nov 22, 2024
Signed-off-by: rickyx <rickyx@anyscale.com>
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 23, 2024

Test failures look unrelated

Comment on lines 1214 to 1217
if envs.VLLM_USE_V1:
# Overwrite EngineArgs to use EngineArgsV1
# This has to be done before `AsyncEngineArgs` is imported.
EngineArgs = EngineArgsV1 # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamically changing the class looks quite strange. for example, if someone wants to create both a v0 engine args and v1 engine args for testing, it will not be possible under this PR.

I think you can have v1 config override inside the create_engine_config() function, and read envs.VLLM_USE_V1 there.

Copy link
Contributor Author

@rickyyx rickyyx Nov 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, if someone wants to create both a v0 engine args and v1 engine args for testing, it will not be possible under this PR.

Yeah, I think one will have to reimport the file with VLLM_USE_V1=0/1 to do this. But I am not sure how niche this usecase would be.

I think you can have v1 config override inside the create_engine_config() function, and read envs.VLLM_USE_V1 there.

Yeah, I think the there might be a few issues with:

  1. how to have different default value for v1 and v0.
  2. how to support additional args in v1 not supported by v0.

But maybe there's some way to make this possible for now. Let me try.

@rickyyx rickyyx marked this pull request as draft November 24, 2024 00:44
Signed-off-by: rickyx <rickyx@anyscale.com>
@rickyyx rickyyx changed the title [v1] EngineArgsV1 for better config handling [v1] EngineArgs for better config handling for v1 Nov 24, 2024
@rickyyx rickyyx marked this pull request as ready for review November 24, 2024 01:07
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 24, 2024

Remove the dynamic override of EngineArg class. cc @youkaichao

Thanks for the suggestion.

Signed-off-by: rickyx <rickyx@anyscale.com>
@@ -113,7 +114,7 @@ class EngineArgs:
# NOTE(kzawora): default block size for Gaudi should be 128
# smaller sizes still work, but very inefficiently
block_size: int = 16 if not current_platform.is_hpu() else 128
enable_prefix_caching: bool = False
enable_prefix_caching: bool = bool(envs.VLLM_USE_V1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is also read in class-creation time. changing the env var later will not affect the default value.

iirc, @WoosukKwon mention that enable_prefix_caching will be ignored for v1, and we can ignore this argument directly. please check if my understanding is correct, or we also support disabling it in v1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. Even prefix caching is enabled by default, we still need a way to disable it for testing like purposes. ofc later on we could change the flag to "disable-prefix-cache", but we shouldn't close the door of configuration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then we can make it None by default, and set the real default value when we create the engine args.

Signed-off-by: rickyx <rickyx@anyscale.com>
@rickyyx
Copy link
Contributor Author

rickyyx commented Nov 24, 2024

Test failures should be unrelated.

@comaniac
Copy link
Collaborator

Hand over to @youkaichao for final review and force merge.

@youkaichao
Copy link
Member

can you merge main to see if these errors disappear?

Copy link

mergify bot commented Nov 25, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rickyyx.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 25, 2024
Signed-off-by: rickyx <rickyx@anyscale.com>
@mergify mergify bot removed the needs-rebase label Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants