[v1] EngineArgs for better config handling for v1 #10382

rickyyx · 2024-11-15T23:13:38Z

This allows:

Users to use v0/v1 transparently with the simple flag change VLLM_USE_V1
When VLLM_USE_V1 sets, v1-specific defaults will be automatically used if users don't provide them.
This also allows easier migration in the future where we would simply replace the old EngineArgs with the V1's EngineArgs

This PRs:

updates create_engine_config to include usage context, which is currently needed for v1 arg's update.
It adds _override_v1_args to override some of the EngineArg's value before creation of engine config
It adds _override_v1_configs to override the generated engine config.

github-actions · 2024-11-15T23:13:50Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

comaniac

It's pretty clean to me!
@WoosukKwon please review to see if this format is desired to you. Also what's the current best practice to test this in v1?

comaniac · 2024-11-15T23:16:45Z

vllm/engine/arg_utils.py

+        assert (
+            usage_context is not None
+        ), "usage_context must be provided for V1EngineArgs"


@WoosukKwon We need to pass usage_context because the default value depends on it, but this argument looks a bit weird to me. Do you have a better way to decide the default max_num_batched_tokens?

vllm/engine/arg_utils.py

comaniac

LGTM. cc @WoosukKwon @robertgshaw2-neuralmagic

.buildkite/test-pipeline.yaml

comaniac · 2024-11-20T20:02:48Z

@rickyyx could you rebase and see if the errors go away?

mergify · 2024-11-20T22:28:18Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rickyyx.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx · 2024-11-21T08:44:49Z

Test failures look related - taking a look

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx · 2024-11-23T23:43:28Z

Test failures look unrelated

youkaichao · 2024-11-24T00:37:15Z

vllm/engine/arg_utils.py

+if envs.VLLM_USE_V1:
+    # Overwrite EngineArgs to use EngineArgsV1
+    # This has to be done before `AsyncEngineArgs` is imported.
+    EngineArgs = EngineArgsV1  # type: ignore


dynamically changing the class looks quite strange. for example, if someone wants to create both a v0 engine args and v1 engine args for testing, it will not be possible under this PR.

I think you can have v1 config override inside the create_engine_config() function, and read envs.VLLM_USE_V1 there.

for example, if someone wants to create both a v0 engine args and v1 engine args for testing, it will not be possible under this PR.

Yeah, I think one will have to reimport the file with VLLM_USE_V1=0/1 to do this. But I am not sure how niche this usecase would be.

I think you can have v1 config override inside the create_engine_config() function, and read envs.VLLM_USE_V1 there.

Yeah, I think the there might be a few issues with:

how to have different default value for v1 and v0.

how to support additional args in v1 not supported by v0.

But maybe there's some way to make this possible for now. Let me try.

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx · 2024-11-24T01:08:22Z

Remove the dynamic override of EngineArg class. cc @youkaichao

Thanks for the suggestion.

Signed-off-by: rickyx <rickyx@anyscale.com>

youkaichao · 2024-11-24T01:19:16Z

vllm/engine/arg_utils.py

@@ -113,7 +114,7 @@ class EngineArgs:
    # NOTE(kzawora): default block size for Gaudi should be 128
    # smaller sizes still work, but very inefficiently
    block_size: int = 16 if not current_platform.is_hpu() else 128
-    enable_prefix_caching: bool = False
+    enable_prefix_caching: bool = bool(envs.VLLM_USE_V1)


I think this is also read in class-creation time. changing the env var later will not affect the default value.

iirc, @WoosukKwon mention that enable_prefix_caching will be ignored for v1, and we can ignore this argument directly. please check if my understanding is correct, or we also support disabling it in v1.

I disagree. Even prefix caching is enabled by default, we still need a way to disable it for testing like purposes. ofc later on we could change the flag to "disable-prefix-cache", but we shouldn't close the door of configuration.

then we can make it None by default, and set the real default value when we create the engine args.

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx · 2024-11-24T22:26:20Z

Test failures should be unrelated.

comaniac · 2024-11-24T23:12:29Z

Hand over to @youkaichao for final review and force merge.

youkaichao · 2024-11-25T06:34:44Z

can you merge main to see if these errors disappear?

mergify · 2024-11-25T07:41:25Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @rickyyx.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx requested review from WoosukKwon, zhuohan123, youkaichao, alexm-neuralmagic, comaniac and njhill as code owners November 15, 2024 23:13

mergify bot added the ci/build label Nov 15, 2024

comaniac reviewed Nov 15, 2024

View reviewed changes

russellb suggested changes Nov 16, 2024

View reviewed changes

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

vllm/engine/arg_utils.py Outdated Show resolved Hide resolved

rickyyx requested review from robertgshaw2-neuralmagic and ywang96 as code owners November 16, 2024 20:23

comaniac approved these changes Nov 19, 2024

View reviewed changes

.buildkite/test-pipeline.yaml Show resolved Hide resolved

rickyyx force-pushed the pr-v1-default branch from d3ee119 to db20919 Compare November 20, 2024 22:27

rickyyx requested review from mgoin, tlrmchlsmth, DarkLight1337 and simon-mo as code owners November 20, 2024 22:27

mergify bot added documentation Improvements or additions to documentation frontend labels Nov 20, 2024

mergify bot added the needs-rebase label Nov 20, 2024

new

c3efa25

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx force-pushed the pr-v1-default branch from db20919 to c3efa25 Compare November 20, 2024 22:33

mergify bot removed the needs-rebase label Nov 20, 2024

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 20, 2024

fix

376a7d2

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx changed the title ~~[v1] V1EngineArgs for better config handling~~ [v1] EngineArgsV1 for better config handling Nov 22, 2024

up

b15d4f9

Signed-off-by: rickyx <rickyx@anyscale.com>

youkaichao reviewed Nov 24, 2024

View reviewed changes

rickyyx marked this pull request as draft November 24, 2024 00:44

comments

a102588

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx changed the title ~~[v1] EngineArgsV1 for better config handling~~ [v1] EngineArgs for better config handling for v1 Nov 24, 2024

rickyyx marked this pull request as ready for review November 24, 2024 01:07

nits

26a6540

Signed-off-by: rickyx <rickyx@anyscale.com>

rickyyx requested a review from comaniac November 24, 2024 01:11

youkaichao reviewed Nov 24, 2024

View reviewed changes

comments

31927bb

Signed-off-by: rickyx <rickyx@anyscale.com>

youkaichao approved these changes Nov 25, 2024

View reviewed changes

mergify bot added the needs-rebase label Nov 25, 2024

merged

5191da7

Signed-off-by: rickyx <rickyx@anyscale.com>

mergify bot removed the needs-rebase label Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1] EngineArgs for better config handling for v1 #10382

[v1] EngineArgs for better config handling for v1 #10382

rickyyx commented Nov 15, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 15, 2024

comaniac left a comment

comaniac Nov 15, 2024

comaniac left a comment

comaniac commented Nov 20, 2024

mergify bot commented Nov 20, 2024

rickyyx commented Nov 21, 2024

rickyyx commented Nov 23, 2024

youkaichao Nov 24, 2024

rickyyx Nov 24, 2024 •

edited

Loading

rickyyx commented Nov 24, 2024

youkaichao Nov 24, 2024

comaniac Nov 24, 2024

youkaichao Nov 24, 2024

rickyyx commented Nov 24, 2024

comaniac commented Nov 24, 2024

youkaichao commented Nov 25, 2024

mergify bot commented Nov 25, 2024

[v1] EngineArgs for better config handling for v1 #10382

Are you sure you want to change the base?

[v1] EngineArgs for better config handling for v1 #10382

Conversation

rickyyx commented Nov 15, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 15, 2024

comaniac left a comment

Choose a reason for hiding this comment

comaniac Nov 15, 2024

Choose a reason for hiding this comment

comaniac left a comment

Choose a reason for hiding this comment

comaniac commented Nov 20, 2024

mergify bot commented Nov 20, 2024

rickyyx commented Nov 21, 2024

rickyyx commented Nov 23, 2024

youkaichao Nov 24, 2024

Choose a reason for hiding this comment

rickyyx Nov 24, 2024 • edited Loading

Choose a reason for hiding this comment

rickyyx commented Nov 24, 2024

youkaichao Nov 24, 2024

Choose a reason for hiding this comment

comaniac Nov 24, 2024

Choose a reason for hiding this comment

youkaichao Nov 24, 2024

Choose a reason for hiding this comment

rickyyx commented Nov 24, 2024

comaniac commented Nov 24, 2024

youkaichao commented Nov 25, 2024

mergify bot commented Nov 25, 2024

rickyyx commented Nov 15, 2024 •

edited by github-actions bot

Loading

rickyyx Nov 24, 2024 •

edited

Loading