Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enabling multi-node serving on Gaudi ray cluster #218

Closed

Conversation

vishnumadhu365
Copy link

PR enables multi-node serving on Ray cluster. Tested with meta-llama/Meta-Llama-3.1-405B-Instruct
Originally implemented in SW-195705

Steps to test:

  1. Install vllm-fork on master and worker nodes
  2. Launch Ray cluster.
  3. Run benchmark as below (eg: for 2 nodes, 16 Gaudi cards)
export PT_HPU_ENABLE_LAZY_COLLECTIVES=true
export PT_HPUGRAPH_DISABLE_TENSOR_CACHE=0
export GLOO_SOCKET_IFNAME=eth0

python benchmarks/benchmark_throughput.py --backend vllm --input-len 128 --output-len 128 \
--model  meta-llama/Meta-Llama-3.1-405B-Instruct  --device hpu \
--dtype bfloat16 --tensor-parallel-size 16 --num-prompts 10
  1. Serve the model
vllm serve  meta-llama/Meta-Llama-3.1-405B-Instruct  --tensor-parallel-size 16  --pipeline-parallel-size 1

Contributors - @jiunnyeu-habana @kzawora-intel , Jiafan Wang , Kobayashi-san , Nishant Agrawal

…lama-3.1-405B-Instruct. Originally implemented in SW-195705
@kzawora-intel kzawora-intel added the intel Issues or PRs submitted by Intel label Aug 29, 2024
@kzawora-intel
Copy link

code looks good, but please fix the formatting (format.sh included with vllm)

@@ -115,7 +126,11 @@ def _init_workers_ray(self, placement_group: "PlacementGroup",
**worker_wrapper_kwargs)
else:
# Else, added to the list of workers.
self.workers.append(worker)
#self.workers.append(worker)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented code, please remove

@michalkuligowski
Copy link

Closing for now due to no update on the subject, lets reopen when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
intel Issues or PRs submitted by Intel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants