enabling multi-node serving on Gaudi ray cluster #218

vishnumadhu365 · 2024-08-29T12:55:38Z

PR enables multi-node serving on Ray cluster. Tested with meta-llama/Meta-Llama-3.1-405B-Instruct
Originally implemented in SW-195705

Steps to test:

Install vllm-fork on master and worker nodes
Launch Ray cluster.
Run benchmark as below (eg: for 2 nodes, 16 Gaudi cards)

export PT_HPU_ENABLE_LAZY_COLLECTIVES=true
export PT_HPUGRAPH_DISABLE_TENSOR_CACHE=0
export GLOO_SOCKET_IFNAME=eth0

python benchmarks/benchmark_throughput.py --backend vllm --input-len 128 --output-len 128 \
--model  meta-llama/Meta-Llama-3.1-405B-Instruct  --device hpu \
--dtype bfloat16 --tensor-parallel-size 16 --num-prompts 10

Serve the model

vllm serve  meta-llama/Meta-Llama-3.1-405B-Instruct  --tensor-parallel-size 16  --pipeline-parallel-size 1

Contributors - @jiunnyeu-habana @kzawora-intel , Jiafan Wang , Kobayashi-san , Nishant Agrawal

…lama-3.1-405B-Instruct. Originally implemented in SW-195705

kzawora-intel · 2024-08-30T11:06:52Z

code looks good, but please fix the formatting (format.sh included with vllm)

sync fork

michalkuligowski · 2024-09-10T11:48:17Z

vllm/executor/ray_habana_executor.py

@@ -115,7 +126,11 @@ def _init_workers_ray(self, placement_group: "PlacementGroup",
                        **worker_wrapper_kwargs)
                else:
                    # Else, added to the list of workers.
-                    self.workers.append(worker)
+                    #self.workers.append(worker)


Commented code, please remove

michalkuligowski · 2024-11-04T09:34:54Z

Closing for now due to no update on the subject, lets reopen when needed.

enabling multi-node vllm on ray cluster, tested for meta-llama/Meta-L…

fa819df

…lama-3.1-405B-Instruct. Originally implemented in SW-195705

kzawora-intel added the intel Issues or PRs submitted by Intel label Aug 29, 2024

Merge pull request #1 from HabanaAI/habana_main

85f61a7

sync fork

michalkuligowski reviewed Sep 10, 2024

View reviewed changes

michalkuligowski closed this Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enabling multi-node serving on Gaudi ray cluster #218

enabling multi-node serving on Gaudi ray cluster #218

vishnumadhu365 commented Aug 29, 2024

kzawora-intel commented Aug 30, 2024

michalkuligowski Sep 10, 2024

michalkuligowski commented Nov 4, 2024

enabling multi-node serving on Gaudi ray cluster #218

enabling multi-node serving on Gaudi ray cluster #218

Conversation

vishnumadhu365 commented Aug 29, 2024

kzawora-intel commented Aug 30, 2024

michalkuligowski Sep 10, 2024

Choose a reason for hiding this comment

michalkuligowski commented Nov 4, 2024