Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Set a missed VLLM_ARG to accelerator_count.
The main Llama2 deployment instructions are pulled from this notebook: https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_pytorch_llama2_deployment.ipynb Which specifies that `tensor-parallel-size` be set to accelerator_count. I initially hardcoded that to 1 & need to set it now that accelerator_count can be 8 instead for Llama2 70b. PiperOrigin-RevId: 678756743
- Loading branch information