From a2ba1c7127b96f4d14e1d79529e1f973c0fde3ee Mon Sep 17 00:00:00 2001 From: Matthias Reso <13337103+mreso@users.noreply.github.com> Date: Fri, 23 Aug 2024 17:17:46 -0700 Subject: [PATCH] Update quickstart llm docker in serve/readme; added ts.llm_launcher example (#3300) * Update quickstart llm docker readme; added ts.llm_launcher example * fix wording --- README.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ae15547091..766b3f4e45 100644 --- a/README.md +++ b/README.md @@ -62,13 +62,24 @@ Refer to [torchserve docker](docker/README.md) for details. ### 🤖 Quick Start LLM Deployment +```bash +# Make sure to install torchserve with pip or conda as described above and login with `huggingface-cli login` +python -m ts.llm_launcher --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth + +# Try it out +curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions" +``` + +### 🚢 Quick Start LLM Deployment with Docker + ```bash #export token= docker build --pull . -f docker/Dockerfile.llm -t ts/llm docker run --rm -ti --shm-size 10g --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:8080 -v data:/data ts/llm --model_id meta-llama/Meta-Llama-3-8B-Instruct --disable_token_auth -curl -X POST -d '{"prompt":"Hello, my name is", "max_new_tokens": 50}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model" +# Try it out +curl -X POST -d '{"model":"meta-llama/Meta-Llama-3-8B-Instruct", "prompt":"Hello, my name is", "max_tokens": 200}' --header "Content-Type: application/json" "http://localhost:8080/predictions/model/1.0/v1/completions" ``` Refer to [LLM deployment](docs/llm_deployment.md) for details and other methods.