Skip to content

Commit

Permalink
Add vllm deploy (#45)
Browse files Browse the repository at this point in the history
* support qlora

* upload dummy conversation data

* delete doc and docker

* update pyproject pip install package

* continue cleaning

* delete more files

* delete a format

* add llm_deploy

* add testing scripts

* update deployment readme

* update readme and fix some bug

* finalize the inference and deployment based on vllm
  • Loading branch information
lwaekfjlk authored Oct 11, 2023
1 parent 9c5205b commit 23a19f6
Show file tree
Hide file tree
Showing 5 changed files with 33 additions and 2 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,6 @@ We split our overall framework into multiple parts

1. Data Processing --> Output general form of sotopia train and test data
2. Together AI Finetuning --> Input the train and test data / Output model checkpoint
3. LLM Finetuning --> Input the train and test data / Output model checkpoint + deployed model API form
4. Eval --> Input model checkpoint / Output evaluation scores
3. LLM Finetuning --> Input the train and test data / Output model checkpoint
4. LLM Deplyment --> Input LLM Finetuned model checkpoint / Output Deployable OpenAI type API
5. Eval --> Input model checkpoint / Output evaluation scores
5 changes: 5 additions & 0 deletions llm_deploy/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
We need to use an unmerged branch to support deploying lora-finetuned model. (the forked repo is https://github.com/troph-team/vllm.git)

Go to the vllm dir and pip install -e .

To notice https://github.com/vllm-project/vllm/issues/1283, need to modify the config file to "== 2.0.1" and the pytorch version if facing with CUDA version error.
1 change: 1 addition & 0 deletions llm_deploy/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
vllm
1 change: 1 addition & 0 deletions llm_deploy/vllm_deploy.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
python -m vllm.entrypoints.openai.api_server --model ../llm_ft/vicuna-7b-1.5
23 changes: 23 additions & 0 deletions llm_deploy/vllm_inference_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from vllm import LLM, SamplingParams
from vllm.model_executor.adapters import lora

# Create an LLM, need to change gpu memory utilization based on our need
llm = LLM(model="../llm_ft/vicuna-7b-1.5", gpu_memory_utilization=0.5)

# Add LoRA adapter
lora.LoRAModel.from_pretrained(llm.llm_engine.workers[0].model, "../llm_ft/vicuna_checkpoints/checkpoint-1200")

prompts = [
"Hello, my name is",
"The capital of France is",
"The future of AI is",
]

sampling_params = SamplingParams(temperature=0, top_k=-1)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

0 comments on commit 23a19f6

Please sign in to comment.