diff --git a/llm_deploy/README.md b/llm_deploy/README.md
index 677878b3..80e3f445 100644
--- a/llm_deploy/README.md
+++ b/llm_deploy/README.md
@@ -7,37 +7,36 @@ Go to the vllm dir and pip install -e .
 To notice https://github.com/vllm-project/vllm/issues/1283, need to modify the config file to "== 2.0.1" and the pytorch version if facing with CUDA version error.
 
 
-
-## Deploy finetuned model on babel via vLLM
+## Setting up Babel server
 ### Login with SSH key
-1. Add public ed25519 key to server
+Add public ed25519 key to server
 ```bash
 ssh-copy-id -i ~/.ssh/id_ed25519.pub <username>@<mycluster>
 ```
-2. Config ~~/.ssh/config
+Config SSH file
 ```bash
  Host <mycluster>
  HostName <mycluster>
  User <username>
  IdentityFile ~/.ssh/id_ed25519
 ```
-3. Login babel with SSH key
+Login babel with SSH key
 ```bash
 ssh <mycluster>
 ```
 
-### Connecting to compute node
-1. Jump from login node to compute node
+### Connecting to a compute node
+Jump from login node to compute node
 ```bash
 srun --pty bash
 ```
-2. Check if you can access the /data/folder
+Check if you can access the /data/folder
 ```bash
 cd /data/datasets/
 ```
 
-### Config environment on compute node
-1. Install miniconda
+### Config environment on the compute node
+Install miniconda
 ```bash
 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
 bash Miniconda3-latest-Linux-x86_64.sh
@@ -46,34 +45,79 @@ conda create --name myenv
 conda activate myenv
 # conda deactivate
 ```
-2. Install vllm packages
+Install vllm packages
 ```bash
 conda install pip
 pip install vllm
 ```
-3. Submit gpu request and open a new terminal
+Install fastchat packages
+```bash
+conda install pip
+git clone https://github.com/lm-sys/FastChat.git
+cd FastChat
+pip3 install --upgrade pip
+pip3 install "fschat[model_worker,webui]"
+```
+Submit gpu request and open a an interactive terminal
 ```bash
 srun --gres=gpu:1 --time=1-00:00:00 --mem=80G --pty $SHELL
 conda activate myenv
 ```
-4. Useful commands for checking gpu jobs
+Some useful commands for checking gpu jobs
 ```bash
 # check slurm status
 squeue -l
 # check gpu status
 nvidia-smi
+# check gpu usage
+pip install gpustat
+watch -n 1 gpustat
 # quit slurm jobs
 scancel job_id
 # connect to compute node directly
 ssh -J babel babel-x-xx
 ```
 
-### Host vLLM instance and run inference on server
-1. Start vLLM surver with model checkpoint
+### Install cuda-toolkit (optional)
+Due to the issue with vllm: https://github.com/vllm-project/vllm/issues/1283, we need to use cuda-toolkit=11.7.0 that is compatible with Pytorch 2.0.1.
+Install cuda-toolkit=11.7.0 on conda environment
+```bash
+conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
+```
+Check cuda-toolkit version
+```bash
+nvcc -V
+```
+
+## Deploy models on Babel via FastChat API server
+Implement the following python commands in three separate interactive terminal windows:
+```bash
+python3 -m fastchat.serve.controller
+python3 -m fastchat.serve.model_worker --model-path model-checkpoint
+python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
+```
+Call model checkpoint API
+```bash
+curl http://localhost:8000/v1/completions \
+     -H "Content-Type: application/json" \
+     -d '{
+         "model": "model-checkpoint",
+         "prompt": "San Francisco is a",
+         "max_tokens": 7,
+         "temperature": 0
+     }'
+```
+*Sample output:*
+```JSON
+{"id":"cmpl-GGvKBiZFdFLzPq2HdtuxbC","object":"text_completion","created":1698692212,"model":"checkpoint-4525","choices":[{"index":0,"text":"city that is known for its icon","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"total_tokens":11,"completion_tokens":6}}
+```
+
+## Deploy models on Babel via vllm API server
+Start vLLM surver with model checkpoint
 ```bash
 python -m vllm.entrypoints.openai.api_server --model model_checkpoint/
 ```
-1. Call model checkpoint API
+Call model checkpoint API
 ```bash
 curl http://localhost:8000/v1/models
 ```
@@ -81,12 +125,12 @@ curl http://localhost:8000/v1/models
 ```JSON
 {"object":"list","data":[{"id":"Mistral-7B-Instruct-v0.1/","object":"model","created":1697599903,"owned_by":"vllm","root":"Mistral-7B-Instruct-v0.1/","parent":null,"permission":[{"id":"modelperm-d415ecf6362a4f818090eb6428e0cac9","object":"model_permission","created":1697599903,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}
 ```
-2. Inference model checkpoint API
+Inference model checkpoint API
 ```bash
 curl http://localhost:8000/v1/completions \
      -H "Content-Type: application/json" \
      -d '{
-         "model": "model_checkpoint/",
+         "model": "model_checkpoint",
          "prompt": "San Francisco is a",
          "max_tokens": 7,
          "temperature": 0
@@ -97,31 +141,31 @@ curl http://localhost:8000/v1/completions \
 {"id":"cmpl-bf7552957a8a4bd89186051c40c52de4","object":"text_completion","created":3600699,"model":"Mistral-7B-Instruct-v0.1/","choices":[{"index":0,"text":" city that is known for its icon","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}}
 ```
 
-### Access deployed babel server on a local machine
-1. Construct ssh tunnel between babel login node and babel compute node with hosted model
+## Access deployed Babel server on a local machine
+Construct ssh tunnel between babel login node and babel compute node with hosted model
 ```bash
 ssh -N -L 7662:localhost:8000 username@babel-x-xx
 ```
 The above command creates a localhost:7662 server on bable login node which connects to localhost:8000 on compute node.
 
-2. Construct ssh tunnel between local machine and babel login node
+Construct ssh tunnel between local machine and babel login node
 ```bash
 ssh -N -L 8001:localhost:7662 username@<mycluster>
 ```
 The above command creates a localhost:8001 server on your local machine which connects to localhost:7662 on babel login node.
 
-3. Call hosted model on local machine
+Call hosted model on local machine
 ```bash
 curl http://localhost:8001/v1/models
 ```
 If the above command runs successfully, you should be able to use REST API on your local machine.
 
-4. (optional) If you fail in building the ssh tunnel, you may add `-v` to the ssh command to see what went wrong.
+(optional) If you fail in building the ssh tunnel, you may add `-v` to the ssh command to see what went wrong.
 
 
 
 
-### Userful resource links for babel
+## Userful resource links for babel
 1. https://hpc.lti.cs.cmu.edu/wiki/index.php?title=BABEL#Cluster_Architecture
 2. https://hpc.lti.cs.cmu.edu/wiki/index.php?title=VSCode
 3. https://hpc.lti.cs.cmu.edu/wiki/index.php?title=Training_Material