Skip to content

Commit

Permalink
GPT-in-a-Box: Doc Updates (nutanix-cloud-native#54)
Browse files Browse the repository at this point in the history
* GPT-in-a-Box Doc Updates:
* Replace "supported" with "validated"
* Reword parameter description to clarify that deployment metadata name is user-specified
* fix typos
  • Loading branch information
lauranutanix authored Dec 20, 2023
1 parent 4eb9bb7 commit ed83ff7
Show file tree
Hide file tree
Showing 10 changed files with 19 additions and 19 deletions.
2 changes: 1 addition & 1 deletion docs/gpt-in-a-box/kubernetes/v0.2/generating_mar.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Run the following command for downloading model files and generating MAR file:
python3 $WORK_DIR/llm/generate.py [--hf_token <HUGGINGFACE_HUB_TOKEN> --repo_version <REPO_COMMIT_ID>] --model_name <MODEL_NAME> --output <NFS_LOCAL_MOUNT_LOCATION>
```

* **model_name**: Name of a [supported model](supported_models.md)
* **model_name**: Name of a [validated model](validated_models.md)
* **output**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored
* **repo_version**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used)
* **hf_token**: Your HuggingFace token. Needed to download LLAMA(2) models.
Expand Down
2 changes: 1 addition & 1 deletion docs/gpt-in-a-box/kubernetes/v0.2/huggingface_model.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# HuggingFace Model Support
!!! Note
To start the inference server for the [**Supported Models**](supported_models.md), refer to the [**Deploying Inference Server**](inference_server.md) documentation.
To start the inference server for the [**Validated Models**](validated_models.md), refer to the [**Deploying Inference Server**](inference_server.md) documentation.

We provide the capability to download model files from any HuggingFace repository and generate a MAR file to start an inference server using Kubeflow serving.<br />

Expand Down
8 changes: 4 additions & 4 deletions docs/gpt-in-a-box/kubernetes/v0.2/inference_requests.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Kubeflow serving can be inferenced and managed through it's Inference APIs. Find out more about Kubeflow serving APIs in the official [Inference API](https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/#model-inference) documentation.
Kubeflow serving can be inferenced and managed through its Inference APIs. Find out more about Kubeflow serving APIs in the official [Inference API](https://kserve.github.io/website/0.8/modelserving/v1beta1/torchserve/#model-inference) documentation.

### Set HOST and PORT
The first step is to [determine the ingress IP and ports](https://kserve.github.io/website/0.8/get_started/first_isvc/#4-determine-the-ingress-ip-and-ports) and set INGRESS_HOST and INGRESS_PORT.
Expand Down Expand Up @@ -31,15 +31,15 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http:
#### Examples:
Curl request for MPT-7B model
```
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mpt_7b/infer -d @$WORK_DIR/data/qa/sample_test1.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/mpt_7b/infer -d @$WORK_DIR/data/qa/sample_text1.json
```
Curl request for Falcon-7B model
```
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/falcon_7b/infer -d @$WORK_DIR/data/summarize/sample_test1.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/falcon_7b/infer -d @$WORK_DIR/data/summarize/sample_text1.json
```
Curl request for Llama2-7B model
```
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/llama2_7b/infer -d @$WORK_DIR/data/translate/sample_test1.json
curl -v -H "Host: ${SERVICE_HOSTNAME}" -H "Content-Type: application/json" http://${INGRESS_HOST}:${INGRESS_PORT}/v2/models/llama2_7b/infer -d @$WORK_DIR/data/translate/sample_text1.json
```

### Input data format
Expand Down
4 changes: 2 additions & 2 deletions docs/gpt-in-a-box/kubernetes/v0.2/inference_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@ Run the following command for starting Kubeflow serving and running inference on
bash $WORK_DIR/llm/run.sh -n <MODEL_NAME> -g <NUM_GPUS> -f <NFS_ADDRESS_WITH_SHARE_PATH> -m <NFS_LOCAL_MOUNT_LOCATION> -e <KUBE_DEPLOYMENT_NAME> [OPTIONAL -d <INPUT_PATH> -v <REPO_COMMIT_ID> -t <HUGGINGFACE_HUB_TOKEN>]
```

* **n**: Name of a [supported model](supported_models.md)
* **n**: Name of a [validated model](validated_models.md)
* **d**: Absolute path of input data folder (Optional)
* **g**: Number of gpus to be used to execute (Set 0 to use cpu)
* **f**: NFS server address with share path information
* **m**: Mount path to your nfs server to be used in the kube PV where model files and model archive file be stored
* **e**: Name of the deployment metadata
* **e**: Desired name of the deployment metadata (will be created)
* **v**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used)
* **t**: Your HuggingFace token. Needed for LLAMA(2) model.

Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Supported Models for Kubernetes Version
# Validated Models for Kubernetes Version

GPT-in-a-Box currently supports a curated set of HuggingFace models Information pertaining to these models is stored in the ```llm/model_config.json``` file.
GPT-in-a-Box has been validated on a curated set of HuggingFace models Information pertaining to these models is stored in the ```llm/model_config.json``` file.

The Supported Models are :
The Validated Models are :

| Model Name | HuggingFace Repository ID |
| --- | --- |
Expand Down
2 changes: 1 addition & 1 deletion docs/gpt-in-a-box/vm/v0.3/generating_mar.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python3 $WORK_DIR/llm/generate.py [--skip_download --repo_version <REPO_VERSION>
```
Where the arguments are :

- **model_name**: Name of a [supported model](supported_models.md)
- **model_name**: Name of a [validated model](validated_models.md)
- **repo_version**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used)
- **model_path**: Absolute path of model files (should be empty if downloading)
- **mar_output**: Absolute path of export of MAR file (.mar)
Expand Down
2 changes: 1 addition & 1 deletion docs/gpt-in-a-box/vm/v0.3/huggingface_model.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# HuggingFace Model Support
!!! Note
To start the inference server for the [**Supported Models**](supported_models.md), refer to the [**Deploying Inference Server**](inference_server.md) documentation.
To start the inference server for the [**Validated Models**](validated_models.md), refer to the [**Deploying Inference Server**](inference_server.md) documentation.

We provide the capability to download model files from any HuggingFace repository and generate a MAR file to start an inference server using it with Torchserve.

Expand Down
2 changes: 1 addition & 1 deletion docs/gpt-in-a-box/vm/v0.3/inference_server.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ bash $WORK_DIR/llm/run.sh -n <MODEL_NAME> -a <MAR_EXPORT_PATH> [OPTIONAL -d <INP
```
Where the arguments are :

- **n**: Name of a [supported model](supported_models.md)
- **n**: Name of a [validated model](validated_models.md)
- **v**: Commit ID of model's HuggingFace repository (optional, if not provided default set in model_config will be used)
- **d**: Absolute path of input data folder (optional)
- **a**: Absolute path to the Model Store directory
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Supported Models for Virtual Machine Version
# Validated Models for Virtual Machine Version

GPT-in-a-Box currently supports a curated set of HuggingFace models. Information pertaining to these models is stored in the ```llm/model_config.json``` file.
GPT-in-a-Box has been validated on a curated set of HuggingFace models. Information pertaining to these models is stored in the ```llm/model_config.json``` file.

The Supported Models are :
The Validated Models are :

| Model Name | HuggingFace Repository ID |
| --- | --- |
Expand Down
4 changes: 2 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ nav:
- "Deploy on Virtual Machine":
- "v0.3":
- "Getting Started": "gpt-in-a-box/vm/v0.3/getting_started.md"
- "Supported Models": "gpt-in-a-box/vm/v0.3/supported_models.md"
- "Validated Models": "gpt-in-a-box/vm/v0.3/validated_models.md"
- "Generating Model Archive File": "gpt-in-a-box/vm/v0.3/generating_mar.md"
- "Deploying Inference Server": "gpt-in-a-box/vm/v0.3/inference_server.md"
- "Inference Requests": "gpt-in-a-box/vm/v0.3/inference_requests.md"
Expand All @@ -142,7 +142,7 @@ nav:
- "Deploy on Kubernetes":
- "v0.2":
- "Getting Started": "gpt-in-a-box/kubernetes/v0.2/getting_started.md"
- "Supported Models": "gpt-in-a-box/kubernetes/v0.2/supported_models.md"
- "Validated Models": "gpt-in-a-box/kubernetes/v0.2/validated_models.md"
- "Generating Model Archive File": "gpt-in-a-box/kubernetes/v0.2/generating_mar.md"
- "Deploying Inference Server": "gpt-in-a-box/kubernetes/v0.2/inference_server.md"
- "Inference Requests": "gpt-in-a-box/kubernetes/v0.2/inference_requests.md"
Expand Down

0 comments on commit ed83ff7

Please sign in to comment.