diff --git a/README_GAUDI.md b/README_GAUDI.md index 9ea30a2e43f69..91bcbe49405eb 100644 --- a/README_GAUDI.md +++ b/README_GAUDI.md @@ -62,16 +62,16 @@ following: $ git clone https://github.com/HabanaAI/vllm-fork.git $ cd vllm-fork $ git checkout habana_main -$ python setup.py develop +$ pip install -e . ``` Supported Features ================== - [Offline batched - inference](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference) + inference](https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#offline-batched-inference) - Online inference via [OpenAI-Compatible - Server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server) + Server](https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#openai-compatible-server) - HPU autodetection - no need to manually select device within vLLM - Paged KV cache with algorithms enabled for Intel Gaudi accelerators - Custom Intel Gaudi implementations of Paged Attention, KV cache ops, diff --git a/docs/source/getting_started/gaudi-installation.rst b/docs/source/getting_started/gaudi-installation.rst index ddbac022a8d9d..b3234d10b3115 100644 --- a/docs/source/getting_started/gaudi-installation.rst +++ b/docs/source/getting_started/gaudi-installation.rst @@ -30,7 +30,7 @@ To verify that the Intel Gaudi software was correctly installed, run: $ pip list | grep neural # verify that neural_compressor is installed Refer to `Intel Gaudi Software Stack -Verification `__ +Verification `__ for more details. Run Docker Image @@ -51,15 +51,6 @@ Use the following commands to run a Docker image: Build and Install vLLM --------------------------- -To build and install vLLM from source, run: - -.. code:: console - - $ git clone https://github.com/vllm-project/vllm.git - $ cd vllm - $ python setup.py develop - - Currently, the latest features and performance optimizations are developed in Gaudi's `vLLM-fork `__ and we periodically upstream them to vLLM main repo. To install latest `HabanaAI/vLLM-fork `__, run the following: .. code:: console @@ -67,16 +58,16 @@ Currently, the latest features and performance optimizations are developed in Ga $ git clone https://github.com/HabanaAI/vllm-fork.git $ cd vllm-fork $ git checkout habana_main - $ python setup.py develop + $ pip install -e . Supported Features ================== - `Offline batched - inference `__ + inference `__ - Online inference via `OpenAI-Compatible - Server `__ + Server `__ - HPU autodetection - no need to manually select device within vLLM - Paged KV cache with algorithms enabled for Intel Gaudi accelerators - Custom Intel Gaudi implementations of Paged Attention, KV cache ops, diff --git a/docs/source/getting_started/quickstart.rst b/docs/source/getting_started/quickstart.rst index 89bdc247c5e8e..8cfde76adf5fa 100644 --- a/docs/source/getting_started/quickstart.rst +++ b/docs/source/getting_started/quickstart.rst @@ -9,7 +9,7 @@ This guide shows how to use vLLM to: * build an API server for a large language model; * start an OpenAI-compatible API server. -Be sure to complete the :ref:`installation instructions ` before continuing with this guide. +Be sure to complete the `Gaudi installation instructions `_ before continuing with this guide. .. note::