Skip to content

Commit

Permalink
Revert "[Doc][BugFix] Update setup instructions and reference links (#…
Browse files Browse the repository at this point in the history
…191)"

This reverts commit 8185d76.
  • Loading branch information
kzawora-intel committed Sep 24, 2024
1 parent f6ff4a7 commit a000e62
Showing 1 changed file with 13 additions and 4 deletions.
17 changes: 13 additions & 4 deletions docs/source/getting_started/gaudi-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ To verify that the Intel Gaudi software was correctly installed, run:
$ pip list | grep neural # verify that neural_compressor is installed
Refer to `Intel Gaudi Software Stack
Verification <https://docs.habana.ai/en/latest/Installation_Guide/Platform_Upgrade_and_Unboxing.html#system-verifications-and-final-tests>`__
Verification <https://docs.habana.ai/en/latest/Installation_Guide/SW_Verification.html#platform-upgrade>`__
for more details.

Run Docker Image
Expand All @@ -51,23 +51,32 @@ Use the following commands to run a Docker image:
Build and Install vLLM
---------------------------

To build and install vLLM from source, run:

.. code:: console
$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python setup.py develop
Currently, the latest features and performance optimizations are developed in Gaudi's `vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__ and we periodically upstream them to vLLM main repo. To install latest `HabanaAI/vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__, run the following:

.. code:: console
$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout habana_main
$ pip install -e .
$ python setup.py develop
Supported Features
==================

- `Offline batched
inference <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#offline-batched-inference>`__
inference <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference>`__
- Online inference via `OpenAI-Compatible
Server <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#openai-compatible-server>`__
Server <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server>`__
- HPU autodetection - no need to manually select device within vLLM
- Paged KV cache with algorithms enabled for Intel Gaudi accelerators
- Custom Intel Gaudi implementations of Paged Attention, KV cache ops,
Expand Down

0 comments on commit a000e62

Please sign in to comment.