Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc][BugFix] Update setup instructions and reference links #191

Merged
merged 1 commit into from
Aug 19, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions README_GAUDI.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,16 @@ following:
$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout habana_main
$ python setup.py develop
$ pip install -e .
```

Supported Features
==================

- [Offline batched
inference](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference)
inference](https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#offline-batched-inference)
- Online inference via [OpenAI-Compatible
Server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server)
Server](https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#openai-compatible-server)
- HPU autodetection - no need to manually select device within vLLM
- Paged KV cache with algorithms enabled for Intel Gaudi accelerators
- Custom Intel Gaudi implementations of Paged Attention, KV cache ops,
Expand Down
17 changes: 4 additions & 13 deletions docs/source/getting_started/gaudi-installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ To verify that the Intel Gaudi software was correctly installed, run:
$ pip list | grep neural # verify that neural_compressor is installed

Refer to `Intel Gaudi Software Stack
Verification <https://docs.habana.ai/en/latest/Installation_Guide/SW_Verification.html#platform-upgrade>`__
Verification <https://docs.habana.ai/en/latest/Installation_Guide/Platform_Upgrade_and_Unboxing.html#system-verifications-and-final-tests>`__
for more details.

Run Docker Image
Expand All @@ -51,32 +51,23 @@ Use the following commands to run a Docker image:
Build and Install vLLM
---------------------------

To build and install vLLM from source, run:

.. code:: console

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm
$ python setup.py develop


Currently, the latest features and performance optimizations are developed in Gaudi's `vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__ and we periodically upstream them to vLLM main repo. To install latest `HabanaAI/vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__, run the following:

.. code:: console

$ git clone https://github.com/HabanaAI/vllm-fork.git
$ cd vllm-fork
$ git checkout habana_main
$ python setup.py develop
$ pip install -e .


Supported Features
==================

- `Offline batched
inference <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference>`__
inference <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#offline-batched-inference>`__
- Online inference via `OpenAI-Compatible
Server <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server>`__
Server <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#openai-compatible-server>`__
- HPU autodetection - no need to manually select device within vLLM
- Paged KV cache with algorithms enabled for Intel Gaudi accelerators
- Custom Intel Gaudi implementations of Paged Attention, KV cache ops,
Expand Down
2 changes: 1 addition & 1 deletion docs/source/getting_started/quickstart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This guide shows how to use vLLM to:
* build an API server for a large language model;
* start an OpenAI-compatible API server.

Be sure to complete the :ref:`installation instructions <installation>` before continuing with this guide.
Be sure to complete the `Gaudi installation instructions <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/gaudi-installation.rst#run-docker-image>`_ before continuing with this guide.

.. note::

Expand Down
Loading