From 8185d760325a7699c5c07f7cd0e28d443a36051b Mon Sep 17 00:00:00 2001
From: Mohit Deopujari <mdeopujari@habana.ai>
Date: Sun, 18 Aug 2024 23:30:38 -0700
Subject: [PATCH] [Doc][BugFix] Update setup instructions and reference links
 (#191)

1. Replaced the non-working setup instruction with the correct command.
2. Fixed broken links and updated references in documentation.
---
 README_GAUDI.md                                 |  6 +++---
 .../getting_started/gaudi-installation.rst      | 17 ++++-------------
 docs/source/getting_started/quickstart.rst      |  2 +-
 3 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/README_GAUDI.md b/README_GAUDI.md
index 9ea30a2e43f6..91bcbe49405e 100644
--- a/README_GAUDI.md
+++ b/README_GAUDI.md
@@ -62,16 +62,16 @@ following:
 $ git clone https://github.com/HabanaAI/vllm-fork.git
 $ cd vllm-fork
 $ git checkout habana_main
-$ python setup.py develop
+$ pip install -e .
 ```
 
 Supported Features
 ==================
 
 -   [Offline batched
-    inference](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference)
+    inference](https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#offline-batched-inference)
 -   Online inference via [OpenAI-Compatible
-    Server](https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server)
+    Server](https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#openai-compatible-server)
 -   HPU autodetection - no need to manually select device within vLLM
 -   Paged KV cache with algorithms enabled for Intel Gaudi accelerators
 -   Custom Intel Gaudi implementations of Paged Attention, KV cache ops,
diff --git a/docs/source/getting_started/gaudi-installation.rst b/docs/source/getting_started/gaudi-installation.rst
index ddbac022a8d9..b3234d10b311 100644
--- a/docs/source/getting_started/gaudi-installation.rst
+++ b/docs/source/getting_started/gaudi-installation.rst
@@ -30,7 +30,7 @@ To verify that the Intel Gaudi software was correctly installed, run:
    $ pip list | grep neural # verify that neural_compressor is installed
 
 Refer to `Intel Gaudi Software Stack
-Verification <https://docs.habana.ai/en/latest/Installation_Guide/SW_Verification.html#platform-upgrade>`__
+Verification <https://docs.habana.ai/en/latest/Installation_Guide/Platform_Upgrade_and_Unboxing.html#system-verifications-and-final-tests>`__
 for more details.
 
 Run Docker Image
@@ -51,15 +51,6 @@ Use the following commands to run a Docker image:
 Build and Install vLLM
 ---------------------------
 
-To build and install vLLM from source, run:
-
-.. code:: console
-
-   $ git clone https://github.com/vllm-project/vllm.git
-   $ cd vllm
-   $ python setup.py develop
-
-
 Currently, the latest features and performance optimizations are developed in Gaudi's `vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__ and we periodically upstream them to vLLM main repo. To install latest `HabanaAI/vLLM-fork <https://github.com/HabanaAI/vllm-fork>`__, run the following:
 
 .. code:: console
@@ -67,16 +58,16 @@ Currently, the latest features and performance optimizations are developed in Ga
    $ git clone https://github.com/HabanaAI/vllm-fork.git
    $ cd vllm-fork
    $ git checkout habana_main
-   $ python setup.py develop
+   $ pip install -e .
 
 
 Supported Features
 ==================
 
 -  `Offline batched
-   inference <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#offline-batched-inference>`__
+   inference <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#offline-batched-inference>`__
 -  Online inference via `OpenAI-Compatible
-   Server <https://docs.vllm.ai/en/latest/getting_started/quickstart.html#openai-compatible-server>`__
+   Server <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/quickstart.rst#openai-compatible-server>`__
 -  HPU autodetection - no need to manually select device within vLLM
 -  Paged KV cache with algorithms enabled for Intel Gaudi accelerators
 -  Custom Intel Gaudi implementations of Paged Attention, KV cache ops,
diff --git a/docs/source/getting_started/quickstart.rst b/docs/source/getting_started/quickstart.rst
index 89bdc247c5e8..8cfde76adf5f 100644
--- a/docs/source/getting_started/quickstart.rst
+++ b/docs/source/getting_started/quickstart.rst
@@ -9,7 +9,7 @@ This guide shows how to use vLLM to:
 * build an API server for a large language model;
 * start an OpenAI-compatible API server.
 
-Be sure to complete the :ref:`installation instructions <installation>` before continuing with this guide.
+Be sure to complete the `Gaudi installation instructions <https://github.com/HabanaAI/vllm-fork/blob/habana_main/docs/source/getting_started/gaudi-installation.rst#run-docker-image>`_ before continuing with this guide.
 
 .. note::