From 90de1b2564d16463a49f198199da4dc3e9540695 Mon Sep 17 00:00:00 2001 From: Zhanghao Wu Date: Mon, 19 Aug 2024 16:50:53 -0700 Subject: [PATCH] [Docs] Fix imgur links (#3846) * Fix imgur links * Remove unecessary file * revert --- docs/source/examples/interactive-development.rst | 2 +- llm/codellama/README.md | 4 ++-- llm/falcon/README.md | 2 +- llm/gpt-2/README.md | 8 ++++---- llm/llama-2/README.md | 2 +- llm/llama-3/README.md | 4 ++-- llm/llama-3_1-finetuning/readme.md | 6 +++--- llm/lorax/README.md | 2 +- llm/vicuna-llama-2/README.md | 6 +++--- llm/vllm/README.md | 2 +- 10 files changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/source/examples/interactive-development.rst b/docs/source/examples/interactive-development.rst index cc50f8e6ea8..40920934597 100644 --- a/docs/source/examples/interactive-development.rst +++ b/docs/source/examples/interactive-development.rst @@ -110,7 +110,7 @@ This is supported by simply connecting VSCode to the cluster with the cluster na For more details, please refer to the `VSCode documentation `__. -.. image:: https://imgur.com/8mKfsET.gif +.. image:: https://i.imgur.com/8mKfsET.gif :align: center :alt: Connect to the cluster with VSCode diff --git a/llm/codellama/README.md b/llm/codellama/README.md index 8e5025d22b5..f145fd062ff 100644 --- a/llm/codellama/README.md +++ b/llm/codellama/README.md @@ -10,14 +10,14 @@ The followings are the demos of Code Llama 70B hosted by SkyPilot Serve (aka Sky ## Demos
- +
Coding Assistant: Connect to hosted Code Llama with Tabby in VScode
- +
Chat: Connect to hosted Code Llama with FastChat
diff --git a/llm/falcon/README.md b/llm/falcon/README.md index 837e93f5558..6eb480d9ea8 100644 --- a/llm/falcon/README.md +++ b/llm/falcon/README.md @@ -50,7 +50,7 @@ sky launch -c falcon -s falcon.yaml --no-use-spot For reference, below is a loss graph you may expect to see, and the amount of time and the approximate cost of fine-tuning each of the models over 500 epochs (assuming a spot instance A100 GPU rate at $1.1 / hour and a A100-80GB rate of $1.61 / hour): -image +image 1. `ybelkada/falcon-7b-sharded-bf16`: 2.5 to 3 hours using 1 A100 spot GPU; total cost ≈ $3.3. diff --git a/llm/gpt-2/README.md b/llm/gpt-2/README.md index bc9893fec5b..10fa2cf6998 100644 --- a/llm/gpt-2/README.md +++ b/llm/gpt-2/README.md @@ -28,14 +28,14 @@ Run the following command to start GPT-2 (124M) training on a GPU VM with 8 A100 sky launch -c gpt2 gpt2.yaml ``` -![GPT-2 training with 8 A100 GPUs](https://imgur.com/v8SGpsF.png) +![GPT-2 training with 8 A100 GPUs](https://i.imgur.com/v8SGpsF.png) Or, you can train the model with a single A100, by adding `--gpus A100`: ```bash sky launch -c gpt2 gpt2.yaml --gpus A100 ``` -![GPT-2 training with a single A100](https://imgur.com/hN65g4r.png) +![GPT-2 training with a single A100](https://i.imgur.com/hN65g4r.png) It is also possible to speed up the training of the model on 8 H100 (2.3x more tok/s than 8x A100s): @@ -43,7 +43,7 @@ It is also possible to speed up the training of the model on 8 H100 (2.3x more t sky launch -c gpt2 gpt2.yaml --gpus H100:8 ``` -![GPT-2 training with 8 H100](https://imgur.com/STbi80b.png) +![GPT-2 training with 8 H100](https://i.imgur.com/STbi80b.png) ### Download logs and visualizations @@ -54,7 +54,7 @@ scp -r gpt2:~/llm.c/log124M . We can visualize the training progress with the notebook provided in [llm.c](https://github.com/karpathy/llm.c/blob/master/dev/vislog.ipynb). (Note: we cut off the training after 10K steps, which already achieve similar validation loss as OpenAI GPT-2 checkpoint.)
- +
> Yes! We are able to reproduce the training of GPT-2 (124M) on any cloud with SkyPilot. diff --git a/llm/llama-2/README.md b/llm/llama-2/README.md index d8f8151572e..4f1a8f60cae 100644 --- a/llm/llama-2/README.md +++ b/llm/llama-2/README.md @@ -94,6 +94,6 @@ You can also host the official FAIR model without using huggingface and gradio. ``` 3. Open http://localhost:7681 in your browser and start chatting! -LLaMA chatbot running on the cloud via SkyPilot +LLaMA chatbot running on the cloud via SkyPilot diff --git a/llm/llama-3/README.md b/llm/llama-3/README.md index d0c28dc93c6..ef19d94b5c0 100644 --- a/llm/llama-3/README.md +++ b/llm/llama-3/README.md @@ -5,7 +5,7 @@

-Llama-3 x SkyPilot +Llama-3 x SkyPilot

[Llama-3](https://github.com/meta-llama/llama3) is the latest top open-source LLM from Meta. It has been released with a license that authorizes commercial use. You can deploy a private Llama-3 chatbot with SkyPilot in your own cloud with just one simple command. @@ -248,7 +248,7 @@ To use the Gradio UI, open the URL shown in the logs:

-Gradio UI serving Llama-3 +Gradio UI serving Llama-3

To stop the instance: diff --git a/llm/llama-3_1-finetuning/readme.md b/llm/llama-3_1-finetuning/readme.md index 836f3bf1b3b..935dccde84e 100644 --- a/llm/llama-3_1-finetuning/readme.md +++ b/llm/llama-3_1-finetuning/readme.md @@ -135,7 +135,7 @@ sky launch -c llama31 lora.yaml \
- +
Training Loss of LoRA finetuning Llama 3.1
@@ -218,10 +218,10 @@ run: | ## Appendix: Preparation 1. Request the access to [Llama 3.1 weights on huggingface](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) (Click on the blue box and follow the steps): -![](https://imgur.com/snIQhr9.png) +![](https://i.imgur.com/snIQhr9.png) 2. Get your [huggingface access token](https://huggingface.co/settings/tokens): -![](https://imgur.com/3idBgHn.png) +![](https://i.imgur.com/3idBgHn.png) 3. Add huggingface token to your environment variable: diff --git a/llm/lorax/README.md b/llm/lorax/README.md index 2fe548c92a8..6cc44cf1134 100644 --- a/llm/lorax/README.md +++ b/llm/lorax/README.md @@ -4,7 +4,7 @@

- LoRAX + LoRAX

[LoRAX](https://github.com/predibase/lorax) (LoRA eXchange) is a framework that allows users to serve thousands of fine-tuned LLMs on a single GPU, dramatically reducing the cost of serving without compromising on throughput or latency. It works by dynamically loading multiple fine-tuned "adapters" (LoRAs, etc.) on top of a single base model at runtime. Concurrent requests for different adapters can be processed together in a single batch, allowing LoRAX to maintain near linear throughput scaling as the number of adapters increases. diff --git a/llm/vicuna-llama-2/README.md b/llm/vicuna-llama-2/README.md index 899792c299d..24caa525a56 100644 --- a/llm/vicuna-llama-2/README.md +++ b/llm/vicuna-llama-2/README.md @@ -1,6 +1,6 @@ # Train Your Own Vicuna on Llama-2 -![Vicuna-Llama-2](https://imgur.com/McZWg6z.gif "Result model in action, trained using this guide. From the SkyPilot and Vicuna teams.") +![Vicuna-Llama-2](https://i.imgur.com/McZWg6z.gif "Result model in action, trained using this guide. From the SkyPilot and Vicuna teams.") Meta released [Llama 2](https://ai.meta.com/llama/) two weeks ago and has made a big wave in the AI community. In our opinion, its biggest impact is that the model is now released under a [permissive license](https://github.com/facebookresearch/llama/blob/main/LICENSE) that **allows the model weights to be used commercially**[^1]. This differs from Llama 1 which cannot be used commercially. @@ -106,7 +106,7 @@ sky launch --no-use-spot ...

- Optimizer + Optimizer

**Optional**: Try out the training for the 13B model: @@ -139,7 +139,7 @@ sky launch -c serve serve.yaml --env MODEL_CKPT=/chatbot/ ``` In [serve.yaml](https://github.com/skypilot-org/skypilot/tree/master/llm/vicuna-llama-2/serve.yaml), we specified launching a Gradio server that serves the model checkpoint at `/chatbot/7b`. -![Vicuna-Llama-2](https://imgur.com/McZWg6z.gif "Serving the resulting model with Gradio.") +![Vicuna-Llama-2](https://i.imgur.com/McZWg6z.gif "Serving the resulting model with Gradio.") > **Tip**: You can also switch to a cheaper accelerator, such as L4, to save costs, by adding `--gpus L4` to the above command. diff --git a/llm/vllm/README.md b/llm/vllm/README.md index e3a2befbecc..9fb3c0c1364 100644 --- a/llm/vllm/README.md +++ b/llm/vllm/README.md @@ -4,7 +4,7 @@

- vLLM + vLLM

This README contains instructions to run a demo for vLLM, an open-source library for fast LLM inference and serving, which improves the throughput compared to HuggingFace by **up to 24x**.