Getting started section in README for icpp_llama2

icppWorld · Mar 7, 2024 · 5dd6c5d · 5dd6c5d
1 parent 3ade641
commit 5dd6c5d
Show file tree

Hide file tree

Showing 5 changed files with 89 additions and 48 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,9 @@
 
 *The LLMs of this repo run in it's back-end canisters.*
 
+# Getting Started
 
+A step-by-step guide to deploy your first LLM to the internet computer is provided in [icpp_llama2/README.md](https://github.com/icppWorld/icpp_llm/blob/main/icpp_llama2/README.md).
 
 # The Benefits of Running LLMs On-Chain
 
@@ -27,10 +29,6 @@ Coherent English?](https://arxiv.org/pdf/2305.07759.pdf)
 Besides the ease of use and the enhanced security, running LLMs directly on-chain also facilitates a seamless integration of tokenomics, eliminating the need to juggle between a complex blend of web3 and web2 components, and I believe it will lead to a new category of Generative AI based dApps.
 
 
-## Instructions
-
-See the README in the icpp_llama2 folder
-
 
 ## Support
 

diff --git a/icpp_llama2/README.md b/icpp_llama2/README.md
@@ -1,8 +1,18 @@
 # [karpathy/llama2.c](https://github.com/karpathy/llama2.c) for the Internet Computer
 
-# Instructions
+# Getting Started
 
 - Install the C++ development environment for the Internet Computer ([docs](https://docs.icpp.world/installation.html)):
+  - Create a python environment. (We like MiniConda, but use whatever you like!)
+    ```bash
+    conda create --name myllama2 python=3.11
+    conda activate myllama2 
+    ```
+  - Clone this repo and enter the icpp_llama2 folder
+    ```bash
+    git clone https://github.com/icppWorld/icpp_llm.git
+    cd icpp_llm/icpp_llama2
+    ```
   - Install the required python packages *(icpp-pro & ic-py)*:
     ```bash
     pip install -r requirements.txt
@@ -16,11 +26,71 @@
     sh -ci "$(curl -fsSL https://internetcomputer.org/install.sh)"
     ```
     *(Note: On Windows, just install dfx in wsl, and icpp-pro in PowerShell will know where to find it. )*
-
 
-- Get a model checkpoint, as explained in [karpathy/llama2.c](https://github.com/karpathy/llama2.c):
+- Deploy the smallest pre-trained model to canister `llama2_260K`:
+  - Start the local network:
+    ```bash
+    dfx start --clean
+    ```
+  - Compile & link to WebAssembly (wasm), as defined in `icpp.toml`:
+    ```bash
+    icpp build-wasm
+    ```
+  - Deploy the wasm to a canister on the local network:
+    ```bash
+    dfx deploy llama2_260k
+    ```
+  - Check the health endpoint of the `llama2_260k` canister:
+    ```bash
+    dfx canister call llama2_260k health
+    ```
+  - Upload the 260k parameter model & tokenizer:
+    ```bash
+    python -m scripts.upload --network local --canister llama2_260K --model stories260K/stories260K.bin --tokenizer stories260K/tok512.bin
+    ```
+  - Check the readiness endpoint, indicating it can be used for inference:
+    ```bash
+    dfx canister call llama2_260k ready
+    ```
+
+- Test it with dfx:  
+  - Generate a new story, 10 tokens at a time, starting with an empty prompt:
+    ```bash
+    dfx canister call llama2_260k new_chat '()'
+    dfx canister call llama2_260k inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.9 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
+    dfx canister call llama2_260k inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.9 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
+    # etc.
+    ```
+  - Generate a new story, starting with a non-empty:
+    ```bash
+    dfx canister call llama2_260k new_chat '()'
+    dfx canister call llama2_260k inference '(record {prompt = "Jenny climbed in a tree" : text; steps = 10 : nat64; temperature = 0.9 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
+    dfx canister call llama2_260k inference '(record {prompt = "" : text; steps = 10 : nat64; temperature = 0.9 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
+    # etc.
+    ```
+
+# Next steps
+
+As you test the smallest pre-trained model, llama2_260k, you quickly realize that it is not a very good model. The stories generated are not comprehensible. This is simply because the model is not large enough. It is just for verifying that your build, deploy and test pipeline is functional.
+
+You also will notice that using dfx to generate stories is not very user friendly. We build a little frontend to generate stories, available as an open source project: https://github.com/icppWorld/icgpt, and deployed to the IC as deployed as [ICGPT](https://icgpt.icpp.world/).
+
+As next challenges, some ideas:
+- Deploy the 15M parameter model
+- Test out the 15M model at [ICGPT](https://icgpt.icpp.world/)
+- Test the influence of `temperature` and `topp` on the storie generation
+- Build your own frontend
+- Train your own model and deploy it
+- Study the efficiency of the LLM, and look for improvements
+- etc.
+
+Some further instructions are provided below.
+
+## Deploy the 15M parameter pre-trained model
+
+- You can get other model checkpoints, as explained in [karpathy/llama2.c](https://github.com/karpathy/llama2.c):
 
-   This command downloads the 15M parameter model that was trained on the TinyStories dataset (~60MB download) and stores it in a `models` folder:
+   For example, this command downloads the 15M parameter model that was trained on the TinyStories dataset (~60MB download) and stores it in a `models` folder:
 
    ```bash
    # on Linux/Mac
@@ -45,33 +115,6 @@
 ![icpp_llama2_without_limits](../assets/icpp_llama2_without_limits.png)
 
 
-# stories260K
-
-The default model is`stories15M.bin`, with `tokenizer.bin`, which contains the default llama2 tokenizer using 32000 tokens. 
-
-For testing, it is nice to be able to work with a smaller model & tokenizer:
-- Download the model & tokenizer from [huggingface stories260K](https://huggingface.co/karpathy/tinyllamas/tree/main/stories260K) and store them in:
-  - stories260K/stories260K.bin
-  - stories260K/tok512.bin
-  - stories260K/tok512.model
-- Deploy the canister:
-  ```bash
-  icpp build-wasm
-  dfx deploy
-  ```
-- Upload the model & tokenizer:
-  ```bash
-  python -m scripts.upload --model stories260K/stories260K.bin --tokenizer stories260K/tok512.bin
-  ```
-- Inference is now possible with many more tokens before hitting the instruction limit, but off course, the stories are not as good:
-  ```bash
-  $ dfx canister call llama2 inference '(record {prompt = "Lilly went swimming yesterday  " : text; steps = 100 : nat64; temperature = 0.9 : float32; topp = 0.9 : float32; rng_seed = 0 : nat64;})'
-  (
-    variant {
-      ok = "Lilly went swimming yesterday  order. She had a great eyes that was closed. One day, she asked her mom why the cloud was close to the pond. \n\"Mommy, I will take clothes away,\" Lila said. \"Th\n"
-    },
-  )
-  ```
 
 # Fine tuning
 

diff --git a/icpp_llama2/demo.ps1 b/icpp_llama2/demo.ps1
@@ -37,7 +37,7 @@ Write-Host $output -ForegroundColor Green
 #######################################################################
 Write-Host " "
 Write-Host "--------------------------------------------------"
-Write-Host "Building the wasm with wasi-sdk"
+Write-Host "Building the wasm with wasi-sdk, as defined in icpp.toml"
 icpp build-wasm --to-compile all
 # icpp build-wasm --to-compile mine
 

diff --git a/icpp_llama2/demo.sh b/icpp_llama2/demo.sh
@@ -16,7 +16,7 @@ dfx start --clean --background
 
 #######################################################################
 echo "--------------------------------------------------"
-echo "Building the wasm with wasi-sdk"
+echo "Building the wasm with wasi-sdk, as defined in icpp.toml"
 icpp build-wasm --to-compile all
 # icpp build-wasm --to-compile mine
 

diff --git a/icpp_llama2/scripts/requirements.txt b/icpp_llama2/scripts/requirements.txt
@@ -1,16 +1,16 @@
 requests
-pandas
-pandas-stubs
-jupyterlab
-jupyterlab-lsp
-jupyter-black
-python-lsp-server[all]
+# pandas
+# pandas-stubs
+# jupyterlab
+# jupyterlab-lsp
+# jupyter-black
+# python-lsp-server[all]
 python-dotenv
-tabulate
+# tabulate
 black
 mypy
 pylint==2.13.9
-matplotlib
-fastparquet
-openpyxl
-seaborn
+# matplotlib
+# fastparquet
+# openpyxl
+# seaborn