Skip to content

Commit

Permalink
Merge pull request #1222 from jasonrandrews/review
Browse files Browse the repository at this point in the history
review Llama3 on Rasp Pi 5 Learning Path
  • Loading branch information
jasonrandrews authored Sep 6, 2024
2 parents a72f09a + f899884 commit 510cbb0
Show file tree
Hide file tree
Showing 6 changed files with 183 additions and 149 deletions.
6 changes: 2 additions & 4 deletions content/learning-paths/embedded-systems/rpi-llama3/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,12 @@ minutes_to_complete: 60
who_is_this_for: This is an introductory topic for anyone interested in running the Llama 3 model on a Raspberry Pi 5. It also touches on techniques for running large language models (LLMs) in an embedded environment.

learning_objectives:
- Use Docker to emulate an embedded operating system.
- Use Docker to run Raspberry Pi OS on an Arm Linux server.
- Compile a Large Language Model (LLM) using ExecuTorch.
- Deploy the Llama 3 model on an edge device.


prerequisites:
- An Arm-based machine or cloud instance.
- An Arm Linux machine or an [Arm cloud instance](/learning-paths/servers-and-cloud-computing/csp/).
- A Raspberry Pi 5.

author_primary: Annie Tallund
Expand All @@ -24,7 +23,6 @@ armips:
- Cortex-A
operatingsystems:
- Linux
- Raspberry Pi OS
tools_software_languages:
- LLM
- GenAI
Expand Down
10 changes: 5 additions & 5 deletions content/learning-paths/embedded-systems/rpi-llama3/_review.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,18 @@ review:
question: >
What quantization scheme does Llama require to run on an embedded device such as the Raspberry Pi 5?
answers:
- 8-bit groupwise per token dynamic quantization of all the linear layers.
- 4-bit groupwise per token dynamic quantization of all the linear layers.
- No quantization is needed.
- "8-bit groupwise per token dynamic quantization of all the linear layers."
- "4-bit groupwise per token dynamic quantization of all the linear layers."
- "No quantization is needed."
correct_answer: 2
explanation: >
The 4-bit quantization scheme yields the smallest memory footprint for Llama 3 in this case.
- questions:
question: >
Dynamic quantization happens at runtime.
answers:
- False
- True
- "False"
- "True"
correct_answer: 2
explanation: >
Dynamic quantization refers to quantizing activations at runtime.
Expand Down
46 changes: 33 additions & 13 deletions content/learning-paths/embedded-systems/rpi-llama3/dev-env.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,48 +5,68 @@ weight: 2
### FIXED, DO NOT MODIFY
layout: learningpathall
---
The rise of Large Language Models (LLM) re-shapes the landscape of what is possible. The *transformer networks* are known for their ability to generate coherent responses to complex strings of text. A known collection of LLMs is [Llama](https://llama.meta.com/). It successfully generates text so contextually accurate that it can be indistinguishable from a real human.
The rise of Large Language Models (LLMs) re-shapes the landscape of what is possible. *Transformer networks* are known for their ability to generate coherent responses to complex strings of text. A known collection of LLMs is [Llama](https://llama.meta.com/). It successfully generates text so contextually accurate that it can be indistinguishable from a real human.

In this learning path, you will prepare an LLM for edge deployment on the Raspberry Pi 5. A Docker container emulates the edge device, which is used to build the binaries needed to deploy the model on the actual device.
In this Learning Path, you will prepare an LLM for edge deployment on the Raspberry Pi 5. A Docker container with Raspberry Pi OS is used to build the binaries needed to deploy the model on the actual device.

## Arm machine memory requirements
## Arm Linux development machine requirements

You can run the steps in this learning path on any Arm-based Linux, either a physical machine or instance in the cloud (later on referred to as _host_). Because of the size of the model, you need one with a generous amount of memory (RAM), which is needed to compile the transformer model. These instructions were tested on an AWS instance of type `m7gd.4xlarge` and `c7g.8xlarge`, with 64 GB of memory (RAM) and a disk volume of 100 GB .
You can run the steps in this Learning Path on any Arm-based Linux computer, either a physical machine or a cloud instance. Because of the size of the model, you need one with a generous amount of memory (RAM), which is needed to compile the transformer model. These instructions were tested on an AWS instance of type `m7gd.4xlarge` and `c7g.8xlarge`, with 64 GB of memory (RAM) and a disk volume of 100 GB. If necessary, you can get by with 32 GB RAM and 16 GB of swap space, but build time is slower.

Information on launching an AWS instance is available in the [Getting Started with AWS](/learning-paths/servers-and-cloud-computing/csp/aws/) install guide.
Information on launching an AWS instance is available in [Getting Started with AWS](/learning-paths/servers-and-cloud-computing/csp/aws/).

Verify the architecture of your machine:
```bash
uname -m
```
and observe the output
and confirm the output is:
```console
aarch64
```

## Run Raspberry Pi OS in a Docker container

This example uses Docker to run an Raspberry Pi OS container in the cloud. You can go through the [Docker](/install-guides/docker/docker-engine) install guide for installation instructions.
Development of large, complex AI applications to be run on edge hardware like the Raspberry Pi 5 is often difficult. The complexity comes from resource intensive C++ compilation, large files, and a long list of dependencies.

A Docker container is useful to try out embedded workflows without hardware access. Raspberry Pi has Docker images available for download. For this example, a [GitHub repository](https://github.com/jasonrandrews/rpi-os-docker-image) is set up with everything you need to get started. It uses an image of the Raspberry Pi OS, which also comes with a number of useful tools such as Git, Python and the Arm toolchain. In the host machine, clone the example repository:
This example uses Docker to run Raspberry Pi OS in a container, providing the same operating system on the development machine and the edge device. A container is also useful to try out workflows without hardware access.

Make sure Docker is installed on your Arm Linux development machine. Refer to the [Docker install guide](/install-guides/docker/docker-engine) for instructions.

Raspberry Pi has Docker images available for download, but for this example, a [GitHub repository](https://github.com/jasonrandrews/rpi-os-docker-image) is provided with everything you need to get started. It uses an image of Raspberry Pi OS, which also comes with a number of useful tools such as Git, Python, and the Arm toolchain.

On your Arm Linux development machine, clone the example repository:

```bash
git clone https://github.com/jasonrandrews/rpi-os-docker-image
cd rpi-os-docker-image
```

Run the scripts to set up the container.
Take a look at the `Dockerfile` and scripts to see how the process of creating the Docker image works.

The scripts download Raspberry Pi OS and build a container image using the Dockerfile.

Download the Raspberry Pi OS image:

```bash
./get-pi-sw.sh
```

In addition to downloading Raspberry Pi OS, the `get-pi-sw.sh` script extracts the root file system so it can be used in the container.

Build the Docker image:

```bash
./build.sh
./run.sh
```

These scripts will download the Raspberry Pi OS image and build it using the Dockerfile. Finally, it will run the container as an interactive terminal session.
The `run.sh` script starts the container with an interactive terminal session:

```bash
./run.sh
```

You should now be in a shell named `pi@rpi`. With this, you now know how to run the Raspberry Pi OS in a cloud container.
You are now at a shell prompt named `pi@rpi` and you are running Raspberry Pi OS in a container.

{{% notice Note %}}
The rest of this learning path will be run in this running Docker container shell.
For the next steps, continue working at the Docker container shell prompt.
{{% /notice %}}
30 changes: 15 additions & 15 deletions content/learning-paths/embedded-systems/rpi-llama3/executorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,18 @@ The best practice is to create an isolated Python environment in which you insta

### Option 1: Create a Python virtual environment

Create a Python virtual environment using:

```bash
python -m venv executorch-venv
source executorch-venv/bin/activate
```

Your terminal displays `(executorch-venv)` to indicate that the virtual environment is active.
Your terminal displays `(executorch-venv)` in the prompt indicating the virtual environment is active.

### Option 2: Create a Conda virtual environment

Install Miniconda on your development machine by following the [Anaconda](/install-guides/anaconda/) Install Guide.
Install Miniconda on your development machine by following the [Anaconda install guide](/install-guides/anaconda/).

Once `conda` is installed create the environment:

Expand All @@ -31,43 +33,41 @@ conda create -yn executorch-venv
conda activate executorch-venv
```

## Install clang
## Install Clang

Install Clang, which is required to build ExecuTorch:

Install clang if it is not already installed.
```bash
sudo apt install clang
sudo apt install clang -y
```

Then, make clang the default compiler for cc and c++
Then, make clang the default C/C++ compiler:

```bash
sudo update-alternatives --install /usr/bin/cc cc /usr/bin/clang 100
sudo update-alternatives --install /usr/bin/c++ c++ /usr/bin/clang++ 100
sudo update-alternatives --set cc /usr/bin/clang
sudo update-alternatives --set c++ /usr/bin/clang++
```

This will allow ExecuTorch to compile and build properly.

## Clone ExecuTorch and install the required dependencies

From within the environment, run the commands below to download the ExecuTorch repository and install the required packages. After cloning the repository, you need to update and pull the project's submodules. Finally, you run two scripts that install a few dependencies.
Continue in your Python virtual environment, and run the commands below to download the ExecuTorch repository and install the required packages.

After cloning the repository, the project's submodules are updated, and two scripts install additional dependencies.

``` bash
git clone https://github.com/pytorch/executorch.git
cd executorch

git submodule sync
git submodule update --init

./install_requirements.sh --pybind xnnpack

./examples/models/llama2/install_requirements.sh
```
{{% notice Note %}}
The install_requirements for Llama 3 are the same as for Llama 2, so you can use the instructions for both models up until the very last step.

{{% notice Note %}}
You can safely ignore the following error on failing to import lm_eval running the install_requirements.sh scripts:
`Failed to import examples.models due to lm_eval conflict`
{{% /notice %}}

If these scripts finish successfully, ExecuTorch is all set up. That means it's time to dive into the world of Llama models!
When these scripts finish successfully, ExecuTorch is all set up. That means it's time to dive into the world of Llama models!
Loading

0 comments on commit 510cbb0

Please sign in to comment.