Skip to content

A simple Docker container for running Llama 2 locally using Docker

License

Notifications You must be signed in to change notification settings

Zozman/simple-llama2-docker-container

Repository files navigation

Simple Llama 2 Docker Container

Are you trying to just get a simple Llama2 Docker Comtainer to just work? Welp, you might have come to the right place, as I needed the same thing, couldn't find it after Googling, and thus set this up for just that.

What Does This Repo Do?

This repo provides a set of scripts for setting up and building a local Docker container that provides a FastAPI server to interact with a local Llama2 instance and an API endpoint for submitting a query to the LLM directly using default settings. Once setup you can:

Example of using Swagger UI to interact with the Docker container

  • Directly send POST requests to the http://localhost/submit endpoint to directly query Llama 2 without any UI needed:

Example of a direct POST request to interact with the Docker container

Requirements

  • A machine with a NVIDIA GPU
  • Docker, Git, and Git LFS installed on your machine for the scripts to work
  • Successfully Requested Access to Llama2 from Meta using your email address
    • Meta will email you when you have been granted access
  • A Hugging Face account with an email that matches the one you requested Llama2 access with (so we can use git to download the model weights)
  • Successfully Requested Access to Llama2 on Hugging Face
    • Complete the steps in the Access Llama 2 on Hugging Face section on this page
    • Hugging Face will email you when you have been granted access
    • This process can take 1 to 2 days

Recomendations

  • Set up Git over SSH for Hugging Face so you don't have to type in credentials.

Model Weights

This project supports the following model weights:

Want more? Submit a pull request with an update. This is GitHub; y'all know the drill.

Instructions

  • git clone this repo
  • Run setup.sh <weight> with <weight> being the model weight you want to use
    • Llama-2-7b-chat is used is a weight is not provided.
    • This script will:
      • Validate the model weight
      • Ensures git and git lfs are installed
      • Check out the Llama 2 Python Library From GitHub
      • Check out the requested model weight
    • This only needs to be done once per model weight.
  • Run start.sh <weight> with <weight> being the model weight you want to use
    • Llama-2-7b-chat is used is a weight is not provided.
    • This script will build and start the Docker container using docker compose.
  • Go to http://localhost to use the Swagger UI to send a request to the /submit endpoint to submit a query or send a POST request to the /submit endpoint directly.

From here, feel free to start hacking what I've got here and customizing things to your needs. I've tried to comment all this to explain what's going on but this is just a base to get started.

License

Llama 2 resources are governed by the LLAMA 2 COMMUNITY LICENSE AGREEMENT.

The custom code in this repo is governed by The MIT License

About

A simple Docker container for running Llama 2 locally using Docker

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published