Skip to content
/ serge Public
forked from serge-chat/serge

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.

License

Notifications You must be signed in to change notification settings

olirip/serge

 
 

Repository files navigation

Serge - LLaMA made easy 🦙

License Discord

Serge is a chat interface crafted with llama.cpp for running Alpaca models. No API keys, entirely self-hosted!

  • 🌐 SvelteKit frontend
  • 💾 Redis for storing chat history & parameters
  • ⚙️ FastAPI + LangChain for the API, wrapping calls to llama.cpp using the python bindings

🎥 Demo:

demo.webm

⚡️ Quick start

🐳 Docker:

docker run -d \
    --name serge \
    -v weights:/usr/src/app/weights \
    -v datadb:/data/db/ \
    -p 8008:8008 \
    ghcr.io/serge-chat/serge:latest

🐙 Docker Compose:

services:
  serge:
    image: ghcr.io/serge-chat/serge:latest
    container_name: serge
    restart: unless-stopped
    ports:
      - 8008:8008
    volumes:
      - weights:/usr/src/app/weights
      - datadb:/data/db/

volumes:
  weights:
  datadb:

Then, just visit http://localhost:8008/, You can find the API documentation at http://localhost:8008/api/docs

🖥️ Windows Setup

Ensure you have Docker Desktop installed, WSL2 configured, and enough free RAM to run models.

☁️ Kubernetes & Docker Compose Setup

Instructions for setting up Serge on Kubernetes can be found in the wiki.

🧠 Supported Models

We currently support the following models:

  • Airoboros 🎈
    • Airoboros-7B
    • Airoboros-13B
    • Airoboros-30B
    • Airoboros-65B
  • Alpaca 🦙
    • Alpaca-LoRA-65B
    • GPT4-Alpaca-LoRA-30B
  • BigTrans 🗺
    • BigTrans-13B
  • Chronos 🌑
    • Chronos-13B
    • Chronos-33B
    • Chronos-Hermes-13B
  • GPT4All 🌍
    • GPT4All-13B
  • Guanaco 🦙
    • Guanaco-7B
    • Guanaco-13B
    • Guanaco-33B
    • Guanaco-65B
  • Koala 🐨
    • Koala-7B
    • Koala-13B
  • Llama 🦙
    • FinLlama-33B
    • Llama-Supercot-30B
  • Lazarus 💀
    • Lazarus-30B
  • Minotour 🐃
    • Minotaur-15B
  • Nous 🧠
    • Nous-Hermes-13B
  • OpenAssistant 🎙️
    • OpenAssistant-30B
  • Robin 🏹
    • Robin-7B
    • Robin-13B
    • Robin-33B
    • Robin-65B
  • Samantha 👩
    • Samantha-7B
    • Samantha-13B
    • Samantha-33B
  • Tulu 🎚
    • Tulu-7B
    • Tulu-13B
    • Tulu-30B
  • Vicuna 🦙
    • Stable-Vicuna-13B
    • Vicuna-CoT-7B
    • Vicuna-CoT-13B
    • Vicuna-v1.1-7B
    • Vicuna-v1.1-13B
    • VicUnlocked-30B
    • VicUnlocked-65B
    • Vicuna-v1.3-7B
    • Vicuna-v1.3-13B
  • Wizard 🧙
    • Wizard-Mega-13B
    • Wizard-Vicuna-Uncensored-7B
    • Wizard-Vicuna-Uncensored-13B
    • Wizard-Vicuna-Uncensored-30B
    • WizardLM-30B
    • WizardLM-Uncensored-7B
    • WizardLM-Uncensored-13B
    • WizardLM-Uncensored-30B

Additional weights can be added to the serge_weights volume using docker cp:

docker cp ./my_weight.bin serge:/usr/src/app/weights/

⚠️ Memory Usage

LLaMA will crash if you don't have enough available memory for the model:

Model Max RAM Required
7B 4.5GB
7B-q2_K 5.37GB
7B-q3_K_L 6.10GB
7B-q4_1 6.71GB
7B-q4_K_M 6.58GB
7B-q5_1 7.56GB
7B-q5_K_M 7.28GB
7B-q6_K 8.03GB
7B-q8_0 9.66GB
13B 12GB
13B-q2_K 8.01GB
13B-q3_K_L 9.43GB
13B-q4_1 10.64GB
13B-q4_K_M 10.37GB
13B-q5_1 12.26GB
13B-q5_K_M 11.73GB
13B-q6_K 13.18GB
13B-q8_0 16.33GB
33B 20GB
33B-q2_K 16.21GB
33B-q3_K_L 19.78GB
33B-q4_1 22.83GB
33B-q4_K_M 22.12GB
33B-q5_1 26.90GB
33B-q5_K_M 25.55GB
33B-q6_K 29.19GB
33B-q8_0 37.06GB
65B 50GB
65B-q2_K 29.95GB
65B-q3_K_L 37.15GB
65B-q4_1 43.31GB
65B-q4_K_M 41.85GB
65B-q5_1 51.47GB
65B-q5_K_M 48.74GB
65B-q6_K 56.06GB
65B-q8_0 71.87GB

💬 Support

Need help? Join our Discord

🤝 Contributing

If you discover a bug or have a feature idea, feel free to open an issue or PR.

To run Serge in development mode:

git clone https://github.com/serge-chat/serge.git
docker compose -f docker-compose.dev.yml up -d --build

About

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 50.6%
  • Svelte 40.6%
  • TypeScript 2.5%
  • Dockerfile 2.1%
  • JavaScript 1.6%
  • Shell 1.1%
  • Other 1.5%