💡 Article | 💻 HuggingFace | 📔 Colab 1,2
Welcome to CRIA, a LLM model series based on Llama 2-7B.
Hint: krē-ə plural crias; a baby llama, alpaca, vicuña, or guanaco.
With ChatGPT's help, CRIA also stands for "Crafting a Rapid prototype of an Intelligent llm App using open source resources". This encapsulates the objective of this project perfectly.
Additionally, akin to a baby llama in nature, CRIA pays homage to its foundational model, Meta's Llama-2 7B Large Language Model.
- Demostration of instruction-tuning on latest open source LLM using a custom dataset on a free colab instance.
- Utilized FastAPI for efficient model serving and inference deployment.
- Supports real-time with Server-Sent Events (SSE) for a seamless chat experience.
- Enjoy a modern front-end built with Next.js and Chakra UI.
- Supports both local deployment, and cloud deployment. (Coming Soon!)
Demo: Leveraging on open source resources such as Horizon AI Template
In this repository, you'll find:
Code: Dive into the technical details of our chatbot implementation, including the training process, API server implementation, the integration of Next.js for the user interface, and more.
Documentation: Detailed documentation to help you understand and replicate the CRIA setup, from model selection to deployment considerations.
Demo: Access a live demo showcasing CRIA in action.
HuggingFace Model | Model Type | Base Model | Dataset | Colab | Status |
---|---|---|---|---|---|
cria-llama2-7b-v1.3, cria-llama2-7b-v1.3_peft |
Merged / PEFT | NousResearch/Llama-2-7b-chat-hf | mlabonne/CodeLlama-2-20k | Latest | |
cria-llama2-7b-v1.1, cria-llama2-7b-v1.2 | Merged / PEFT | TinyPixel/Llama-2-7B-bf16-sharded | n3rd0/DreamBook_Guanaco_Format | N.A. | Experimental |
cria-llama2-7b-v1.0 | PEFT | TinyPixel/Llama-2-7B-bf16-sharded | Elliot4AI/dolly-15k-chinese-guanacoformat | N.A. | Experimental |
The instructions to run the various components, such as the API server and frontend interface, can be found at /docs/setup.md.
The instructions to deploy the API server and frontend on the cloud, can be found at /docs/deployment.md.
CRIA v1.3 was first presented in a private session on 18 Aug 2023. The slides is publicly available here.
The overview of the project can be found at /docs/architecture.md.
Please refer to the /docs/adr/ folder for the detailed information on the list of design decisions made so far.
The preliminary model evaluation can be be found at /docs/model-eval/ folder.
- ML Blog - Fine-Tune Your Own Llama 2 Model in a Colab Notebook
- Fine-tune Llama 2 in Google Colab.ipynb - Colaboratory
- Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
- bnb-4bit-training.ipynb - Colaboratory
- 🐐Llama 2 Fine-Tune with QLoRA [Free Colab 👇🏽] - YouTube
- Fine-Tune Large LLMs with QLoRA (Free Colab Tutorial) - YouTube
- LLaMA2 for Multilingual Fine Tuning? - YouTube
- How to Tune Falcon-7B With QLoRA on a Single GPU - YouTube
- 🦙Llama 2 Fine-Tuning with 4-Bit QLoRA on Dolly-15k [Free Colab 🙌] - YouTube
- Fine-Tune Your Own Llama 2 Model in a Colab Notebook | Towards Data Science