This repository contains a curated list of awesome open-source projects for production large language models.
- [2024.12.27] π₯A new cateogry βοΈLLM Extraction / Parsing has been added.
- [2024.10.26] A new category π€LLM Agent Benchmarks has been added.
- [2024.09.03] A new category πLLM Courses / Education has been added.
- [2024.08.01] A new category π³LLM Cookbook / Examples has been added.
Newly added projects are marked with π.
- data-juicer (
ModelScope
) A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! - datatrove (
HuggingFace
) Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks. - dolma (
AllenAI
) Data and tools for generating and inspecting OLMo pre-training data. - NeMo-Curator (
NVIDIA
) Scalable toolkit for data curation - dataverse (
Upstage
) The Universe of Data. All about data, data science, and data engineering - EasyInstruct (
ZJUNLP
) An Easy-to-use Instruction Processing Framework for LLMs. - data-prep-kit (
IBM
) Open source project for data preparation of LLM application builders - dps (
EleutherAI
) Data processing system for polyglot
- nanoGPT (
karpathy
) The simplest, fastest repository for training/finetuning medium-sized GPTs. - LLaMA-Factory A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
- unsloth (
Unsloth AI
) Finetune Llama 3.2, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory - peft (
HuggingFace
) PEFT: State-of-the-art Parameter-Efficient Fine-Tuning. - llama-recipes (
Meta
) Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. - litgpt (
LightningAI
) 20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale. - Megatron-LM (
NVIDIA
) Ongoing research training transformer models at scale - trl (
HuggingFace
) Train transformer language models with reinforcement learning. - LMFlow (
OptimalScale
) An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All. - gpt-neox (
EleutherAI
) An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries - torchtune (
PyTorch
) A Native-PyTorch Library for LLM Fine-tuning - xtuner (
InternLM
) An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...) - torchtitan (
PyTorch
) A native PyTorch Library for large model training - nanotron (
HuggingFace
) Minimalistic large language model 3D-parallelism training
- evals (
OpenAI
) Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks. - ragas (
Exploding Gradients
) Supercharge Your LLM Application Evaluations - lm-evaluation-harness (
EleutherAI
) A framework for few-shot evaluation of language models. - opencompass (
OpenCompass
) - OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets. - deepeval (
ConfidentAI
) The LLM Evaluation Framework - simple-evals (
OpenAI
) This repository contains a lightweight library for evaluating language models. - lighteval (
HuggingFace
) LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library datatrove and LLM training library nanotron. - evalverse (
Upstage
) The Universe of Evaluation. All about the evaluation for LLMs.
- ollama (
Ollama
) Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models. - gpt4all (
NomicAI
) GPT4All: Chat with Local LLMs on Any Device - llama.cpp LLM inference in C/C++
- FastChat (
LMSYS
) An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. - vllm A high-throughput and memory-efficient inference and serving engine for LLMs
- guidance (
guidance-ai
) A guidance language for controlling large language models. - LiteLLM (
BerriAI
) Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate, Groq (100+ LLMs) - BitNet (
Microsoft
) Official inference framework for 1-bit LLMs - OpenLLM (
BentoML
) Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud. - text-generation-inference (
HuggingFace
) Large Language Model Text Generation Inference - TensorRT-LLM (
NVIDIA
) TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. - SGLang (
sgl-project
) SGLang is a fast serving framework for large language models and vision language models. - LMDeploy (
InternLM
) LMDeploy is a toolkit for compressing, deploying, and serving LLMs. - torchchat (
PyTorch
) Run PyTorch LLMs locally on servers, desktop and mobile - RouteLLM (
LMSYS
) A framework for serving and evaluating LLM routers - save LLM costs without compromising quality! - LightLLM (
ModelTC
) LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
- AutoGPT AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
- langchain (
LangChain
) Build context-aware reasoning applications - dify (
LangGenius
) Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production. - MetaGPT The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
- llama_index (
LlamaIndex
) LlamaIndex is a data framework for your LLM applications - πQuivr (
Quivr
) Opiniated RAG for integrating GenAI in your apps. Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want. - AutoGen (
Microsoft
) A programming framework for agentic AI - Flowise (
FlowiseAI
) Drag & drop UI to build your customized LLM flow - β¬RAGFlow (
InfiniFlow
) RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding. - mem0 (
Mem0
) The memory layer for Personalized AI - crewAI (
crewAI
) Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks. - GraphRAG (
Microsoft
) A modular graph-based Retrieval-Augmented Generation (RAG) system - haystack (
Deepset
) LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. - swarm (
OpenAI
) Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team. - Letta (
Letta
) Letta (fka MemGPT) is a framework for creating stateful LLM services. - πoutlines (
.TXT
) Structured Text Generation (Make LLMs speak the language of every application.) - β¬pathway (
Pathway
) Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG. - llmware (
LLMware.ai
) Unified framework for building enterprise RAG pipelines with small, specialized models - πbrowser-use (
Browser Use
) Make websites accessible for AI agents (Browser use is the easiest way to connect your AI agents with the browser.) - TaskingAI (
TaskingAI
) The open source platform for AI-native application development. - β¬llama-stack (
Meta
) Model components of the Llama Stack APIs - AgentScope (
ModelScope
) Start building LLM-empowered multi-agent applications in an easier way. - β¬Qwen-Agent (
QwenLM
) Agent framework and applications built upon Qwen>=2.0, featuring Function Calling, Code Interpreter, RAG, and Chrome extension. - llama-stack-apps (
Meta
) Agentic components of the Llama Stack APIs - β¬AutoRAG (
Markr Inc.
) AutoML tool for RAG - Langroid (
Langroid
) Harness LLMs with Multi-Agent Programming - AgentOps (
AgentOps-AI
) Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks like CrewAI, Langchain, and Autogen - Lagent (
InternLM
) A lightweight framework for building LLM-based agents - πChonkie (
Chonkie.ai
) CHONK your texts with Chonkie - The no-nonsense RAG chunking library
- πMarkItDown (
Microsoft
) Python tool for converting files and office documents to Markdown. - πMinerU (
OpenDataLab
) A high-quality tool for convert PDF to Markdown and JSON. - πFirecrawl (
Mendable AI
) Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API. - πCrawl4AI (
UncleCode
) Crawl4AI: Crawl Smarter, Faster, Freely. For AI (LLMs, AI agents, and data pipelines). - πDocling (
IBM
) Get your documents ready for gen AI - πUnstructured (
Unstructured.io
) Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines. - πZerox (
OmniAI
) PDF to Markdown with vision models - πPDF-Extract-Kit (
OpenDataLab
) A Comprehensive Toolkit for High-Quality PDF Content Extraction - πMegaParse (
Quivr
) File Parser optimised for LLM Ingestion with no loss // Parse PDFs, Docx, PPTx in a format that is ideal for LLMs. - πLlamaParse (
LlamaIndex
) Parse files for optimal RAG. LlamaParse is a GenAI-native document parser that can parse complex document data for any downstream LLM use case (RAG, agents). - πGitIngest Replace 'hub' with 'ingest' in any github url to get a prompt-friendly extract of a codebase
- πOpen-Parse Improved file parsing for LLMβs
- πpdf-extract-api (
Catch The Tornado
) Document (PDF) extraction and parse API using state of the art modern OCRs + Ollama supported models. Anonymize documents. Remove PII. Convert any document or picture to structured JSON or Markdown - πnv-ingest (
NVIDIA
) NVIDIA Ingest is an early access set of microservices for parsing hundreds of thousands of complex, messy unstructured PDFs and other enterprise documents into metadata and text to embed into retrieval systems.
- promptflow (
Microsoft
) Build high-quality LLM apps - from prototyping, testing to production deployment and monitoring. - langfuse (
Langfuse
) Open source LLM engineering platform: Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. - evidently (
EvidentlyAI
) Evidently is ββan open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics. - promptfoo (
promptfoo
) Test your prompts, agents, and RAGs. Redteaming, pentesting, vulnerability scanning for LLMs. Improve your app's quality and catch problems. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. - giskard (
Giskard
) Open-Source Evaluation & Testing for LLMs and ML models - phoenix (
ArizeAI
) AI Observability & Evaluation - Opik (
Comet
) Open-source end-to-end LLM Development Platform - agenta (
Agenta.ai
) The all-in-one LLM developer platform: prompt management, evaluation, human feedback, and deployment all in one place.
- NeMo-Guardrails (
NVIDIA
) NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. - guardrails (
GuardrailsAI
) Adding guardrails to large language models. - PurpleLlama (
Meta
) Set of tools to assess and improve LLM security. - llm-guard (
ProtectAI
) The Security Toolkit for LLM Interactions
- openai-cookbook (
OpenAI
) Examples and guides for using the OpenAI API - anthropic-cookbook (
Anthropic
) A collection of notebooks/recipes showcasing some fun and effective ways of using Claude. - gemini-cookbook (
Google
) Examples and guides for using the Gemini API. - Phi-3CookBook (
Microsoft
) This is a Phi-3 book for getting started with Phi-3. Phi-3, a family of open AI models developed by Microsoft. - amazon-bedrock-workshop (
AWS
) This is a workshop designed for Amazon Bedrock a foundational model service. - mistral-cookbook (
Mistral
) The Mistral Cookbook features examples contributed by Mistralers and our community, as well as our partners. - gemma-cookbook (
Google
) A collection of guides and examples for the Gemma open models from Google. - amazon-bedrock-samples (
AWS
) This repository contains examples for customers to get started using the Amazon Bedrock Service. This contains examples for all available foundational models - cohere-notebooks (
Cohere
) Code examples and jupyter notebooks for the Cohere Platform - upstage-cookbook (
Upstage
) Upstage api examples and guides
- generative-ai-for-beginners (
Microsoft
) 18 Lessons, Get Started Building with Generative AI - llm-course Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
- LLMs-from-scratch Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
- hands-on-llms Learn about LLMs, LLMOps, and vector DBs for free by designing, training, and deploying a real-time financial advisor LLM system ~ source code + video & reading materials
- llm-zoomcamp (
DataTalksClub
) LLM Zoomcamp - a free online course about building a Q&A system - llm-twin-course (
DecodingML
) Learn for free how to build an end-to-end production-ready LLM & RAG system using LLMOps best practices: ~ source code + 12 hands-on lessons
- SWE-bench (
Princeton-NLP
) SWE-bench is a benchmark for evaluating large language models on real world software issues collected from GitHub. - MMAU (axlearn) (
Apple
) The Massive Multitask Agent Understanding (MMAU) benchmark is designed to evaluate the performance of large language models (LLMs) as agents across a wide variety of tasks. - mle-bench (
OpenAI
) MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering - WindowsAgentArena (
Microsoft
) Windows Agent Arena (WAA) is a scalable OS platform for testing and benchmarking of multi-modal AI agents. - DevAI (agent-as-a-judge) (
METAUTO.ai
) DevAI, a benchmark consisting of 55 realistic AI development tasks with 365 hierarchical user requirements. - πAIOpsLab (
Microsoft
) AIOpsLab is a holistic framework to enable the design, development, and evaluation of autonomous AIOps agents that, additionally, serves the purpose of building reproducible, standardized, interoperable and scalable benchmarks. - natural-plan (
Google DeepMind
) Natural Plan is a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling.
This project is inspired by Awesome Production Machine Learning.