A curated list of LLMs and related studies targeted at mobile and embedded hardware
Last update: 8th November 2024
If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.
Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.
- Mobile-First LLMs
- Infrastructure / Deployment of LLMs on Device
- Benchmarking LLMs on Device
- Mobile-Specific Optimisations
- Applications
- Multimodal LLMs
- Surveys on Efficient LLMs
- Training LLMs on Device
- Mobile-Related Use-Cases
- Leaderboards
- Industry Announcements
- Related Awesome Repositories
The following Table shows sub-3B models designed for on-device deployments, sorted by year.
Name | Year | Sizes | Primary Group/Affiliation | Publication | Code Repository | HF Repository |
---|---|---|---|---|---|---|
AMD-Llama-135m | 2024 | 135M | AMD | blog | code | huggingface |
SmolLM2 | 2024 | 135M, 360M, 1.7B | Huggingface | - | - | huggingface |
Ministral | 2024 | 3B, ... | Mistral | blog | - | huggingface |
Llama 3.2 | 2024 | 1B, 3B | Meta | blog | code | huggingface |
Gemma 2 | 2024 | 2B, ... | paper blog | code | huggingface | |
Apple Intelligence Foundation LMs | 2024 | 3B | Apple | paper | - | - |
SmolLM | 2024 | 135M, 360M, 1.7B | Huggingface | blog | - | huggingface |
Fox | 2024 | 1.6B | TensorOpera | blog | - | huggingface |
Qwen2 | 2024 | 500M, 1.5B, ... | Qwen Team | paper | code | huggingface |
OpenELM | 2024 | 270M, 450M, 1.08B, 3.04B | Apple | paper | code | huggingface |
Phi-3 | 2024 | 3.8B | Microsoft | whitepaper | code | huggingface |
OLMo | 2024 | 1B, ... | AllenAI | paper | code | huggingface |
Mobile LLMs | 2024 | 125M, 250M | Meta | paper | code | - |
Gemma | 2024 | 2B, ... | paper, website | code, gemma.cpp | huggingface | |
MobiLlama | 2024 | 0.5B, 1B | MBZUAI | paper | code | huggingface |
Stable LM 2 (Zephyr) | 2024 | 1.6B | Stability.ai | paper | - | huggingface |
TinyLlama | 2024 | 1.1B | Singapore University of Technology and Design | paper | code | huggingface |
Gemini-Nano | 2024 | 1.8B, 3.25B | paper | - | - | |
Stable LM (Zephyr) | 2023 | 3B | Stability | blog | code | huggingface |
OpenLM | 2023 | 11M, 25M, 87M, 160M, 411M, 830M, 1B, 3B, ... | OpenLM team | - | code | huggingface |
Phi-2 | 2023 | 2.7B | Microsoft | website | - | huggingface |
Phi-1.5 | 2023 | 1.3B | Microsoft | paper | - | huggingface |
Phi-1 | 2023 | 1.3B | Microsoft | paper | - | huggingface |
RWKV | 2023 | 169M, 430M, 1.5B, 3B, ... | EleutherAI | paper | code | huggingface |
Cerebras-GPT | 2023 | 111M, 256M, 590M, 1.3B, 2.7B ... | Cerebras | paper | code | huggingface |
OPT | 2022 | 125M, 350M, 1.3B, 2.7B, ... | Meta | paper | code | huggingface |
LaMini-LM | 2023 | 61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ... | MBZUAI | paper | code | huggingface |
Pythia | 2023 | 70M, 160M, 410M, 1B, 1.4B, 2.8B, ... | EleutherAI | paper | code | huggingface |
Galactica | 2022 | 125M, 1.3B, ... | Meta | paper | code | huggingface |
BLOOM | 2022 | 560M, 1.1B, 1.7B, 3B, ... | BigScience | paper | code | huggingface |
XGLM | 2021 | 564M, 1.7B, 2.9B, ... | Meta | paper | code | huggingface |
GPT-Neo | 2021 | 125M, 350M, 1.3B, 2.7B | EleutherAI | - | code, gpt-neox | huggingface |
MobileBERT | 2020 | 15.1M, 25.3M | CMU, Google | paper | code | huggingface |
BART | 2019 | 140M, 400M | Meta | paper | code | huggingface |
DistilBERT | 2019 | 66M | HuggingFace | paper | code | huggingface |
T5 | 2019 | 60M, 220M, 770M, 3B, ... | paper | code | huggingface | |
TinyBERT | 2019 | 14.5M | Huawei | paper | code | huggingface |
Megatron-LM | 2019 | 336M, 1.3B, ... | Nvidia | paper | code | - |
This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.
- llama.cpp: Inference of Meta's LLaMA model (and others) in pure C/C++. Supports various platforms and builds on top of ggml (now gguf format).
- LLMFarm: iOS frontend for llama.cpp
- LLM.swift: iOS frontend for llama.cpp
- Sherpa: Android frontend for llama.cpp
- iAkashPaul/Portal: Wraps the example android app with tweaked UI, configs & additional model support
- dusty-nv's llama.cpp: Containers for Jetson deployment of llama.cpp
- MLC-LLM: MLC LLM is a machine learning compiler and high-performance deployment engine for large language models. Supports various platforms and build on top of TVM.
- Android App: MLC Android app
- iOS App: MLC iOS app
- dusty-nv's MLC: Containers for Jetson deployment of MLC
- PyTorch ExecuTorch: Solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers.
- TorchChat: Codebase showcasing the ability to run large language models (LLMs) seamlessly across iOS and Android
- Google MediaPipe: A suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. Support Android, iOS, Python and Web.
- Apple MLX: MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. Builds upon lazy evaluation and unified memory architecture.
- MLX Swift: Swift API for MLX.
- HF Swift Transformers: Swift Package to implement a transformers-like API in Swift
- Alibaba MNN: MNN supports inference and training of deep learning models and for inference and training on-device.
- llama2.c (More educational, see here for android port)
- tinygrad: Simple neural network framework from tinycorp and @geohot
- TinyChatEngine: Targeted at Nvidia, Apple M1 and RPi, from Song Han's (MIT) group.
- PowerInfer-2: Fast Large Language Model Inference on a Smartphone (paper, code)
- [MobiCom'24] Mobile Foundation Model as Firmware (paper, code)
- Merino: Entropy-driven Design for Generative Language Models on IoT Devicess (paper)
- LLM as a System Service on Mobile Devices (paper)
- LLMCad: Fast and Scalable On-device Large Language Model Inference (paper)
- EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models (paper)
This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.
- MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases (paper)
- [MobiCom'24] MELTing point: Mobile Evaluation of Language Transformers (paper, talk, code)
This section focuses on techniques and optimisations that target mobile-specific deployment.
- MobileQuant: Mobile-friendly Quantization for On-device Language Models (paper, code)
- Gemma 2: Improving Open Language Models at a Practical Size (paper, code)
- Apple Intelligence Foundation Language Models (paper)
- Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (paper, code)
- Gemma: Open Models Based on Gemini Research and Technology (paper, code)
- MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT (paper, code)
- [ICML'24] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (paper, code)
- TinyLlama: An Open-Source Small Language Model (paper, code)
- Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent (paper)
- Octopus v2: On-device language model for super agent (paper)
- Towards an On-device Agent for Text Rewriting (paper)
This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.
- TinyLLaVA: A Framework of Small-scale Large Multimodal Models (paper, code)
- MobileVLM V2: Faster and Stronger Baseline for Vision Language Model (paper, code)
This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.
- A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness (paper)
- Small Language Models: Survey, Measurements, and Insights (paper)
- On-Device Language Models: A Comprehensive Review (paper)
- A Survey of Resource-efficient LLM and Multimodal Foundation Models (paper)
- Efficient Large Language Models: A Survey (paper, code)
- Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems (paper)
- A Survey on Model Compression for Large Language Models (paper)
This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.
- [MobiCom'23] Federated Few-Shot Learning for Mobile NLP (paper, code)
- FwdLLM: Efficient FedLLM using Forward Gradient (paper, code)
- [Electronics'24] Forward Learning of Large Language Models by Consumer Devices (paper)
- Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly (paper)
- Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (paper, code)
This section includes paper that are mobile-related, but not necessarily run on device.
- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs (paper)
- Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (paper, code)
- [MobiCom'24] MobileGPT: Augmenting LLM with Human-like App Memory for Mobile Task Automation (paper)
- [MobiCom'24] AutoDroid: LLM-powered Task Automation in Android (paper, code)
- [NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control (paper, code)
- GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation (paper, code)
- [ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences (paper)
- WWDC'24 - Apple Foundation Models
- PyTorch Executorch Alpha
- Google - LLMs On-Device with MediaPipe and TFLite
- Qualcomm - The future of AI is Hybrid
- ARM - Generative AI on mobile
If you want to read more about related topics, here are some tangential awesome repositories to visit:
- NexaAI/Awesome-LLMs-on-device on LLMs on Device
- Hannibal046/Awesome-LLM on Large Language Models
- KennethanCeyer/awesome-llm on Large Language Models
- HuangOwen/Awesome-LLM-Compression on Large Language Model Compression
- csarron/awesome-emdl on Embedded and Mobile Deep Learning
Contributions welcome! Read the contribution guidelines first.