Awesome Mobile LLMs

A curated list of LLMs and related studies targeted at mobile and embedded hardware

Last update: 8th November 2024

If your publication/work is not included - and you think it should - please open an issue or reach out directly to @stevelaskaridis.

Let's try to make this list as useful as possible to researchers, engineers and practitioners all around the world.

Mobile-First LLMs

The following Table shows sub-3B models designed for on-device deployments, sorted by year.

Name	Year	Sizes	Primary Group/Affiliation	Publication	Code Repository	HF Repository
AMD-Llama-135m	2024	135M	AMD	blog	code	huggingface
SmolLM2	2024	135M, 360M, 1.7B	Huggingface	-	-	huggingface
Ministral	2024	3B, ...	Mistral	blog	-	huggingface
Llama 3.2	2024	1B, 3B	Meta	blog	code	huggingface
Gemma 2	2024	2B, ...	Google	paper blog	code	huggingface
Apple Intelligence Foundation LMs	2024	3B	Apple	paper	-	-
SmolLM	2024	135M, 360M, 1.7B	Huggingface	blog	-	huggingface
Fox	2024	1.6B	TensorOpera	blog	-	huggingface
Qwen2	2024	500M, 1.5B, ...	Qwen Team	paper	code	huggingface
OpenELM	2024	270M, 450M, 1.08B, 3.04B	Apple	paper	code	huggingface
Phi-3	2024	3.8B	Microsoft	whitepaper	code	huggingface
OLMo	2024	1B, ...	AllenAI	paper	code	huggingface
Mobile LLMs	2024	125M, 250M	Meta	paper	code	-
Gemma	2024	2B, ...	Google	paper, website	code, gemma.cpp	huggingface
MobiLlama	2024	0.5B, 1B	MBZUAI	paper	code	huggingface
Stable LM 2 (Zephyr)	2024	1.6B	Stability.ai	paper	-	huggingface
TinyLlama	2024	1.1B	Singapore University of Technology and Design	paper	code	huggingface
Gemini-Nano	2024	1.8B, 3.25B	Google	paper	-	-
Stable LM (Zephyr)	2023	3B	Stability	blog	code	huggingface
OpenLM	2023	11M, 25M, 87M, 160M, 411M, 830M, 1B, 3B, ...	OpenLM team	-	code	huggingface
Phi-2	2023	2.7B	Microsoft	website	-	huggingface
Phi-1.5	2023	1.3B	Microsoft	paper	-	huggingface
Phi-1	2023	1.3B	Microsoft	paper	-	huggingface
RWKV	2023	169M, 430M, 1.5B, 3B, ...	EleutherAI	paper	code	huggingface
Cerebras-GPT	2023	111M, 256M, 590M, 1.3B, 2.7B ...	Cerebras	paper	code	huggingface
OPT	2022	125M, 350M, 1.3B, 2.7B, ...	Meta	paper	code	huggingface
LaMini-LM	2023	61M, 77M, 111M, 124M, 223M, 248M, 256M, 590M, 774M, 738M, 783M, 1.3B, 1.5B, ...	MBZUAI	paper	code	huggingface
Pythia	2023	70M, 160M, 410M, 1B, 1.4B, 2.8B, ...	EleutherAI	paper	code	huggingface
Galactica	2022	125M, 1.3B, ...	Meta	paper	code	huggingface
BLOOM	2022	560M, 1.1B, 1.7B, 3B, ...	BigScience	paper	code	huggingface
XGLM	2021	564M, 1.7B, 2.9B, ...	Meta	paper	code	huggingface
GPT-Neo	2021	125M, 350M, 1.3B, 2.7B	EleutherAI	-	code, gpt-neox	huggingface
MobileBERT	2020	15.1M, 25.3M	CMU, Google	paper	code	huggingface
BART	2019	140M, 400M	Meta	paper	code	huggingface
DistilBERT	2019	66M	HuggingFace	paper	code	huggingface
T5	2019	60M, 220M, 770M, 3B, ...	Google	paper	code	huggingface
TinyBERT	2019	14.5M	Huawei	paper	code	huggingface
Megatron-LM	2019	336M, 1.3B, ...	Nvidia	paper	code	-

Infrastructure / Deployment of LLMs on Device

This section showcases frameworks and contributions for supporting LLM inference on mobile and edge devices.

Deployment Frameworks

llama.cpp: Inference of Meta's LLaMA model (and others) in pure C/C++. Supports various platforms and builds on top of ggml (now gguf format).
- LLMFarm: iOS frontend for llama.cpp
- LLM.swift: iOS frontend for llama.cpp
- Sherpa: Android frontend for llama.cpp
- iAkashPaul/Portal: Wraps the example android app with tweaked UI, configs & additional model support
- dusty-nv's llama.cpp: Containers for Jetson deployment of llama.cpp
MLC-LLM: MLC LLM is a machine learning compiler and high-performance deployment engine for large language models. Supports various platforms and build on top of TVM.
- Android App: MLC Android app
- iOS App: MLC iOS app
- dusty-nv's MLC: Containers for Jetson deployment of MLC
PyTorch ExecuTorch: Solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers.
- TorchChat: Codebase showcasing the ability to run large language models (LLMs) seamlessly across iOS and Android
Google MediaPipe: A suite of libraries and tools for you to quickly apply artificial intelligence (AI) and machine learning (ML) techniques in your applications. Support Android, iOS, Python and Web.
Apple MLX: MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research. Builds upon lazy evaluation and unified memory architecture.
- MLX Swift: Swift API for MLX.
HF Swift Transformers: Swift Package to implement a transformers-like API in Swift
Alibaba MNN: MNN supports inference and training of deep learning models and for inference and training on-device.
llama2.c (More educational, see here for android port)
tinygrad: Simple neural network framework from tinycorp and @geohot
TinyChatEngine: Targeted at Nvidia, Apple M1 and RPi, from Song Han's (MIT) group.

Papers

2024

PowerInfer-2: Fast Large Language Model Inference on a Smartphone (paper, code)
[MobiCom'24] Mobile Foundation Model as Firmware (paper, code)
Merino: Entropy-driven Design for Generative Language Models on IoT Devicess (paper)
LLM as a System Service on Mobile Devices (paper)

2023

LLMCad: Fast and Scalable On-device Large Language Model Inference (paper)
EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models (paper)

2022

[IEEE Pervasive Computing] The Future of Consumer Edge-AI Computing (paper, talk)

Benchmarking LLMs on Device

This section focuses on measurements and benchmarking efforts for assessing LLM performance when deployed on device.

Papers

2024

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases (paper)
[MobiCom'24] MELTing point: Mobile Evaluation of Language Transformers (paper, talk, code)

Mobile-Specific Optimisations

This section focuses on techniques and optimisations that target mobile-specific deployment.

Papers

2024

MobileQuant: Mobile-friendly Quantization for On-device Language Models (paper, code)
Gemma 2: Improving Open Language Models at a Practical Size (paper, code)
Apple Intelligence Foundation Language Models (paper)
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone (paper, code)
Gemma: Open Models Based on Gemini Research and Technology (paper, code)
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT (paper, code)
[ICML'24] MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (paper, code)
TinyLlama: An Open-Source Small Language Model (paper, code)

Applications

Papers

2024

Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent (paper)
Octopus v2: On-device language model for super agent (paper)

2023

Towards an On-device Agent for Text Rewriting (paper)

Multimodal LLMs

This section refers to multimodal LLMs, which integrate vision or other modalities in their tasks.

Papers

2024

TinyLLaVA: A Framework of Small-scale Large Multimodal Models (paper, code)
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model (paper, code)

2023

MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices (paper, code)

Surveys on Efficient LLMs

This section includes survey papers on LLM efficiency, a topic very much related to deploying in constrained devices.

Papers

2024

A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness (paper)
Small Language Models: Survey, Measurements, and Insights (paper)
On-Device Language Models: A Comprehensive Review (paper)
A Survey of Resource-efficient LLM and Multimodal Foundation Models (paper)

2023

Efficient Large Language Models: A Survey (paper, code)
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems (paper)
A Survey on Model Compression for Large Language Models (paper)

Training LLMs on Device

This section refers to papers attempting to train/fine-tune LLMs on device, in a standalone or federated manner.

Papers

2023

[MobiCom'23] Federated Few-Shot Learning for Mobile NLP (paper, code)
FwdLLM: Efficient FedLLM using Forward Gradient (paper, code)
[Electronics'24] Forward Learning of Large Language Models by Consumer Devices (paper)
Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly (paper)
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes (paper, code)

Mobile-Related Use-cases

This section includes paper that are mobile-related, but not necessarily run on device.

Papers

2024

Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs (paper)
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception (paper, code)
[MobiCom'24] MobileGPT: Augmenting LLM with Human-like App Memory for Mobile Task Automation (paper)
[MobiCom'24] AutoDroid: LLM-powered Task Automation in Android (paper, code)

2023

[NeurIPS'23] AndroidInTheWild: A Large-Scale Dataset For Android Device Control (paper, code)
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation (paper, code)

Older

[ACL'20] Mapping Natural Language Instructions to Mobile UI Action Sequences (paper)

Leaderboards

Industry Announcements

Related Awesome Repositories

If you want to read more about related topics, here are some tangential awesome repositories to visit:

NexaAI/Awesome-LLMs-on-device on LLMs on Device
Hannibal046/Awesome-LLM on Large Language Models
KennethanCeyer/awesome-llm on Large Language Models
HuangOwen/Awesome-LLM-Compression on Large Language Model Compression
csarron/awesome-emdl on Embedded and Mobile Deep Learning

Contribute

Contributions welcome! Read the contribution guidelines first.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
contributing.md		contributing.md

License

stevelaskaridis/awesome-mobile-llm

Folders and files

Latest commit

History

Repository files navigation

Awesome Mobile LLMs

Contents

Mobile-First LLMs

Infrastructure / Deployment of LLMs on Device

Deployment Frameworks

Papers

2024

2023

2022

Benchmarking LLMs on Device

Papers

2024

Mobile-Specific Optimisations

Papers

2024

Applications

Papers

2024

2023

Multimodal LLMs

Papers

2024

2023

Surveys on Efficient LLMs

Papers

2024

2023

Training LLMs on Device

Papers

2023

Mobile-Related Use-cases

Papers

2024

2023

Older

Leaderboards

Industry Announcements

Related Awesome Repositories

Contribute

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages