Skip to content

Survey of Small Language Models from Penn State, ...

License

Notifications You must be signed in to change notification settings

FairyFali/SLMs-Survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 

Repository files navigation

SLM Survey

Awesome

A Comprehensive Survey of Small Language Models: Technology, On-Device Applications, Efficiency, Enhancements for LLMs, and Trustworthiness

This repo include the papers discussed in our latest survey paper on small language models.
📖 Read the full paper here: Paper Link

News

  • 2024/11/04: The first version of our survey is on Arxiv!

Reference

If our survey is useful for your research, please kindly cite our paper:

@article{wang2024comprehensive,
  title={A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness},
  author={Wang, Fali and Zhang, Zhiwei and Zhang, Xianren and Wu, Zongyu and Mo, Tzuhao and Lu, Qiuhao and Wang, Wanjing and Li, Rui and Xu, Junjie and Tang, Xianfeng and others},
  journal={arXiv preprint arXiv:2411.03350},
  year={2024}
}

Overview of SLMs

Overview of Small Language Models

Timeline of SLMs

Timeline of Small Language Models

SLMs Paper List

Existing SLMs

Model #Params Date Paradigm Domain Code HF Model Paper/Blog
Llama 3.2 1B; 3B 2024.9 Pre-train Generic Github HF Blog
Qwen 1 1.8B; 7B; 14B; 72B 2023.12 Pre-train Generic Github HF Paper
Qwen 1.5 0.5B; 1.8B; 4B; 7B; 14B; 32B; 72B 2024.2 Pre-train Generic Github HF Paper
Qwen 2 0.5B; 1.5B; 7B; 57B; 72B 2024.6 Pre-train Generic Github HF Paper
Qwen 2.5 0.5B; 1.5B; 3B; 7B; 14B; 32B; 72B 2024.9 Pre-train Generic Github HF Paper
Gemma 2B; 7B 2024.2 Pre-train Generic HF Paper
Gemma 2 2B; 9B; 27B 2024.7 Pre-train Generic HF Paper
H2O-Danube3 500M; 4B 2024.7 Pre-train Generic HF Paper
Fox-1 1.6B 2024.6 Pre-train Generic HF Blog
Rene 1.3B 2024.5 Pre-train Generic HF Paper
MiniCPM 1.2B; 2.4B 2024.4 Pre-train Generic Github HF Paper
OLMo 1B; 7B 2024.2 Pre-train Generic Github HF Paper
TinyLlama 1B 2024.1 Pre-train Generic Github HF Paper
Phi-1 1.3B 2023.6 Pre-train Coding HF Paper
Phi-1.5 1.3B 2023.9 Pre-train Generic HF Paper
Phi-2 2.7B 2023.12 Pre-train Generic HF Paper
Phi-3 3.8B; 7B; 14B 2024.4 Pre-train Generic HF Paper
Phi-3.5 3.8B; 4.2B; 6.6B 2024.4 Pre-train Generic HF Paper
OpenELM 270M; 450M; 1.1B; 3B 2024.4 Pre-train Generic Github HF Paper
MobiLlama 0.5B; 0.8B 2024.2 Pre-train Generic Github HF Paper
MobileLLM 125M; 350M 2024.2 Pre-train Generic Github HF Paper
StableLM 3B; 7B 2023.4 Pre-train Generic Github HF Paper
StableLM 2 1.6B 2024.2 Pre-train Generic Github HF Paper
Cerebras-GPT 111M-13B 2023.4 Pre-train Generic HF Paper
BLOOM, BLOOMZ 560M; 1.1B; 1.7B; 3B; 7.1B; 176B 2022.11 Pre-train Generic HF Paper
OPT 125M; 350M; 1.3B; 2.7B; 5.7B 2022.5 Pre-train Generic HF Paper
XGLM 1.7B; 2.9B; 7.5B 2021.12 Pre-train Generic Github HF Paper
GPT-Neo 125M; 350M; 1.3B; 2.7B 2021.5 Pre-train Generic Github Paper
Megatron-gpt2 355M; 2.5B; 8.3B 2019.9 Pre-train Generic Github Paper, Blog
MINITRON 4B; 8B; 15B 2024.7 Pruning and Distillation Generic Github HF Paper
Orca 2 7B 2023.11 Distillation Generic HF Paper
Dolly-v2 3B; 7B; 12B 2023.4 Instruction tuning Generic Github HF Blog
LaMini-LM 61M-7B 2023.4 Distillation Generic Github HF Blog
Specialized FlanT5 250M; 760M; 3B 2023.1 Instruction Tuning Generic (math) Github - Paper
FlanT5 80M; 250M; 780M; 3B 2022.10 Instruction Tuning Generic Gihub HF Paper
T5 60M; 220M; 770M; 3B; 11B 2019.9 Pre-train Generic Github HF Paper

SLM Architecture

  1. Transformer: Attention is all you need. Ashish Vaswani. NeurIPS 2017.
  2. Mamba 1: Mamba: Linear-time sequence modeling with selective state spaces. Albert Gu and Tri Dao. COLM 2024. [Paper].
  3. Mamba 2: Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality. Tri Dao and Albert Gu. ICML 2024. [Paper] [Code]

Enhancement for SLM

Training from scratch

  1. MobiLlama: "MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT". Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan. arXiv 2024. [Paper] [Github] [HuggingFace]
  2. MobileLLM: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases". Zechun Liu, Changsheng Zhao, Forrest Iandola, Chen Lai, Yuandong Tian, Igor Fedorov, Yunyang Xiong, Ernie Chang, Yangyang Shi, Raghuraman Krishnamoorthi, Liangzhen Lai, Vikas Chandra ICML 2024. [Paper] [Github] [HuggingFace]
  3. Rethinking optimization and architecture for tiny language models. Yehui Tang, Fangcheng Liu, Yunsheng Ni, Yuchuan Tian, Zheyuan Bai, Yi-Qi Hu, Sichao Liu, Shangling Jui, Kai Han, and Yunhe Wang. ICML 2024. [Paper] [Code]
  4. MindLLM: "MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications". Yizhe Yang, Huashan Sun, Jiawei Li, Runheng Liu, Yinghao Li, Yuhang Liu, Heyan Huang, Yang Gao. arXiv 2023. [Paper] [HuggingFace]

Supervised fine-tuning

  1. Direct preference optimization: Your language model is secretly a reward model. Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D Manning, Stefano Ermon, and Chelsea Finn. NeurIPS, 2024. [Paper] [Code]
  2. Enhancing chat language models by scaling high-quality instructional conversations. Ning Ding, Yulin Chen, Bokai Xu, Yujia Qin, Zhi Zheng, Shengding Hu, Zhiyuan Liu, Maosong Sun, and Bowen Zhou. EMNLP 2023. [Paper] [Code]
  3. SlimOrca: An Open Dataset of GPT-4 Augmented FLAN Reasoning Traces, with Verification. Wing Lian, Guan Wang, Bleys Goodson, Eugene Pentland, Austin Cook, Chanvichet Vong, and "Teknium". Huggingface, 2023. [Data]
  4. Stanford Alpaca: An Instruction-following LLaMA model. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. GitHub, 2023. [Blog] [Github] [HuggingFace]
  5. OpenChat: Advancing Open-source Language Models with Mixed-Quality Data. Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, and Yang Liu. ICLR, 2024. [Paper] [Code] [HuggingFace]
  6. Training language models to follow instructions with human feedback. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. NeurIPS, 2022. [Paper]
  7. RLHF: "Training language models to follow instructions with human feedback". Long Ouyang et al. 2022. [Paper]
  8. MobileBERT: "MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices". Zhiqing Sun et al. ACL 2020. [Paper] [Github] [HuggingFace]
  9. Language models are unsupervised multitask learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. OpenAI Blog, 2019. [Paper]

Data quality in KD

  1. TinyStory: "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?". Ronen Eldan et al. 2023. [Paper] [HuggingFace]
  2. AS-ES: "AS-ES Learning: Towards Efficient CoT Learning in Small Models". Nuwa Xi et al. 2024. [Paper]
  3. Self-Amplify: "Self-AMPLIFY: Improving Small Language Models with Self Post Hoc Explanations". Milan Bhan et al. 2024. [Paper]
  4. Large Language Models Can Self-Improve. Jiaxin Huang, Shixiang Shane Gu, Le Hou, Yuexin Wu, Xuezhi Wang, Hongkun Yu, and Jiawei Han. EMNLP 2023. [Paper]
  5. Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing. Ye Tian, Baolin Peng, Linfeng Song, Lifeng Jin, Dian Yu, Haitao Mi, and Dong Yu. NeurIPS 2024. [Paper] [Code]

Distillation for SLM

  1. GKD: "On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes". Rishabh Agarwal et al. ICLR 2024. [Paper]
  2. DistilLLM: "DistiLLM: Towards Streamlined Distillation for Large Language Models". Jongwoo Ko et al. ICML 2024. [Paper] [Github]
  3. Adapt-and-Distill: "Adapt-and-Distill: Developing Small, Fast and Effective Pretrained Language Models for Domains". Yunzhi Yao et al. ACL2021. [Paper] [Github]

Quantization

  1. SmoothQuant: "SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models". Guangxuan Xiao et al. ICML 2023. [Paper] [Github]
  2. BiLLM: "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs". Wei Huang et al. 2024. [Paper] [Github]
  3. LLM-QAT: "LLM-QAT: Data-Free Quantization Aware Training for Large Language Models". Zechun Liu et al. 2023. [Paper]
  4. PB-LLM: "PB-LLM: Partially Binarized Large Language Models". Yuzhang Shang et al. 2024. [Paper] [Github]
  5. OneBit: "OneBit: Towards Extremely Low-bit Large Language Models". Yuzhuang Xu et al. NeurIPS 2024. [Paper]
  6. BitNet: "BitNet: Scaling 1-bit Transformers for Large Language Models". Hongyu Wang et al. 2023. [Paper]
  7. BitNet b1.58: "The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits". Shuming Ma et al. 2024. [Paper]
  8. SqueezeLLM: "SqueezeLLM: Dense-and-Sparse Quantization". Sehoon Kim et al. ICML 2024. [Paper] [Github]
  9. JSQ: "Compressing Large Language Models by Joint Sparsification and Quantization". Jinyang Guo et al. PMLR 2024. [Paper] [Github]
  10. FrameQuant: "FrameQuant: Flexible Low-Bit Quantization for Transformers". Harshavardhan Adepu et al. 2024. [Paper] [Github]
  11. BiLLM: "BiLLM: Pushing the Limit of Post-Training Quantization for LLMs". Wei Huang et al. 2024. [Paper] [Github]
  12. LQER: "LQER: Low-Rank Quantization Error Reconstruction for LLMs". Cheng Zhang et al. ICML 2024. [Paper] [Github]
  13. I-LLM: "I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models". Xing Hu et al. 2024. [Paper] [Github]
  14. PV-Tuning: "PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression". Vladimir Malinovskii et al. 2024. [Paper]
  15. PEQA: "Memory-efficient fine-tuning of compressed large language models via sub-4-bit integer quantization". Jeonghoon Kim et al. NIPS 2023. [Paper]
  16. QLoRA: "QLORA: efficient finetuning of quantized LLMs". Tim Dettmers et al. NIPS 2023. [Paper] [Github]

LLM techniques for SLMs

  1. Ma et al.: "Large Language Model Is Not a Good Few-shot Information Extractor, but a Good Reranker for Hard Samples!". Yubo Ma et al. EMNLP 2023. [Paper] [Github]
  2. MoQE: "Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness". Young Jin Kim et al. 2023. [Paper]
  3. SLM-RAG: "Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?". Suqing Liu et al. ITiCSE 2024. [Paper]

Task-specific SLM Applications

SLM in QA

  1. Alpaca: "Alpaca: A Strong, Replicable Instruction-Following Model". Rohan Taori et al. 202X. [Paper] [Github] [HuggingFace]
  2. Stable Beluga 7B: "Stable Beluga 2". Mahan et al. 2023. [HuggingFace]
  3. Fine-tuned BioGPT Guo et al.: "Improving Small Language Models on PubMedQA via Generative Data Augmentation". Zhen Guo et al. 2023. [Paper]
  4. Financial SLMs: "Fine-tuning Smaller Language Models for Question Answering over Financial Documents". Karmvir Singh Phogat et al. 2024. [Paper]
  5. ColBERT: "ColBERT Retrieval and Ensemble Response Scoring for Language Model Question Answering". Alex Gichamba et al. IEEE 2024. [Paper]
  6. T-SAS: "Test-Time Self-Adaptive Small Language Models for Question Answering". Soyeong Jeong et al. ACL 2023. [Paper] [Github]
  7. Rationale Ranking: "Answering Unseen Questions With Smaller Language Models Using Rationale Generation and Dense Retrieval". Tim Hartill et al. 2023. [Paper]

SLM in Coding

  1. Phi-3.5-mini: "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone". Marah Abdin et al. 202X. [Paper] [HuggingFace]
  2. TinyLlama: "TinyLlama: An Open-Source Small Language Model". Peiyuan Zhang et al. 2024. [Paper] [HuggingFace]
  3. CodeLlama: "Code Llama: Open Foundation Models for Code". Baptiste Rozière et al. 2024. [Paper] [HuggingFace]
  4. CodeGemma: "CodeGemma: Open Code Models Based on Gemma". Heri Zhao et al. 2024. [Paper] [HuggingFace]

SLM in Recommendation

  1. PromptRec: "Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start Recommendations". Xuansheng Wu, et al. 2024. [Paper] [Github]
  2. SLIM: "Can Small Language Models be Good Reasoners for Sequential Recommendation?". Yuling Wang et al. 2024. [Paper]
  3. BiLLP: "Large Language Models are Learnable Planners for Long-Term Recommendation". Wentao Shi et al. 2024. [Paper]
  4. ONCE: "ONCE: Boosting Content-based Recommendation with Both Open- and Closed-source Large Language Models". Qijiong Liu et al. 2023. [Paper] [Github] [HuggingFace]
  5. RecLoRA: "Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation". Jiachen Zhu et al. 2024. [Paper]

SLM in Web Search

  1. Content encoder: "Pre-training Tasks for Embedding-based Large-scale Retrieval". Wei-Cheng Chang et al. ICLR 2020. [Paper]
  2. Poly-encoders: "Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring". Samuel Humeau et al. ICLR 2020. [Paper]
  3. Twin-BERT: "TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval". Wenhao Lu et al. 2020. [Paper]
  4. H-ERNIE: "H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search". Xiaokai Chu et al. SIGIR 2022. [Paper]
  5. Ranker: "Passage Re-ranking with BERT". Rodrigo Nogueira et al. 2019. [Paper] [Github]
  6. Rewriter: "Query Rewriting for Retrieval-Augmented Large Language Models". Xinbei Ma et al. EMNLP2023. [Paper] [Github]

SLM in Mobile-device

  1. Octopus: "Octopus: On-device language model for function calling of software APIs". Wei Chen et al. 2024. [Paper]
  2. MobileAgent: "Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration". Junyang Wang et al. 2024. [Paper] [Github] [HuggingFace]
  3. Revolutionizing Mobile Interaction: "Revolutionizing Mobile Interaction: Enabling a 3 Billion Parameter GPT LLM on Mobile". Samuel Carreira et al. 2023. [Paper]
  4. AutoDroid: "AutoDroid: LLM-powered Task Automation in Android". Hao Wen et al. 2023. [Paper]
  5. On-device Agent for Text Rewriting: "Towards an On-device Agent for Text Rewriting". Yun Zhu et al. 2023. [Paper]

On-device Deployment Optimization Techniques

Memory Efficiency Optimization

  1. EDGE-LLM: "EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting". Zhongzhi Yu et al. 2024. [Paper] [Github]
  2. LLM-PQ: "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization". Juntao Zhao et al. 2024. [Paper] [Github]
  3. AWQ: "AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration". Ji Lin et al. MLSys 2024. [Paper] [Github]
  4. MobileAIBench: "MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases". Rithesh Murthy et al. 2024. [Paper] [Github]
  5. MobileLLM: "MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases". Zechun Liu et al. ICML 2024. [Paper] [Github] [HuggingFace]
  6. EdgeMoE: "EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models". Rongjie Yi et al. 2023. [Paper] [Github]
  7. GEAR: "GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM". Hao Kang et al. 2024. [Paper] [Github]
  8. DMC: "Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference". Name et al. 202X. [Paper]
  9. Transformer-Lite: "Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs". Luchang Li et al. 202X. [Paper]
  10. LLMaaS: "LLM as a System Service on Mobile Devices". Wangsong Yin et al. 2024. [Paper]

Runtime Efficiency Optimization

  1. EdgeMoE: "EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models". Rongjie Yi et al. 2023. [Paper] [Github]
  2. LLMCad: "LLMCad: Fast and Scalable On-device Large Language Model Inference". Daliang Xu et al. 2023. [Paper]
  3. LinguaLinked: "LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices". Junchen Zhao et al. 2023 [Paper]

SLMs enhance LLMs

SLMs for LLMs Calibration

  1. Calibrating Large Language Models Using Their Generations Only. Dennis Ulmer, Martin Gubri, Hwaran Lee, Sangdoo Yun, Seong Joon Oh. ACL 2024 Long, [pdf] [code]
  2. Pareto Optimal Learning for Estimating Large Language Model Errors. Theodore Zhao, Mu Wei, J. Samuel Preston, Hoifung Poon. ACL 2024 Long, [pdf]
  3. The Internal State of an LLM Knows When It’s Lying. Amos Azaria, Tom Mitchell. EMNLP 2023 Findings. [pdf]

SLMs for LLMs RAG

  1. Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs. Jiejun Tan, Zhicheng Dou, Yutao Zhu, Peidong Guo, Kun Fang, Ji-Rong Wen. ACL 2024 Long. [pdf] [code] [huggingface]
  2. Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, Hannaneh Hajishirzi. ICLR 2024 Oral. [pdf] [huggingface] [code] [website] [model] [data]
  3. LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression. Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu. ICLR 2024 Workshop ME-FoMo Poster. [pdf]
  4. Corrective Retrieval Augmented Generation. Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling. arXiv 2024.1. [pdf] [code]
  5. Self-Knowledge Guided Retrieval Augmentation for Large Language Models. Yile Wang, Peng Li, Maosong Sun, Yang Liu. EMNLP 2023 Findings. [pdf] [code]
  6. In-Context Retrieval-Augmented Language Models. Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton-Brown, Yoav Shoham. TACL 2023. [pdf] [code]

Star History

Star History Chart