Mixture of Experts (MoE)

MoE Training

Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models (SIGCOMM 2023) [Paper]
- THU & ByteDance
Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]
- CityU & ByteDance & CUHK
SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization (ATC 2023) [Paper] [Code]
- THU

MoE Inference

Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]
- CityU & ByteDance & CUHK
Optimizing Dynamic Neural Networks with Brainstorm (OSDI 2023) [Paper]
- SJTU & MSRA & USTC

Models

Mixtral-8x7B [Hugging Face] [Blog]
- Mistral AI