- Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models (SIGCOMM 2023) [Paper]
- THU & ByteDance
- Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]
- CityU & ByteDance & CUHK
- SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization (ATC 2023) [Paper] [Code]
- THU
- Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]
- CityU & ByteDance & CUHK
- Optimizing Dynamic Neural Networks with Brainstorm (OSDI 2023) [Paper]
- SJTU & MSRA & USTC
- Mixtral-8x7B [Hugging Face] [Blog]
- Mistral AI