Skip to content

Latest commit

 

History

History
22 lines (16 loc) · 1.38 KB

File metadata and controls

22 lines (16 loc) · 1.38 KB

Mixture of Experts (MoE)

MoE Training

  • Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models (SIGCOMM 2023) [Paper]
    • THU & ByteDance
  • Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]
    • CityU & ByteDance & CUHK
  • SmartMoE: Efficiently Training Sparsely-Activated Models through Combining Offline and Online Parallelization (ATC 2023) [Paper] [Code]
    • THU

MoE Inference

  • Accelerating Distributed MoE Training and Inference with Lina (ATC 2023) [Paper]
    • CityU & ByteDance & CUHK
  • Optimizing Dynamic Neural Networks with Brainstorm (OSDI 2023) [Paper]
    • SJTU & MSRA & USTC

Models