Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 509 Bytes

211220 Efficient Large Scale Language Modeling with Mixtures of Experts.md

File metadata and controls

7 lines (4 loc) · 509 Bytes

https://arxiv.org/abs/2112.10684

Efficient Large Scale Language Modeling with Mixtures of Experts (Mikel Artetxe, Shruti Bhosale, Naman Goyal, Todor Mihaylov, Myle Ott, Sam Shleifer, Xi Victoria Lin, Jingfei Du, Srinivasan Iyer, Ramakanth Pasunuru, Giri Anantharaman, Xian Li, Shuohui Chen, Halil Akin, Mandeep Baines, Louis Martin, Xing Zhou, Punit Singh Koura, Brian O'Horo, Jeff Wang, Luke Zettlemoyer, Mona Diab, Zornitsa Kozareva, Ves Stoyanov)

다음 llm은 moe한 것으로.

#lm #mixture_of_experts