Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 554 Bytes

210409 Efficient Large-Scale Language Model Training on GPU Clusters.md

File metadata and controls

7 lines (4 loc) · 554 Bytes

https://arxiv.org/abs/2104.04473

Efficient Large-Scale Language Model Training on GPU Clusters (Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia)

A100 클러스터에 대한 scaling이 슬슬 나오기 시작하는군요. 1 trillion 파라미터 모델에 대해 3072 GPU로 502 petaFLOP/s를 달성. megatron-lm은 분석할 가치가 있어 보이네요.

#transformer #distributed_training