Efficient Large-Scale Language Model Training on GPU Clusters (Deepak Narayanan, Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, Bryan Catanzaro, Amar Phanishayee, Matei Zaharia)

A100 클러스터에 대한 scaling이 슬슬 나오기 시작하는군요. 1 trillion 파라미터 모델에 대해 3072 GPU로 502 petaFLOP/s를 달성. megatron-lm은 분석할 가치가 있어 보이네요.

#transformer #distributed_training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

210409 Efficient Large-Scale Language Model Training on GPU Clusters.md

210409 Efficient Large-Scale Language Model Training on GPU Clusters.md

Files

210409 Efficient Large-Scale Language Model Training on GPU Clusters.md

Latest commit

History

210409 Efficient Large-Scale Language Model Training on GPU Clusters.md

File metadata and controls