Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 596 Bytes

221020 Transcending Scaling Laws with 0.1% Extra Compute.md

File metadata and controls

7 lines (4 loc) · 596 Bytes

https://t.co/WxIDAyzLC4

Transcending Scaling Laws with 0.1% Extra Compute (Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani)

오...이건 굉장한 결과네요. regular lm으로 pretraining한 이후에 prefix lm과 span denoising 등을 결합한 objective (https://arxiv.org/abs/2205.05131) 로 살짝 파인튜닝해주는 것으로 학습 flops를 반토막 낼 수 있다는 결과입니다.

#llm #mlm