Transcending Scaling Laws with 0.1% Extra Compute (Yi Tay, Jason Wei, Hyung Won Chung, Vinh Q. Tran, David R. So, Siamak Shakeri, Xavier Garcia, Huaixiu Steven Zheng, Jinfeng Rao, Aakanksha Chowdhery, Denny Zhou, Donald Metzler, Slav Petrov, Neil Houlsby, Quoc V. Le, Mostafa Dehghani)

오...이건 굉장한 결과네요. regular lm으로 pretraining한 이후에 prefix lm과 span denoising 등을 결합한 objective (https://arxiv.org/abs/2205.05131) 로 살짝 파인튜닝해주는 것으로 학습 flops를 반토막 낼 수 있다는 결과입니다.

#llm #mlm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

221020 Transcending Scaling Laws with 0.1% Extra Compute.md

221020 Transcending Scaling Laws with 0.1% Extra Compute.md

Files

221020 Transcending Scaling Laws with 0.1% Extra Compute.md

Latest commit

History

221020 Transcending Scaling Laws with 0.1% Extra Compute.md

File metadata and controls