https://arxiv.org/abs/2205.10770

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models (Kushal Tirumala, Aram H. Markosyan, Luke Zettlemoyer, Armen Aghajanyan)

더 큰 모델은 오버피팅이 발생하기 전까지 더 많은 데이터를 더 빠르게 외우고 더 잘 잊어버리지 않는다는 결과. 그러니까 데이터셋을 외우는 것 자체는 나쁘지 않은데 지나치게 외우게 만들면 안 된다...이런 느낌이네요.

#llm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

220522 Memorization Without Overfitting.md

220522 Memorization Without Overfitting.md

Files

220522 Memorization Without Overfitting.md

Latest commit

History

220522 Memorization Without Overfitting.md

File metadata and controls