https://arxiv.org/abs/2110.09456
NormFormer: Improved Transformer Pretraining with Extra Normalization (Sam Shleifer, Jason Weston, Myle Ott)
normalization & scaling factor를 더 끼워넣은 트랜스포머. [[210917 Primer]]의 ReLU^2를 벌써 테스트해봤네요.
#transformer
https://arxiv.org/abs/2110.09456
NormFormer: Improved Transformer Pretraining with Extra Normalization (Sam Shleifer, Jason Weston, Myle Ott)
normalization & scaling factor를 더 끼워넣은 트랜스포머. [[210917 Primer]]의 ReLU^2를 벌써 테스트해봤네요.
#transformer