https://arxiv.org/abs/2209.10655

Mega: Moving Average Equipped Gated Attention (Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, Luke Zettlemoyer)

각 타임 스텝의 임베딩들의 EMA를 계산하고 그 위에 attention과 gate를 얹은 형태네요. 추가적으로 local attention을 사용해 linear complexity를 갖는 variant도 만들었습니다. 전반적으로 gated rnn와 s4를 기반으로 삼아 만들었다는 느낌입니다.

이런 형태의 시도들이 어디까지 도달할 수 있을지 궁금하네요.

#efficient_attention

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

220921 Mega.md

220921 Mega.md

Files

220921 Mega.md

Latest commit

History

220921 Mega.md

File metadata and controls