Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 326 Bytes

211026 Hierarchical Transformers Are More Efficient Language Models.md

File metadata and controls

7 lines (4 loc) · 326 Bytes

https://arxiv.org/abs/2110.13711

Hierarchical Transformers Are More Efficient Language Models (Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski)

시퀀스 모델용 unet. 실제 레이턴시가 궁금하긴 하네요.

#transformer #lm #efficient_attention