https://arxiv.org/abs/2110.13711
Hierarchical Transformers Are More Efficient Language Models (Piotr Nawrot, Szymon Tworkowski, Michał Tyrolski, Łukasz Kaiser, Yuhuai Wu, Christian Szegedy, Henryk Michalewski)
시퀀스 모델용 unet. 실제 레이턴시가 궁금하긴 하네요.
#transformer #lm #efficient_attention