https://arxiv.org/abs/2104.13840
Twins: Revisiting Spatial Attention Design in Vision Transformers (Xiangxiang Chu, Zhi Tian, Yuqing Wang, Bo Zhang, Haibing Ren, Xiaolin Wei, Huaxia Xia, Chunhua Shen)
local attention + window level global attention + positional encoding generator. positional encoding generator는 역시 꽤 흥미로운 접근인 것 같네요.
#vision_transformer #local_attention #positional_encoding