https://arxiv.org/abs/2104.01136
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference (Ben Graham, Alaaeldin El-Nouby, Hugo Touvron, Pierre Stock, Armand Joulin, Hervé Jégou, Matthijs Douze)
비전 트랜스포머 + 다운샘플링 한 편 더! batch norm을 사용한 것도 특징.
#vision_transformer