Skip to content

Latest commit

 

History

History
8 lines (5 loc) · 509 Bytes

201223 Training data-efficient image transformers & distillation through.md

File metadata and controls

8 lines (5 loc) · 509 Bytes

https://arxiv.org/abs/2012.12877

Training data-efficient image transformers & distillation through attention (Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, Hervé Jégou)

JFT-300M 대신 이미지넷으로 이미지 트랜스포머 학습시키기. distillation용 토큰을 하나 추가해서 KD하는 방식. 더해서 augmentation과 regularization 세팅을 잘 맞춰주는 것이 필요. 속성비가 많이 나아진 듯.

#vision_transformer #distillation