Skip to content

Latest commit

 

History

History
9 lines (5 loc) · 617 Bytes

220822 Image as a Foreign Language.md

File metadata and controls

9 lines (5 loc) · 617 Bytes

https://arxiv.org/abs/2208.10442

Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks (Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei)

BEiT v2 뿐만 아니라 v3도 준비하고 있었네요. 이번에는 vision-language 모델입니다. 각 modality에 대해 모델을 공유하고 ffn만 각 modal별 expert를 나눠서(vision, language, vision-language) 학습시키는 방법을 채택했습니다.

이런 레이더 차트가 또 나오는군요.

#vision-language #mlm