Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 496 Bytes

220803 Masked Vision and Language Modeling for Multi-modal Representation Learning.md

File metadata and controls

7 lines (4 loc) · 496 Bytes

https://arxiv.org/abs/2208.02131

Masked Vision and Language Modeling for Multi-modal Representation Learning (Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto)

masked vision-language modeling. masked image reconstruction에서는 masked image와 unmasked text를 사용하고 masked language model에서는 unmasked image와 masked text를 사용하도록 하고 이 둘을 결합해서 학습시킨다는 아이디어입니다.

#mlm #self_supervised