https://arxiv.org/abs/2208.02131

Masked Vision and Language Modeling for Multi-modal Representation Learning (Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto)

masked vision-language modeling. masked image reconstruction에서는 masked image와 unmasked text를 사용하고 masked language model에서는 unmasked image와 masked text를 사용하도록 하고 이 둘을 결합해서 학습시킨다는 아이디어입니다.

#mlm #self_supervised

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

220803 Masked Vision and Language Modeling for Multi-modal Representation Learning.md

220803 Masked Vision and Language Modeling for Multi-modal Representation Learning.md

Files

220803 Masked Vision and Language Modeling for Multi-modal Representation Learning.md

Latest commit

History

220803 Masked Vision and Language Modeling for Multi-modal Representation Learning.md

File metadata and controls