Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 429 Bytes

230123 Zorro.md

File metadata and controls

7 lines (4 loc) · 429 Bytes

https://arxiv.org/abs/2301.09595

Zorro: the masked multimodal transformer (Adrià Recasens, Jason Lin, Joāo Carreira, Drew Jaegle, Luyu Wang, Jean-baptiste Alayrac, Pauline Luc, Antoine Miech, Lucas Smaira, Ross Hemsley, Andrew Zisserman)

video-audio multimodal 모델이군요. 핵심 포인트는 masked attention을 사용해 video only/audio only/fused representation을 만드는 쪽입니다.

#multimodal #video #audio