https://arxiv.org/abs/2104.12763

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding (Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve, Nicolas Carion)

나온지 며칠 되어서 다들 보셨을 것 같긴 한데...이전 visual grounding들과 비슷하게 text 입력에 맞는 object를 찾아주는 모델. 핑크 코끼리를 찾아내는 게 꽤 재미있습니다. 드디어 vision-language multi modal 과제들이 뜨기 시작하는 것 같네요.

#detr #object_detection #visual_grounding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding.md

210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding.md

Files

210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding.md

Latest commit

History

210426 MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding.md

File metadata and controls