https://arxiv.org/abs/2104.03135

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning (Zhicheng Huang, Zhaoyang Zeng, Yupan Huang, Bei Liu, Dongmei Fu, Jianlong Fu)

region feature 없이 vision-language 모델 학습시키기. visual feature를 quantize 하는 방식으로 태클했군요. 역시 트랜스포머가 도장 깨기를 하고 다니는 시대에 바운딩 박스 같은 건 그다지 멋지지 않죠.

#multimodal #transformer #vision-language

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

210407 Seeing Out of tHe bOx.md

210407 Seeing Out of tHe bOx.md

Files

210407 Seeing Out of tHe bOx.md

Latest commit

History

210407 Seeing Out of tHe bOx.md

File metadata and controls