https://arxiv.org/abs/2104.08541
TransVG: End-to-End Visual Grounding with Transformers (Jiajun Deng, Zhengyuan Yang, Tianlang Chen, Wengang Zhou, Houqiang Li)
visual grounding 풀기. vision transformer, language transformer, vision-language transformer를 조합하는 모던한 해법.
#visual_grounding #object_detection