Awsome-Visual Question Answering

A list of resources for Visual Question Answering.

ICCV 2015

[1] Antol S, Agrawal A, Lu J, et al. Vqa: Visual question answering [paper] [project]

NIPS2015

[1] Ren M, Kiros R, Zemel R. Exploring models and data for image question answering [paper] [code]

CVPR2016

[1] Yang Z, He X, Gao J, et al. Stacked attention networks for image question answering [paper] [code]

[2] Andreas J, Rohrbach M, Darrell T, et al. Neural module networks [paper] [code]

NIPS2016

[1] Lu J, Yang J, Batra D, et al. Hierarchical question-image co-attention for visual question answering [paper] [code]

EMNLP2016

[1] Fukui A, Park D H, Yang D, et al. Multimodal compact bilinear pooling for visual question answering and visual grounding [paper] [code]

ECCV2016

[1] Jabri A, Joulin A, Van Der Maaten L. Revisiting visual question answering baselines [paper]

CVIU(compute Visual and Image Understanding)2016

[1] Wu Q, Teney D, Wang P, et al. Visual question answering: A survey of methods and datasets [paper]

ArXiv 2016

[1] Kim J H, On K W, Lim W, et al. Hadamard product for low-rank bilinear pooling [paper] [code]

CVPR 2017

[1] Goyal Y, Khot T, Summers-Stay D, et al. Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering [paper]

[2] Johnson J, Hariharan B, van der Maaten L, et al. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning [paper]

[3] Ganju S, Russakovsky O, Gupta A. What's in a question: Using visual questions as a form of supervision [paper] [code]

[4] Nam H, Ha J W, Kim J. Dual attention networks for multimodal reasoning and matching [paper] [code]

LCPRIA2017(Iberian Conference on Pattern Recognition and Image Analysis)

[1] Bolaños M, Peris Á, Casacuberta F, et al. VIBIKNet: Visual bidirectional kernelized network for visual question answering [paper] [code]

ICCV 2017

[1] Ben-Younes H, Cadene R, Cord M, et al. Mutan: Multimodal tucker fusion for visual question answering [paper] [code]

[2] Zhu C, Zhao Y, Huang S, et al. Structured attentions for visual question answering [paper] [code]

[3] Hu R, Andreas J, Rohrbach M, et al. Learning to reason: End-to-end module networks for visual question answering [paper]

[4] Yu Z, Yu J, Fan J, et al. Multi-modal factorized bilinear pooling with co-attention learning for visual question answering [paper] [code]

[5] Zhu C, Zhao Y, Huang S, et al. Structured attentions for visual question answering [paper] [code]

[6] Gan C, Li Y, Li H, et al. Vqs: Linking segmentations to questions and answers for supervised attention in vqa and question-focused semantic segmentation [paper]

NIPS2017

[1] Schwartz I, Schwing A, Hazan T. High-order attention models for visual question answering [paper] [code]

[2] Ilievski I, Feng J. Multimodal learning and reasoning for visual question answering [paper]

EMNLP 2017

Mahendru A, Prabhu V, Mohapatra A, et al. The promise of premise: Harnessing question premises in visual question answering [paper] [code]

CVPR 2018

[1] Agrawal A, Batra D, Parikh D, et al. Don't just assume; look and answer: Overcoming priors for visual question answering [paper] [code]

[2] Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering [paper] [code]

[3] Teney D, Anderson P, He X, et al. Tips and tricks for visual question answering: Learnings from the 2017 challenge [paper]

[4] Gordon D, Kembhavi A, Rastegari M, et al. Iqa: Visual question answering in interactive environments [paper] [code]

[5] Nguyen D K, Okatani T. Improved fusion of visual and language representations by dense symmetric co-attention for visual question answering [paper] [code]

[6] Liang J, Jiang L, Cao L, et al. Focal visual-text attention for visual question answering [paper] [code]

[7] Huk Park D, Anne Hendricks L, Akata Z, et al. Multimodal explanations: Justifying decisions and pointing to the evidence [paper]

[8] Gurari D, Li Q, Stangl A J, et al. Vizwiz grand challenge: Answering visual questions from blind people [paper]

[9] Mascharka D, Tran P, Soklaski R, et al. Transparency by design: Closing the gap between performance and interpretability in visual reasoning [paper]

[10] Cao Q, Liang X, Li B, et al. Visual question reasoning on general dependency tree [paper]

[11] Patro B, Namboodiri V P. Differential attention for visual question answering [paper]

[12] Su Z, Zhu C, Dong Y, et al. Learning visual knowledge memory networks for visual question answering [paper]

[13] Fan H, Zhou J. Stacked latent attention for multimodal reasoning [paper]

[14] Hu H, Chao W L, Sha F. Learning answer embeddings for visual question answering [paper]

ICLR2018

[1] Zhang Y, Hare J, Prügel-Bennett A. Learning to count objects in natural images for visual question answering [paper] [code]

AAAI 2018

[1] Lu P, Li H, Zhang W, et al. Co-attending free-form regions and detections with multi-modal multiplicative feature embedding for visual question answering [paper] [code]

ACL 2018

[1] Mudrakarta P K, Taly A, Sundararajan M, et al. Did the model understand the question? [paper]

Transactions on neural networks and learning systems2018

[1] Yu Z, Yu J, Xiang C, et al. Beyond bilinear: Generalized multimodal factorized high-order pooling for visual question answering [paper] [code]

ECCV 2018

[1] Bai Y, Fu J, Zhao T, et al. Deep attention neural tensor network for visual question answering [paper]

[2] Yang G R, Ganichev I, Wang X J, et al. A dataset and architecture for visual reasoning with a working memory [paper]

[3] Li Q, Tao Q, Joty S, et al. Vqa-e: Explaining, elaborating, and enhancing your answers for visual questions [paper]

[4] Shi Y, Furlanello T, Zha S, et al. Question type guided attention in visual question answering [paper]

[5] Malinowski M, Doersch C, Santoro A, et al. Learning visual question answering by bootstrapping hard attention [paper]

[6] Yu Y, Kim J, Kim G. A joint sequence fusion model for video question answering and retrieval [paper]

[7] Gao P, Li H, Li S, et al. Question-guided hybrid convolution for visual question answering [paper]

[8] Narasimhan M, Schwing A G. Straight to the facts: Learning knowledge base retrieval for factual visual question answering [paper]

[9] Li W, Yuan Z, Fang X, et al. Knowing Where to Look? Analysis on Attention of Visual Question Answering System [paper]

NIPS2018

[1] Kim J H, Jun J, Zhang B T. Bilinear attention networks [paper] [code]

[2] Norcliffe-Brown W, Vafeias S, Parisot S. Learning conditioned graph structures for interpretable visual question answering [paper] [code]

[3] Deng Y, Kim Y, Chiu J, et al. Latent alignment and variational attention [paper] [code]

[4] Yi K, Wu J, Gan C, et al. Neural-symbolic vqa: Disentangling reasoning from vision and language understanding [paper] [code]

[5] Narasimhan M, Lazebnik S, Schwing A. Out of the box: Reasoning with graph convolution nets for factual visual question answering [paper]

[6] Wu C, Liu J, Wang X, et al. Chain of reasoning for visual question answering [paper]

ArXiv 2018

[1] Jiang Y, Natarajan V, Chen X, et al. Pythia v0. 1: the winning entry to the vqa challenge 2018 [paper] [code]

CVPR 2019

[1] Cadene R, Ben-younes H, Cord M, et al. MUREL: Multimodal Relational Reasoning for Visual Question Answering [paper] [code]

[2] Peng G, Li H, You H, et al. Dynamic Fusion with Intra-and Inter-Modality Attention Flow for Visual Question Answering [paper] [code]

[3] Shah M, Chen X, Rohrbach M, et al. Cycle-Consistency for Robust Visual Question Answering [paper]

[4] Marino K, Rastegari M, Farhadi A, et al. OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge [paper]

[5] Li H, Wang P, Shen C, et al. Visual Question Answering as Reading Comprehension[paper]

[6] Kim J, Ma M, Kim K, et al. Progressive Attention Memory Network for Movie Story Question Answering[paper]

[7] Manjunatha V, Saini N, Davis L S. Explicit Bias Discovery in Visual Question Answering Models[paper]

[8] Shrestha R, Kafle K, Kanan C. Answer them all! toward universal visual question answering models[paper]

[9] Singh A, Natarajan V, Shah M, et al. Towards vqa models that can read[paper]

[10] Fan C, Zhang X, Zhang S, et al. Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering[paper]

[11] Fukui H, Hirakawa T, Yamashita T, et al. Attention branch network: Learning of attention mechanism for visual explanation [paper]

[12] Xiong P, Zhan H, Wang X, et al. Visual Query Answering by Entity-Attribute Graph Matching and Reasoning [paper]

[13] Noh H, Kim T, Mun J, et al. Transfer Learning via Unsupervised Task Discovery for Visual Question Answering [paper] [code]

[14] Tang K, Zhang H, Wu B, et al. Learning to compose dynamic tree structures for visual contexts [paper]

[15] Yu Z, Yu J, Cui Y, et al. Deep Modular Co-Attention Networks for Visual Question Answering [paper] [code]

[16] Shi J, Zhang H, Li J. Explainable and explicit visual reasoning over scene graphs [paper] [code]

ICLR2019

[1] Zhang Y, Hare J, Prügel-Bennett A. Learning Representations of Sets through Optimized Permutations [paper] [code]

TPAMI 2019

[1] Liang J, Jiang L, Cao L, et al. Focal visual-text attention for memex question answering [paper] [code]

AAAI 2019

[1] Ben-Younes H, Cadene R, Thome N, et al. BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection [paper] [code]

ArXiv2019

[1] Wu J, Mooney R J. Self-Critical Reasoning for Robust Visual Question Answering [paper] [code]

[2] Cadene R, Dancette C, Ben-younes H, et al. RUBi: Reducing Unimodal Biases in Visual Question Answering [paper] [code]

[3] Li L, Gan Z, Cheng Y, et al. Relation-aware Graph Attention Network for Visual Question Answering [paper]

[4] Wu Y, Sun Q, Ma J, et al. Question Guided Modular Routing Networks for Visual Question Answering [paper]

Based On GQA dataset

CVPR2019

[1] Hudson D A, Manning C D. Gqa: A new dataset for real-world visual reasoning and compositional question answering [paper] [code]

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awsome-Visual Question Answering

ICCV 2015

NIPS2015

CVPR2016

NIPS2016

EMNLP2016

ECCV2016

CVIU(compute Visual and Image Understanding)2016

ArXiv 2016

CVPR 2017

LCPRIA2017(Iberian Conference on Pattern Recognition and Image Analysis)

ICCV 2017

NIPS2017

EMNLP 2017

CVPR 2018

ICLR2018

AAAI 2018

ACL 2018

Transactions on neural networks and learning systems2018

ECCV 2018

NIPS2018

ArXiv 2018

CVPR 2019

ICLR2019

TPAMI 2019

AAAI 2019

ArXiv2019

Based On GQA dataset

CVPR2019

About

Releases

Packages

ZhichengHuang/Awsome-Visual-Question-Answering

Folders and files

Latest commit

History

Repository files navigation

Awsome-Visual Question Answering

ICCV 2015

NIPS2015

CVPR2016

NIPS2016

EMNLP2016

ECCV2016

CVIU(compute Visual and Image Understanding)2016

ArXiv 2016

CVPR 2017

LCPRIA2017(Iberian Conference on Pattern Recognition and Image Analysis)

ICCV 2017

NIPS2017

EMNLP 2017

CVPR 2018

ICLR2018

AAAI 2018

ACL 2018

Transactions on neural networks and learning systems2018

ECCV 2018

NIPS2018

ArXiv 2018

CVPR 2019

ICLR2019

TPAMI 2019

AAAI 2019

ArXiv2019

Based On GQA dataset

CVPR2019

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages