A list of papers about creating visual dialogue papers Please feel free to add an issue or pull request for missing papers.
-
Visual Dialogue, Abhishek Das et al., CVPR 2017
-
Visual Reference Resolution using Attention Memory for Visual Dialog, Paul Hongsuck Seo et al., NIPS 2017
-
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation, Nasrin Mostafazadeh et al., IJCNLP 2017
-
MMD, MMD: Towards Building Large Scale Multimodal Domain-Aware Conversation Systems, Amrita Saha et al., AAAI 2018
-
CLEVR-Dialog, CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog, Satwik Kottur et al., NAACL 2019
-
AVSA, Audio-Visual Scene-Aware Dialog, Huda Alamri et al., CVPR 2019
-
Image-chat, Image-Chat: Engaging Grounded Conversations, Kurt Shuster et al., ACL 2020, HomePage
-
Multi-Modal Open-Domain Dialogue, Kurt Shuster et al., arXiv 2020, HomePage
-
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images, Nyoungwoo Lee et al., ACL 2021, HomePage
-
MMChat, MMChat: Multi-Modal Chat Dataset on Social Media, Yinhe Zheng et al., CVPR 2021
-
OpenViDial, OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts, Yuxian Meng et al., arXiv 2021
-
OpenViDial 2.0, OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts, Shuhe Wang et al., arXiv 2021
-
GuessWhat, GuessWhat?! Visual object discovery through multi-modal dialogue, Harm de Vries et al., CVPR 2017
-
SIMMC, SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform, et al., DSTC9 2020, HomePage
-
SIMMC 2.0, SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations, et al., DSTC10 2021, HomePage
-
The JDDC 2.0, The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service, et al., arXiv 2021, HomePage
-
visualdialog, 2020
-
Audio Visual Scene-Aware Dialog (AVSD) Challenge, Bing Liu, arXiv, 2016
-
VisualBERT, Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline, Vishvak Murahari et al., ECCV 2020, HomePage
-
VDBERT, VD-BERT: A Unified Vision and Dialog Transformer with BERT, Yue Wang et al., EMNLP 2020, HomePage
-
Bridging Text and Video: A Universal Multimodal Transformer for Video-Audio Scene-Aware Dialog, Zekang Li et al., AAAI2020 DSTC8 workshop
-
MITVG, Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation, Feilong Chen et al., ACL Fingdings 2021
-
HCIAE, Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model, Jiasen Lu et al., NIPS 2017
-
Primary, Image-Question-Answer Synergistic Network for Visual Dialog, Dalu Guo et al., CVPR 2019
-
ReDAN, Multi-step Reasoning via Recurrent Dual Attention for Visual Dialog, Zhe Gan et al., ACL 2019
-
DVAN, Dual Visual Attention Network for Visual Dialog, Dan Guo et al., IJCAI 2019, Code
-
RvA, Recursive Visual Attention in Visual Dialog, Yulei Niu et al., CVPR 2019
-
DMRM, DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog, Feilong Chen et al., AAAI 2020
-
GNN-EM, Reasoning Visual Dialogs with Structural and Partial Observations, Zilong Zheng et al., CVPR 2019
-
DualVD, DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue, Xiaoze Jiang et al., AAAI 2020
-
FGA, Factor Graph Attention, Idan Schwartz et al., CVPR 2019
-
KBGN, KBGN: Knowledge-Bridge Graph Network for Adaptive Vision-Text Reasoning in Visual Dialogue, Xiaoze Jiang et al., ACM MM 2020
-
GoG, GoG: Relation-aware Graph-over-Graph Network for Visual Dialog, Feilong Chen et al., ACL Findings 2021
- CoAtt, Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning, Qi Wu et al., CVPR 2018
-
CorefNMN, Visual Coreference Resolution in Visual Dialog using Neural Module Networks, Satwik Kottur et al., ECCV 2018
-
Making History Matter: History-Advantage Sequence Training for Visual Dialog, Tianhao Yang et al., ICCV 2019
-
Generative Visual Dialogue System via Adaptive Reasoning and Weighted Likelihood Estimation, Heming Zhang et al., IJCAI 2019
-
Granular Multimodal Attention Networks for Visual Dialog, Badri N. Patro et al., ICCV Workshop 2019
-
Modality-Balanced Models for Visual Dialogue, Hyounghun Kim et al., AAAI 2020
-
LTMI, Efficient Attention Mechanism for Visual Dialog that can Handle All the Interactions between Multiple Inputs, Van-Quang Nguyen et al., ECCV 2020
-
All-in-One Image-Grounded Conversational Agents, Da Ju et al., CVPR 2020
-
Multi-View Attention Network for Visual Dialog, Sungjin Park et al., ACL 2020, [Code]
-
The Dialogue Dodecathlon: Open-Domain Knowledge and Image Grounded Conversational Agents, Kurt Shuster et al., ACL 2020
-
MCA, History for Visual Dialog: Do we really need it?, Shubham Agarwal et al., ACL 2020
-
CAG, Iterative Context-Aware Graph Inference for Visual Dialog, Dan Guo et al., CVPR 2020
-
DAM, DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue, Xiaoze Jiang et al., IJCAI 2020, Code
-
Learning to Ground Visual Objects for Visual Dialog, Feilong Chen et al., EMNLP 2021
- Situated and interactive multimodal conversations, Seungwhan Moon et al., COLING 2020
-
Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient, Rui Zhao et al., IEEE SLT 2018
-
What Should I Ask? Using Conversationally Informative Rewards for Goal-Oriented Visual Dialogue, Pushkar Shukla et al., ACL 2019
-
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog, Sang-Woo Lee et al., NIPS 2018
-
Visual Dialogue State Tracking for Question Generation, Wei Pang et al., AAAI 2020 Oral
-
Guessing State Tracking for Visual Dialogue, Wei Pang et al., ECCV 2020, [code]
-
A Revised Generative Evaluation of Visual Dialogue, Daniela Massiceti et al., CVPR 2020
-
DialogStitch: Synthetic Deeper and Multi-Context Task-Oriented Dialogs, Satwik Kottur et al., SIGDIAL 2021, [code]
-
An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC), Satwik Kottur et al., SIGDIAL 2021, [video]
-
Joint Generation and Bi-Encoder for Situated Interactive MultiModal Conversations, Xin Huang et al., AAAI 2021 DSTC-9 Workshop
-
A Response Retrieval Approach for Dialogue Using a Multi-Attentive Transformer, Matteo A. Senese et al., AAAI 2021 DSTC-9 Workshop, [code]
-
TOM : End-to-End Task-Oriented Multimodal Dialog System with GPT-2, Younghoon Jeong et al., AAAI 2021 DSTC-9 Workshop
-
Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems, Po-Nien Kung et al., AAAI 2021 DSTC-9 Workshop
-
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue, Zipeng Xu et al., ACM MM 2020
-
Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue, Shoya Matsumori et al., CVPR 2021
-
UNITER: UNiversal Image-TExt Representation Learning, Bing Liu, arXiv, 2016
-
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain, et al., CVPR 2021
-
Integrating Multimodal Information in Large Pretrained Transformers, Wasifur Rahman et al., ACL 2020, [code]
- Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods, Aditya Mogadala et al., JAIR 2021
- Multimodal Conversational AI: A Survey of Datasets and Approaches, Anirudh Sundar et al., arXiv, 2022
- Satwik Kottur, https://satwikkottur.github.io/