Visual Dialogue Papers

A list of papers about creating visual dialogue papers Please feel free to add an issue or pull request for missing papers.

Datasets

Open Domain

Visual Dialogue, Abhishek Das et al., CVPR 2017
Visual Reference Resolution using Attention Memory for Visual Dialog, Paul Hongsuck Seo et al., NIPS 2017
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation, Nasrin Mostafazadeh et al., IJCNLP 2017
MMD, MMD: Towards Building Large Scale Multimodal Domain-Aware Conversation Systems, Amrita Saha et al., AAAI 2018
CLEVR-Dialog, CLEVR-Dialog: A Diagnostic Dataset for Multi-Round Reasoning in Visual Dialog, Satwik Kottur et al., NAACL 2019
AVSA, Audio-Visual Scene-Aware Dialog, Huda Alamri et al., CVPR 2019
Image-chat, Image-Chat: Engaging Grounded Conversations, Kurt Shuster et al., ACL 2020, HomePage
Multi-Modal Open-Domain Dialogue, Kurt Shuster et al., arXiv 2020, HomePage
Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images, Nyoungwoo Lee et al., ACL 2021, HomePage
MMChat, MMChat: Multi-Modal Chat Dataset on Social Media, Yinhe Zheng et al., CVPR 2021
OpenViDial, OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts, Yuxian Meng et al., arXiv 2021
OpenViDial 2.0, OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts, Shuhe Wang et al., arXiv 2021

Goal Oriented

GuessWhat, GuessWhat?! Visual object discovery through multi-modal dialogue, Harm de Vries et al., CVPR 2017
SIMMC, SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform, et al., DSTC9 2020, HomePage
SIMMC 2.0, SIMMC 2.0: A Task-oriented Dialog Dataset for Immersive Multimodal Conversations, et al., DSTC10 2021, HomePage
The JDDC 2.0, The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service, et al., arXiv 2021, HomePage

Challenge

visualdialog, 2020
Audio Visual Scene-Aware Dialog (AVSD) Challenge, Bing Liu, arXiv, 2016

Open Domain

Goal Oriented

Reinforcement Learning

Situated and interactive multimodal conversations, Seungwhan Moon et al., COLING 2020

General

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient, Rui Zhao et al., IEEE SLT 2018
What Should I Ask? Using Conversationally Informative Rewards for Goal-Oriented Visual Dialogue, Pushkar Shukla et al., ACL 2019
Answerer in Questioner's Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog, Sang-Woo Lee et al., NIPS 2018
Visual Dialogue State Tracking for Question Generation, Wei Pang et al., AAAI 2020 Oral
Guessing State Tracking for Visual Dialogue, Wei Pang et al., ECCV 2020, [code]
A Revised Generative Evaluation of Visual Dialogue, Daniela Massiceti et al., CVPR 2020
DialogStitch: Synthetic Deeper and Multi-Context Task-Oriented Dialogs, Satwik Kottur et al., SIGDIAL 2021, [code]
An Analysis of State-of-the-Art Models for Situated Interactive MultiModal Conversations (SIMMC), Satwik Kottur et al., SIGDIAL 2021, [video]
Joint Generation and Bi-Encoder for Situated Interactive MultiModal Conversations, Xin Huang et al., AAAI 2021 DSTC-9 Workshop
A Response Retrieval Approach for Dialogue Using a Multi-Attentive Transformer, Matteo A. Senese et al., AAAI 2021 DSTC-9 Workshop, [code]
TOM : End-to-End Task-Oriented Multimodal Dialog System with GPT-2, Younghoon Jeong et al., AAAI 2021 DSTC-9 Workshop
Multi-Task Learning for Situated Multi-Domain End-to-End Dialogue Systems, Po-Nien Kung et al., AAAI 2021 DSTC-9 Workshop
Answer-Driven Visual State Estimator for Goal-Oriented Visual Dialogue, Zipeng Xu et al., ACM MM 2020
Unified Questioner Transformer for Descriptive Question Generation in Goal-Oriented Visual Dialogue, Shoya Matsumori et al., CVPR 2021

Pretraining in Vision and Language

UNITER: UNiversal Image-TExt Representation Learning, Bing Liu, arXiv, 2016
Kaleido-BERT: Vision-Language Pre-training on Fashion Domain, et al., CVPR 2021
Integrating Multimodal Information in Large Pretrained Transformers, Wasifur Rahman et al., ACL 2020, [code]

Related Survey

Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods, Aditya Mogadala et al., JAIR 2021
Multimodal Conversational AI: A Survey of Datasets and Approaches, Anirudh Sundar et al., arXiv, 2022

Visual Dialog Challenge Starter Code

Visual Dialog Challenge 2019

Researchers' Homepage

Satwik Kottur, https://satwikkottur.github.io/

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Dialogue Papers

Datasets

Open Domain

Goal Oriented

Challenge

Open Domain

Pretraining

Transformer

Attention-based models

Graph-based models

GAN

General

Goal Oriented

Reinforcement Learning

General

Pretraining in Vision and Language

Related Survey

Visual Dialog Challenge Starter Code

Researchers' Homepage

About

Releases

Packages

License

chengguangtang/Visual-Dialogue-Papers

Folders and files

Latest commit

History

Repository files navigation

Visual Dialogue Papers

Datasets

Open Domain

Goal Oriented

Challenge

Open Domain

Pretraining

Transformer

Attention-based models

Graph-based models

GAN

General

Goal Oriented

Reinforcement Learning

General

Pretraining in Vision and Language

Related Survey

Visual Dialog Challenge Starter Code

Researchers' Homepage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages