Resource List

[pdf]: paper PDF online link
[repo]: paper PDF repo link
[github]: github link
[web]: website link

Paper by category
Datasets
Metrics
Tutorials

Paper by category

Multimodal Machine Translation

Year	Authors	Conf.	Title	Links
2016.08	Huang et al.	WMT'16	Attention-based Multimodal Neural Machine Translation	[pdf] [repo]
2016.08	Caglayan et al.	WMT'16	Does Multimodality Help Human and Machine for Translation and Image Captioning?	[pdf] [repo]
2016.09	Caglayan et al.	arXiv	Multimodal Attention for Neural Machine Translation	[pdf] [repo]
2017.02	Calixto et al.	arXiv	Doubly-Attentive Decoder for Multi-modal Neural Machine Translation	[pdf] [repo] [github]
2017.03	Delbrouck et al.	ICLR'17	Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation	[pdf] [repo]
2017.05	Chen et al.	arXiv	A Teacher-Student Framework for Zero-Resource Neural Machine Translation	[pdf] [repo]
2017.06	Lala et al.	PBML'17	Unraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation	[pdf] [repo]
2017.07	Elliott et al.	arXiv	Imagination improves Multimodal Translation	[pdf] [repo]
2017.07	Nakayama et al.	arXiv	Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot	[pdf] [repo]
2017.08	Libovicky et al.	ACL'17	Attention Strategies for Multi-Source Sequence-to-Sequence Learning	[pdf] [repo]
2017.09	Calixto et al.	EMNLP'17	Incorporating Global Visual Features into Attention-Based Neural Machine Translation	[pdf] [repo]
2017.10	Elliott et al.	WMT'17	Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description	[pdf] [repo]
2018.05	Qian et al.	arXiv	Multimodal Machine Translation with Reinforcement Learning	[pdf] [repo]
2018.08	Zhou et al.	ACL'18	A Visual Attention Grounding Neural Model for Multimodal Machine Translation	[pdf] [repo]
2018.10	Barrault et al.	WMT'18	Findings of the Third Shared Task on Multimodal Machine Translation	[pdf] [repo]
2018.10	Caglayan et al.	WMT'18	LIUM-CVC Submissions for WMT18 Multimodal Translation Task	[pdf] [repo]
2018.10	Gronroos et al.	WMT'18	The MeMAD Submission to the WMT18 Multimodal Translation Task	[pdf] [repo]
2018.10	Gwinnup et al.	WMT'18	The AFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional	[pdf] [repo]
2018.10	Helcl et al.	WMT'18	CUNI System for the WMT18 Multimodal Translation Task	[pdf] [repo]
2018.10	Lala et al.	WMT'18	Sheffield Submissions for WMT18 Multimodal Translation Shared Task	[pdf] [repo]
2018.10	Zheng et al.	WMT'18	Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Translation System Report	[pdf] [repo]
2018.10	Delbrouck et al.	WMT'18	UMONS Submission for WMT18 Multimodal Translation Task	[pdf] [repo] [github]
2018.10	Libovicky et al.	WMT'18	Input Combination Strategies for Multi-Source Transformer Decoder	[pdf] [repo]
2018.10	Shin et al.	WMT'18	Multi-encoder Transformer Network for Automatic Post-Editing	[pdf] [repo]
2019.04	Calixto et al.	Springer	An Error Analysis for Image-based Multi-modal Neural Machine Translation	[pdf] [repo]
2019.04	Hirasawa et al.	arXiv	Multimodal Machine Translation with Embedding Prediction	[pdf] [repo] [github]
2019.06	Caglayan et al.	NAACL-HLT'19	Probing the Need for Visual Context in Multimodal Machine Translation	[pdf] [repo]
2019.06	Su et al.	CVPR'19	Unsupervised Multi-modal Neural Machine Translation	[pdf] [repo]
2019.06	Chen et al.	IJCAI'19	From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots	[pdf] [repo]
2019.07	Calixto et al.	ACL'19	Latent Variable Model for Multi-modal Translation	[pdf] [repo]
2019.07	Ive et al.	ACL'19	Distilling Translations with Visual Awareness	[pdf] [repo] [github]
2019.07	Hirasawa et al.	ACL'19	Debiasing Word Embedding Improves Multimodal Machine Translation	[pdf] [repo]
2019.07	Mogadala et al.	arXiv	Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods	[pdf] [repo]
2020.02	Yang et al.	AAAI'20	Visual Agreement Regularized Training for Multi-Modal Machine Translation	[pdf] [repo]
2020.07	Huang et al.	ACL'20	Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting	[pdf] [repo]
2020.08	Sulubacak et al.	Machine Translation	Multimodal Machine Translation through Visuals and Speech	[pdf] [repo]
2021.02	Wang et al.	AAAI'21	Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding	[pdf] [repo]
2021.04	Caglayan et al.	EACL'21	Cross-lingual Visual Pre-training for Multimodal Machine Translation	[pdf] [repo]
2021.04	Ive et al.	EACL'21	Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation	[pdf] [repo]

Multimodal Language Models

Year	Authors	Conf.	Title	Links
2011.11	Jia et al.	ICCV'11	Learning Cross-modality Similarity for Multinomial Data	[pdf] [repo]
2014.10	Mao et al.	arXiv	Explain Images with Multimodal Recurrent Neural Networks	[pdf] [repo]
2014.11	Kiros et al.	arXiv	Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models	[pdf] [repo]
2015.06	Mao et al.	ICLR'15	Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)	[pdf] [repo] [github]
2015.09	Ferraro et al.	EMNLP'15	A Survey of Current Datasets for Vision and Language Research	[pdf] [repo]
2015.12	Ma et al.	ICCV'15	Multimodal Convolutional Neural Networks for Matching Image and Sentence	[pdf] [repo]
2016.06	You et al.	CVPR'16	Image Captioning with Semantic Attention	[pdf] [repo]
2016.10	Lu et al.	NIPS'16	Hierarchical Question-Image Co-Attention for Visual Question Answering	[pdf] [repo] [github]
2016.10	Yang et al.	NIPS'16	Review Networks for Caption Generation	[pdf] [repo] [github]
2018.06	Wang et al.	NAACL'18	Object Counts! Bringing Explicit Detections Back into Image Captioning	[pdf] [repo]
2018.06	Anderson et al.	CVPR'18	Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering	[pdf] [repo]
2018.06	Nguyen et al.	CVPR'18	Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering	[pdf] [repo]
2018.10	Zhu et al.	EMNLP'18	MSMO: Multimodal Summarization with Multimodal Output	[pdf] [repo]
2019.02	Li et al.	AAAI'19	Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering	[pdf] [repo]
2019.06	Qin et al.	CVPR'19	Look Back and Predict Forward in Image Captioning	[pdf] [repo]
2019.06	Yu et al.	CVPR'19	Deep Modular Co-Attention Networks for Visual Question Answering	[pdf] [repo]
2019.09	Kiela et al.	arXiv	Supervised Multimodal Bitransformers for Classifying Images and Text	[pdf] [repo] [github]
2019.10	Guo et al.	ACM-MM'19	Aligning Linguistic Words and Visual Semantic Units for Image Captioning	[pdf] [repo] [github]
2019.10	Li et al.	ACM-MM'19	Walking with MIND: Mental Imagery eNhanceD Embodied QA	[pdf] [repo]
2019.10	Wu et al.	ACM-MM'19	Editing Text in the Wild	[pdf] [repo]
2019.12	Khademi et al.	NIPS'19	Multimodal Neural Graph Memory Networks for Visual Question Answering	[pdf] [repo]
2020.01	Park et al.	WACV'20	MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding	[pdf] [repo]
2020.02	Mai et al.	AAAI'20	Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion	[pdf] [repo]
2020.02	Sun et al.	AAAI'20	Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis	[pdf] [repo]
2020.02	Zhang et al.	AAAI'20	Learning Long- and Short-Term User Literal Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption	[pdf] [repo]
2020.06	Kim et al.	CVPR'20	Hypergraph Attention Networks for Multimodal Learning	[pdf] [repo]
2020.07	Alikhani et al.	ACL'20	Clue: Cross-modal Coherence Modeling for Caption Generation	[pdf] [repo]
2020.07	Chauhan et al.	ACL'20	Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sacrasm, Sentiment and Emotion Analysis	[pdf] [repo]
2020.07	Lin et al.	ACL'20	A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks	[pdf] [repo] [github]
2020.09	Luo et al.	arXiv	UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation	[pdf] [repo]
2020.11	Cho et al.	EMNLP'20	X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers	[pdf] [repo] [web]
2020.11	Jin et al.	EMNLP'20	Dual Low-Rank Multimodal Fusion	[pdf] [repo]
2020.11	Khan et al.	EMNLP'20	MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering	[pdf] [repo]
2020.11	Tsai et al.	EMNLP'20	Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis	[pdf] [repo] [github]
2020.11	Wang et al.	EMNLP'20	MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding	[pdf] [repo]
2021.02	Yu et al.	AAAI'21	Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis	[pdf] [repo] [github]
2021.04	Sahu et al.	EACL'21	Adaptive Fusion Techniques for Multimodal Data	[pdf] [repo] [github]

Neural Machine Translation

Year	Authors	Conf.	Title	Links
2016.06	Yang et al.	NAACL-HLT'16	Hierarchical Attention Networks for Document Classification	[pdf] [repo]
2016.06	Zoph et al.	arXiv	Multi-Source Neural Translation	[pdf] [repo]
2017.12	Vaswani et al.	NIPS'17	Attention Is All You Need	[pdf] [repo] [github]
2017.12	Xia et al.	NIPS'17	Deliberation Networks: Sequence Generation Beyond One-Pass Decoding	[pdf] [repo] [github]
2018.04	Yang et al.	NAACL-HLT'18	Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets	[pdf] [repo]
2018.09	Wu et al.	NAACL-HLT'18	Adversarial Neural Machine Translation	[pdf] [repo]
2018.10	Miculicich et al.	EMNLP'18	Document-Level Neural Machine Translation with Hierarchical Attention Networks	[pdf] [repo]
2019.05	Devlin et al.	arXiv	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	[pdf] [repo] [github]
2019.05	Zhou et al.	arXiv	Synchronous Bidirectional Neural Machine Translation	[pdf] [repo] [github]
2019.06	Yang et al.	arXiv	XLNet: Generalized Autoregressive Pretraining for Language Understanding	[pdf] [repo] [github]
2019.07	Dai et al.	ACL'19	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	[pdf] [repo] [github]
2019.07	Liu et al.	ACL'19	Hierarchical Transformers for Multi-Document Summarization	[pdf] [repo] [github]
2019.07	Pourdamghani et al.	ACL'19	Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation	[pdf] [repo]

Datasets

Dataset	Authors	Paper	Links
Flickr30K	Young et al.	From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions	[pdf] [repo] [web]
Flickr30K Entities	Plummer et al.	Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models	[pdf] [repo] [web] [github]
Multi30K	Elliott et al.	Multi30K: Multilingual English-German Image Descriptions	[pdf] [repo] [github]
IAPR-TC12	Grubinger et al.	The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems	[pdf] [repo] [web]
VATEX	Wang et al.	VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research	[pdf] [repo] [web]

Metrics

Metric	Authors	Paper	Links
BLEU	Papineni et al.	BLEU: a Method for Automatic Evaluation of Machine Translation	[pdf] [repo]
METEOR	Banerjee et al.	METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments	[pdf] [repo] [web]
METEOR 1.5	Denkowski et al.	METEOR Universal: Language Specific Translation Evaluation for Any Target Language	[pdf] [repo] [web]
TER	Snover et al.	A study of Translation Edit Rate with Targeted Human Annotation	[pdf] [repo]

Tutorials

Year	Authors	Title	Links
2016	Elliott et al.	Multimodal Learning and Reasoning	[pdf] [repo]
2017	Lucia Specia	Multimodal Machine Translation	[pdf] [repo]
2018	Loic Barrault	Introduction to Multimodal Machine Translation	[pdf] [repo]
2018	Mirella Lapata	Understanding Visual Scenes	[pdf] [repo]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resource_list_by_category.md

resource_list_by_category.md

Paper by category

Multimodal Machine Translation

Multimodal Language Models

Neural Machine Translation

Datasets

Metrics

Tutorials

Files

resource_list_by_category.md

Latest commit

History

resource_list_by_category.md

File metadata and controls

Paper by category

Multimodal Machine Translation

Multimodal Language Models

Neural Machine Translation

Datasets

Metrics

Tutorials