Skip to content

Latest commit

 

History

History
148 lines (133 loc) · 23.9 KB

resource_list_by_category.md

File metadata and controls

148 lines (133 loc) · 23.9 KB

Resource List

[pdf]: paper PDF online link
[repo]: paper PDF repo link
[github]: github link
[web]: website link

Paper by category

Multimodal Machine Translation

Year
Authors
Conf.
Title Links
2016.08 Huang et al. WMT'16 Attention-based Multimodal Neural Machine Translation [pdf] [repo]
2016.08 Caglayan et al. WMT'16 Does Multimodality Help Human and Machine for Translation and Image Captioning? [pdf] [repo]
2016.09 Caglayan et al. arXiv Multimodal Attention for Neural Machine Translation [pdf] [repo]
2017.02 Calixto et al. arXiv Doubly-Attentive Decoder for Multi-modal Neural Machine Translation [pdf] [repo] [github]
2017.03 Delbrouck et al. ICLR'17 Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation [pdf] [repo]
2017.05 Chen et al. arXiv A Teacher-Student Framework for Zero-Resource Neural Machine Translation [pdf] [repo]
2017.06 Lala et al. PBML'17 Unraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation [pdf] [repo]
2017.07 Elliott et al. arXiv Imagination improves Multimodal Translation [pdf] [repo]
2017.07 Nakayama et al. arXiv Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot [pdf] [repo]
2017.08 Libovicky et al. ACL'17 Attention Strategies for Multi-Source Sequence-to-Sequence Learning [pdf] [repo]
2017.09 Calixto et al. EMNLP'17 Incorporating Global Visual Features into Attention-Based Neural Machine Translation [pdf] [repo]
2017.10 Elliott et al. WMT'17 Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description [pdf] [repo]
2018.05 Qian et al. arXiv Multimodal Machine Translation with Reinforcement Learning [pdf] [repo]
2018.08 Zhou et al. ACL'18 A Visual Attention Grounding Neural Model for Multimodal Machine Translation [pdf] [repo]
2018.10 Barrault et al. WMT'18 Findings of the Third Shared Task on Multimodal Machine Translation [pdf] [repo]
2018.10 Caglayan et al. WMT'18 LIUM-CVC Submissions for WMT18 Multimodal Translation Task [pdf] [repo]
2018.10 Gronroos et al. WMT'18 The MeMAD Submission to the WMT18 Multimodal Translation Task [pdf] [repo]
2018.10 Gwinnup et al. WMT'18 The AFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional [pdf] [repo]
2018.10 Helcl et al. WMT'18 CUNI System for the WMT18 Multimodal Translation Task [pdf] [repo]
2018.10 Lala et al. WMT'18 Sheffield Submissions for WMT18 Multimodal Translation Shared Task [pdf] [repo]
2018.10 Zheng et al. WMT'18 Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Translation System Report [pdf] [repo]
2018.10 Delbrouck et al. WMT'18 UMONS Submission for WMT18 Multimodal Translation Task [pdf] [repo] [github]
2018.10 Libovicky et al. WMT'18 Input Combination Strategies for Multi-Source Transformer Decoder [pdf] [repo]
2018.10 Shin et al. WMT'18 Multi-encoder Transformer Network for Automatic Post-Editing [pdf] [repo]
2019.04 Calixto et al. Springer An Error Analysis for Image-based Multi-modal Neural Machine Translation [pdf] [repo]
2019.04 Hirasawa et al. arXiv Multimodal Machine Translation with Embedding Prediction [pdf] [repo] [github]
2019.06 Caglayan et al. NAACL-HLT'19 Probing the Need for Visual Context in Multimodal Machine Translation [pdf] [repo]
2019.06 Su et al. CVPR'19 Unsupervised Multi-modal Neural Machine Translation [pdf] [repo]
2019.06 Chen et al. IJCAI'19 From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots [pdf] [repo]
2019.07 Calixto et al. ACL'19 Latent Variable Model for Multi-modal Translation [pdf] [repo]
2019.07 Ive et al. ACL'19 Distilling Translations with Visual Awareness [pdf] [repo] [github]
2019.07 Hirasawa et al. ACL'19 Debiasing Word Embedding Improves Multimodal Machine Translation [pdf] [repo]
2019.07 Mogadala et al. arXiv Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods [pdf] [repo]
2020.02 Yang et al. AAAI'20 Visual Agreement Regularized Training for Multi-Modal Machine Translation [pdf] [repo]
2020.07 Huang et al. ACL'20 Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting [pdf] [repo]
2020.08 Sulubacak et al. Machine Translation Multimodal Machine Translation through Visuals and Speech [pdf] [repo]
2021.02 Wang et al. AAAI'21 Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding [pdf] [repo]
2021.04 Caglayan et al. EACL'21 Cross-lingual Visual Pre-training for Multimodal Machine Translation [pdf] [repo]
2021.04 Ive et al. EACL'21 Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation [pdf] [repo]

Multimodal Language Models

Year
Authors
Conf.
Title Links
2011.11 Jia et al. ICCV'11 Learning Cross-modality Similarity for Multinomial Data [pdf] [repo]
2014.10 Mao et al. arXiv Explain Images with Multimodal Recurrent Neural Networks [pdf] [repo]
2014.11 Kiros et al. arXiv Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models [pdf] [repo]
2015.06 Mao et al. ICLR'15 Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) [pdf] [repo] [github]
2015.09 Ferraro et al. EMNLP'15 A Survey of Current Datasets for Vision and Language Research [pdf] [repo]
2015.12 Ma et al. ICCV'15 Multimodal Convolutional Neural Networks for Matching Image and Sentence [pdf] [repo]
2016.06 You et al. CVPR'16 Image Captioning with Semantic Attention [pdf] [repo]
2016.10 Lu et al. NIPS'16 Hierarchical Question-Image Co-Attention for Visual Question Answering [pdf] [repo] [github]
2016.10 Yang et al. NIPS'16 Review Networks for Caption Generation [pdf] [repo] [github]
2018.06 Wang et al. NAACL'18 Object Counts! Bringing Explicit Detections Back into Image Captioning [pdf] [repo]
2018.06 Anderson et al. CVPR'18 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [pdf] [repo]
2018.06 Nguyen et al. CVPR'18 Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering [pdf] [repo]
2018.10 Zhu et al. EMNLP'18 MSMO: Multimodal Summarization with Multimodal Output [pdf] [repo]
2019.02 Li et al. AAAI'19 Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering [pdf] [repo]
2019.06 Qin et al. CVPR'19 Look Back and Predict Forward in Image Captioning [pdf] [repo]
2019.06 Yu et al. CVPR'19 Deep Modular Co-Attention Networks for Visual Question Answering [pdf] [repo]
2019.09 Kiela et al. arXiv Supervised Multimodal Bitransformers for Classifying Images and Text [pdf] [repo] [github]
2019.10 Guo et al. ACM-MM'19 Aligning Linguistic Words and Visual Semantic Units for Image Captioning [pdf] [repo] [github]
2019.10 Li et al. ACM-MM'19 Walking with MIND: Mental Imagery eNhanceD Embodied QA [pdf] [repo]
2019.10 Wu et al. ACM-MM'19 Editing Text in the Wild [pdf] [repo]
2019.12 Khademi et al. NIPS'19 Multimodal Neural Graph Memory Networks for Visual Question Answering [pdf] [repo]
2020.01 Park et al. WACV'20 MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding [pdf] [repo]
2020.02 Mai et al. AAAI'20 Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion [pdf] [repo]
2020.02 Sun et al. AAAI'20 Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis [pdf] [repo]
2020.02 Zhang et al. AAAI'20 Learning Long- and Short-Term User Literal Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption [pdf] [repo]
2020.06 Kim et al. CVPR'20 Hypergraph Attention Networks for Multimodal Learning [pdf] [repo]
2020.07 Alikhani et al. ACL'20 Clue: Cross-modal Coherence Modeling for Caption Generation [pdf] [repo]
2020.07 Chauhan et al. ACL'20 Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sacrasm, Sentiment and Emotion Analysis [pdf] [repo]
2020.07 Lin et al. ACL'20 A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks [pdf] [repo] [github]
2020.09 Luo et al. arXiv UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation [pdf] [repo]
2020.11 Cho et al. EMNLP'20 X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers [pdf] [repo] [web]
2020.11 Jin et al. EMNLP'20 Dual Low-Rank Multimodal Fusion [pdf] [repo]
2020.11 Khan et al. EMNLP'20 MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering [pdf] [repo]
2020.11 Tsai et al. EMNLP'20 Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis [pdf] [repo] [github]
2020.11 Wang et al. EMNLP'20 MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding [pdf] [repo]
2021.02 Yu et al. AAAI'21 Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis [pdf] [repo] [github]
2021.04 Sahu et al. EACL'21 Adaptive Fusion Techniques for Multimodal Data [pdf] [repo] [github]

Neural Machine Translation

Year
Authors
Conf.
Title Links
2016.06 Yang et al. NAACL-HLT'16 Hierarchical Attention Networks for Document Classification [pdf] [repo]
2016.06 Zoph et al. arXiv Multi-Source Neural Translation [pdf] [repo]
2017.12 Vaswani et al. NIPS'17 Attention Is All You Need [pdf] [repo] [github]
2017.12 Xia et al. NIPS'17 Deliberation Networks: Sequence Generation Beyond One-Pass Decoding [pdf] [repo] [github]
2018.04 Yang et al. NAACL-HLT'18 Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets [pdf] [repo]
2018.09 Wu et al. NAACL-HLT'18 Adversarial Neural Machine Translation [pdf] [repo]
2018.10 Miculicich et al. EMNLP'18 Document-Level Neural Machine Translation with Hierarchical Attention Networks [pdf] [repo]
2019.05 Devlin et al. arXiv BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [pdf] [repo] [github]
2019.05 Zhou et al. arXiv Synchronous Bidirectional Neural Machine Translation [pdf] [repo] [github]
2019.06 Yang et al. arXiv XLNet: Generalized Autoregressive Pretraining for Language Understanding [pdf] [repo] [github]
2019.07 Dai et al. ACL'19 Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context [pdf] [repo] [github]
2019.07 Liu et al. ACL'19 Hierarchical Transformers for Multi-Document Summarization [pdf] [repo] [github]
2019.07 Pourdamghani et al. ACL'19 Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation [pdf] [repo]

Datasets

Dataset
Authors
Paper Links
Flickr30K Young et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions [pdf] [repo] [web]
Flickr30K Entities Plummer et al. Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models [pdf] [repo] [web] [github]
Multi30K Elliott et al. Multi30K: Multilingual English-German Image Descriptions [pdf] [repo] [github]
IAPR-TC12 Grubinger et al. The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems [pdf] [repo] [web]
VATEX Wang et al. VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research [pdf] [repo] [web]

Metrics

Metric
Authors
Paper Links
BLEU Papineni et al. BLEU: a Method for Automatic Evaluation of Machine Translation [pdf] [repo]
METEOR Banerjee et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments [pdf] [repo] [web]
METEOR 1.5 Denkowski et al. METEOR Universal: Language Specific Translation Evaluation for Any Target Language [pdf] [repo] [web]
TER Snover et al. A study of Translation Edit Rate with Targeted Human Annotation [pdf] [repo]

Tutorials

Year Authors Title Links
2016 Elliott et al. Multimodal Learning and Reasoning [pdf] [repo]
2017 Lucia Specia Multimodal Machine Translation [pdf] [repo]
2018 Loic Barrault Introduction to Multimodal Machine Translation [pdf] [repo]
2018 Mirella Lapata Understanding Visual Scenes [pdf] [repo]