Resource List
[pdf]: paper PDF online link
[repo]: paper PDF repo link
[github]: github link
[web]: website link
Year | Authors |
Conf. |
Title | Links |
---|---|---|---|---|
2016.08 | Huang et al. | WMT'16 | Attention-based Multimodal Neural Machine Translation | [pdf] [repo] |
2016.08 | Caglayan et al. | WMT'16 | Does Multimodality Help Human and Machine for Translation and Image Captioning? | [pdf] [repo] |
2016.09 | Caglayan et al. | arXiv | Multimodal Attention for Neural Machine Translation | [pdf] [repo] |
2017.02 | Calixto et al. | arXiv | Doubly-Attentive Decoder for Multi-modal Neural Machine Translation | [pdf] [repo] [github] |
2017.03 | Delbrouck et al. | ICLR'17 | Multimodal Compact Bilinear Pooling for Multimodal Neural Machine Translation | [pdf] [repo] |
2017.05 | Chen et al. | arXiv | A Teacher-Student Framework for Zero-Resource Neural Machine Translation | [pdf] [repo] |
2017.06 | Lala et al. | PBML'17 | Unraveling the Contribution of Image Captioning and Neural Machine Translation for Multimodal Machine Translation | [pdf] [repo] |
2017.07 | Elliott et al. | arXiv | Imagination improves Multimodal Translation | [pdf] [repo] |
2017.07 | Nakayama et al. | arXiv | Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot | [pdf] [repo] |
2017.08 | Libovicky et al. | ACL'17 | Attention Strategies for Multi-Source Sequence-to-Sequence Learning | [pdf] [repo] |
2017.09 | Calixto et al. | EMNLP'17 | Incorporating Global Visual Features into Attention-Based Neural Machine Translation | [pdf] [repo] |
2017.10 | Elliott et al. | WMT'17 | Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description | [pdf] [repo] |
2018.05 | Qian et al. | arXiv | Multimodal Machine Translation with Reinforcement Learning | [pdf] [repo] |
2018.08 | Zhou et al. | ACL'18 | A Visual Attention Grounding Neural Model for Multimodal Machine Translation | [pdf] [repo] |
2018.10 | Barrault et al. | WMT'18 | Findings of the Third Shared Task on Multimodal Machine Translation | [pdf] [repo] |
2018.10 | Caglayan et al. | WMT'18 | LIUM-CVC Submissions for WMT18 Multimodal Translation Task | [pdf] [repo] |
2018.10 | Gronroos et al. | WMT'18 | The MeMAD Submission to the WMT18 Multimodal Translation Task | [pdf] [repo] |
2018.10 | Gwinnup et al. | WMT'18 | The AFRL-Ohio State WMT18 Multimodal System: Combining Visual with Traditional | [pdf] [repo] |
2018.10 | Helcl et al. | WMT'18 | CUNI System for the WMT18 Multimodal Translation Task | [pdf] [repo] |
2018.10 | Lala et al. | WMT'18 | Sheffield Submissions for WMT18 Multimodal Translation Shared Task | [pdf] [repo] |
2018.10 | Zheng et al. | WMT'18 | Ensemble Sequence Level Training for Multimodal MT: OSU-Baidu WMT18 Multimodal Translation System Report | [pdf] [repo] |
2018.10 | Delbrouck et al. | WMT'18 | UMONS Submission for WMT18 Multimodal Translation Task | [pdf] [repo] [github] |
2018.10 | Libovicky et al. | WMT'18 | Input Combination Strategies for Multi-Source Transformer Decoder | [pdf] [repo] |
2018.10 | Shin et al. | WMT'18 | Multi-encoder Transformer Network for Automatic Post-Editing | [pdf] [repo] |
2019.04 | Calixto et al. | Springer | An Error Analysis for Image-based Multi-modal Neural Machine Translation | [pdf] [repo] |
2019.04 | Hirasawa et al. | arXiv | Multimodal Machine Translation with Embedding Prediction | [pdf] [repo] [github] |
2019.06 | Caglayan et al. | NAACL-HLT'19 | Probing the Need for Visual Context in Multimodal Machine Translation | [pdf] [repo] |
2019.06 | Su et al. | CVPR'19 | Unsupervised Multi-modal Neural Machine Translation | [pdf] [repo] |
2019.06 | Chen et al. | IJCAI'19 | From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots | [pdf] [repo] |
2019.07 | Calixto et al. | ACL'19 | Latent Variable Model for Multi-modal Translation | [pdf] [repo] |
2019.07 | Ive et al. | ACL'19 | Distilling Translations with Visual Awareness | [pdf] [repo] [github] |
2019.07 | Hirasawa et al. | ACL'19 | Debiasing Word Embedding Improves Multimodal Machine Translation | [pdf] [repo] |
2019.07 | Mogadala et al. | arXiv | Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods | [pdf] [repo] |
2020.02 | Yang et al. | AAAI'20 | Visual Agreement Regularized Training for Multi-Modal Machine Translation | [pdf] [repo] |
2020.07 | Huang et al. | ACL'20 | Unsupervised Multimodal Neural Machine Translation with Pseudo Visual Pivoting | [pdf] [repo] |
2020.08 | Sulubacak et al. | Machine Translation | Multimodal Machine Translation through Visuals and Speech | [pdf] [repo] |
2021.02 | Wang et al. | AAAI'21 | Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding | [pdf] [repo] |
2021.04 | Caglayan et al. | EACL'21 | Cross-lingual Visual Pre-training for Multimodal Machine Translation | [pdf] [repo] |
2021.04 | Ive et al. | EACL'21 | Exploiting Multimodal Reinforcement Learning for Simultaneous Machine Translation | [pdf] [repo] |
Year | Authors |
Conf. |
Title | Links |
---|---|---|---|---|
2011.11 | Jia et al. | ICCV'11 | Learning Cross-modality Similarity for Multinomial Data | [pdf] [repo] |
2014.10 | Mao et al. | arXiv | Explain Images with Multimodal Recurrent Neural Networks | [pdf] [repo] |
2014.11 | Kiros et al. | arXiv | Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models | [pdf] [repo] |
2015.06 | Mao et al. | ICLR'15 | Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) | [pdf] [repo] [github] |
2015.09 | Ferraro et al. | EMNLP'15 | A Survey of Current Datasets for Vision and Language Research | [pdf] [repo] |
2015.12 | Ma et al. | ICCV'15 | Multimodal Convolutional Neural Networks for Matching Image and Sentence | [pdf] [repo] |
2016.06 | You et al. | CVPR'16 | Image Captioning with Semantic Attention | [pdf] [repo] |
2016.10 | Lu et al. | NIPS'16 | Hierarchical Question-Image Co-Attention for Visual Question Answering | [pdf] [repo] [github] |
2016.10 | Yang et al. | NIPS'16 | Review Networks for Caption Generation | [pdf] [repo] [github] |
2018.06 | Wang et al. | NAACL'18 | Object Counts! Bringing Explicit Detections Back into Image Captioning | [pdf] [repo] |
2018.06 | Anderson et al. | CVPR'18 | Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering | [pdf] [repo] |
2018.06 | Nguyen et al. | CVPR'18 | Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering | [pdf] [repo] |
2018.10 | Zhu et al. | EMNLP'18 | MSMO: Multimodal Summarization with Multimodal Output | [pdf] [repo] |
2019.02 | Li et al. | AAAI'19 | Beyond RNNs: Positional Self-Attention with Co-Attention for Video Question Answering | [pdf] [repo] |
2019.06 | Qin et al. | CVPR'19 | Look Back and Predict Forward in Image Captioning | [pdf] [repo] |
2019.06 | Yu et al. | CVPR'19 | Deep Modular Co-Attention Networks for Visual Question Answering | [pdf] [repo] |
2019.09 | Kiela et al. | arXiv | Supervised Multimodal Bitransformers for Classifying Images and Text | [pdf] [repo] [github] |
2019.10 | Guo et al. | ACM-MM'19 | Aligning Linguistic Words and Visual Semantic Units for Image Captioning | [pdf] [repo] [github] |
2019.10 | Li et al. | ACM-MM'19 | Walking with MIND: Mental Imagery eNhanceD Embodied QA | [pdf] [repo] |
2019.10 | Wu et al. | ACM-MM'19 | Editing Text in the Wild | [pdf] [repo] |
2019.12 | Khademi et al. | NIPS'19 | Multimodal Neural Graph Memory Networks for Visual Question Answering | [pdf] [repo] |
2020.01 | Park et al. | WACV'20 | MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding | [pdf] [repo] |
2020.02 | Mai et al. | AAAI'20 | Modality to Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion | [pdf] [repo] |
2020.02 | Sun et al. | AAAI'20 | Learning Relationships between Text, Audio, and Video via Deep Canonical Correlation for Multimodal Language Analysis | [pdf] [repo] |
2020.02 | Zhang et al. | AAAI'20 | Learning Long- and Short-Term User Literal Preference with Multimodal Hierarchical Transformer Network for Personalized Image Caption | [pdf] [repo] |
2020.06 | Kim et al. | CVPR'20 | Hypergraph Attention Networks for Multimodal Learning | [pdf] [repo] |
2020.07 | Alikhani et al. | ACL'20 | Clue: Cross-modal Coherence Modeling for Caption Generation | [pdf] [repo] |
2020.07 | Chauhan et al. | ACL'20 | Sentiment and Emotion help Sarcasm? A Multi-task Learning Framework for Multi-Modal Sacrasm, Sentiment and Emotion Analysis | [pdf] [repo] |
2020.07 | Lin et al. | ACL'20 | A Recipe for Creating Multimodal Aligned Datasets for Sequential Tasks | [pdf] [repo] [github] |
2020.09 | Luo et al. | arXiv | UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation | [pdf] [repo] |
2020.11 | Cho et al. | EMNLP'20 | X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers | [pdf] [repo] [web] |
2020.11 | Jin et al. | EMNLP'20 | Dual Low-Rank Multimodal Fusion | [pdf] [repo] |
2020.11 | Khan et al. | EMNLP'20 | MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering | [pdf] [repo] |
2020.11 | Tsai et al. | EMNLP'20 | Multimodal Routing: Improving Local and Global Interpretability of Multimodal Language Analysis | [pdf] [repo] [github] |
2020.11 | Wang et al. | EMNLP'20 | MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding | [pdf] [repo] |
2021.02 | Yu et al. | AAAI'21 | Learning Modality-Specific Representations with Self-Supervised Multi-Task Learning for Multimodal Sentiment Analysis | [pdf] [repo] [github] |
2021.04 | Sahu et al. | EACL'21 | Adaptive Fusion Techniques for Multimodal Data | [pdf] [repo] [github] |
Year | Authors |
Conf. |
Title | Links |
---|---|---|---|---|
2016.06 | Yang et al. | NAACL-HLT'16 | Hierarchical Attention Networks for Document Classification | [pdf] [repo] |
2016.06 | Zoph et al. | arXiv | Multi-Source Neural Translation | [pdf] [repo] |
2017.12 | Vaswani et al. | NIPS'17 | Attention Is All You Need | [pdf] [repo] [github] |
2017.12 | Xia et al. | NIPS'17 | Deliberation Networks: Sequence Generation Beyond One-Pass Decoding | [pdf] [repo] [github] |
2018.04 | Yang et al. | NAACL-HLT'18 | Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets | [pdf] [repo] |
2018.09 | Wu et al. | NAACL-HLT'18 | Adversarial Neural Machine Translation | [pdf] [repo] |
2018.10 | Miculicich et al. | EMNLP'18 | Document-Level Neural Machine Translation with Hierarchical Attention Networks | [pdf] [repo] |
2019.05 | Devlin et al. | arXiv | BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | [pdf] [repo] [github] |
2019.05 | Zhou et al. | arXiv | Synchronous Bidirectional Neural Machine Translation | [pdf] [repo] [github] |
2019.06 | Yang et al. | arXiv | XLNet: Generalized Autoregressive Pretraining for Language Understanding | [pdf] [repo] [github] |
2019.07 | Dai et al. | ACL'19 | Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | [pdf] [repo] [github] |
2019.07 | Liu et al. | ACL'19 | Hierarchical Transformers for Multi-Document Summarization | [pdf] [repo] [github] |
2019.07 | Pourdamghani et al. | ACL'19 | Translating Translationese: A Two-Step Approach to Unsupervised Machine Translation | [pdf] [repo] |
Dataset | Authors |
Paper | Links |
---|---|---|---|
Flickr30K | Young et al. | From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions | [pdf] [repo] [web] |
Flickr30K Entities | Plummer et al. | Flickr30K Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models | [pdf] [repo] [web] [github] |
Multi30K | Elliott et al. | Multi30K: Multilingual English-German Image Descriptions | [pdf] [repo] [github] |
IAPR-TC12 | Grubinger et al. | The IAPR TC-12 Benchmark: A New Evaluation Resource for Visual Information Systems | [pdf] [repo] [web] |
VATEX | Wang et al. | VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research | [pdf] [repo] [web] |
Metric | Authors |
Paper | Links |
---|---|---|---|
BLEU | Papineni et al. | BLEU: a Method for Automatic Evaluation of Machine Translation | [pdf] [repo] |
METEOR | Banerjee et al. | METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments | [pdf] [repo] [web] |
METEOR 1.5 | Denkowski et al. | METEOR Universal: Language Specific Translation Evaluation for Any Target Language | [pdf] [repo] [web] |
TER | Snover et al. | A study of Translation Edit Rate with Targeted Human Annotation | [pdf] [repo] |
Year | Authors | Title | Links |
---|---|---|---|
2016 | Elliott et al. | Multimodal Learning and Reasoning | [pdf] [repo] |
2017 | Lucia Specia | Multimodal Machine Translation | [pdf] [repo] |
2018 | Loic Barrault | Introduction to Multimodal Machine Translation | [pdf] [repo] |
2018 | Mirella Lapata | Understanding Visual Scenes | [pdf] [repo] |