PTMs

Contributed by Jishun Zhao.

预训练模型相关资料补充领域介绍文档每篇论文入选理由，1-3句话介绍该论文亮点。随时更新

论文清单

综述

论文：Pre-trained Models for Natural Language Processing: A Survey
地址:https://arxiv.org/pdf/2003.08271.pdf
复旦大学邱锡鹏等学者发布了的的自然语言处理处理中预训练模型PTMs的综述大全，共25页pdf205篇参考文献，从背景知识到当前代表性PTM模型和应用研究挑战等，是绝好的预训练语言模型的文献。

领域必读（10篇）

context2vec: Learning Generic Context Embedding with Bidirectional LSTM. Oren Melamud, Jacob Goldberger, Ido Dagan. CoNLL 2016. [pdf] [project] (context2vec).

提出了一种无监督模型，借助双向LSTM从大型语料库中有效学习通用句子上下文表征。
Attention is all you need.[1] Vaswani, A. , Shazeer, N. et al. (2017).

最强特征提取器,超越RNN的模型之作,BERT模型的重要组成部分.
Deep contextualized word representations.Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee and Luke Zettlemoyer. NAACL 2018. [pdf] [project] (ELMo).

解决一词多义的动态词向量模型.
Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint. [pdf] [project] (GPT).

在这篇论文中，探索出了一种对自然语言理解任务的半监督方法，融合了无监督的预训练(pre-training)和有监督的微调(fine-tuning)过程。
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. NAACL 2019. [pdf] [code & model].

划时代意义的预训练模型.
ERNIE: Enhanced Language Representation with Informative Entities. Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun and Qun Liu. ACL 2019. [pdf] [code & model] (ERNIE (Tsinghua) ).

利用了大规模的语料信息和知识图谱，去训练一个增强的语言表示模型，它能够同时利用词汇、语义和知识信息。
Defending Against Neural Fake News. Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi. NeurIPS 2019. [pdf] [project] (Grover).

本文讨论了不同的自然语言处理方法，以开发出对神经假新闻的强大防御，包括使用GPT-2检测器模型和Grover（AllenNLP）
Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, Noah A. Smith. ACL 2020. [pdf].

ACL2020最佳论文,预训练训练与微调使用trick.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. ICLR 2020. [pdf].

一种针对Bert的模型压缩模型.

10.XLNet: Generalized Autoregressive Pretraining for Language Understanding. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. NeurIPS 2019. [pdf] [code & model] 自回归预训练方法,克服了BERT的局限性。 11. Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. Preprint. [pdf] [code] (GPT-2).

多任务预训练+超大数据集+超大规模模型

较新重要（2-3年内，20篇)

RoBERTa: A Robustly Optimized BERT Pretraining Approach. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint. [pdf] [code & model]
Knowledge Enhanced Contextual Word Representations. Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith. EMNLP 2019. [pdf] (KnowBert)
Contrastive Bidirectional Transformer for Temporal Representation Learning. Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid. Preprint. [pdf] (CBT)
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang. Preprint. [pdf] [code]
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer. ACL 2020. [pdf]
FreeLB: Enhanced Adversarial Training for Language Understanding. Chen Zhu, Yu Cheng, Zhe Gan, Siqi Sun, Thomas Goldstein, Jingjing Liu. ICLR 2020. [pdf]
On the Sentence Embeddings from Pre-trained Language Models. Bohan Li, Hao Zhou, Junxian He, Mingxuan Wang, Yiming Yang, Lei Li. EMNLP 2020. [pdf]
Pre-training via Paraphrasing. Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, Sida Wang, Luke Zettlemoyer. NeurIPS 2020. [pdf]
PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation. Wei Zhu, Xiaofeng Zhou, Keqiang Wang, Xun Luo, Xiepeng Li, Yuan Ni, Guotong Xie. The 18th BioNLP workshop. [pdf]
Reducing Transformer Depth on Demand with Structured Dropout. Angela Fan, Edouard Grave, Armand Joulin. ICLR 2020. [pdf]
Contrastive Distillation on Intermediate Representations for Language Model Compression. Siqi Sun, Zhe Gan, Yuwei Fang, Yu Cheng, Shuohang Wang, Jingjing Liu. EMNLP 2020. [pdf]
When BERT Plays the Lottery, All Tickets Are Winning. Sai Prasanna, Anna Rogers, Anna Rumshisky. EMNLP 2020. [pdf]
BERT Loses Patience: Fast and Robust Inference with Early Exit. Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei. NeurIPS 2020. [pdf]
Revealing the Dark Secrets of BERT. Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky. EMNLP 2019. [pdf]
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations. Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers. CIKM 2019. [pdf]
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. Alex Wang, Kyunghyun Cho. NeuralGen 2019. [pdf] [code]
What Does BERT Look At? An Analysis of BERT's Attention. Kevin Clark, Urvashi Khandelwal, Omer Levy, Christopher D. Manning. BlackBoxNLP 2019. [pdf] [code]
BERT Rediscovers the Classical NLP Pipeline. Ian Tenney, Dipanjan Das, Ellie Pavlick. ACL 2019. [pdf]
Language Models as Knowledge Bases? Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel. EMNLP 2019, [pdf] [code]
What do you learn from context? Probing for sentence structure in contextualized word representations. Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R. Thomas McCoy, Najoung Kim, Benjamin Van Durme, Samuel R. Bowman, Dipanjan Das, and Ellie Pavlick. ICLR 2019. [pdf]

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
papers		papers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PTMs

论文清单

综述

领域必读（10篇）

较新重要（2-3年内，20篇)

最新可读（1年内，无篇数上限）

About

Releases

Packages

zhaojishun/PTMs

Folders and files

Latest commit

History

Repository files navigation

PTMs

论文清单

综述

领域必读（10篇）

较新重要（2-3年内，20篇)

最新可读（1年内，无篇数上限）

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages