预训练语言模型(PLM)必读论文清单(附论文PDF、源码和模型链接)
来源:专知
模型:
论文:
https://arxiv.org/pdf/1802.05365.pdf
工程:
https://allennlp.org/elmo (ELMo)
Universal Language Model Fine-tuning for Text Classification. Jeremy Howard and Sebastian Ruder. ACL 2018.
论文:
https://www.aclweb.org/anthology/P18-1031
工程:
http://nlp.fast.ai/category/classification.html (ULMFiT)
Improving Language Understanding by Generative Pre-Training. Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. Preprint.
论文:
https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
工程:
https://openai.com/blog/language-unsupervised/ (GPT)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. NAACL 2019.
论文:https://arxiv.org/pdf/1810.04805.pdf
代码+模型:https://github.com/google-research/bert
Language Models are Unsupervised Multitask Learners. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. Preprint.
论文:https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf
代码:https://github.com/openai/gpt-2 (GPT-2)
ERNIE: Enhanced Language Representation with Informative Entities. Zhengyan Zhang, Xu Han, Zhiyuan Liu, Xin Jiang, Maosong Sun and Qun Liu. ACL2019.
论文:https://www.aclweb.org/anthology/P19-1139
代码+模型:https://github.com/thunlp/ERNIE (ERNIE (Tsinghua) )
ERNIE: Enhanced Representation through Knowledge Integration. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Xuyi Chen, Han Zhang, Xin Tian, Danxiang Zhu, Hao Tian and Hua Wu. Preprint.
论文:https://arxiv.org/pdf/1904.09223.pdf
代码:https://github.com/PaddlePaddle/ERNIE/tree/develop/ERNIE (ERNIE (Baidu) )
Defending Against Neural Fake News. Rowan Zellers, Ari Holtzman, Hannah Rashkin, Yonatan Bisk, Ali Farhadi, Franziska Roesner, Yejin Choi. NeurIPS.
论文:https://arxiv.org/pdf/1905.12616.pdf
工程:https://rowanzellers.com/grover/ (Grover)
Cross-lingual Language Model Pretraining. Guillaume Lample, Alexis Conneau. NeurIPS2019.
论文:https://arxiv.org/pdf/1901.07291.pdf
代码+模型:https://github.com/facebookresearch/XLM (XLM)
Multi-Task Deep Neural Networks for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. ACL2019.
论文:https://www.aclweb.org/anthology/P19-1441
代码+模型:https://github.com/namisan/mt-dnn (MT-DNN)
MASS: Masked Sequence to Sequence Pre-training for Language Generation. Kaitao Song, Xu Tan, Tao Qin, Jianfeng Lu, Tie-Yan Liu. ICML2019.
论文:https://arxiv.org/pdf/1905.02450.pdf
代码+模型:https://github.com/microsoft/MASS
Unified Language Model Pre-training for Natural Language Understanding and Generation. Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon. Preprint.
论文:https://arxiv.org/pdf/1905.03197.pdf (UniLM)
XLNet: Generalized Autoregressive Pretraining for Language Understanding. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le. NeurIPS2019.
论文:https://arxiv.org/pdf/1906.08237.pdf
代码+模型:https://github.com/zihangdai/xlnet
RoBERTa: A Robustly Optimized BERT Pretraining Approach. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. Preprint.
论文:https://arxiv.org/pdf/1907.11692.pdf
代码+模型:https://github.com/pytorch/fairseq
SpanBERT: Improving Pre-training by Representing and Predicting Spans. Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy. Preprint.
论文:https://arxiv.org/pdf/1907.10529.pdf
代码+模型:https://github.com/facebookresearch/SpanBERT
Knowledge Enhanced Contextual Word Representations. Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith. EMNLP2019.
论文:https://arxiv.org/pdf/1909.04164.pdf (KnowBert)
VisualBERT: A Simple and Performant Baseline for Vision and Language. Liunian Harold Li, Mark Yatskar, Da Yin, Cho-Jui Hsieh, Kai-Wei Chang. Preprint.
论文:https://arxiv.org/pdf/1908.03557.pdf
代码+模型:https://github.com/uclanlp/visualbert
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks. Jiasen Lu, Dhruv Batra, Devi Parikh, Stefan Lee. NeurIPS.
论文:https://arxiv.org/pdf/1908.02265.pdf
代码+模型:https://github.com/jiasenlu/vilbert_beta
VideoBERT: A Joint Model for Video and Language Representation Learning. Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, Cordelia Schmid. ICCV2019.
论文:https://arxiv.org/pdf/1904.01766.pdf
LXMERT: Learning Cross-Modality Encoder Representations from Transformers. Hao Tan, Mohit Bansal. EMNLP2019.
论文:https://arxiv.org/pdf/1908.07490.pdf
代码+模型:https://github.com/airsplay/lxmert
VL-BERT: Pre-training of Generic Visual-Linguistic Representations. Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai. Preprint.
论文:https://arxiv.org/pdf/1908.08530.pdf
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training. Gen Li, Nan Duan, Yuejian Fang, Ming Gong, Daxin Jiang, Ming Zhou. Preprint.
论文:https://arxiv.org/pdf/1908.06066.pdf
K-BERT: Enabling Language Representation with Knowledge Graph. Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang. Preprint.
论文:https://arxiv.org/pdf/1909.07606.pdf
Fusion of Detected Objects in Text for Visual Question Answering. Chris Alberti, Jeffrey Ling, Michael Collins, David Reitter. EMNLP2019.
论文:https://arxiv.org/pdf/1908.05054.pdf (B2T2)
Contrastive Bidirectional Transformer for Temporal Representation Learning. Chen Sun, Fabien Baradel, Kevin Murphy, Cordelia Schmid. Preprint.
论文:https://arxiv.org/pdf/1906.05743.pdf (CBT)
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding. Yu Sun, Shuohuan Wang, Yukun Li, Shikun Feng, Hao Tian, Hua Wu, Haifeng Wang. Preprint.
论文:https://arxiv.org/pdf/1907.12412v1.pdf
代码:https://github.com/PaddlePaddle/ERNIE/blob/develop/README.md
75 Languages, 1 Model: Parsing Universal Dependencies Universally. Dan Kondratyuk, Milan Straka. EMNLP2019.
论文:https://arxiv.org/pdf/1904.02099.pdf
代码+模型:https://github.com/hyperparticle/udify (UDify)
Pre-Training with Whole Word Masking for Chinese BERT. Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. Preprint.
论文:https://arxiv.org/pdf/1906.08101.pdf
代码+模型:https://github.com/ymcui/Chinese-BERT-wwm/blob/master/README_EN.md (Chinese-BERT-wwm)
知识蒸馏和模型压缩:
TinyBERT: Distilling BERT for Natural Language Understanding. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu.
论文:https://arxiv.org/pdf/1909.10351v1.pdf
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. Preprint.
论文:https://arxiv.org/pdf/1903.12136.pdf
Patient Knowledge Distillation for BERT Model Compression. Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. EMNLP2019.
论文:https://arxiv.org/pdf/1908.09355.pdf
代码:https://github.com/intersun/PKD-for-BERT-Model-Compression
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System. Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. Preprint.
论文:https://arxiv.org/pdf/1904.09636.pdf
PANLP at MEDIQA 2019: Pre-trained Language Models, Transfer Learning and Knowledge Distillation. Wei Zhu, Xiaofeng Zhou, Keqiang Wang, Xun Luo, Xiepeng Li, Yuan Ni, Guotong Xie. The 18th BioNLP workshop.
论文:https://www.aclweb.org/anthology/W19-5040
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. Preprint.
论文:https://arxiv.org/pdf/1904.09482.pdf
代码+模型:https://github.com/namisan/mt-dnn
Well-Read Students Learn Better: The Impact of Student Initialization on Knowledge Distillation. Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. Preprint.
论文:https://arxiv.org/pdf/1908.08962.pdf
Small and Practical BERT Models for Sequence Labeling. Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. EMNLP2019.
论文:https://arxiv.org/pdf/1909.00100.pdf
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. Preprint.
论文:https://arxiv.org/pdf/1909.05840.pdf
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Anonymous authors. ICLR2020 under review.
论文:https://openreview.net/pdf?id=H1eA7AEtvS
分析:
Revealing the Dark Secrets of BERT. Olga Kovaleva, Alexey Romanov, Anna Rogers, Anna Rumshisky. EMNLP2019.
论文:https://arxiv.org/abs/1908.08593
How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations. Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers. CIKM2019.
论文:https://arxiv.org/pdf/1909.04925.pdf
论文:https://arxiv.org/pdf/1905.10650.pdf
代码:https://github.com/pmichel31415/are-16-heads-really-better-than-1
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. Preprint.
论文:https://arxiv.org/pdf/1907.11932.pdf
代码:https://github.com/jind11/TextFooler
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. Alex Wang, Kyunghyun Cho. NeuralGen2019.
论文:https://arxiv.org/pdf/1902.04094.pdf
代码:https://github.com/nyu-dl/bert-gen
论文:https://www.aclweb.org/anthology/N19-1112
论文:https://arxiv.org/pdf/1906.04341.pdf
代码:https://github.com/clarkkev/attention-analysis
论文:https://arxiv.org/pdf/1906.01698.pdf
代码:https://github.com/yongjie-lin/bert-opensesame
论文:https://arxiv.org/pdf/1906.04284.pdf
论文:https://arxiv.org/pdf/1906.01539.pdf
论文:https://www.aclweb.org/anthology/P19-1452
论文:https://www.aclweb.org/anthology/P19-1493
论文:https://www.aclweb.org/anthology/P19-1356
论文:https://arxiv.org/pdf/1904.09077.pdf
论文:https://arxiv.org/pdf/1909.00512.pdf
论文:https://www.aclweb.org/anthology/P19-1459
代码:https://github.com/IKMLab/arct2
论文:https://arxiv.org/pdf/1908.07125.pdf
代码:https://github.com/Eric-Wallace/universal-triggers
https://arxiv.org/pdf/1909.01380.pdf
https://arxiv.org/pdf/1909.07940.pdf
论文:https://arxiv.org/pdf/1909.02597.pdf
代码:https://github.com/alexwarstadt/data_generation
论文:https://arxiv.org/pdf/1908.05620.pdf
论文:https://arxiv.org/pdf/1906.02715.pdf
论文:https://arxiv.org/pdf/1908.04211.pdf
论文:https://arxiv.org/pdf/1908.11775.pdf
论文:https://arxiv.org/pdf/1909.01066.pdf
代码:https://github.com/facebookresearch/LAMA
参考链接:
https://github.com/thunlp/PLMpapers
预训练语言模型(PLM)必读论文清单(附论文PDF、源码和模型链接)相关推荐
- 重磅新作!预训练语言模型入门必读好书 | 送书福利
邵浩,刘一烽 编著 电子工业出版社-博文视点 2021-05-01 ISBN: 9787121409998 定价: 109.00 元 新书推荐 ????今日福利 |关于本书| 预训练语言模型属于人工智 ...
- 【预训练语言模型】WKLM: Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model
[预训练语言模型]WKLM:Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model 知识增强的 ...
- 大模型系统和应用——Transformer预训练语言模型
引言 最近在公众号中了解到了刘知远团队退出的视频课程<大模型交叉研讨课>,看了目录觉得不错,因此拜读一下. 观看地址: https://www.bilibili.com/video/BV1 ...
- 清华研究登Nature子刊:面向大规模预训练语言模型的参数高效微调
©作者 | 机器之心编辑部 来源 | 机器之心 近年来,清华大学计算机系孙茂松团队深入探索语言大模型参数高效微调方法的机理与特性,与校内其他相关团队合作完成的研究成果"面向大规模预训练语言模 ...
- 赠书 | 一文了解预训练语言模型
来源 | 博文视点 头图 | 下载于视觉中国 近年来,在深度学习和大数据的支撑下,自然语言处理技术迅猛发展.而预训练语言模型把自然语言处理带入了一个新的阶段,也得到了工业界的广泛关注. 通过大数据预训 ...
- 周末送新书 | 一文了解预训练语言模型!
近年来,在深度学习和大数据的支撑下,自然语言处理技术迅猛发展. 而预训练语言模型把自然语言处理带入了一个新的阶段,也得到了工业界的广泛关注. 通过大数据预训练加小数据微调,自然语言处理任务的解决,无须 ...
- 面向神经代码智能(NCI,Neural Code Intelligence)的预训练语言模型综述
面向神经代码智能的预训练语言模型综述 reference: https://arxiv.org/pdf/2212.10079v1.pdf Abstract 随着现代软件的复杂性不断升级,软件工程已 ...
- [源码解析] 模型并行分布式训练 Megatron (4) --- 如何设置各种并行
[源码解析] 模型并行分布式训练 Megatron (4) - 如何设置各种并行 文章目录 [源码解析] 模型并行分布式训练 Megatron (4) --- 如何设置各种并行 0x00 摘要 0x0 ...
- 预训练语言模型论文分类整理:综述、基准数据集、PLM的设计和分析
©作者 | 王晓磊 学校 | 中国人民大学博士生 研究方向 | 对话系统 1. 引言 近年来,以 BERT 和 GPT 系列为代表的大规模预训练语言模型(Pre-trained Language Mo ...
最新文章
- Building Java Projects with Gradle
- getname java_关于java:只获取类Class.getName()的名称
- .Net Discovery 系列之六--深入浅出.Net实时编译机制(下)
- matlab caxis 刻度,[求助]MATLAB画图问题,caxis.m的问题
- Java判断工作日计算,计算随意2个日期内的工作日
- TCP的困境与解决方案
- 2016 linux发行版排行_选择困难症必看!云服务器如何选择操作系统,Windows和Linux哪个更好?...
- Django的get和post请求处理
- iOS:位置相关(18-03-09更)
- Zabbix 监控 MySQL
- autocad不能画图_学了这50条CAD技巧,画图速度提10倍!
- sql按照字符串格式拼接
- [每日一雷] C++多线程unique_lock condition_varibale mutex 动态释放内存泄露问题delete出现corrupted unsorted chunks
- 不用U盘从linux重装win系统,不用U盘和光盘安装win7旗舰版系统
- 我的测试图片vr(后前-上下-左右)
- c语言共阴极数码管数字6,用51单片机C语言编写程序实现6位共阴极数码管循环显示0123456789ABCDEF,六个数码管是连续不同的六个数?...
- 速度与压缩比如何兼得?压缩算法在构建部署中的优化
- mysql5.7.20如何卸载干净再重装
- (OS 10038)在一个非套接字上尝试了一个操作 的解决办法
- 电机控制中标幺的目的
热门文章
- UVA11732 strcmp() Anyone?
- Linux服务器优化(转)
- CentOS7 systemctl的使用
- ORA-24247: network access denied by access control list (ACL)
- tomcat通过conf-Catalina-localhost目录发布项目详解
- 为WPF和Silverlight的Grid添加边框线(zz)
- banana pi 板上跑树莓派镜像
- SQL server中SET ANSI_PADDING对char、varchar、nvarchar的影响
- 修改sms_def的MOF文件收集网络共享信息
- 用python解“用天平找小球”题