独家 | 基于知识蒸馏的BERT模型压缩
作者:孙思琦、成宇、甘哲、刘晶晶
本文为你介绍“耐心的知识蒸馏”模型。
数据派THU后台回复“191010”,获取论文地址。
图表1
图表2
Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI Blog 1.8 (2019).
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv preprint arXiv:1906.08237 (2019).
Liu, Yinhan, et al. "Roberta: A robustly optimized BERT pretraining approach." arXiv preprint arXiv:1907.11692 (2019).
Hinton, Geoffrey, Oriol Vinyals, and Jeff Dean. "Distilling the knowledge in a neural network." arXiv preprint arXiv:1503.02531 (2015).
Siqi Sun: is a Research SDE in Microsoft. He is currently working on commonsense reasoning and knowledge graph related projects. Prior joining Microsoft, he was a PhD student in computer science at TTI Chicago, and before that he was an undergraduate student from school of mathematics at Fudan University.
Yu Cheng: is a senior researcher at Microsoft. His research is about deep learning in general, with specific interests in model compression, deep generative model and adversarial learning. He is also interested in solving real-world problems in computer vision and natural language processing. Yu received his Ph.D.from Northwestern University in 2015 and his bachelor from Tsinghua University in 2010. Before join Microsoft, he spent three years as a Research Staff Member at IBM Research/MIT-IBM Watson AI Lab.
Zhe Gan: is a senior researcher at Microsoft, primarily working on generative models, visual QA/dialog, machine reading comprehension (MRC), and natural language generation (NLG). He also has broad interests on various machine learning and NLP topics. Zhe received his PhD degree from Duke University in Spring 2018. Before that, he received his Master's and Bachelor's degree from Peking University in 2013 and 2010, respectively.
Jingjing (JJ) Liu: is a Principal Research Manager at Microsoft, leading a research team in NLP and Computer Vision. Her current research interests include Machine Reading Comprehension, Commonsense Reasoning, Visual QA/Dialog and Text-to-Image Generation. She received her PhD degree in Computer Science from MIT EECS in 2011. She also holds an MBA degree from Judge Business School at University of Cambridge.Before joining MSR, Dr.Liu was the Director of Product at Mobvoi Inc and Research Scientist at MIT CSAIL.
数据派THU后台回复“191010”,获取论文地址。
点击“阅读原文”拥抱组织
独家 | 基于知识蒸馏的BERT模型压缩相关推荐
- 【BERT】BERT模型压缩技术概览
由于BERT参数众多,模型庞大,推理速度较慢,在一些实时性要求较高.计算资源受限的场景,其应用会受到限制.因此,讨论如何在不过多的损失BERT性能的条件下,对BERT进行模型压缩,是一个非常有现实意义 ...
- 娓娓道来!那些BERT模型压缩方法
本文约3000字,建议阅读10+分钟 本文主要介绍知识蒸馏.参数共享和参数矩阵近似方法. 作者 | Chilia 哥伦比亚大学 nlp搜索推荐 整理 | NewBeeNLP 基于Transformer ...
- 所有你要知道的 BERT 模型压缩方法,都在这里!
模型压缩可减少受训神经网络的冗余,由于几乎没有 BERT 或者 BERT-Large 模型可直接在 GPU 及智能手机上应用,因此模型压缩方法对于 BERT 的未来的应用前景而言,非常有价值. 软件工 ...
- 娓娓道来!那些BERT模型压缩方法(一)
作者 | Chilia 哥伦比亚大学 nlp搜索推荐 整理 | NewBeeNLP 基于Transformer的预训练模型的趋势就是越来越大,虽然这些模型在效果上有很大的提升,但是巨大的参数量也对上线 ...
- 使用DistilBERT 蒸馏类 BERT 模型的代码实现
来源:DeepHub IMBA 本文约2700字,建议阅读9分钟 本文带你进入Distil细节,并给出完整的代码实现.本文为你详细介绍DistilBERT,并给出完整的代码实现. 机器学习模型已经变得 ...
- 模型压缩:量化、剪枝和蒸馏
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 编者荐语 近年来,BERT 系列模型成了应用最广的预训练语言模型, ...
- bert模型蒸馏实战
由于bert模型参数很大,在用到生产环境中推理效率难以满足要求,因此经常需要将模型进行压缩.常用的模型压缩的方法有剪枝.蒸馏和量化等方法.比较容易实现的方法为知识蒸馏,下面便介绍如何将bert模型进行 ...
- BERT-of-Theseus:基于模块替换的模型压缩方法
©PaperWeekly 原创 · 作者|苏剑林 学校|追一科技 研究方向|NLP.神经网络 最近了解到一种称为"BERT-of-Theseus"的 BERT 模型压缩方法,来自论 ...
- 深度学习模型压缩与加速技术(三):低秩分解
目录 总结 低秩分解 定义 特点 1.二元分解 2.多元分解 参考文献 深度学习模型的压缩和加速是指利用神经网络参数的冗余性和网络结构的冗余性精简模型,在不影响任务完成度的情况下,得到参数量更少.结构 ...
最新文章
- 关于union的那些事儿
- 【SpringBoot】SpingBoot整合AOP
- mysql目录树_MySQL B+树目录及索引优化_mysql
- PASCAL-VOC2012 数据集介绍 及 制作同格式数据
- 区块链应用 | 区块链火了,这到底是虚火还是实火?
- SparkSQL错误:Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider...
- Atitit. 软件设计 模式 变量 方法 命名最佳实践 vp820 attilax总结命名表大全
- 基于DEM数据的河流提取
- centos7下载php7.4
- ​一、什么是射频识别?二、射频识别系统组成及工作原理三、射频识别系统分类四、RFID与物联网​
- 微信小程序框架--weui
- 诺基亚x6 云服务器,手机上面怎么玩端游?诺基亚X6通过云电脑玩DNF教程
- matlab中的clc命令和clear命令
- File “/etc/oratab“ is not accessible.
- Java通过SMS短信平台实现发短信功能
- 怎么将png图片缩小?教你在线压缩png图片的方法
- 【Java】Java的各个版本和各个版本的历史版本号的关系与解读
- Java:获取字符串长度(length())
- RFID射频技术基本原理与射频技术中的基本单位
- 解救西西弗斯- 模型驱动架构(MDA,Model Driven Architecture)浅述