迁移学习 nlp

This is the third part of a series of posts showing the improvements in NLP modeling approaches. We have seen the use of traditional techniques like Bag of Words, TF-IDF, then moved on to RNNs and LSTMs. This time we’ll look into one of the pivotal shifts in approaching NLP Tasks — Transfer Learning!

这是一系列文章的第三部分，显示了NLP建模方法的改进。我们已经看到了诸如词袋，TF-IDF之类的传统技术的使用，然后又转向了RNN和LSTM 。这次，我们将探讨处理NLP任务的一项重要转变-转移学习！

The complete code for this tutorial is available at this Kaggle Kernel

本教程的完整代码可在此Kaggle内核中找到。

超低价 (ULMFit)

The idea of using Transfer Learning is quite new in NLP Tasks, while it has been quite prominently used in Computer Vision tasks! This new way of looking at NLP was first proposed by Howard Jeremy, and has transformed the way we looked at data previously!

在NLP任务中，使用转移学习的想法是相当新的，而在计算机视觉任务中已经非常显着地使用了转移学习！这种查看NLP的新方法最初是由霍华德·杰里米(Howard Jeremy)提出的，它改变了我们之前查看数据的方式！

The core idea is two-fold — using generative pre-trained Language Model + task-specific fine-tuning was first explored in ULMFiT (Howard & Ruder, 2018), directly motivated by the success of using ImageNet pre-training for computer vision tasks. The base model is AWD-LSTM.

核心思想有两个方面-使用生成式预训练语言模型+特定于任务的微调是在ULMFiT中首次探索的(Howard＆Ruder，2018)，其直接动机是将ImageNet预训练成功用于计算机视觉任务。基本模型是AWD-LSTM。

A Language Model is exactly like it sounds — the output of this model is to predict the next word of a sentence. The goal is to have a model that can understand the semantics, grammar, and unique structure of a language.

语言模型就像听起来一样—该模型的输出是预测句子的下一个单词。我们的目标是建立一个能够理解语言的语义，语法和独特结构的模型。

ULMFit follows three steps to achieve good transfer learning results on downstream language classification tasks:

ULMFit遵循三个步骤以在下游语言分类任务上获得良好的迁移学习结果：

General Language Model pre-training: on Wikipedia text.通用语言模型预培训：在Wikipedia文本上。
Target task Language Model fine-tuning: ULMFiT proposed two training techniques for stabilizing the fine-tuning process.目标任务语言模型的微调：ULMFiT提出了两种训练技术来稳定微调过程。
Target task classifier fine-tuning: The pretrained LM is augmented with two standard feed-forward layers and a softmax normalization at the end to predict a target label distribution.目标任务分类器的微调：预训练的LM通过两个标准前馈层和最后的softmax归一化进行增强，以预测目标标签的分布。

对NLP使用fast.ai- (Using fast.ai for NLP -)

fast.ai’s motto — Making Neural Networks Uncool again — tells you a lot about their approach ;) Implementation of these models is remarkably simple and intuitive, and with good documentation, you can easily find a solution if you get stuck anywhere. Along with this, and a few other reasons I elaborate below, I decided to try out the fast.ai library which is built on top of PyTorch instead of Keras. Despite being used to working in Keras, I didn’t find it difficult to navigate fast.ai and the learning curve is quite fast to implement advanced things as well!

fast.ai的座右铭-再次使神经网络变得不酷-向您介绍了他们的方法；)这些模型的实现非常简单直观，并且有了良好的文档，如果您遇到任何麻烦，都可以轻松找到解决方案。伴随着此，以及下面我要阐述的其他一些原因，我决定尝试在PyTorch而非Keras之上构建的fast.ai库。尽管习惯了在Keras上工作，但我发现快速导航并不困难。爱，而且学习曲线也很快就能实现高级功能！

In addition to its simplicity, there are some advantages of using fast.ai’s implementation -

除了简单之外，使用fast.ai的实现还有一些优势-

Discriminative fine-tuning is motivated by the fact that different layers of LM capture different types of information (see discussion above). ULMFiT proposed to tune each layer with different learning rates, {η1,…,ηℓ,…,ηL}, where η is the base learning rate for the first layer, ηℓ is for the ℓ-th layer and there are L layers in total.

区分微调的动机是，LM的不同层捕获不同类型的信息(请参见上面的讨论)。 ULMFiT建议用不同的学习速率{η1，…，ηℓ，…，ηL}来调整每一层，其中η是第一层的基本学习率，ηℓ是第ℓ层，总共有L层。

J(θ) is the gradient of Loss Function with respect to θ(ℓ). η(ℓ) is the learning rate of the ℓ-th layer.J (θ)是损失函数相对于θ(ℓ)的梯度。 η(ℓ)是第layer层的学习率。

Slanted triangular learning rates (STLR) refer to a special learning rate scheduling that first linearly increases the learning rate and then linearly decays it. The increase stage is short so that the model can converge to a parameter space suitable for the task fast, while the decay period is long allowing for better fine-tuning.

斜三角学习率(STLR)是指一种特殊的学习率计划，该计划首先线性增加学习率，然后线性降低它。增加阶段很短，因此模型可以快速收敛到适合任务的参数空间，而衰减周期很长，可以进行更好的微调。

Learning rate increases till 200th iteration and then slowly decays. Howard, Ruder (2018) — Universal Language Model Fine-tuning for Text Classification

Let’s try to see how well this approach works for our dataset. I would also like to point out that all these ideas and code are available at fast.ai’s free official course for Deep Learning.

让我们尝试看看这种方法对我们的数据集的效果如何。我还想指出，所有这些想法和代码都可以在fast.ai的免费深度学习官方官方课程中获得。

加载数据！ (Loading the data!)

Data in fast.ai is taken using TextLMDataBunch. This is very similar to ImageGenerator in Keras, where the path, labels, etc. are provided and the method prepares Train, Test and Validation data depending on the task at hand!

fast.ai中的数据是使用TextLMDataBunch获取的。这与Keras中的ImageGenerator非常相似，其中提供了路径，标签等，并且该方法根据手头的任务准备了Train，Test和Validation数据！

语言模型数据集 (Data Bunch for Language Model)

data_lm = TextLMDataBunch.from_csv(path,'train.csv', text_cols = 3, label_cols = 4)

分类任务的数据束 (Data Bunch for Classification Task)

data_clas = TextClasDataBunch.from_csv(path, 'train.csv', vocab=data_lm.train_ds.vocab, bs=32, text_cols = 3, label_cols = 4)

As discussed in the steps before, we start out first with a language model learner, while basically predicts the next word, given a sequence. Intuitively, this model tries to understand what language and context is. And then we use this model and fine-tune it for our specific task — Sentiment Classification.

正如前面步骤中讨论的那样，我们首先从语言模型学习者入手，基本上根据给定的顺序预测下一个单词。直观地，该模型试图理解什么是语言和上下文。然后，我们使用此模型并针对特定任务(情感分类)对其进行微调。

步骤1.训练语言模型 (Step 1. Training a Language Model)

learn = language_model_learner(data_lm, AWD_LSTM, drop_mult=0.5)learn.fit_one_cycle(1, 1e-2)

By default, we start with a pre-trained model, based on AWD-LSTM architecture. This model is built on top of simple LSTM units but has multiple dropout layers and hyperparameters. Based on the drop_mult argument, we can simultaneously set multiple dropouts within the model. I’ve kept it at 0.5. You can set it higher if you find that this model is overfitting.

默认情况下，我们从基于AWD-LSTM体系结构的预训练模型开始。该模型基于简单的LSTM单元构建，但是具有多个辍学层和超参数。基于drop_mult参数，我们可以同时在模型中设置多个dropout 。我将其保持在0.5。如果发现此模型过度拟合，可以将其设置得更高。

区分性微调 (Discriminative Fine-Tuning)

learn.unfreeze()learn.fit_one_cycle(3, slice(1e-4,1e-2))

learn.unfreeze() makes all the layers of AWD-LSTM trainable. We can set a training rate using slice() function, which trains the last layer at 1e-02, while groups (of layers) in between would have geometrically reducing learning rates. In our case, I’ve specified the learning rate using the slice() method. It basically takes 1e-4 as the learning rate for the inner layer and 1e-2 for the outer layer. Layers in between have geometrically scaled learning rates.

learn.unfreeze()使AWD-LSTM的所有层均可训练。我们可以使用slice()函数设置训练速率，该函数在1e-02训练最后一层，而介于两者之间的(层)组将在几何上降低学习速率。在我们的例子中，我使用slice()方法指定了学习率。内层的学习率基本上是1e-4，外层的学习率是1e-2。两者之间的层具有按几何比例缩放的学习率。

预期的三角学习率 (Slated Triangular Learning Rates)

This can be achieved simply by using fit_one_cycle() method in fast.ai

这可以通过在fast.ai中使用fit_one_cycle()方法简单地实现

逐渐解冻 (Gradual Unfreezing)

Though I’ve not experimented with this here, the idea is pretty simple. In the start, we keep the initial layers of the model as un-trainable, and then we slowly unfreeze earlier layers, as we keep on training. I’ll cover this in detail in next post

尽管我没有在这里进行尝试，但是这个想法很简单。 首先，我们将模型的初始层保持为不可训练，然后在继续训练的同时慢慢解冻较早的层。 我将在下一篇文章中详细介绍

Since, we’ve made a language model, we can actually use it to predict the next few words based on certain input. This can tell if the model has begun to understand our reviews.

由于我们已经建立了语言模型，因此实际上可以根据特定输入使用它来预测接下来的几个单词。这可以判断模型是否已开始理解我们的评论。

You can see that, with just a simple starting input, the model is able to generate realistic reviews. So, this assures that we are in the right direction.

您可以看到，仅需简单的开始输入，该模型就可以生成现实的评论。因此，这可以确保我们朝着正确的方向前进。

learn.save(file = Path('language_model'))learn.save_encoder(Path('language_model_encoder'))

Let’s save this model and we will load it later for classification

保存此模型，稍后我们将其加载以进行分类

步骤2.使用语言模型作为编码器的分类任务 (Step 2. Classification Task using Language Model as encoder)

learn = text_classifier_learner(data_clas, AWD_LSTM, drop_mult=0.5).to_fp16()learn.model_dir = Path('/kaggle/working/')learn.load_encoder('language_model_encoder')

Let’s get started with training. I’m running it in a similar manner. Training only outer layer for 1 epoch, unfreezing the whole network and training for 3 epochs.

让我们开始培训吧。我以类似的方式运行它。仅训练1个时代的外层，解冻整个网络并训练3个时代。

learn.fit_one_cycle(1, 1e-2)learn.unfreeze()learn.fit_one_cycle(3, slice(1e-4, 1e-2))

准确性-90％ (Accuracy — 90%)

With this alone (in just 4 epochs), we are at 90% accuracy! It’s an absolutely amazing result if you consider the amount of effort we’ve put in! Within just a few lines of code and nearly 10 mins of training, we’ve breached the 90% wall.

仅此一项(仅4个纪元)，我们的准确性就达到了90％！如果您考虑我们付出的努力，这绝对是一个惊人的结果！在短短的几行代码和近10分钟的培训中，我们突破了90％的要求。

I hope this was helpful for you as well to get started with NLP and Transfer Learning. I’ll catch you later in the 4th blog of this series, where we take this up a notch and explore transformers!

我希望这对您也对NLP和转学学习有所帮助。我将在本系列的第4个博客中稍后吸引您，在此我们将其提升一个档次并探索变形金刚！

翻译自: https://medium.com/analytics-vidhya/evolution-of-nlp-part-3-transfer-learning-using-ulmfit-267d0a73421e

迁移学习 nlp

查看全文

http://www.taodudu.cc/news/show-863646.html

情感分析朴素贝叶斯_朴素贝叶斯推文的情感分析
梯度下降优化方法'原理_优化梯度下降的新方法
DengAI —数据预处理
k 最近邻_k最近邻与维数的诅咒
使用Pytorch进行密集视频字幕
5g与edge ai_使用OpenVINO部署AI Edge应用
法庭上认可零和博弈的理论吗_从零开始的本征理论
极限学习机和支持向量机_极限学习机I
如何在不亏本的情况下构建道德数据科学系统？
ann人工神经网络_深度学习-人工神经网络（ANN）
唐宇迪机器学习课程数据集_最受欢迎的数据科学和机器学习课程-2020年8月
r中如何求变量的对数转换_对数转换以求阳性。
美团脱颖而出的经验_使数据科学项目脱颖而出的6种方法
aws rds同步_将数据从Python同步到AWS RDS
扫描二维码读取文档_使用深度学习读取和分类扫描的文档
电路分析导论_生存分析导论
强化学习-第3部分
范数在机器学习中的作用_设计在机器学习中的作用
贝叶斯深度神经网络_深度学习为何胜过贝叶斯神经网络
模型监控psi_PSI和CSI：前2个模型监控指标
flask渲染图像_用于图像推荐的Flask应用
pytorch贝叶斯网络_贝叶斯神经网络：2个在TensorFlow和Pytorch中完全连接
稀疏组套索_Python中的稀疏组套索
deepin中zz_如何解决R中的FizzBuzz问题
图像生成对抗生成网络gan_GAN生成汽车图像
生成模型和判别模型_生成模型和判别模型简介
机器学习算法拟合曲线_制定学习曲线以检测机器学习算法中的错误
重拾强化学习的核心概念_强化学习的核心概念
gpt 语言模型_您可以使用语言模型构建的事物的列表-不仅仅是GPT-3
廉价raid_如何查找80行代码中的廉价航班

迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习相关推荐

专业学习与职业发展之我见（二）
一.前言在上篇文章中写道了我对专业学习和职业发展的认识,现在接着这个话题进一步思考. 二.正文 (一)专业学习与职业发展的多种关系 (1)专业学习包容职业发展.在这种情况下,常见有"我的职 ...
IJCAI主席杨强：联邦学习的最新发展及应用
https://www.toutiao.com/i6714524498911035911/ 2019-07-17 15:04:20 "同态加密"的突破使联邦学习成为解决" ...
一文回顾深度学习十年发展
公众号关注 "视学苏案发" 设为 "星标",DLCV消息即可送达! 转自 | 大数据文摘出品来源 | leogao.dev 随着21世纪第二个十年行将结束,我 ...
【深度学习前沿】一文回顾深度学习十年发展
关注上方"深度学习技术前沿",选择"星标公众号", 资源干货,第一时间送达! 转自 | 大数据文摘出品来源 | leogao.dev 随着21世纪第二个十年行 ...
MyCat 学习笔记第十五篇 . 数据分片后的迁移验证
本篇前言前面几篇把 mycat 分片.批量压力测试的功能都验证了一把,这回体验下系统上线前做数据分片规划,上线后若服务器压力过大时做数据迁移的过程. 其实做起来还是比较简单的验证,就是把之前几篇和d ...
深度学习十年发展回顾：里程碑论文汇编
本文转自"大数据文摘" 来源:leogao.dev 随着21世纪第二个十年行将结束,我们有必要回顾一下这十年来在深度学习领域所取得的巨大进步.在性能日益强大的计算机及大数据可用性的 ...
C++程序员学习发展方向分析和指导（C++入门学习指导建议必看）
一路走来,磕磕碰碰,走到现在,历经了千辛万苦,可是路才刚刚开始走,未来还很长,我将会不断的思考和探索. 我想,如果是打算走进C++编程的同志们,请好好看完这篇文章,或许,对你的发展有所启发.但是,不要 ...
【NLP学习】1. 发展历史
开坑NLP,此乃学习笔记. 发展历史篇章重在理解思想和应用,而非公式,因为我数学不好. 第一章. 发展历史 1. 起源在word2vec(word embedding是里程碑)出现之前,有这么一些原 ...
软件产品设计：学习与未来发展
figma官网软件产品设计是现今时代必不可少的一个领域,它是众多IT领域重要组成部分之一.本篇博客将从软件产品设计的定义和意义.我在这个学期所学到的知识和技能,以及软件产品设计未来发展方向进行探讨和 ...

迁移学习 nlp_NLP的发展-第3部分-使用ULMFit进行迁移学习