keras 版本的LRFinder，借鉴 fast.ai Deep Learning course。

前言

学习率lr在神经网络中是最难调的全局参数：设置过大，会导致loss震荡，学习难以收敛；设置过小，那么训练的过程将大大增加。如果，调整一次学习率的周期的训练完一次，那么，训练n次，才能得到n个lr的结果…，导致学习率的选择过程代价太大。。

有多种方法可以为学习速度选择一个好的起点。一个简单的方法是尝试几个不同的值，看看哪个值在不牺牲训练速度的情况下损失最大。我们可以从较大的值开始，比如0.1，然后尝试指数更低的值:0.01、0.001，等等。当我们以较高的学习率开始培训时，损失并没有得到改善，甚至在我们运行前几次迭代培训时可能还会增加。当训练的学习率较小时，在某一点上损失函数的值在前几次迭代中开始减小。这个学习率是我们可以使用的最大值，任何更高的值都不会让训练收敛。甚至这个值也太高了:它还不足以训练多个时代，因为随着时间的推移，网络将需要更细粒度的权重更新。因此，一个合理的开始训练的学习率可能会低1-2个数量级。

学习率的调整困难如下图所示：

如果，只是在训练的一个epoch中，就可以确定最佳的lr选择，那岂不是美哉？

A smarter way

Leslie N. Smith在2015年发表的论文Cyclical Learning Rates for Training Neural Networks第3.3节中描述了一种选择神经网络学习率范围的强大技术。
诀窍是，从一个较低的学习率开始训练一个网络，并以指数级增长每一批的学习率。
如下图所示：

记录每个批次的学习率和培训损失。然后，绘制损失和学习率。通常，它看起来是这样的:

首先，在低学习率的情况下，损失会缓慢地改善，然后训练会加速，直到学习率过大，损失上升:训练过程会出现分歧。

我们需要在图上选择一个损失下降最快的点。在这个例子中，当学习率在0.001到0.01之间时，损失函数下降得很快。

查看这些数字的另一种方法是计算损失的变化率(损失函数对迭代次数的导数)，然后在y轴上绘制变化率，在x轴上绘制学习率。

它看起来太吵了，我们用简单的simple moving average把它弄平。

这看起来更好。在这个图上，我们需要找到最小值。它接近于lr=0.01。

keras 实现

from matplotlib import pyplot as plt
import math
from keras.callbacks import LambdaCallback
import keras.backend as Kclass LRFinder:"""Plots the change of the loss function of a Keras model when the learning rate is exponentially increasing.See for details:https://towardsdatascience.com/estimating-optimal-learning-rate-for-a-deep-neural-network-ce32f2556ce0"""def __init__(self, model):self.model = modelself.losses = []self.lrs = []self.best_loss = 1e9def on_batch_end(self, batch, logs):# Log the learning ratelr = K.get_value(self.model.optimizer.lr)self.lrs.append(lr)# Log the lossloss = logs['loss']self.losses.append(loss)# Check whether the loss got too large or NaNif math.isnan(loss) or loss > self.best_loss * 4:self.model.stop_training = Truereturnif loss < self.best_loss:self.best_loss = loss# Increase the learning rate for the next batchlr *= self.lr_multK.set_value(self.model.optimizer.lr, lr)def find(self, x_train, y_train, start_lr, end_lr, batch_size=64, epochs=1):num_batches = epochs * x_train.shape[0] / batch_sizeself.lr_mult = (float(end_lr) / float(start_lr)) ** (float(1) / float(num_batches))# Save weights into a fileself.model.save_weights('tmp.h5')# Remember the original learning rateoriginal_lr = K.get_value(self.model.optimizer.lr)# Set the initial learning rateK.set_value(self.model.optimizer.lr, start_lr)callback = LambdaCallback(on_batch_end=lambda batch, logs: self.on_batch_end(batch, logs))self.model.fit(x_train, y_train,batch_size=batch_size, epochs=epochs,callbacks=[callback])# Restore the weights to the state before model fittingself.model.load_weights('tmp.h5')# Restore the original learning rateK.set_value(self.model.optimizer.lr, original_lr)def plot_loss(self, n_skip_beginning=10, n_skip_end=5):"""Plots the loss.Parameters:n_skip_beginning - number of batches to skip on the left.n_skip_end - number of batches to skip on the right."""plt.ylabel("loss")plt.xlabel("learning rate (log scale)")plt.plot(self.lrs[n_skip_beginning:-n_skip_end], self.losses[n_skip_beginning:-n_skip_end])plt.xscale('log')def plot_loss_change(self, sma=1, n_skip_beginning=10, n_skip_end=5, y_lim=(-0.01, 0.01)):"""Plots rate of change of the loss function.Parameters:sma - number of batches for simple moving average to smooth out the curve.n_skip_beginning - number of batches to skip on the left.n_skip_end - number of batches to skip on the right.y_lim - limits for the y axis."""assert sma >= 1derivatives = [0] * smafor i in range(sma, len(self.lrs)):derivative = (self.losses[i] - self.losses[i - sma]) / smaderivatives.append(derivative)plt.ylabel("rate of loss change")plt.xlabel("learning rate (log scale)")plt.plot(self.lrs[n_skip_beginning:-n_skip_end], derivatives[n_skip_beginning:-n_skip_end])plt.xscale('log')plt.ylim(y_lim)

可以修改find函数，来适应fit_generator。

    def find(self, aug_gen, start_lr, end_lr, batch_size=600, epochs=1, num_train = 10000):num_batches = epochs * num_train / batch_sizesteps_per_epoch = num_train / batch_size self.lr_mult = (float(end_lr) / float(start_lr)) ** (float(1) / float(num_batches))# Save weights into a fileself.model.save_weights('tmp.h5')# Remember the original learning rateoriginal_lr = K.get_value(self.model.optimizer.lr)# Set the initial learning rateK.set_value(self.model.optimizer.lr, start_lr)callback = LambdaCallback(on_batch_end=lambda batch, logs: self.on_batch_end(batch, logs))self.model.fit_generator(aug_gen,epochs=epochs,steps_per_epoch=steps_per_epoch,callbacks=[callback])# Restore the weights to the state before model fittingself.model.load_weights('tmp.h5')# Restore the original learning rateK.set_value(self.model.optimizer.lr, original_lr)

代码解析

代码主要是使用了keras中的回调函数，LambdaCallback
函数详情：

keras.callbacks.LambdaCallback(on_epoch_begin=None, on_epoch_end=None, on_batch_begin=None, on_batch_end=None, on_train_begin=None, on_train_end=None)

在训练进行中创建简单，自定义的回调函数的回调函数。

这个回调函数和匿名函数在合适的时间被创建。需要注意的是回调函数要求位置型参数，如下：

on_epoch_begin 和 on_epoch_end 要求两个位置型的参数： epoch, logs
on_batch_begin 和 on_batch_end 要求两个位置型的参数： batch, logs
on_train_begin 和 on_train_end 要求一个位置型的参数： logs
参数

on_epoch_begin: 在每轮开始时被调用。
on_epoch_end: 在每轮结束时被调用。
on_batch_begin: 在每批开始时被调用。
on_batch_end: 在每批结束时被调用。
on_train_begin: 在模型训练开始时被调用。
on_train_end: 在模型训练结束时被调用。

例子：

# 在每一个批开始时，打印出批数。
batch_print_callback = LambdaCallback(on_batch_begin=lambda batch,logs: print(batch))

下面是我在 kaggle Histopathologic Cancer Detection做的实验：

代码参考：https://github.com/surmenok/keras_lr_finder
博客参考：https://towardsdatascience.com/estimating-optimal-learning-rate-for-a-deep-neural-network-ce32f2556ce0

【调参】如何为神经网络选择最合适的学习率lr-LRFinder-for-Keras相关推荐

AI：神经网络调参(数据、层数、batch大小，学习率+激活函数+正则化+分类/回归)并进行结果可视化
AI:神经网络调参(数据.层数.batch大小,学习率+激活函数+正则化+分类/回归)并进行结果可视化目录神经网络调参(数据.层数.batch大小,学习率+激活函数+正则化+分类/回归)并进行结果 ...
【调参实战】如何开始你的第一个深度学习调参任务？不妨从图像分类中的学习率入手。...
大家好,欢迎来到专栏<调参实战>,虽然当前自动化调参研究越来越火,但那其实只是换了一些参数来调,对参数的理解和调试在机器学习相关任务中是最基本的素质,在这个专栏中我们会带领大家一步一步理解 ...
深度神经网络调参-学习笔记
深度学习调参一. 建立评价指标建立判别对于网络的评价指标,同个这个指标来判别这个网络的好坏层度,网络也可以有一个或者有多个指标. (1) 最好是一个指标网络最好是由一个指标来决定,如果由多个指标 ...
【调参实战】那些优化方法的性能究竟如何，各自的参数应该如何选择？
大家好,欢迎来到专栏<调参实战>,虽然当前自动化调参研究越来越火,但那其实只是换了一些参数来调,对参数的理解和调试在机器学习相关任务中是最基本的素质,在这个专栏中我们会带领大家一步一步理解 ...
模型调参常见问题及Aadm优化器调参记录
超参数调试.Batch正则化和编程框架参考链接:链接:https://blog.csdn.net/red_stone1/article/details/78403416 1. Tuning Proc ...
【调参实战】BN和Dropout对小模型有什么影响？全局池化相比全连接有什么劣势？...
大家好,欢迎来到专栏<调参实战>,虽然当前自动化调参研究越来越火,但那其实只是换了一些参数来调,对参数的理解和调试在机器学习相关任务中是最基本的素质,在这个专栏中我们会带领大家一步一步理解 ...
mllib调参 spark_从Spark MLlib到美图机器学习框架实践
MLlib 是 Apache Spark 的可扩展机器学习库,旨在简化机器学习的工程实践工作,并方便扩展到更大规模的数据集. 机器学习简介在深入介绍 Spark MLlib 之前先了解机器学习,根据 ...
Auto ML自动调参
Auto ML自动调参本文介绍Auto ML自动调参的算法介绍及操作流程. 操作步骤登录PAI控制台. 单击左侧导航栏的实验并选择某个实验. 本文以雾霾天气预测实验为例. 在实验画布区,单击左上角 ...
【Task5(2天)】模型调参
使用网格搜索法对5个模型进行调优(调参时采用五折交叉验证的方式),并进行模型评估,记得展示代码的运行结果. 时间:2天 1.利用GGridSearchCV调参 1.1参数选择首先选择5个模型要调的参 ...

【调参】如何为神经网络选择最合适的学习率lr-LRFinder-for-Keras

前言

A smarter way

keras 实现

代码解析

【调参】如何为神经网络选择最合适的学习率lr-LRFinder-for-Keras相关推荐

最新文章

热门文章