深度学习之手撕深度神经网络DNN代码(基于numpy)
声明
1)本文仅供学术交流,非商用。所以每一部分具体的参考资料并没有详细对应。如果某部分不小心侵犯了大家的利益,还望海涵,并联系博主删除。
2)博主才疏学浅,文中如有不当之处,请各位指出,共同进步,谢谢。
3)此属于第一版本,若有错误,还需继续修正与增删。还望大家多多指点。大家都共享一点点,一起为祖国科研的推进添砖加瓦。
文章目录
- 声明
- 0、前言
- 1、神经网络步骤
- 2、深度神经网络
- 3、初始化参数
- 4、激活函数
- 5、前向传播
- 6、计算损失
- 7、反向传播
- 8、参数更新
- 9、封装搭建过程(选看)
- Appendix_1、数据读取
- Appendix_2、学习率设置
- 参考文章
0、前言
在之前写过一个手撕代码系列 深度学习之手撕神经网络代码(基于numpy),搭建了感知机和一个隐藏层的神经网络,理解了神经网络的基本结构和传播原理,掌握了如何从零开始手写一个神经网络。但是神经网络和深度学习之所以效果奇佳的一个原因就是,隐藏层多,网络结构深,很久之前一个小伙伴想让我写一个基于numpy的DNN,一直没填坑,今天就来写一下。
1、神经网络步骤
不知道你还记不记得搭建一个神经网络结构的步骤(深度学习之手撕神经网络代码(基于numpy)),大概是六点:
- 构建网络
- 初始化参数
- 迭代优化
- 计算损失
- 反向传播
- 更新参数
简洁地说就是三点,即构建网络、赋值参数、循环计算。
- 首先是确定准备搭建的网络结构是怎么样的(大话卷积神经网络CNN(干货满满)),比如经典的AlexNet,VGGNet等等;
- 然后是对权重w和偏置b进行参数初始化(深度学习入门笔记(十二):权重初始化),比如Xavier初始化,He初始化等等;
- 最后是迭代计算,(深度学习之手撕神经网络代码(基于numpy)),比如前向传播,反向传播等等。
2、深度神经网络
之前写过的一些深度神经网络的理论:
- 深度学习入门笔记(七):深层神经网络
- 深度学习入门笔记(八):深层网络的原理
需要补得童鞋可以看一下,避免后面不懂。这里就简单说两句,底层神经网络提取特征,然后接着卷积池化,再经过神经元的激活和随机失活,从而实现前行传播,计算损失函数,反向传播回调整参数,优化迭代过程。
以一个简单的手写数字识别为例,图例是整个过程:
- 端到端无中间操作,像素值即特征,转换为向量,经过深度神经网络,输出独热编码概率。
- 网络结构包括输入层,输出层和隐藏层,前向传播即,卷积池化并输出下一层,反向传播也是如此,只不过是反着的。
- 得到一张图片,提取其中的像素输入到,GPU加速训练过的神经网络中,输出结果就是分类结果。
3、初始化参数
深度神经网络的隐藏层数量用 layer_dims
表示,这样一共有多少层就不用全部写出来了,更加方便灵活:
def initialize_parameters_deep(layer_dims):# 随机种子np.random.seed(3)parameters = {}# 网络层数L = len(layer_dims)for l in range(1, L):parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))return parameters
上述代码使用的是随机数和归零操作来初始化权重 W 和偏置 b,如果想要别的初始化方式,几种常用的初始化方式的numpy写法如下:
- 截断正态分布初始化
def truncated_normal(mean, std, out_shape):"""Parameters----------mean : float or array_like of floatsThe mean/center of the distributionstd : float or array_like of floatsStandard deviation (spread or "width") of the distribution.out_shape : int or tuple of intsOutput shape. If the given shape is, e.g., ``(m, n, k)``, then``m * n * k`` samples are drawn.Returns-------samples : :py:class:`ndarray <numpy.ndarray>` of shape `out_shape`Samples from the truncated normal distribution parameterized by `mean`and `std`."""samples = np.random.normal(loc=mean, scale=std, size=out_shape)reject = np.logical_or(samples >= mean + 2 * std, samples <= mean - 2 * std)while any(reject.flatten()):resamples = np.random.normal(loc=mean, scale=std, size=reject.sum())samples[reject] = resamplesreject = np.logical_or(samples >= mean + 2 * std, samples <= mean - 2 * std)return samples
- He正态分布初始化
def he_normal(weight_shape):"""Parameters----------weight_shape : tupleThe dimensions of the weight matrix/volume.Returns-------W : :py:class:`ndarray <numpy.ndarray>` of shape `weight_shape`The initialized weights."""fan_in, fan_out = calc_fan(weight_shape)std = np.sqrt(2 / fan_in)return truncated_normal(0, std, weight_shape)
- Glorot正态分布初始化(Xavier)
def glorot_normal(weight_shape, gain=1.0):"""Parameters----------weight_shape : tupleThe dimensions of the weight matrix/volume.Returns-------W : :py:class:`ndarray <numpy.ndarray>` of shape `weight_shape`The initialized weights."""fan_in, fan_out = calc_fan(weight_shape)std = gain * np.sqrt(2 / (fan_in + fan_out))return truncated_normal(0, std, weight_shape)
这三个初始化方式可自行调用,这里就按照最简单的讲解了。
小应用
假设一个输入层大小 3 ,隐藏层大小 3,输出层大小 3 的深度神经网络,然后调用参数初始化函数,输入参数 [3,3,3
,输出如下:
parameters = initialize_parameters_deep([3,3,3])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
4、激活函数
除了经常使用的 sigmoid
激活函数,就是 ReLU
激活函数了,有 Leaky RELU
,ELU
,SELU
,Softplus
等等。
wikipedia列出来的这些函数:
几种常用的激活函数的numpy写法如下:
from abc import ABC, abstractmethod
import numpy as npclass ActivationBase(ABC):def __init__(self, **kwargs):super().__init__()def __call__(self, z):if z.ndim == 1:z = z.reshape(1, -1)return self.fn(z)@abstractmethoddef fn(self, z):raise NotImplementedError@abstractmethoddef grad(self, x, **kwargs):raise NotImplementedErrorclass Sigmoid(ActivationBase):def __init__(self):"""A logistic sigmoid activation function."""super().__init__()def __str__(self):return "Sigmoid"def fn(self, z):"""Evaluate the logistic sigmoid, :math:`\sigma`, on the elements of input `z`."""return 1 / (1 + np.exp(-z))def grad(self, x):"""Evaluate the first derivative of the logistic sigmoid on the elements of `x`."""fn_x = self.fn(x)return fn_x * (1 - fn_x)def grad2(self, x):"""Evaluate the second derivative of the logistic sigmoid on the elements of `x`."""fn_x = self.fn_xreturn fn_x * (1 - fn_x) * (1 - 2 * fn_x)class ReLU(ActivationBase):"""A rectified linear activation function."""def __init__(self):super().__init__()def __str__(self):return "ReLU"def fn(self, z):"""Evaulate the ReLU function on the elements of input `z`."""return np.clip(z, 0, np.inf)def grad(self, x):"""Evaulate the first derivative of the ReLU function on the elements of input `x`."""return (x > 0).astype(int)def grad2(self, x):"""Evaulate the second derivative of the ReLU function on the elements of input `x`."""return np.zeros_like(x)class LeakyReLU(ActivationBase):"""'Leaky' version of a rectified linear unit (ReLU)."""def __init__(self, alpha=0.3):self.alpha = alphasuper().__init__()def __str__(self):return "Leaky ReLU(alpha={})".format(self.alpha)def fn(self, z):"""Evaluate the leaky ReLU function on the elements of input `z`."""_z = z.copy()_z[z < 0] = _z[z < 0] * self.alphareturn _zdef grad(self, x):"""Evaluate the first derivative of the leaky ReLU function on the elementsof input `x`."""out = np.ones_like(x)out[x < 0] *= self.alphareturn outdef grad2(self, x):"""Evaluate the second derivative of the leaky ReLU function on the elements of input `x`."""return np.zeros_like(x)class ELU(ActivationBase):def __init__(self, alpha=1.0):"""An exponential linear unit (ELU).-----Parameters----------alpha : floatSlope of negative segment. Default is 1."""self.alpha = alphasuper().__init__()def __str__(self):return "ELU(alpha={})".format(self.alpha)def fn(self, z):"""Evaluate the ELU activation on the elements of input `z`."""# z if z > 0 else alpha * (e^z - 1)return np.where(z > 0, z, self.alpha * (np.exp(z) - 1))def grad(self, x):"""Evaluate the first derivative of the ELU activation on the elementsof input `x`."""# 1 if x > 0 else alpha * e^(z)return np.where(x > 0, np.ones_like(x), self.alpha * np.exp(x))def grad2(self, x):"""Evaluate the second derivative of the ELU activation on the elementsof input `x`."""# 0 if x > 0 else alpha * e^(z)return np.where(x >= 0, np.zeros_like(x), self.alpha * np.exp(x))class SELU(ActivationBase):"""A scaled exponential linear unit (SELU)."""def __init__(self):self.alpha = 1.6732632423543772848170429916717self.scale = 1.0507009873554804934193349852946self.elu = ELU(alpha=self.alpha)super().__init__()def __str__(self):return "SELU"def fn(self, z):"""Evaluate the SELU activation on the elements of input `z`."""return self.scale * self.elu.fn(z)def grad(self, x):"""Evaluate the first derivative of the SELU activation on the elementsof input `x`."""return np.where(x >= 0, np.ones_like(x) * self.scale, np.exp(x) * self.alpha * self.scale)def grad2(self, x):"""Evaluate the second derivative of the SELU activation on the elementsof input `x`."""return np.where(x > 0, np.zeros_like(x), np.exp(x) * self.alpha * self.scale)class SoftPlus(ActivationBase):def __init__(self):"""A softplus activation function."""super().__init__()def __str__(self):return "SoftPlus"def fn(self, z):"""Evaluate the softplus activation on the elements of input `z`."""return np.log(np.exp(z) + 1)def grad(self, x):"""Evaluate the first derivative of the softplus activation on the elementsof input `x`."""exp_x = np.exp(x)return exp_x / (exp_x + 1)def grad2(self, x):"""Evaluate the second derivative of the softplus activation on the elementsof input `x`."""exp_x = np.exp(x)return exp_x / ((exp_x + 1) ** 2)
5、前向传播
我们这里仅仅使用 sigmoid
和 relu
两种激活函数进行前向传播,代码如下:
def linear_activation_forward(A_prev, W, b, activation):if activation == "sigmoid":Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = sigmoid(Z) elif activation == "relu":Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = relu(Z) assert (A.shape == (W.shape[0], A_prev.shape[1]))cache = (linear_cache, activation_cache) return A, cache
A_prev
是前一步前向计算的结果,W
和 b
分别对应权重和偏置,中间有一个激活函数判断。如果你想更换激活函数,直接替换即可。
对于某一层的前向传播过程如下:
实现过程如下:
def L_model_forward(X, parameters):caches = []A = X 网络层数L = len(parameters) // 2 # 实现[LINEAR -> RELU]*(L-1)for l in range(1, L):A_prev = A A, cache = linear_activation_forward(A_prev, parameters["W"+str(l)], parameters["b"+str(l)], "relu")caches.append(cache) # 实现LINEAR -> SIGMOIDAL, cache = linear_activation_forward(A, parameters["W"+str(L)], parameters["b"+str(L)], "sigmoid")caches.append(cache) assert(AL.shape == (1,X.shape[1])) return AL, caches
6、计算损失
通过前向传播得到结果之后,根据结果去计算损失函数大小。
from abc import ABC, abstractmethod
import numpy as np
import numbersdef is_binary(x):"""Return True if array `x` consists only of binary values"""msg = "Matrix must be binary"assert np.array_equal(x, x.astype(bool)), msgreturn Truedef is_stochastic(X):"""True if `X` contains probabilities that sum to 1 along the columns"""msg = "Array should be stochastic along the columns"assert len(X[X < 0]) == len(X[X > 1]) == 0, msgassert np.allclose(np.sum(X, axis=1), np.ones(X.shape[0])), msgreturn Trueclass OptimizerInitializer(object):def __init__(self, param=None):"""A class for initializing optimizers. Valid inputs are:(a) __str__ representations of `OptimizerBase` instances(b) `OptimizerBase` instances(c) Parameter dicts (e.g., as produced via the `summary` method in`LayerBase` instances)If `param` is `None`, return the SGD optimizer with default parameters."""self.param = paramdef __call__(self):param = self.paramif param is None:opt = SGD()elif isinstance(param, OptimizerBase):opt = paramelif isinstance(param, str):opt = self.init_from_str()elif isinstance(param, dict):opt = self.init_from_dict()return optdef init_from_str(self):r = r"([a-zA-Z]*)=([^,)]*)"opt_str = self.param.lower()kwargs = dict([(i, eval(j)) for (i, j) in re.findall(r, opt_str)])if "sgd" in opt_str:optimizer = SGD(**kwargs)elif "adagrad" in opt_str:optimizer = AdaGrad(**kwargs)elif "rmsprop" in opt_str:optimizer = RMSProp(**kwargs)elif "adam" in opt_str:optimizer = Adam(**kwargs)else:raise NotImplementedError("{}".format(opt_str))return optimizerdef init_from_dict(self):O = self.paramcc = O["cache"] if "cache" in O else Noneop = O["hyperparameters"] if "hyperparameters" in O else Noneif op is None:raise ValueError("Must have `hyperparemeters` key: {}".format(O))if op and op["id"] == "SGD":optimizer = SGD().set_params(op, cc)elif op and op["id"] == "RMSProp":optimizer = RMSProp().set_params(op, cc)elif op and op["id"] == "AdaGrad":optimizer = AdaGrad().set_params(op, cc)elif op and op["id"] == "Adam":optimizer = Adam().set_params(op, cc)elif op:raise NotImplementedError("{}".format(op["id"]))return optimizerclass WeightInitializer(object):def __init__(self, act_fn_str, mode="glorot_uniform"):"""A factory for weight initializers.-----Parameters----------act_fn_str : strThe string representation for the layer activation functionmode : str (default: 'glorot_uniform')The weight initialization strategy. Valid entries are {"he_normal","he_uniform", "glorot_normal", glorot_uniform", "std_normal","trunc_normal"}"""if mode not in ["he_normal","he_uniform","glorot_normal","glorot_uniform","std_normal","trunc_normal",]:raise ValueError("Unrecognize initialization mode: {}".format(mode))self.mode = modeself.act_fn = act_fn_strif mode == "glorot_uniform":self._fn = glorot_uniformelif mode == "glorot_normal":self._fn = glorot_normalelif mode == "he_uniform":self._fn = he_uniformelif mode == "he_normal":self._fn = he_normalelif mode == "std_normal":self._fn = np.random.randnelif mode == "trunc_normal":self._fn = partial(truncated_normal, mean=0, std=1)def __call__(self, weight_shape):if "glorot" in self.mode:gain = self._calc_glorot_gain()W = self._fn(weight_shape, gain)elif self.mode == "std_normal":W = self._fn(*weight_shape)else:W = self._fn(weight_shape)return Wdef _calc_glorot_gain(self):"""Values from:https://pytorch.org/docs/stable/nn.html?#torch.nn.init.calculate_gain"""gain = 1.0act_str = self.act_fn.lower()if act_str == "tanh":gain = 5.0 / 3.0elif act_str == "relu":gain = np.sqrt(2)elif "leaky relu" in act_str:r = r"leaky relu\(alpha=(.*)\)"alpha = re.match(r, act_str).groups()[0]gain = np.sqrt(2 / 1 + float(alpha) ** 2)return gainclass ObjectiveBase(ABC):def __init__(self):super().__init__()@abstractmethoddef loss(self, y_true, y_pred):pass@abstractmethoddef grad(self, y_true, y_pred, **kwargs):passclass SquaredError(ObjectiveBase):def __init__(self):"""A squared-error / `L2` loss."""super().__init__()def __call__(self, y, y_pred):return self.loss(y, y_pred)def __str__(self):return "SquaredError"@staticmethoddef loss(y, y_pred):"""Compute the squared error between `y` and `y_pred`.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Ground truth values for each of `n` examplesy_pred : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Predictions for the `n` examples in the batch.Returns-------loss : floatThe sum of the squared error across dimensions and examples."""return 0.5 * np.linalg.norm(y_pred - y) ** 2@staticmethoddef grad(y, y_pred, z, act_fn):"""Gradient of the squared error loss with respect to the pre-nonlinearityinput, `z`.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Ground truth values for each of `n` examples.y_pred : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Predictions for the `n` examples in the batch.act_fn : :doc:`Activation <numpy_ml.neural_nets.activations>` objectThe activation function for the output layer of the network.Returns-------grad : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)The gradient of the squared error loss with respect to `z`."""return (y_pred - y) * act_fn.grad(z)class CrossEntropy(ObjectiveBase):def __init__(self):"""A cross-entropy loss."""super().__init__()def __call__(self, y, y_pred):return self.loss(y, y_pred)def __str__(self):return "CrossEntropy"@staticmethoddef loss(y, y_pred):"""Compute the cross-entropy (log) loss.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Class labels (one-hot with `m` possible classes) for each of `n`examples.y_pred : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Probabilities of each of `m` classes for the `n` examples in thebatch.Returns-------loss : floatThe sum of the cross-entropy across classes and examples."""is_binary(y)is_stochastic(y_pred)# prevent taking the log of 0eps = np.finfo(float).eps# each example is associated with a single class; sum the negative log# probability of the correct label over all samples in the batch.# observe that we are taking advantage of the fact that y is one-hot# encodedcross_entropy = -np.sum(y * np.log(y_pred + eps))return cross_entropy@staticmethoddef grad(y, y_pred):"""Compute the gradient of the cross entropy loss with regard to thesoftmax input, `z`.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape `(n, m)`A one-hot encoding of the true class labels. Each row constitues atraining example, and each column is a different class.y_pred: :py:class:`ndarray <numpy.ndarray>` of shape `(n, m)`The network predictions for the probability of each of `m` classlabels on each of `n` examples in a batch.Returns-------grad : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)The gradient of the cross-entropy loss with respect to the *input*to the softmax function."""is_binary(y)is_stochastic(y_pred)# derivative of xe wrt z is y_pred - y_true, hence we can just# subtract 1 from the probability of the correct class labelsgrad = y_pred - y# [optional] scale the gradients by the number of examples in the batch# n, m = y.shape# grad /= nreturn
这里用最简单的两个单元计算,函数如下:
def compute_cost(AL, Y):m = Y.shape[1] # Compute loss from aL and y.cost = -np.sum(np.multiply(Y,np.log(AL))+np.multiply(1-Y,np.log(1-AL)))/mcost = np.squeeze(cost) assert(cost.shape == ()) return cost
7、反向传播
反向传播的关键在于链式法则求导,好在上次课有认真推倒过交叉熵的导数,同时以前写过一个 深度学习100问之深入理解Back Propagation(反向传播),如果认真看过的话,反向传播应该不会再有手撕代码的问题。
因而现行反向传播函数代码如下:
def linear_backward(dZ, cache):A_prev, W, b = cachem = A_prev.shape[1]dW = np.dot(dZ, A_prev.T)/mdb = np.sum(dZ, axis=1, keepdims=True)/mdA_prev = np.dot(W.T, dZ) assert (dA_prev.shape == A_prev.shape) assert (dW.shape == W.shape) assert (db.shape == b.shape) return dA_prev, dW, db
还有线性激活反向传播函数如下:
def linear_activation_backward(dA, cache, activation):linear_cache, activation_cache = cache if activation == "relu":dZ = relu_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache) elif activation == "sigmoid":dZ = sigmoid_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache) return dA_prev, dW, db
得到反向传播函数如下:
def L_model_backward(AL, Y, caches):grads = {}L = len(caches) # 层数m = AL.shape[1]Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL# 初始化backpropagationdAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) current_cache = caches[L-1]grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, "sigmoid") for l in reversed(range(L - 1)):current_cache = caches[l]dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache, "relu")grads["dA" + str(l + 1)] = dA_prev_tempgrads["dW" + str(l + 1)] = dW_tempgrads["db" + str(l + 1)] = db_temp return grads
建议认真看一看反向传播,尤其是交叉熵的求导,堪称是复合函数求导的极致,如果能推倒明白了,基本上就没啥大问题了。
8、参数更新
反向传播之后,就是参数更新了,函数如下:
def update_parameters(parameters, grads, learning_rate):# number of layers in the neural networkL = len(parameters) // 2 # Update rule for each parameter. Use a for loop.for l in range(L):parameters["W" + str(l+1)] = parameters["W"+str(l+1)] - learning_rate*grads["dW"+str(l+1)]parameters["b" + str(l+1)] = parameters["b"+str(l+1)] - learning_rate*grads["db"+str(l+1)] return parameters
9、封装搭建过程(选看)
到这来,一个DNN就已经全部搭建完成了,和上一节类似,肯定想对这些函数进行一下简单的封装,代码如下:
def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):np.random.seed(1)costs = [] # 参数初始化parameters = initialize_parameters_deep(layers_dims) # 循环迭代for i in range(0, num_iterations): # 前向传播: # [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOIDAL, caches = L_model_forward(X, parameters) # 计算损失cost = compute_cost(AL, Y) # 反向传播grads = L_model_backward(AL, Y, caches) # 参数更新parameters = update_parameters(parameters, grads, learning_rate) # 每训练100个样本打印一次损失if print_cost and i % 100 == 0: print ("Cost after iteration %i: %f" %(i, cost))costs.append(cost) # 给损失画图plt.plot(np.squeeze(costs))plt.ylabel('cost')plt.xlabel('iterations (per tens)')plt.title("Learning rate =" + str(learning_rate))plt.show() return parameters
到这来一个深度神经网络就已经完整地搭建完毕了,原代码是吴恩达深度学习课程的代码,可以GitHub寻找一下,如果实在找不到也可以留言。
---手动分割线---
下面是一些其他结构的numpy代码:
Appendix_1、数据读取
minibatch
可以提高算法的运行速度,同时增加训练过程中的随机性。
蓝色是 minibatch
,紫色是 fullbatch
。
def minibatch(X, batchsize=256, shuffle=True):"""Compute the minibatch indices for a training dataset.Parameters----------X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, \*)`The dataset to divide into minibatches. Assumes the first dimensionrepresents the number of training examples.batchsize : intThe desired size of each minibatch. Note, however, that if ``X.shape[0] %batchsize > 0`` then the final batch will contain fewer than batchsizeentries. Default is 256.shuffle : boolWhether to shuffle the entries in the dataset before dividing intominibatches. Default is True.Returns-------mb_generator : generatorA generator which yields the indices into X for each batchn_batches: intThe number of batches"""N = X.shape[0]ix = np.arange(N)n_batches = int(np.ceil(N / batchsize))if shuffle:np.random.shuffle(ix)def mb_generator():for i in range(n_batches):yield ix[i * batchsize : (i + 1) * batchsize]return mb_generator(), n_batches
Appendix_2、学习率设置
学习率与迭代下降的速度有关,当学习率设置的过小时,收敛过程将变得十分缓慢;而当学习率设置的过大时,梯度可能会在最小值附近来回震荡,甚至可能无法收敛。
学习率衰减的效果如下:
可以由上图看出,固定学习率时,当收敛时,会在最优值附近一个较大的区域内摆动;而学习率衰减,当收敛时,会在最优值附近一个更小的区域内摆动。
from copy import deepcopy
from abc import ABC, abstractmethodimport numpy as npfrom math import erfdef gaussian_cdf(x, mean, var):"""Compute the probability that a random draw from a 1D Gaussian with mean`mean` and variance `var` is less than or equal to `x`."""eps = np.finfo(float).epsx_scaled = (x - mean) / np.sqrt(var + eps)return (1 + erf(x_scaled / np.sqrt(2))) / 2class SchedulerBase(ABC):def __init__(self):"""Abstract base class for all Scheduler objects."""self.hyperparameters = {}def __call__(self, step=None, cur_loss=None):return self.learning_rate(step=step, cur_loss=cur_loss)def copy(self):"""Return a copy of the current object."""return deepcopy(self)def set_params(self, hparam_dict):"""Set the scheduler hyperparameters from a dictionary."""if hparam_dict is not None:for k, v in hparam_dict.items():if k in self.hyperparameters:self.hyperparameters[k] = v@abstractmethoddef learning_rate(self, step=None):raise NotImplementedErrorclass ConstantScheduler(SchedulerBase):def __init__(self, lr=0.01, **kwargs):"""Returns a fixed learning rate, regardless of the current step.Parameters----------initial_lr : floatThe learning rate. Default is 0.01"""super().__init__()self.lr = lrself.hyperparameters = {"id": "ConstantScheduler", "lr": self.lr}def __str__(self):return "ConstantScheduler(lr={})".format(self.lr)def learning_rate(self, **kwargs):"""Return the current learning rate.Returns-------lr : floatThe learning rate"""return self.lrclass ExponentialScheduler(SchedulerBase):def __init__(self, initial_lr=0.01, stage_length=500, staircase=False, decay=0.1, **kwargs):"""An exponential learning rate scheduler.---Parameters----------initial_lr : floatThe learning rate at the first step. Default is 0.01.stage_length : intThe length of each stage, in steps. Default is 500.staircase : boolIf True, only adjusts the learning rate at the stage transitions,producing a step-like decay schedule. If False, adjusts thelearning rate after each step, creating a smooth decay schedule.Default is False.decay : floatThe amount to decay the learning rate at each new stage. Default is0.1."""super().__init__()self.decay = decayself.staircase = staircaseself.initial_lr = initial_lrself.stage_length = stage_lengthself.hyperparameters = {"id": "StepScheduler","decay": self.decay,"staircase": self.staircase,"initial_lr": self.initial_lr,"stage_length": self.stage_length,}def __str__(self):return "ExponentialScheduler(initial_lr={}, stage_length={}, staircase={}, decay={})".format(self.initial_lr, self.stage_length, self.staircase, self.decay)def learning_rate(self, step, **kwargs):"""Return the current learning rate as a function of `step`.Parameters----------step : intThe current step number.Returns-------lr : floatThe learning rate for the current step."""cur_stage = step / self.stage_lengthif self.staircase:cur_stage = np.floor(cur_stage)return self.initial_lr * self.decay ** cur_stage
如果想要更多的资源,欢迎关注 @我是管小亮,文字强迫症MAX~
回复【福利】即可获取我为你准备的大礼,包括C++,编程四大件,NLP,深度学习等等的资料。
想看更多文(段)章(子),欢迎关注微信公众号「程序员管小亮」~
参考文章
- https://www.coursera.org/learn/machine-learning
- https://www.deeplearning.ai/
- 深度学习之手撕神经网络代码(基于numpy)
- 深度神经网络原理与实践
- 深度学习笔记3:手动搭建深度神经网络(DNN)
- https://github.com/ddbourgin/numpy-ml
深度学习之手撕深度神经网络DNN代码(基于numpy)相关推荐
- 【深度学习】手撕 CNN 之 AlexNet(PyTorch 实战篇)
今天我们将使用 PyTorch 来复现AlexNet网络,并用AlexNet模型来解决一个经典的Kaggle图像识别比赛问题. 正文开始! 1. 数据集制作 在论文中AlexNet作者使用的是ILSV ...
- 深度学习二 —— 手撕激活函数(阶跃函数、sigmoid、tanh、ReLu、Leaky ReLu)
文章目录 手撕激活函数 1. 阶跃函数 公式 代码 2. sigmoid 公式 代码 3. 阶跃函数 与 sigmoid函数比较 相同点 不同点 4. tanh 函数 公式 代码 5. sigmoid ...
- 基于深度学习的手写数字识别、python实现
基于深度学习的手写数字识别.python实现 一.what is 深度学习 二.加深层可以减少网络的参数数量 三.深度学习的手写数字识别 一.what is 深度学习 深度学习是加深了层的深度神经网络 ...
- Python基于深度学习的手写数字识别
Python基于深度学习的手写数字识别 1.代码的功能和运行方法 2. 网络设计 3.训练方法 4.实验结果分析 5.结论 1.代码的功能和运行方法 代码可以实现任意数字0-9的识别,只需要将图片载入 ...
- DL:基于神经网络的深度学习模型的总概览简介(DNN/CNN/RNN等)、各种网络结构对比、案例应用对比之详细攻略
DL:基于神经网络的深度学习模型的总概览简介(DNN/CNN/RNN等).各种网络结构对比.案例应用对比之详细攻略 目录 神经网络所有模型的简介(概览) DNN.CNN.RNN结构对比 相关文章 DL ...
- 手撕深度学习框架,原理很简单
2020国内深度学习框架领域百花齐放.各大公司也都陆续推出了自己的框架,大大推动了深度学习的发展.深度学习俨然已经渗入到我们生活中的每个角落,给生活带来极大便利. 深度学习能够针对生产生活所面临的复杂 ...
- 【深度学习】手写数字识别Tensorflow2实验报告
实验一:手写数字识别 一.实验目的 利用深度学习实现手写数字识别,当输入一张手写图片后,能够准确的识别出该图片中数字是几.输出内容是0.1.2.3.4.5.6.7.8.9的其中一个. 二.实验原理 ( ...
- 基于深度学习的手写数字识别算法Python实现
摘 要 深度学习是传统机器学习下的一个分支,得益于近些年来计算机硬件计算能力质的飞跃,使得深度学习成为了当下热门之一.手写数字识别更是深度学习入门的经典案例,学习和理解其背后的原理对于深度学习的理解有 ...
- 【深度学习 论文综述】深度神经网络全面概述:从基本概念到实际模型和硬件基础
本文转载自:深度神经网络全面概述:从基本概念到实际模型和硬件基础 本文旨在提供一个关于实现 DNN 的有效处理(efficient processing)的目标的最新进展的全面性教程和调查. 作者:机 ...
最新文章
- Bootstrap3基础 btn-group-vertical 按钮组(横着、竖着排列)
- 论文《一种金融市场预测的深度学习模型: FEPA》(4)----金融市场预测的架构--常见的数据降为方法
- Android 开发利用wifi调试
- Struts2入门这一篇就够了
- (转载)解决macOS /usr/include 无法写入或者不存在的问题
- Erlang列表操作里面的变量绑定规则
- 有关 Conversion to Dalvik format failed with error 1
- Qt工作笔记-发送端发送Json格式的数据包,接收端解析数据包
- 动态对象泛型数组绑定控件 0107
- 在iOS中实现一个简单的画板App
- RESTful学习笔记
- 机器学习面试-Libsvm
- 别人统一四大力学,吾统一四大布局方向
- 缓存key生成策略的一些思考
- centos刻录工具_带来一篇CentOS下刻录光盘
- 华为2019年4月10日实习生笔试题
- 手机变速齿轮_变速齿轮手机版下载-变速齿轮手机版安卓下载v1.2
- 裸设备和Oracle问答20例
- jquery对cookie进行读取、写入和删除
- PaddleFL 1.2.0版本 使用docker镜像编译流程(Ubuntu 20.04)