声明

1)本文仅供学术交流,非商用。所以每一部分具体的参考资料并没有详细对应。如果某部分不小心侵犯了大家的利益,还望海涵,并联系博主删除。
2)博主才疏学浅,文中如有不当之处,请各位指出,共同进步,谢谢。
3)此属于第一版本,若有错误,还需继续修正与增删。还望大家多多指点。大家都共享一点点,一起为祖国科研的推进添砖加瓦。

文章目录

  • 声明
    • 0、前言
    • 1、神经网络步骤
    • 2、深度神经网络
    • 3、初始化参数
    • 4、激活函数
    • 5、前向传播
    • 6、计算损失
    • 7、反向传播
    • 8、参数更新
    • 9、封装搭建过程(选看)
    • Appendix_1、数据读取
    • Appendix_2、学习率设置
  • 参考文章

0、前言

在之前写过一个手撕代码系列 深度学习之手撕神经网络代码(基于numpy),搭建了感知机和一个隐藏层的神经网络,理解了神经网络的基本结构和传播原理,掌握了如何从零开始手写一个神经网络。但是神经网络和深度学习之所以效果奇佳的一个原因就是,隐藏层多,网络结构深,很久之前一个小伙伴想让我写一个基于numpy的DNN,一直没填坑,今天就来写一下。

1、神经网络步骤

不知道你还记不记得搭建一个神经网络结构的步骤(深度学习之手撕神经网络代码(基于numpy)),大概是六点:

  • 构建网络
  • 初始化参数
  • 迭代优化
  • 计算损失
  • 反向传播
  • 更新参数

简洁地说就是三点,即构建网络、赋值参数、循环计算。

  • 首先是确定准备搭建的网络结构是怎么样的(大话卷积神经网络CNN(干货满满)),比如经典的AlexNet,VGGNet等等;
  • 然后是对权重w和偏置b进行参数初始化(深度学习入门笔记(十二):权重初始化),比如Xavier初始化,He初始化等等;
  • 最后是迭代计算,(深度学习之手撕神经网络代码(基于numpy)),比如前向传播,反向传播等等。

2、深度神经网络

之前写过的一些深度神经网络的理论:

  • 深度学习入门笔记(七):深层神经网络
  • 深度学习入门笔记(八):深层网络的原理

需要补得童鞋可以看一下,避免后面不懂。这里就简单说两句,底层神经网络提取特征,然后接着卷积池化,再经过神经元的激活和随机失活,从而实现前行传播,计算损失函数,反向传播回调整参数,优化迭代过程。

以一个简单的手写数字识别为例,图例是整个过程:

  • 端到端无中间操作,像素值即特征,转换为向量,经过深度神经网络,输出独热编码概率。

  • 网络结构包括输入层,输出层和隐藏层,前向传播即,卷积池化并输出下一层,反向传播也是如此,只不过是反着的。

  • 得到一张图片,提取其中的像素输入到,GPU加速训练过的神经网络中,输出结果就是分类结果。

3、初始化参数

深度神经网络的隐藏层数量用 layer_dims 表示,这样一共有多少层就不用全部写出来了,更加方便灵活:

def initialize_parameters_deep(layer_dims):# 随机种子np.random.seed(3)parameters = {}# 网络层数L = len(layer_dims)for l in range(1, L):parameters['W' + str(l)] = np.random.randn(layer_dims[l], layer_dims[l-1])*0.01parameters['b' + str(l)] = np.zeros((layer_dims[l], 1))assert(parameters['W' + str(l)].shape == (layer_dims[l], layer_dims[l-1]))assert(parameters['b' + str(l)].shape == (layer_dims[l], 1))return parameters

上述代码使用的是随机数和归零操作来初始化权重 W 和偏置 b,如果想要别的初始化方式,几种常用的初始化方式的numpy写法如下:

  • 截断正态分布初始化
def truncated_normal(mean, std, out_shape):"""Parameters----------mean : float or array_like of floatsThe mean/center of the distributionstd : float or array_like of floatsStandard deviation (spread or "width") of the distribution.out_shape : int or tuple of intsOutput shape.  If the given shape is, e.g., ``(m, n, k)``, then``m * n * k`` samples are drawn.Returns-------samples : :py:class:`ndarray <numpy.ndarray>` of shape `out_shape`Samples from the truncated normal distribution parameterized by `mean`and `std`."""samples = np.random.normal(loc=mean, scale=std, size=out_shape)reject = np.logical_or(samples >= mean + 2 * std, samples <= mean - 2 * std)while any(reject.flatten()):resamples = np.random.normal(loc=mean, scale=std, size=reject.sum())samples[reject] = resamplesreject = np.logical_or(samples >= mean + 2 * std, samples <= mean - 2 * std)return samples
  • He正态分布初始化
def he_normal(weight_shape):"""Parameters----------weight_shape : tupleThe dimensions of the weight matrix/volume.Returns-------W : :py:class:`ndarray <numpy.ndarray>` of shape `weight_shape`The initialized weights."""fan_in, fan_out = calc_fan(weight_shape)std = np.sqrt(2 / fan_in)return truncated_normal(0, std, weight_shape)
  • Glorot正态分布初始化(Xavier)
def glorot_normal(weight_shape, gain=1.0):"""Parameters----------weight_shape : tupleThe dimensions of the weight matrix/volume.Returns-------W : :py:class:`ndarray <numpy.ndarray>` of shape `weight_shape`The initialized weights."""fan_in, fan_out = calc_fan(weight_shape)std = gain * np.sqrt(2 / (fan_in + fan_out))return truncated_normal(0, std, weight_shape)

这三个初始化方式可自行调用,这里就按照最简单的讲解了。

小应用

假设一个输入层大小 3 ,隐藏层大小 3,输出层大小 3 的深度神经网络,然后调用参数初始化函数,输入参数 [3,3,3,输出如下:

parameters = initialize_parameters_deep([3,3,3])
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))

4、激活函数

除了经常使用的 sigmoid 激活函数,就是 ReLU 激活函数了,有 Leaky RELUELUSELUSoftplus 等等。

wikipedia列出来的这些函数:

几种常用的激活函数的numpy写法如下:

from abc import ABC, abstractmethod
import numpy as npclass ActivationBase(ABC):def __init__(self, **kwargs):super().__init__()def __call__(self, z):if z.ndim == 1:z = z.reshape(1, -1)return self.fn(z)@abstractmethoddef fn(self, z):raise NotImplementedError@abstractmethoddef grad(self, x, **kwargs):raise NotImplementedErrorclass Sigmoid(ActivationBase):def __init__(self):"""A logistic sigmoid activation function."""super().__init__()def __str__(self):return "Sigmoid"def fn(self, z):"""Evaluate the logistic sigmoid, :math:`\sigma`, on the elements of input `z`."""return 1 / (1 + np.exp(-z))def grad(self, x):"""Evaluate the first derivative of the logistic sigmoid on the elements of `x`."""fn_x = self.fn(x)return fn_x * (1 - fn_x)def grad2(self, x):"""Evaluate the second derivative of the logistic sigmoid on the elements of `x`."""fn_x = self.fn_xreturn fn_x * (1 - fn_x) * (1 - 2 * fn_x)class ReLU(ActivationBase):"""A rectified linear activation function."""def __init__(self):super().__init__()def __str__(self):return "ReLU"def fn(self, z):"""Evaulate the ReLU function on the elements of input `z`."""return np.clip(z, 0, np.inf)def grad(self, x):"""Evaulate the first derivative of the ReLU function on the elements of input `x`."""return (x > 0).astype(int)def grad2(self, x):"""Evaulate the second derivative of the ReLU function on the elements of input `x`."""return np.zeros_like(x)class LeakyReLU(ActivationBase):"""'Leaky' version of a rectified linear unit (ReLU)."""def __init__(self, alpha=0.3):self.alpha = alphasuper().__init__()def __str__(self):return "Leaky ReLU(alpha={})".format(self.alpha)def fn(self, z):"""Evaluate the leaky ReLU function on the elements of input `z`."""_z = z.copy()_z[z < 0] = _z[z < 0] * self.alphareturn _zdef grad(self, x):"""Evaluate the first derivative of the leaky ReLU function on the elementsof input `x`."""out = np.ones_like(x)out[x < 0] *= self.alphareturn outdef grad2(self, x):"""Evaluate the second derivative of the leaky ReLU function on the elements of input `x`."""return np.zeros_like(x)class ELU(ActivationBase):def __init__(self, alpha=1.0):"""An exponential linear unit (ELU).-----Parameters----------alpha : floatSlope of negative segment. Default is 1."""self.alpha = alphasuper().__init__()def __str__(self):return "ELU(alpha={})".format(self.alpha)def fn(self, z):"""Evaluate the ELU activation on the elements of input `z`."""# z if z > 0  else alpha * (e^z - 1)return np.where(z > 0, z, self.alpha * (np.exp(z) - 1))def grad(self, x):"""Evaluate the first derivative of the ELU activation on the elementsof input `x`."""# 1 if x > 0 else alpha * e^(z)return np.where(x > 0, np.ones_like(x), self.alpha * np.exp(x))def grad2(self, x):"""Evaluate the second derivative of the ELU activation on the elementsof input `x`."""# 0 if x > 0 else alpha * e^(z)return np.where(x >= 0, np.zeros_like(x), self.alpha * np.exp(x))class SELU(ActivationBase):"""A scaled exponential linear unit (SELU)."""def __init__(self):self.alpha = 1.6732632423543772848170429916717self.scale = 1.0507009873554804934193349852946self.elu = ELU(alpha=self.alpha)super().__init__()def __str__(self):return "SELU"def fn(self, z):"""Evaluate the SELU activation on the elements of input `z`."""return self.scale * self.elu.fn(z)def grad(self, x):"""Evaluate the first derivative of the SELU activation on the elementsof input `x`."""return np.where(x >= 0, np.ones_like(x) * self.scale, np.exp(x) * self.alpha * self.scale)def grad2(self, x):"""Evaluate the second derivative of the SELU activation on the elementsof input `x`."""return np.where(x > 0, np.zeros_like(x), np.exp(x) * self.alpha * self.scale)class SoftPlus(ActivationBase):def __init__(self):"""A softplus activation function."""super().__init__()def __str__(self):return "SoftPlus"def fn(self, z):"""Evaluate the softplus activation on the elements of input `z`."""return np.log(np.exp(z) + 1)def grad(self, x):"""Evaluate the first derivative of the softplus activation on the elementsof input `x`."""exp_x = np.exp(x)return exp_x / (exp_x + 1)def grad2(self, x):"""Evaluate the second derivative of the softplus activation on the elementsof input `x`."""exp_x = np.exp(x)return exp_x / ((exp_x + 1) ** 2)

5、前向传播

我们这里仅仅使用 sigmoidrelu 两种激活函数进行前向传播,代码如下:

def linear_activation_forward(A_prev, W, b, activation):if activation == "sigmoid":Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = sigmoid(Z)    elif activation == "relu":Z, linear_cache = linear_forward(A_prev, W, b)A, activation_cache = relu(Z)   assert (A.shape == (W.shape[0], A_prev.shape[1]))cache = (linear_cache, activation_cache)    return A, cache

A_prev 是前一步前向计算的结果,Wb 分别对应权重和偏置,中间有一个激活函数判断。如果你想更换激活函数,直接替换即可。

对于某一层的前向传播过程如下:

实现过程如下:

def L_model_forward(X, parameters):caches = []A = X    网络层数L = len(parameters) // 2                 # 实现[LINEAR -> RELU]*(L-1)for l in range(1, L):A_prev = A A, cache = linear_activation_forward(A_prev, parameters["W"+str(l)], parameters["b"+str(l)], "relu")caches.append(cache)    # 实现LINEAR -> SIGMOIDAL, cache = linear_activation_forward(A, parameters["W"+str(L)], parameters["b"+str(L)], "sigmoid")caches.append(cache)    assert(AL.shape == (1,X.shape[1]))    return AL, caches

6、计算损失

通过前向传播得到结果之后,根据结果去计算损失函数大小。

from abc import ABC, abstractmethod
import numpy as np
import numbersdef is_binary(x):"""Return True if array `x` consists only of binary values"""msg = "Matrix must be binary"assert np.array_equal(x, x.astype(bool)), msgreturn Truedef is_stochastic(X):"""True if `X` contains probabilities that sum to 1 along the columns"""msg = "Array should be stochastic along the columns"assert len(X[X < 0]) == len(X[X > 1]) == 0, msgassert np.allclose(np.sum(X, axis=1), np.ones(X.shape[0])), msgreturn Trueclass OptimizerInitializer(object):def __init__(self, param=None):"""A class for initializing optimizers. Valid inputs are:(a) __str__ representations of `OptimizerBase` instances(b) `OptimizerBase` instances(c) Parameter dicts (e.g., as produced via the `summary` method in`LayerBase` instances)If `param` is `None`, return the SGD optimizer with default parameters."""self.param = paramdef __call__(self):param = self.paramif param is None:opt = SGD()elif isinstance(param, OptimizerBase):opt = paramelif isinstance(param, str):opt = self.init_from_str()elif isinstance(param, dict):opt = self.init_from_dict()return optdef init_from_str(self):r = r"([a-zA-Z]*)=([^,)]*)"opt_str = self.param.lower()kwargs = dict([(i, eval(j)) for (i, j) in re.findall(r, opt_str)])if "sgd" in opt_str:optimizer = SGD(**kwargs)elif "adagrad" in opt_str:optimizer = AdaGrad(**kwargs)elif "rmsprop" in opt_str:optimizer = RMSProp(**kwargs)elif "adam" in opt_str:optimizer = Adam(**kwargs)else:raise NotImplementedError("{}".format(opt_str))return optimizerdef init_from_dict(self):O = self.paramcc = O["cache"] if "cache" in O else Noneop = O["hyperparameters"] if "hyperparameters" in O else Noneif op is None:raise ValueError("Must have `hyperparemeters` key: {}".format(O))if op and op["id"] == "SGD":optimizer = SGD().set_params(op, cc)elif op and op["id"] == "RMSProp":optimizer = RMSProp().set_params(op, cc)elif op and op["id"] == "AdaGrad":optimizer = AdaGrad().set_params(op, cc)elif op and op["id"] == "Adam":optimizer = Adam().set_params(op, cc)elif op:raise NotImplementedError("{}".format(op["id"]))return optimizerclass WeightInitializer(object):def __init__(self, act_fn_str, mode="glorot_uniform"):"""A factory for weight initializers.-----Parameters----------act_fn_str : strThe string representation for the layer activation functionmode : str (default: 'glorot_uniform')The weight initialization strategy. Valid entries are {"he_normal","he_uniform", "glorot_normal", glorot_uniform", "std_normal","trunc_normal"}"""if mode not in ["he_normal","he_uniform","glorot_normal","glorot_uniform","std_normal","trunc_normal",]:raise ValueError("Unrecognize initialization mode: {}".format(mode))self.mode = modeself.act_fn = act_fn_strif mode == "glorot_uniform":self._fn = glorot_uniformelif mode == "glorot_normal":self._fn = glorot_normalelif mode == "he_uniform":self._fn = he_uniformelif mode == "he_normal":self._fn = he_normalelif mode == "std_normal":self._fn = np.random.randnelif mode == "trunc_normal":self._fn = partial(truncated_normal, mean=0, std=1)def __call__(self, weight_shape):if "glorot" in self.mode:gain = self._calc_glorot_gain()W = self._fn(weight_shape, gain)elif self.mode == "std_normal":W = self._fn(*weight_shape)else:W = self._fn(weight_shape)return Wdef _calc_glorot_gain(self):"""Values from:https://pytorch.org/docs/stable/nn.html?#torch.nn.init.calculate_gain"""gain = 1.0act_str = self.act_fn.lower()if act_str == "tanh":gain = 5.0 / 3.0elif act_str == "relu":gain = np.sqrt(2)elif "leaky relu" in act_str:r = r"leaky relu\(alpha=(.*)\)"alpha = re.match(r, act_str).groups()[0]gain = np.sqrt(2 / 1 + float(alpha) ** 2)return gainclass ObjectiveBase(ABC):def __init__(self):super().__init__()@abstractmethoddef loss(self, y_true, y_pred):pass@abstractmethoddef grad(self, y_true, y_pred, **kwargs):passclass SquaredError(ObjectiveBase):def __init__(self):"""A squared-error / `L2` loss."""super().__init__()def __call__(self, y, y_pred):return self.loss(y, y_pred)def __str__(self):return "SquaredError"@staticmethoddef loss(y, y_pred):"""Compute the squared error between `y` and `y_pred`.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Ground truth values for each of `n` examplesy_pred : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Predictions for the `n` examples in the batch.Returns-------loss : floatThe sum of the squared error across dimensions and examples."""return 0.5 * np.linalg.norm(y_pred - y) ** 2@staticmethoddef grad(y, y_pred, z, act_fn):"""Gradient of the squared error loss with respect to the pre-nonlinearityinput, `z`.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Ground truth values for each of `n` examples.y_pred : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Predictions for the `n` examples in the batch.act_fn : :doc:`Activation <numpy_ml.neural_nets.activations>` objectThe activation function for the output layer of the network.Returns-------grad : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)The gradient of the squared error loss with respect to `z`."""return (y_pred - y) * act_fn.grad(z)class CrossEntropy(ObjectiveBase):def __init__(self):"""A cross-entropy loss."""super().__init__()def __call__(self, y, y_pred):return self.loss(y, y_pred)def __str__(self):return "CrossEntropy"@staticmethoddef loss(y, y_pred):"""Compute the cross-entropy (log) loss.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Class labels (one-hot with `m` possible classes) for each of `n`examples.y_pred : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)Probabilities of each of `m` classes for the `n` examples in thebatch.Returns-------loss : floatThe sum of the cross-entropy across classes and examples."""is_binary(y)is_stochastic(y_pred)# prevent taking the log of 0eps = np.finfo(float).eps# each example is associated with a single class; sum the negative log# probability of the correct label over all samples in the batch.# observe that we are taking advantage of the fact that y is one-hot# encodedcross_entropy = -np.sum(y * np.log(y_pred + eps))return cross_entropy@staticmethoddef grad(y, y_pred):"""Compute the gradient of the cross entropy loss with regard to thesoftmax input, `z`.-----Parameters----------y : :py:class:`ndarray <numpy.ndarray>` of shape `(n, m)`A one-hot encoding of the true class labels. Each row constitues atraining example, and each column is a different class.y_pred: :py:class:`ndarray <numpy.ndarray>` of shape `(n, m)`The network predictions for the probability of each of `m` classlabels on each of `n` examples in a batch.Returns-------grad : :py:class:`ndarray <numpy.ndarray>` of shape (n, m)The gradient of the cross-entropy loss with respect to the *input*to the softmax function."""is_binary(y)is_stochastic(y_pred)# derivative of xe wrt z is y_pred - y_true, hence we can just# subtract 1 from the probability of the correct class labelsgrad = y_pred - y# [optional] scale the gradients by the number of examples in the batch# n, m = y.shape# grad /= nreturn

这里用最简单的两个单元计算,函数如下:

def compute_cost(AL, Y):m = Y.shape[1]    # Compute loss from aL and y.cost = -np.sum(np.multiply(Y,np.log(AL))+np.multiply(1-Y,np.log(1-AL)))/mcost = np.squeeze(cost)  assert(cost.shape == ())    return cost

7、反向传播

反向传播的关键在于链式法则求导,好在上次课有认真推倒过交叉熵的导数,同时以前写过一个 深度学习100问之深入理解Back Propagation(反向传播),如果认真看过的话,反向传播应该不会再有手撕代码的问题。


因而现行反向传播函数代码如下:

def linear_backward(dZ, cache):A_prev, W, b = cachem = A_prev.shape[1]dW = np.dot(dZ, A_prev.T)/mdb = np.sum(dZ, axis=1, keepdims=True)/mdA_prev = np.dot(W.T, dZ)    assert (dA_prev.shape == A_prev.shape)    assert (dW.shape == W.shape)    assert (db.shape == b.shape)    return dA_prev, dW, db

还有线性激活反向传播函数如下:

def linear_activation_backward(dA, cache, activation):linear_cache, activation_cache = cache    if activation == "relu":dZ = relu_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)    elif activation == "sigmoid":dZ = sigmoid_backward(dA, activation_cache)dA_prev, dW, db = linear_backward(dZ, linear_cache)    return dA_prev, dW, db

得到反向传播函数如下:

def L_model_backward(AL, Y, caches):grads = {}L = len(caches) # 层数m = AL.shape[1]Y = Y.reshape(AL.shape) # after this line, Y is the same shape as AL# 初始化backpropagationdAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))    current_cache = caches[L-1]grads["dA" + str(L)], grads["dW" + str(L)], grads["db" + str(L)] = linear_activation_backward(dAL, current_cache, "sigmoid")    for l in reversed(range(L - 1)):current_cache = caches[l]dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads["dA" + str(l + 2)], current_cache, "relu")grads["dA" + str(l + 1)] = dA_prev_tempgrads["dW" + str(l + 1)] = dW_tempgrads["db" + str(l + 1)] = db_temp    return grads

建议认真看一看反向传播,尤其是交叉熵的求导,堪称是复合函数求导的极致,如果能推倒明白了,基本上就没啥大问题了。

8、参数更新

反向传播之后,就是参数更新了,函数如下:

def update_parameters(parameters, grads, learning_rate):# number of layers in the neural networkL = len(parameters) // 2 # Update rule for each parameter. Use a for loop.for l in range(L):parameters["W" + str(l+1)] = parameters["W"+str(l+1)] - learning_rate*grads["dW"+str(l+1)]parameters["b" + str(l+1)] = parameters["b"+str(l+1)] - learning_rate*grads["db"+str(l+1)]    return parameters

9、封装搭建过程(选看)

到这来,一个DNN就已经全部搭建完成了,和上一节类似,肯定想对这些函数进行一下简单的封装,代码如下:

def L_layer_model(X, Y, layers_dims, learning_rate = 0.0075, num_iterations = 3000, print_cost=False):np.random.seed(1)costs = []    # 参数初始化parameters = initialize_parameters_deep(layers_dims)    # 循环迭代for i in range(0, num_iterations):        # 前向传播: # [LINEAR -> RELU]*(L-1) -> LINEAR -> SIGMOIDAL, caches = L_model_forward(X, parameters)        # 计算损失cost = compute_cost(AL, Y)        # 反向传播grads = L_model_backward(AL, Y, caches)        # 参数更新parameters = update_parameters(parameters, grads, learning_rate)        # 每训练100个样本打印一次损失if print_cost and i % 100 == 0:            print ("Cost after iteration %i: %f" %(i, cost))costs.append(cost)    # 给损失画图plt.plot(np.squeeze(costs))plt.ylabel('cost')plt.xlabel('iterations (per tens)')plt.title("Learning rate =" + str(learning_rate))plt.show()    return parameters

到这来一个深度神经网络就已经完整地搭建完毕了,原代码是吴恩达深度学习课程的代码,可以GitHub寻找一下,如果实在找不到也可以留言。


---手动分割线---


下面是一些其他结构的numpy代码:

Appendix_1、数据读取

minibatch 可以提高算法的运行速度,同时增加训练过程中的随机性。

蓝色是 minibatch,紫色是 fullbatch

def minibatch(X, batchsize=256, shuffle=True):"""Compute the minibatch indices for a training dataset.Parameters----------X : :py:class:`ndarray <numpy.ndarray>` of shape `(N, \*)`The dataset to divide into minibatches. Assumes the first dimensionrepresents the number of training examples.batchsize : intThe desired size of each minibatch. Note, however, that if ``X.shape[0] %batchsize > 0`` then the final batch will contain fewer than batchsizeentries. Default is 256.shuffle : boolWhether to shuffle the entries in the dataset before dividing intominibatches. Default is True.Returns-------mb_generator : generatorA generator which yields the indices into X for each batchn_batches: intThe number of batches"""N = X.shape[0]ix = np.arange(N)n_batches = int(np.ceil(N / batchsize))if shuffle:np.random.shuffle(ix)def mb_generator():for i in range(n_batches):yield ix[i * batchsize : (i + 1) * batchsize]return mb_generator(), n_batches

Appendix_2、学习率设置

学习率与迭代下降的速度有关,当学习率设置的过小时,收敛过程将变得十分缓慢;而当学习率设置的过大时,梯度可能会在最小值附近来回震荡,甚至可能无法收敛。

学习率衰减的效果如下:

可以由上图看出,固定学习率时,当收敛时,会在最优值附近一个较大的区域内摆动;而学习率衰减,当收敛时,会在最优值附近一个更小的区域内摆动。

from copy import deepcopy
from abc import ABC, abstractmethodimport numpy as npfrom math import erfdef gaussian_cdf(x, mean, var):"""Compute the probability that a random draw from a 1D Gaussian with mean`mean` and variance `var` is less than or equal to `x`."""eps = np.finfo(float).epsx_scaled = (x - mean) / np.sqrt(var + eps)return (1 + erf(x_scaled / np.sqrt(2))) / 2class SchedulerBase(ABC):def __init__(self):"""Abstract base class for all Scheduler objects."""self.hyperparameters = {}def __call__(self, step=None, cur_loss=None):return self.learning_rate(step=step, cur_loss=cur_loss)def copy(self):"""Return a copy of the current object."""return deepcopy(self)def set_params(self, hparam_dict):"""Set the scheduler hyperparameters from a dictionary."""if hparam_dict is not None:for k, v in hparam_dict.items():if k in self.hyperparameters:self.hyperparameters[k] = v@abstractmethoddef learning_rate(self, step=None):raise NotImplementedErrorclass ConstantScheduler(SchedulerBase):def __init__(self, lr=0.01, **kwargs):"""Returns a fixed learning rate, regardless of the current step.Parameters----------initial_lr : floatThe learning rate. Default is 0.01"""super().__init__()self.lr = lrself.hyperparameters = {"id": "ConstantScheduler", "lr": self.lr}def __str__(self):return "ConstantScheduler(lr={})".format(self.lr)def learning_rate(self, **kwargs):"""Return the current learning rate.Returns-------lr : floatThe learning rate"""return self.lrclass ExponentialScheduler(SchedulerBase):def __init__(self, initial_lr=0.01, stage_length=500, staircase=False, decay=0.1, **kwargs):"""An exponential learning rate scheduler.---Parameters----------initial_lr : floatThe learning rate at the first step. Default is 0.01.stage_length : intThe length of each stage, in steps. Default is 500.staircase : boolIf True, only adjusts the learning rate at the stage transitions,producing a step-like decay schedule. If False, adjusts thelearning rate after each step, creating a smooth decay schedule.Default is False.decay : floatThe amount to decay the learning rate at each new stage. Default is0.1."""super().__init__()self.decay = decayself.staircase = staircaseself.initial_lr = initial_lrself.stage_length = stage_lengthself.hyperparameters = {"id": "StepScheduler","decay": self.decay,"staircase": self.staircase,"initial_lr": self.initial_lr,"stage_length": self.stage_length,}def __str__(self):return "ExponentialScheduler(initial_lr={}, stage_length={}, staircase={}, decay={})".format(self.initial_lr, self.stage_length, self.staircase, self.decay)def learning_rate(self, step, **kwargs):"""Return the current learning rate as a function of `step`.Parameters----------step : intThe current step number.Returns-------lr : floatThe learning rate for the current step."""cur_stage = step / self.stage_lengthif self.staircase:cur_stage = np.floor(cur_stage)return self.initial_lr * self.decay ** cur_stage

如果想要更多的资源,欢迎关注 @我是管小亮,文字强迫症MAX~

回复【福利】即可获取我为你准备的大礼,包括C++,编程四大件,NLP,深度学习等等的资料。

想看更多文(段)章(子),欢迎关注微信公众号「程序员管小亮」~

参考文章

  • https://www.coursera.org/learn/machine-learning
  • https://www.deeplearning.ai/
  • 深度学习之手撕神经网络代码(基于numpy)
  • 深度神经网络原理与实践
  • 深度学习笔记3:手动搭建深度神经网络(DNN)
  • https://github.com/ddbourgin/numpy-ml

深度学习之手撕深度神经网络DNN代码(基于numpy)相关推荐

  1. 【深度学习】手撕 CNN 之 AlexNet(PyTorch 实战篇)

    今天我们将使用 PyTorch 来复现AlexNet网络,并用AlexNet模型来解决一个经典的Kaggle图像识别比赛问题. 正文开始! 1. 数据集制作 在论文中AlexNet作者使用的是ILSV ...

  2. 深度学习二 —— 手撕激活函数(阶跃函数、sigmoid、tanh、ReLu、Leaky ReLu)

    文章目录 手撕激活函数 1. 阶跃函数 公式 代码 2. sigmoid 公式 代码 3. 阶跃函数 与 sigmoid函数比较 相同点 不同点 4. tanh 函数 公式 代码 5. sigmoid ...

  3. 基于深度学习的手写数字识别、python实现

    基于深度学习的手写数字识别.python实现 一.what is 深度学习 二.加深层可以减少网络的参数数量 三.深度学习的手写数字识别 一.what is 深度学习 深度学习是加深了层的深度神经网络 ...

  4. Python基于深度学习的手写数字识别

    Python基于深度学习的手写数字识别 1.代码的功能和运行方法 2. 网络设计 3.训练方法 4.实验结果分析 5.结论 1.代码的功能和运行方法 代码可以实现任意数字0-9的识别,只需要将图片载入 ...

  5. DL:基于神经网络的深度学习模型的总概览简介(DNN/CNN/RNN等)、各种网络结构对比、案例应用对比之详细攻略

    DL:基于神经网络的深度学习模型的总概览简介(DNN/CNN/RNN等).各种网络结构对比.案例应用对比之详细攻略 目录 神经网络所有模型的简介(概览) DNN.CNN.RNN结构对比 相关文章 DL ...

  6. 手撕深度学习框架,原理很简单

    2020国内深度学习框架领域百花齐放.各大公司也都陆续推出了自己的框架,大大推动了深度学习的发展.深度学习俨然已经渗入到我们生活中的每个角落,给生活带来极大便利. 深度学习能够针对生产生活所面临的复杂 ...

  7. 【深度学习】手写数字识别Tensorflow2实验报告

    实验一:手写数字识别 一.实验目的 利用深度学习实现手写数字识别,当输入一张手写图片后,能够准确的识别出该图片中数字是几.输出内容是0.1.2.3.4.5.6.7.8.9的其中一个. 二.实验原理 ( ...

  8. 基于深度学习的手写数字识别算法Python实现

    摘 要 深度学习是传统机器学习下的一个分支,得益于近些年来计算机硬件计算能力质的飞跃,使得深度学习成为了当下热门之一.手写数字识别更是深度学习入门的经典案例,学习和理解其背后的原理对于深度学习的理解有 ...

  9. 【深度学习 论文综述】深度神经网络全面概述:从基本概念到实际模型和硬件基础

    本文转载自:深度神经网络全面概述:从基本概念到实际模型和硬件基础 本文旨在提供一个关于实现 DNN 的有效处理(efficient processing)的目标的最新进展的全面性教程和调查. 作者:机 ...

最新文章

  1. Bootstrap3基础 btn-group-vertical 按钮组(横着、竖着排列)
  2. 论文《一种金融市场预测的深度学习模型: FEPA》(4)----金融市场预测的架构--常见的数据降为方法
  3. Android 开发利用wifi调试
  4. Struts2入门这一篇就够了
  5. (转载)解决macOS /usr/include 无法写入或者不存在的问题
  6. Erlang列表操作里面的变量绑定规则
  7. 有关 Conversion to Dalvik format failed with error 1
  8. Qt工作笔记-发送端发送Json格式的数据包,接收端解析数据包
  9. 动态对象泛型数组绑定控件 0107
  10. 在iOS中实现一个简单的画板App
  11. RESTful学习笔记
  12. 机器学习面试-Libsvm
  13. 别人统一四大力学,吾统一四大布局方向
  14. 缓存key生成策略的一些思考
  15. centos刻录工具_带来一篇CentOS下刻录光盘
  16. 华为2019年4月10日实习生笔试题
  17. 手机变速齿轮_变速齿轮手机版下载-变速齿轮手机版安卓下载v1.2
  18. 裸设备和Oracle问答20例
  19. jquery对cookie进行读取、写入和删除
  20. PaddleFL 1.2.0版本 使用docker镜像编译流程(Ubuntu 20.04)

热门文章

  1. Kali木马免杀,权限提升(kali本地木马免杀)
  2. linux xen卸载,超级简单安装xen和虚拟机以及解决其中出现的问题
  3. 不同年代的非主流零食
  4. blender 创建脸部骨骼、蒙皮、权重
  5. android自定义view(二)-仿华为卡包效果
  6. 【指纹识别】指纹预处理+特征点提取【含GUI Matlab源码 1693期】
  7. 【狂神Spring笔记】Spring整理笔记(附代码)(共13章)
  8. 阿根廷1比3憾负巴西,南非之路蒙阴影
  9. Android插件化最佳方案--Phantom 实践指南
  10. 异常Throwable