读文献——《Learning representations by back-propagating errors》

Back-procedure, the procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. Internal ‘hidden’ units are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. (反向传播，反复调整网络连接的权重，以便最小化网络的实际输出和所需输出之间的差异。内部“隐藏”单元不是输入或输出的一部分，它们代表了任务域的重要特征，而任务中的规律性是通过网络各单元的相互影响来表现的。)
最简单本质的结构是一个层级表示方法：input layer–> intermediate layers–>output layer，其中隐含层（hidden unit）不直接与输入或输出相连，学习过程中应判断隐含层是否需要被激活。

Eqn.(2) 也就是sigmod函数。当然，eqn.(1) 和eqn.(2)并不是必须的，别的有界函数也可以，但是在这里使用线性组合在非线性之前可以简化程序。
The total error：

采用梯度下降法来减小loss值，使得实际输出和期望输出差距最小。若想知道某一参数，例如某权重，（本文中即为了找到一组时输出接近所需输出的权重）对E值大小产生的影响，则用E值对该权重做偏导。
For a given case, the partial derivative of the error with respect to each weight are computed in two passes. We have already described the forward pass in which the units in each layer have their states determined by the input they receive from units in lower layers using equations (1) and (2). The backward pass which propagates derivatives from the top layer back to the bottom one is more complicated. (对于给出的例子，误差对每个权重的偏导分为两步计算，第一部分为前向部分，是每层的各单元接收的输入所决定的状态，这个输入来自前一单元，是使用equations (1) and (2)所得的。后向部分是从顶层传回底层的更复杂的导数。)
前向传播：输入通过激活函数sigmod()抵达输出层得到输出值，后向传播：对前向传播得到的输出值迭代处理，以梯度下降的方法重新修改权重
由此，从Eqn.(3)向前推

链式法则以及Eqn.(2)可以得

从i到j对权重的导数

We accumulate ∂E/∂w over all the input-output cases before changing the weights. The simplest version of gradient descent is to change each weight by an amount proportional to the accumulated ∂E/∂w. But we use an acceleration method in which the current gradient is used to modify the velocity of the point in weight space instead of its position.

在修改权重值之前需要累加所有的测试，最简单的梯度下降是按比例分配∂E/∂w值来修改每个权重参数。本文中采用一个加速方法加速收敛。
此外还需检查一维input层的数组是否关于中点对称，为保证对称，文章引入了intermediate层。
学习过程的缺点：梯度下降找到的是局部极小值而不是全局极小值（Solution: 一般的找到的local极小值可以认为是global极小值；另外网络中unit之间不需要太多的connection，不然容易造成local minima。）

读文献——《Learning representations by back-propagating errors》相关推荐

【论文泛读】 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
[论文泛读] Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift ...
批归一化《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》
批归一化<Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift ...
Batch normalization:accelerating deep network training by reducing internal covariate shift的笔记
说实话,这篇paper看了很久,,到现在对里面的一些东西还不是很好的理解. 下面是我的理解,当同行看到的话,留言交流交流啊!!!!! 这篇文章的中心点:围绕着如何降低 internal covari ...
【BN】《Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift》
ICML-2015 在 CIFAR-10 上的小实验可以参考如下博客: [Keras-Inception v2]CIFAR-10 文章目录 1 Background and Motivation 2 ...
《Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift》阅读笔记与实现
今年过年之前,MSRA和Google相继在ImagenNet图像识别数据集上报告他们的效果超越了人类水平,下面将分两期介绍两者的算法细节. 这次先讲Google的这篇<Batch Normali ...
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 论文笔记
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 论文链接: h ...
论文阅读：Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
文章目录 1.论文总述 2.Why does batch normalization work 3.BN加到卷积层之后的原因 4.加入BN之后,训练时数据分布的变化 5.与BN配套的一些操作参考文献 ...
读文献——《Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift》
在自己阅读文章之前,通过网上大神的解读先了解了一下这篇文章的大意,英文不够好的惭愧... 大佬的文章在https://blog.csdn.net/happynear/article/details/4 ...
深度学习论文--Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
本文翻译论文为深度学习经典模型之一:GoogLeNet-BN 论文链接:https://arxiv.org/abs/1502.03167v3 摘要:训练深度神经网络的难度在于:前一层网络参数的变化,导 ...
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
机器学习领域有个很重要的假设:IID独立同分布假设,就是假设训练数据和测试数据是满足相同分布的,这是通过训练数据获得的模型能够在测试集获得好的效果的一个基本保障.BatchNorm就是在深度神经网络训 ...

读文献——《Learning representations by back-propagating errors》

读文献——《Learning representations by back-propagating errors》相关推荐

最新文章

热门文章