神经网络中的权重初始化问题weight initialization problem in FNN

个人论文完成笔记
^ _ ^欢迎批评指正
本篇文章研究的是全连接的多层神经网络中的权重初始化问题，以8-20-30-1的MLP为实验对象。神经网络是一种要素间关联性极强的结构，从输入数据，输入数据的scaling，输入数据划分的batch，到每一隐藏层初始化的权重，节点个数，激活函数的选择，再到层数，最终输出函数的选择，和输出节点个数，都是彼此影响的，其中，权重初始化，与激活函数的选择和输入数据存在较强的关联性，不同的权重初始化可能造成不同的问题的产生。
以下的bp算法指的是standard bp算法

I-9:
D. Erdogmus, O. Fontenla-Romero, J. C. Principe, A. Alonso-Betanzos and E. Castillo, “Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response,” in IEEE Transactions on Neural Networks, vol. 16, no. 2, pp. 325-337, March 2005, doi: 10.1109/TNN.2004.841777.

神经网络结构 neural network structure applied: 多输入多输出多层神经网络（文章用Single Hidden Layer MLP进行算法阐述）multi-input & multi-output MLP
激活函数non-linearity or activation function
需要减少的代价函数: 每一个PE的MSE
初始化算法类型：LS （Least square）
算法涉及重点参数：desired value of output before and after non-linearity; actual value of output before and after non-linearity; on different layers
算法描述：
将神经网络训练集的y输出矩阵设定为output layer的desired value of output，对每个y通过non-linearity or activation function的反函数求得对before non-linearity or activation function的output 的desired value，通过
实验数据集对象：
实验评估指标：

D-4：
Petr Dolezel, Pavel Skrabanek, Lumir Gago, Weight Initialization Possibilities for Feedforward Neural Network with Linear Saturated Activation Functions, IFAC-PapersOnLine, Volume 49, Issue 25,
2016, Pages 49-54, ISSN 2405-8963, https://doi.org/10.1016/j.ifacol.2016.12.009.

神经网络结构 neural network structure applied: multi-input & one- output (transform the input vector x into a scalar value y) Piecewise- linear neural network
训练算法training algorithm: Levenberg - Marquardt loss fucntion: MSE
No pruning
No scaling
激活函数non-linearity or activation function：Symmetric linear saturated activation function (特征：not fully differentiable) output layer function:v
实验数据集对象：大类为Function Approximation的四小类Function (continuous function - discontinuous function - simulated first order time series - real dynamic system of twin rotor aerodynamic device)
最优初始化算法：Nguyen- Widrow method
实验评估指标：convergence speed - the number of epochs & performance - over defined number of epochs
本文对比了四种权重初始化方法在function approximation上的收敛速度和准确率，选择了四种数据集：连续函数，离散函数，时间序列和real-time

F-6：
X. Glorot and Y. Bengio. Understanding the difficulty of
training deep feedforward neural networks. In International
Conference on Artificial Intelligence and Statistics, pages
249–256, 2010.

训练算法：back propagation optimizer：SGD cost function: logistic regression activation function：softsign output layer function：LogSoftmax
loss function:NLLess
本文是著名的Xaveir权重初始化法，文章重点表明了前提先决条件：linear regime，核心观点是在向前的过程中保持状态方差不变，在向后传播过程中保持梯度方差不变，然后根据这两个条件计算出权重初始化所在区间

C-3：
K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in: ICCV, 2015.

本文是著名的Kaiming He算法，针对CNN，在提出新的激活函数PReLu之后设计了一种能够解决nonlinearities的权重初始化方法，核心思想同样是保持正向传播状态方差不变，反向传播梯度方差不变
实际上，通过该方法的有效性可以看出，权重初始化与激活函数的关系非常密切，不仅是要求每一层经过权重计算的output要在激活函数的可导有效区间里

H-8：
Junfei Qiao, Sanyi Li, Wenjing Li, Mutual information based weight initialization method for sigmoidal feedforward neural networks,
Neurocomputing, Volume 207, 2016, Pages 676-683, ISSN 0925-2312,
https://doi.org/10.1016/j.neucom.2016.05.054.

本文运用了信息学中的参数：信息熵和交互信息值，通过计算输入变量与输出变量的交互信息值，判断输入变量所含有用信息的多少，根据其有用信息的多少来确定其对应权重的区间，与输出变量关系越大，所包含的有用信息越多，则对应的权重区间越大
RMSE

B-2：
J.Y.F. Yam, T.W.S. Chow
A weight initialization method for improving training speed in feedforward neural network
Neurocomputing, 30 (1–4) (2000), pp. 219-232

神经网络：训练算法back-propagation optimizer SGD 损失函数 E activation function： sigmoid output layer function：sigmoid
B篇文章解决的问题是function approximation类问题，
算法描述：为了确保每一层的输出都在下一层激活函数的active region，该方法先得出一个区间集合，表示出下一层激活函数的active region，如果下一层使用channel-wised激活函数，则区间集合应只包含一个含有两个标量的区间，列出不等式即每一层的output应该处于这个区间内，通过柯西不等式和统计学的规则，最终可以利用权重的分布特点，input数据的平方和，channel数，和s得出权重区间，然后由第一层到最后一层
实际上，该方法属于IA类方法
G-7：
S.P. Adam, D.A. Karras, G.D. Magoulas, M.N. Vrahatis
Solving the linear interval tolerance problem for weight initialization of neural networks
Neural Netw., 54 (2014), pp. 17-37

文章G将权重初始化问题视为一种linear interval tolerance problem,其中运用的LIT-Approach是权重初始化中IA类方法的一种，linear interval tolerance problem的主要特点是会考虑被估计所处区间的参数的不确定性。当一些权重初始化算法考虑input data的信息熵，往往能借助该数据得到更贴切的权重初始化，但是会极大地加大运算量；但完全不考虑input data的权重初始化事实上并不能得到有效结果；LIT-Approach方法考虑且利用了input data的elementary statistics (sample mean)
G方法的特点是在不同的神经网络层应用的权重初始化方法是不同的

E-5:
H. H. Tan and K. H. Lim, “Vanishing Gradient Mitigation with Deep Learning Neural Network Optimization,” 2019 7th International Conference on Smart Computing & Communications (ICSCC), Sarawak, Malaysia, Malaysia, 2019, pp. 1-4, doi: 10.1109/ICSCC.2019.8843652.

J-10:
Y. Lee, S. -. Oh and M. W. Kim, “The effect of initial weights on premature saturation in back-propagation learning,” IJCNN-91-Seattle International Joint Conference on Neural Networks, Seattle, WA, USA, 1991, pp. 765-770 vol.1, doi: 10.1109/IJCNN.1991.155275.

A-1:
Nguyen, D., & Widrow, B. (1990). Improving the learning speed of two-layer neural networks by choosing initial values of the adaptive weights. In Proceedings of the international joint conference on neural networks, IJCNN’90, Ann Arbor, MI, vol. 3 (pp. 21–26).
O-11
Nguyen and Widrow, “The truck backer-upper: an example of self-learning in neural networks,” International 1989 Joint Conference on Neural Networks, Washington, DC, USA, 1989, pp. 357-363 vol.2, doi: 10.1109/IJCNN.1989.118723.
小车emulator以及小车controller
emulator 7输入 7输出数据集：从已知状态根据信号到下一状态的所有可能的集合完全模拟小车倒车过程 controller 7输入 1输出数据集：在笛卡尔直角坐标系的状态和信号集
emulator神经网络：训练算法back-propagation optimizer steepest descent 损失函数 MSE activation function： sigmoid output layer function：v
这两篇文章涉及的是著名的Nguyen-Widrow算法，本质同样是确定权重的区间，权重区间的确定用到的数值有hidden layer上的神经元个数，以及输入和权重向量的维度

本文新颖点：
对于权重初始化使用区间方法则一次初始化完成对区间两头的两个值的应用
把8篇文章以x1x2-21-y(N-W 网络)的MLP模拟一下不同的training algorithm、optimizer、loss function 、learning rate、activation function更新方法解决同一个分类问题（no），同一个function approximation问题，用同一套数据可视化方法，完成数据可视化，就可以知道，参数具体的更新过程以及损失下降过程，针对过程出现的某些独特之处，适用一个相应可以解决的weight initialization办法。
不足：input data set的影响是极大的但本文的办法普适性更高计算量小模块化性

神经网络中的权重初始化问题weight initialization problem in FNN相关推荐

深度学习 | Why and How：神经网络中的权重初始化
北京 | 深度学习与人工智能研修 12月23-24日再设经典课程重温深度学习阅读全文> 正文共2268个字,8张图,预计阅读时间:6分钟. 前言神经网络中的权重(weight)初始化是个常 ...
为什么需要权重初始化（weight initialization）？常见的权重初始化方式有哪些？启发式权重初始化的好处？
为什么需要权重初始化(weight initialization)?常见的权重初始化方式有哪些?启发式权重初始化的好处? 目录为什么需要权重初始化(weight initialization)?常见 ...
神经网络中的权重初始化一览：从基础到Kaiming
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达在进行各种小实验和思维训练时,你会逐步发现为什么在训练深度神经网络 ...
神经网络中的权重初始化常用方法
1.权重初始化的重要性神经网络的训练过程中的参数学习时基于梯度下降算法进行优化的.梯度下降法需要在开始训练时给每个参数赋予一个初始值.这个初始值的选取十分重要.在神经网络的训练中如果将权重全部初始化 ...
[DeeplearningAI笔记]改善深层神经网络_深度学习的实用层面1.10_1.12/梯度消失/梯度爆炸/权重初始化...
觉得有用的话,欢迎一起讨论相互学习~Follow Me 1.10 梯度消失和梯度爆炸当训练神经网络,尤其是深度神经网络时,经常会出现的问题是梯度消失或者梯度爆炸,也就是说当你训练深度网络时,导数或坡 ...
深度学习相关概念：权重初始化
深度学习相关概念:权重初始化 1.全零初始化(×) 2.随机初始化 2.1 高斯分布/均匀分布 2.1.1权重较小-N(0,0.01)\pmb{\mathcal{N}(0,0.01)}N(0,0.01 ...
权重初始化与预训练权重
权重初始化 1.什么是权重初始化权重初始化(weight initialization)又称参数初始化,在深度学习模型训练过程的本质是对weight(即参数 W)进行更新,但是在最开始训练的时候是无 ...
《南溪的目标检测学习笔记》——权重初始化
1 介绍在使用CNN搭建目标检测模型时,有一个很重要的步骤就是需要进行权重初始化,那么为什么需要进行权重初始化呢? 2 权重初始化的原因关于为什么要进行权重初始化,请阅读知乎文章<神经网络中 ...
简单探究神经网络中权重、偏置维度的关系
利用PyTorch的tensor和autograd实现一个简单的神经网络,探究神经网络中权重.偏置维度的关系简单神经网络的分析和实现本次目标项目环境神经网络手绘图代码实现简单神经网络的分析 ...

神经网络中的权重初始化问题weight initialization problem in FNN

神经网络中的权重初始化问题weight initialization problem in FNN相关推荐

最新文章

热门文章