Reason of Random Initialization - Neural Networks
Symmetry Problem
若对于神经网络任意一层 l,l,l, 该层所有参数 ωli,jωi,jl\omega ^{l} _{i,j} 的初始值都一样,则在梯度下降每次迭代中:
{ωl−11,j=ωl−12,j,0≤j≤sl−1,ωli,1=ωli,2,1≤i≤sl+1,,2≤l≤L−1{ω1,jl−1=ω2,jl−1,0≤j≤sl−1,ωi,1l=ωi,2l,1≤i≤sl+1,,2≤l≤L−1\begin{cases} \omega ^{l - 1} _{1,j} = \omega ^{l - 1} _{2,j}, 0 \le j \le s_{l - 1}, \\ \omega ^{l} _{i,1} = \omega ^{l } _{i,2}, 1 \le i \le s_{l + 1}, \end{cases} , 2 \le l \le L - 1
以下图为例,颜色相同的两条线段所代表的权重都相等。
证明
使用数学归纳法。
假设当前迭代之前,命题成立。
由于 a(l)i=g(∑j=0sl−1ωl−1i,ja(l−1)j),1≤i≤sl,ai(l)=g(∑j=0sl−1ωi,jl−1aj(l−1)),1≤i≤sl,a ^{\left (l\right )} _{i} = g\left ( \sum \limits_{j = 0} ^{s_{l - 1}}\omega ^{l- 1} _{i,j} a ^{\left (l - 1\right )} _{j} \right ), 1 \le i \le s_{l}, 其中 ggg 为 logistic 函数。
因此 a1(l)=a2(l)" role="presentation">a(l)1=a(l)2a1(l)=a2(l)a ^{\left (l\right )} _{1} = a ^{\left (l\right )} _{2}
由于 ∂∂ω(l)i,jJ=δ(l+1)ia(l)j,1≤i≤sl+1,∂∂ωi,j(l)J=δi(l+1)aj(l),1≤i≤sl+1,\dfrac {\partial} {\partial \omega ^{\left (l\right ) }_{i,j}} J = \delta ^{\left (l + 1\right )} _{i} a ^{\left (l\right )} _{j}, 1 \le i \le s_{l + 1}, 其中 JJJ 为损失函数。
因此 (1)∂∂ωi,1(l)J=∂∂ωi,2(l)J,1≤i≤sl+1," role="presentation">∂∂ω(l)i,1J=∂∂ω(l)i,2J,1≤i≤sl+1,(1)(1)∂∂ωi,1(l)J=∂∂ωi,2(l)J,1≤i≤sl+1,\dfrac {\partial} {\partial \omega ^{\left (l\right ) }_{i,1}} J = \dfrac {\partial} {\partial \omega ^{\left (l\right ) }_{i,2}} J, 1 \le i \le s_{l + 1}, \tag {1}
由于 δ(l)j=a(l)j(1−a(l)j)∑i=1sl+1ω(l)i,jδ(l+1)iδj(l)=aj(l)(1−aj(l))∑i=1sl+1ωi,j(l)δi(l+1)\delta ^{\left (l\right )} _{j} = a ^{\left (l\right )} _{j} \left (1 - a ^{\left (l\right )} _{j} \right ) \sum \limits_{i = 1} ^{s _{l + 1} } \omega ^{\left (l\right ) }_{i,j} \delta ^{\left (l + 1\right )} _{i}
因此 δ(l)1=δ(l)2δ1(l)=δ2(l)\delta ^{\left (l\right )} _{1} = \delta ^{\left (l\right )} _{2}
由于 ∂∂ω(l−1)i,jJ=δ(l)ia(l−1)j,1≤i≤sl,∂∂ωi,j(l−1)J=δi(l)aj(l−1),1≤i≤sl,\dfrac {\partial} {\partial \omega ^{\left (l - 1\right ) }_{i,j}} J = \delta ^{\left (l\right )} _{i} a ^{\left (l - 1\right )} _{j}, 1 \le i \le s_{l},
因此 ∂∂ω(l−1)1,jJ=∂∂ω(l−1)2,jJ,0≤j≤sl−1,(2)(2)∂∂ω1,j(l−1)J=∂∂ω2,j(l−1)J,0≤j≤sl−1,\dfrac {\partial} {\partial \omega ^{\left (l - 1\right ) }_{1, j}} J = \dfrac {\partial} {\partial \omega ^{\left (l - 1\right ) }_{2, j}} J, 0 \le j \le s_{l - 1}, \tag {2}
由 (1), (2) 得,在下一轮的迭代之前,命题也成立。
Reason of Random Initialization - Neural Networks相关推荐
- Deep Neural Networks的Tricks
Here we will introduce these extensive implementation details, i.e., tricks or tips, for building an ...
- 李菲菲课程笔记:Deep Learning for Computer Vision – Introduction to Convolution Neural Networks
转载自:http://www.analyticsvidhya.com/blog/2016/04/deep-learning-computer-vision-introduction-convoluti ...
- Machine Learning week 5 quiz: Neural Networks: Learning
Neural Networks: Learning 5 试题 1. You are training a three layer neural network and would like to us ...
- Paper:《Graph Neural Networks: A Review of Methods and Applications》翻译与解读
Paper:<Graph Neural Networks: A Review of Methods and Applications>翻译与解读 目录 <Graph Neural N ...
- Paper:Xavier参数初始化之《Understanding the difficulty of training deep feedforward neural networks》的翻译与解读
Paper:Xavier参数初始化之<Understanding the difficulty of training deep feedforward neural networks>的 ...
- CV:翻译并解读2019《A Survey of the Recent Architectures of Deep Convolutional Neural Networks》第一章~第三章
CV:翻译并解读2019<A Survey of the Recent Architectures of Deep Convolutional Neural Networks>第一章~第三 ...
- Paper之DL之BP:《Understanding the difficulty of training deep feedforward neural networks》
Paper之DL之BP:<Understanding the difficulty of training deep feedforward neural networks> 目录 原文解 ...
- [C1W3] Neural Networks and Deep Learning - Shallow neural networks
第三周:浅层神经网络(Shallow neural networks) 神经网络概述(Neural Network Overview) 神经网络的表示(Neural Network Represent ...
- DeepLearning.AI第一部分第三周、 浅层神经网络(Shallow neural networks)
文章目录 3.1 一些简单的介绍 3.2神经网络的表示Neural Network Representation 3.3计算一个神经网络的输出Computing a Neural Network's ...
- Stanford机器学习---第五讲. 神经网络的学习 Neural Networks learning
原文见http://blog.csdn.net/abcjennifer/article/details/7758797,加入了一些自己的理解 本栏目(Machine learning)包含单參数的线性 ...
最新文章
- 结构型模式 -- 代理模式(静态代理动态代理)
- go linux环境搭建,Linux 下 Go 环境搭建以及 Gin 安装
- Conditional
- Gitee 上线多项 PR 功能优化,进一步提升审查与提交效率
- HDU-2084-数塔(dp)
- 使用CPU时间戳进行高精度计时
- wps 甘特图_【WPS神技能】在Excel表格中用图表阶梯式的展示任务进程?找甘特图呀...
- Mac开发-脚本打包DMG
- 深度解读互联网+供应链金融八大模式
- “add measurements”(添加度量)菜单问题
- 新人如何通过小红书赚第一桶金?
- 基于C++的关键字检索系统
- 抑制剂以及抗体偶联物在免疫检查点中的作用
- 如何建立企业级数据分析能力?
- 获取PC 服务器 可用的GPU
- React SSR: 基于 express 自构建 SSR 服务端渲染
- 谁说程序员过了35岁之后就要去“送外卖”、“跑滴滴”?这几种发展走向照样解除焦虑
- React中文文档之State and Lifecycle
- 重装系统后安装的软件
- php与java语法的区别