如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）

根据[4]中的说法，“Though results on the Hessian of individual layers were not included in this study”,似乎每个层都有一个对应的Hessian矩阵。

根据[5]中的说法，最后一层的hessian矩阵很好计算，但是如果下一层，那就很不好计算

下面的这些对hessian矩阵的理论处理可能有帮助，先记载一下：
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
[7]很清晰地讲解了分母是否转置对求导结果的影响,如下：
对于x=(x1…xN)Tx=\left(x_{1} \dots x_{N}\right)^{T}x=(x1…xN)T

∂f(x)∂x=(∂f(x)∂x1∂f(x)∂x2⋮∂f(x)∂xN)\frac{\partial f(x)}{\partial x}=\left(\begin{array}{c}{\frac{\partial f(x)}{\partial x_{1}}} \\ {\frac{\partial f(x)}{\partial x_{2}}} \\ {\vdots} \\ {\frac{\partial f(x)}{\partial x_{N}}}\end{array}\right)∂x∂f(x)=⎝⎜⎜⎜⎜⎛∂x1∂f(x)∂x2∂f(x)⋮∂xN∂f(x)⎠⎟⎟⎟⎟⎞

(∂f(x)∂x)T=∂f(x)∂xT=(∂f(x)∂x1∂f(x)∂x2…∂f(x)∂xN)\left(\frac{\partial f(x)}{\partial x}\right)^{T}=\frac{\partial f(x)}{\partial x^{T}}=\left(\frac{\partial f(x)}{\partial x_{1}} \quad \frac{\partial f(x)}{\partial x_{2}} \quad \ldots \quad \frac{\partial f(x)}{\partial x_{N}}\right)(∂x∂f(x))T=∂xT∂f(x)=(∂x1∂f(x)∂x2∂f(x)…∂xN∂f(x))

∂2f(x)∂x∂xT=(∂2f(x)∂x12∂2f(x)∂x1∂x2⋯∂2f(x)∂x1∂xN∂2f(x)∂x2∂x1∂2f(x)∂x22⋮∂2f(x)∂xN−1∂x2⋮∂2f(x)∂xN∂x1⋯⋯∂2f(x)∂xN2)\frac{\partial^{2} f(x)}{\partial x \partial x^{T}}=\left(\begin{array}{cccc}{\frac{\partial^{2} f(x)}{\partial x_{1}^{2}}} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{2}}} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{1} \partial x_{N}}} \\ {\frac{\partial^{2} f(x)}{\partial x_{2} \partial x_{1}}} & {\frac{\partial^{2} f(x)}{\partial x_{2}^{2}}} & {} & {\vdots} \\ {} & {\frac{\partial^{2} f(x)}{\partial x_{N-1}\partial x_{2}}} & {} & {\vdots} \\ {\frac{\partial^{2} f(x)}{\partial x_{N} \partial x_{1}}} & {\cdots} & {\cdots} & {\frac{\partial^{2} f(x)}{\partial x_{N}^{2}}}\end{array}\right)∂x∂xT∂2f(x)=⎝⎜⎜⎜⎜⎜⎜⎛∂x12∂2f(x)∂x2∂x1∂2f(x)∂xN∂x1∂2f(x)∂x1∂x2∂2f(x)∂x22∂2f(x)∂xN−1∂x2∂2f(x)⋯⋯⋯∂x1∂xN∂2f(x)⋮⋮∂xN2∂2f(x)⎠⎟⎟⎟⎟⎟⎟⎞

－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

粘贴工具是:Mathpix Snipping Tool,第一次发现这工具截图然后转化不准的问题，sigh…
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－
[1]中的(2.16)~(2.18)无法核实,
(3.1)~(3.3)中出现了奇怪的符号δ没有说明是什么含义
(2.8)对于bnib_{ni}bni的定义很奇怪,[1]中根据(2.15)与(2.12)的比较，可知该文是在论述二分类目的的神经网络,该文作者无法联系上，最终放弃阅读。

[3]使用弹簧振子在模仿神经网络的不断振荡，分别从微分方程和差分方程两个角度来论述为什么momentum这种optimizer能够加速收敛

联系了[4]作者，回复是需要谷歌的大量设备以及专门脚本才能复现，并不能在家里实现，连他自己手上都没有代码。
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

至于hessian-free的意思指的是，计算Hv而不是直接计算H，这样避开计算H的庞大工作量。
计算H−1VH^{-1}VH−1V的目标是为了在训练神经网络时,二阶牛顿法的迭代项中有所使用.
－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－－

##################下面几个github链接和hessian-free相关####################################
[7]这个作者不回复了,弃坑

https://github.com/drasmuss/hessianfree
这个里面的代码主要是共轭梯度法，直接舍弃了和Jacobian和Hessian相关的操作

https://github.com/NithinTangellamudi/HessianFreeImplementation
代码各种语法错误，弃坑

☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆☆
下面的还在研究中:
#------------------------------------------------------------------------------------------

[8]中的代码配合论文[9]:
hessian-free部分的代码如下：

    def gauss_vect_mult(v):"""Multiply a vector by the Gauss-Newton matrix JHJ'where J is the Jacobian between output and params and H is the Hessian between costs and outputH should be diagonal and positive.Also add the ridge"""Jv = T.Rop(output, params, v)HJv = T.Rop(T.grad(opt_cost,output), output, Jv)JHJv = T.Lop(output, params, HJv)if not isinstance(JHJv,list):JHJv = [JHJv]JHJv = [a+ridge*b for a,b in zip(JHJv,v)]return JHJv

给作者发了邮件询问理由支持，但是没有回复
#------------------------------------------------------------------------------------------
[10]代码是下面论文[11]的一部分

hessian-free部分的代码如下：

def gauss_newton_product(cost, p, v, s):  # this computes the product Gv = J'HJv (G is the Gauss-Newton matrix)Jv = T.Rop(s, p, v)HJv = T.grad(T.sum(T.grad(cost, s)*Jv), s, consider_constant=[Jv], disconnected_inputs='ignore')Gv = T.grad(T.sum(HJv*s), p, consider_constant=[HJv, Jv], disconnected_inputs='ignore')Gv = map(T.as_tensor_variable, Gv)  # for CudaNdarrayreturn Gv

给作者发了邮件询问理由支持，但是没有回复
#------------------------------------------------------------------------------------------
[12]涉及到元学习

Reference:
[1]Exact Calculation of the Hessian Matrix for the Multilayer Perceptron
[2]A fast procedure for re-training the multilayer perceptron
[3]On the Momentum Term in Gradient Descent Learning Algorithms
[4]Negative eigen values of the hessian in deep neural networks
[5]Most efficient way to calculate hessian of cost function in neural network
[6]https://onlinelibrary.wiley.com/doi/pdf/10.1002/9780470173862.app4
[7]https://github.com/moonl1ght/HessianFreeOptimization/issues/1
[8]https://github.com/doomie/HessianFree
[9]Improved Preconditioner for Hessian Free Optimization
[10]https://github.com/boulanni/theano-hf
[11]Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
[12]https://github.com/ozzzp/MLHF

如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）相关推荐

从原理上“训练”一个神经网络（下）
点击关注我哦一篇文章带你了解函数声明时的优雅操作四.训练当我们从神经网络开始时,我们会随机初始化权重.显然,它不会给很好的结果.在训练过程中,我们希望从性能不佳的神经网络入手,并以高准确度结束网 ...
从原理上“训练”一个神经网络（上）
点击关注我哦一篇文章带你了解函数声明时的优雅操作一.引言这是我计划的系列优化算法的第1部分,该算法特别用于机器学习和神经网络中的"训练".在这篇文章中,将介绍Gradient ...
用Python从头实现一个神经网络
用Python从头实现神经网络实在是觉得LaTeX编译出来的公式太好看了,所以翻译了一下,原文地址: Machine Learning for Beginners: An Introduction ...
python神经网络训练数据_用Python从头开始实现一个神经网络
注:本篇文章非原创,翻译自Implementing a Neural Network from Scratch in Python – An Introductionwww.wildml.com ...
深度学习笔记：利用numpy从零搭建一个神经网络
很多人说深度学习就是个黑箱子,把图像预处理之后丢进 tensorflow 就能出来预测结果,简单有效又省时省力.但正如我在上一篇推送中所说,如果你已是一名功力纯厚的深度学习工程师,这么做当然没问题.但 ...
快速入门PyTorch(2)--如何构建一个神经网络
2019 第 43 篇,总第 67 篇文章本文大约 4600 字,阅读大约需要 10 分钟快速入门 PyTorch 教程第二篇,这篇介绍如何构建一个神经网络.上一篇文章: 快速入门Pytorch( ...
使用Python从头实现一个神经网络
在学习神经网络的过程中,在知乎上看到了一篇写的非常好的文章,不仅仅将神经网络的结构介绍地非常详细,而且将神经网络的反向传播的原理讲的十分透彻.经过作者同意,转载到本博客中. 原文链接:用Python从 ...
如何训练好一个神经网络？
文章目录参考依据两个现象 1.神经网络的训练没有想象中简单 2. 神经网络训练的失败往往是悄无声息的正确的训练方式 1. 数据第一! 2. 制作端到端的训练/验证框架 + 得到baselines ...
神经网络的三种训练方法,如何训练一个神经网络
1.神经网络有哪些主要分类规则并如何分类? 神经网络模型的分类人工神经网络的模型很多,可以按照不同的方法进行分类.其中,常见的两种分类方法是,按照网络连接的拓朴结构分类和按照网络内部的信息流向分类. ...

如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）

如何计算一个神经网络在使用momentum时的hessian矩阵（论文调研）相关推荐

最新文章

热门文章