矩阵求导——Numerator Layout Denominator Layout

突然想起这个问题了，刚刚看到有人在问某个的公式，自己闷头想了想居然都忘的差不多了，于是乎稍微整理一下供以后参考。
其实，关于矩阵求导讲的最详细的还是wiki上的页面面http://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions
关于矩阵求导，很多地方会有不同的表现形式，说到底是这么一回事，一个m维的向量yy对n维的向量xx求导∂y∂x∂y∂x，得到的结果应该是m乘n还是n乘以m。具体内容可以看wikipedia。
yy的元素以列的形式布局，xx以行的形式，或是反过来，这就导致了不同的可能性：
分子布局（numerator layout）:根据yy或者xTxT来布局，也叫Jacobian formulation
分母布局(denominator layout)：根据yTyT或者xx来布局，也叫Hessian formulation
A third possibility sometimes seen is to insist on writing the derivative as ∂y∂x′∂y∂x′, (i.e. the derivative is taken with respect to the transpose of x) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator layout.

When handling the [[gradient]] ∂y∂x∂y∂x and the opposite case ∂y∂x,∂y∂x, we have the same issues. To be consistent, we should do one of the following:
If we choose numerator layout for ∂y∂x,∂y∂x, we should lay out the [[gradient]] ∂y∂x∂y∂x as a row vector, and ∂y∂x∂y∂x as a column vector.
If we choose denominator layout for ∂y∂x,∂y∂x, we should lay out the [[gradient]] ∂y∂x∂y∂x as a column vector, and ∂y∂x∂y∂x as a row vector.
In the third possibility above, we write ∂y∂x′∂y∂x′ and∂y∂x,∂y∂x, and use numerator layout.

Not all math textbooks and papers are consistent in this respect throughout the entire paper. That is, sometimes different conventions are used in different contexts within the same paper. For example, some choose denominator layout for gradients (laying them out as column vectors), but numerator layout for the vector-by-vector derivative ∂y∂x.∂y∂x.

Similarly, when it comes to scalar-by-matrix derivatives ∂y∂X∂y∂X and matrix-by-scalar derivatives ∂Y∂x,∂Y∂x, then consistent numerator layout lays out according to ”YY”’ and ‘XTXT”, while consistent denominator layout lays out according to ”YTYT”and ”X”. In practice, however, following a denominator layout for ∂Y∂x,∂Y∂x, and laying the result out according to ”YTYT”, is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found:
”Consistent numerator layout”, which lays out ∂Y∂x∂Y∂x according to ”Y′Y′’ and ∂y∂X∂y∂X according to ”XTXT”.
”Mixed layout”, which lays out ∂Y∂x∂Y∂x according to ”YY” and ∂y∂X∂y∂X according to ”’X”’.
Use the notation ∂y∂XT,∂y∂XT,with results the same as consistent numerator layout.

In the following formulas, we handle the five possible combinations ∂y∂x,∂y∂x,∂y∂x,∂y∂X∂y∂x,∂y∂x,∂y∂x,∂y∂X and∂Y∂x∂Y∂x separately. We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensional [[parametric curve]] is defined in terms of a scalar variable, and then a derivative of a scalar function of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.

Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.

When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For example, in attempting to find the [[maximum likelihood]] estimate of a [[multivariate normal distribution]] using matrix calculus, if the domain is a ”k”x1 column vector, then the result using the numerator layout will be in the form of a 1x”k” row vector. Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used.

The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.

=== Numerator-layout notation ===

Using numerator-layout notation, we have:Minka, Thomas P. “Old and New Matrix Algebra Useful for Statistics.” December 28, 2000. [http://research.microsoft.com/en-us/um/people/minka/papers/matrix/]

∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn].∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn].

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢∂y1∂x∂y2∂x⋮∂ym∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y1∂x∂y2∂x⋮∂ym∂x].

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂ym∂x1∂y1∂x2∂y2∂x2⋮∂ym∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y1∂x1∂y1∂x2⋯∂y1∂xn∂y2∂x1∂y2∂x2⋯∂y2∂xn⋮⋮⋱⋮∂ym∂x1∂ym∂x2⋯∂ym∂xn].

∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x12⋮∂y∂x1q∂y∂x21∂y∂x22⋮∂y∂x2q⋯⋯⋱⋯∂y∂xp1∂y∂xp2⋮∂y∂xpq⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂X=[∂y∂x11∂y∂x21⋯∂y∂xp1∂y∂x12∂y∂x22⋯∂y∂xp2⋮⋮⋱⋮∂y∂x1q∂y∂x2q⋯∂y∂xpq].

The following definitions are only provided in numerator-layout notation:

∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x∂y21∂x⋮∂ym1∂x∂y12∂x∂y22∂x⋮∂ym2∂x⋯⋯⋱⋯∂y1n∂x∂y2n∂x⋮∂ymn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂Y∂x=[∂y11∂x∂y12∂x⋯∂y1n∂x∂y21∂x∂y22∂x⋯∂y2n∂x⋮⋮⋱⋮∂ym1∂x∂ym2∂x⋯∂ymn∂x].

dX=⎡⎣⎢⎢⎢⎢⎢dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋱⋯dx1ndx2n⋮dxmn⎤⎦⎥⎥⎥⎥⎥.dX=[dx11dx12⋯dx1ndx21dx22⋯dx2n⋮⋮⋱⋮dxm1dxm2⋯dxmn].

===Denominator-layout notation===
Using denominator-layout notation, we have:[http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.AppD.pdf]

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x1∂y∂x2⋮∂y∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y∂x1∂y∂x2⋮∂y∂xn].

∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x].∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x].

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y1∂x2⋮∂y1∂xn∂y2∂x1∂y2∂x2⋮∂y2∂xn⋯⋯⋱⋯∂ym∂x1∂ym∂x2⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y1∂x1∂y2∂x1⋯∂ym∂x1∂y1∂x2∂y2∂x2⋯∂ym∂x2⋮⋮⋱⋮∂y1∂xn∂y2∂xn⋯∂ym∂xn].

∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x21⋮∂y∂xp1∂y∂x12∂y∂x22⋮∂y∂xp2⋯⋯⋱⋯∂y∂x1q∂y∂x2q⋮∂y∂xpq⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂X=[∂y∂x11∂y∂x12⋯∂y∂x1q∂y∂x21∂y∂x22⋯∂y∂x2q⋮⋮⋱⋮∂y∂xp1∂y∂xp2⋯∂y∂xpq].

原文地址：https://blog.csdn.net/lansatiankongxxc/article/details/44992709

矩阵求导——Numerator Layout Denominator Layout相关推荐

矩阵论思维导图_矩阵求导与矩阵微分
矩阵求导与矩阵微分符号定义使用大写的粗体字母表示矩阵使用小写的粗体字母表示向量 ,这里默认为列向量使用小写的正体字母表示标量需要明白的是,矩阵求导的意义在哪来,我们回想一下函数求 ...
矩阵求导公式，及MathJax公式编辑
最近学到线性回归中要用到向量,矩阵求导,所以就搜集了下资料,总结如下: 矩阵求导有两种布局: 分子布局(numerator layout) 分母布局(denominator layout) 下面用向量 ...
[机器学习-数学] 矩阵求导(分母布局与分子布局)，以及常用的矩阵求导公式
一, 矩阵求导 1,矩阵求导的本质矩阵A对矩阵B求导: 矩阵A中的每一个元素分别对矩阵B中的每个元素进行求导. A1×1A_{1\times1}A1×1, B1×1B_{1\times1}B1×1 ...
机器学习中的线性代数之矩阵求导
前面针对机器学习中基础的线性代数知识,我们做了一个常用知识的梳理.接下来针对机器学习公式推导过程中经常用到的矩阵求导,我们做一个详细介绍. 矩阵求导(Matrix Derivative)也称作矩阵微分 ...
（Math）矩阵求导
本文地址:http://blog.csdn.net/mounty_fsc/article/details/51583809 前言本文为维基百科上矩阵微积分部分的翻译内容.本文为原文的翻译与个人总结, ...
【转载】矩阵求导、几种重要的矩阵及常用的矩阵求导公式
一.矩阵求导一般来讲,我们约定x=(x1,x2,-xN)Tx=(x1,x2,-xN)T,这是分母布局.常见的矩阵求导方式有:向量对向量求导,标量对向量求导,向量对标量求导. 1.向量对向量求导 Nu ...
矩阵求导（一）-- 求导的定义和布局约定
本系列主要参考张贤达的<矩阵分析与应用>第三章矩阵微分和下面的博客内容进行学习,并整理成学习笔记.学习路线参考SinclairWang的文章--矩阵求导入门学习路线,按下面推荐顺序学习, ...
矩阵求导常用公式（避坑）+矩阵的模和矩阵的绝对值的求导
目录矩阵求导常用公式 1.分母布局与分子布局 2.分母布局与分子布局的矩阵求导公式 (1)向量对向量求导 (2).标量对向量求导 (3).向量对标量求导 3.验证求导结果矩阵的模和矩阵的绝对值的求 ...
关于矩阵求导的理解与计算方法
前言我今年大四即将毕业,毕设是深度学习相关,在进行理论学习时,一度对矩阵微分感到困惑,本科学习期间没接触过这个(软件工程专业...),网上资料也很零散,在<神经网络与深度学习>的数学基础 ...

矩阵求导——Numerator Layout Denominator Layout

矩阵求导——Numerator Layout Denominator Layout相关推荐

最新文章

热门文章