突然想起这个问题了,刚刚看到有人在问某个的公式,自己闷头想了想居然都忘的差不多了,于是乎稍微整理一下供以后参考。 
其实,关于矩阵求导讲的最详细的还是wiki上的页面面http://en.wikipedia.org/wiki/Matrix_calculus#Layout_conventions 
关于矩阵求导,很多地方会有不同的表现形式,说到底是这么一回事,一个m维的向量yy对n维的向量xx求导∂y∂x∂y∂x,得到的结果应该是m乘n还是n乘以m。具体内容可以看wikipedia。 
yy的元素以列的形式布局,xx以行的形式,或是反过来,这就导致了不同的可能性: 
分子布局(numerator layout):根据yy或者xTxT来布局,也叫Jacobian formulation 
分母布局(denominator layout):根据yTyT或者xx来布局,也叫Hessian formulation 
A third possibility sometimes seen is to insist on writing the derivative as ∂y∂x′∂y∂x′, (i.e. the derivative is taken with respect to the transpose of x) and follow the numerator layout. This makes it possible to claim that the matrix is laid out according to both numerator and denominator. In practice this produces results the same as the numerator layout.

When handling the [[gradient]] ∂y∂x∂y∂x and the opposite case ∂y∂x,∂y∂x, we have the same issues. To be consistent, we should do one of the following: 
If we choose numerator layout for ∂y∂x,∂y∂x, we should lay out the [[gradient]] ∂y∂x∂y∂x as a row vector, and ∂y∂x∂y∂x as a column vector. 
If we choose denominator layout for ∂y∂x,∂y∂x, we should lay out the [[gradient]] ∂y∂x∂y∂x as a column vector, and ∂y∂x∂y∂x as a row vector. 
In the third possibility above, we write ∂y∂x′∂y∂x′ and∂y∂x,∂y∂x, and use numerator layout.

Not all math textbooks and papers are consistent in this respect throughout the entire paper. That is, sometimes different conventions are used in different contexts within the same paper. For example, some choose denominator layout for gradients (laying them out as column vectors), but numerator layout for the vector-by-vector derivative ∂y∂x.∂y∂x.

Similarly, when it comes to scalar-by-matrix derivatives ∂y∂X∂y∂X and matrix-by-scalar derivatives ∂Y∂x,∂Y∂x, then consistent numerator layout lays out according to ”YY”’ and ‘XTXT”, while consistent denominator layout lays out according to ”YTYT”and ”X”. In practice, however, following a denominator layout for ∂Y∂x,∂Y∂x, and laying the result out according to ”YTYT”, is rarely seen because it makes for ugly formulas that do not correspond to the scalar formulas. As a result, the following layouts can often be found: 
”Consistent numerator layout”, which lays out ∂Y∂x∂Y∂x according to ”Y′Y′’ and ∂y∂X∂y∂X according to ”XTXT”. 
”Mixed layout”, which lays out ∂Y∂x∂Y∂x according to ”YY” and ∂y∂X∂y∂X according to ”’X”’. 
Use the notation ∂y∂XT,∂y∂XT,with results the same as consistent numerator layout.

In the following formulas, we handle the five possible combinations ∂y∂x,∂y∂x,∂y∂x,∂y∂X∂y∂x,∂y∂x,∂y∂x,∂y∂X and∂Y∂x∂Y∂x separately. We also handle cases of scalar-by-scalar derivatives that involve an intermediate vector or matrix. (This can arise, for example, if a multi-dimensional [[parametric curve]] is defined in terms of a scalar variable, and then a derivative of a scalar function of the curve is taken with respect to the scalar that parameterizes the curve.) For each of the various combinations, we give numerator-layout and denominator-layout results, except in the cases above where denominator layout rarely occurs. In cases involving matrices where it makes sense, we give numerator-layout and mixed-layout results. As noted above, cases where vector and matrix denominators are written in transpose notation are equivalent to numerator layout with the denominators written without the transpose.

Keep in mind that various authors use different combinations of numerator and denominator layouts for different types of derivatives, and there is no guarantee that an author will consistently use either numerator or denominator layout for all types. Match up the formulas below with those quoted in the source to determine the layout used for that particular type of derivative, but be careful not to assume that derivatives of other types necessarily follow the same kind of layout.

When taking derivatives with an aggregate (vector or matrix) denominator in order to find a maximum or minimum of the aggregate, it should be kept in mind that using numerator layout will produce results that are transposed with respect to the aggregate. For example, in attempting to find the [[maximum likelihood]] estimate of a [[multivariate normal distribution]] using matrix calculus, if the domain is a ”k”x1 column vector, then the result using the numerator layout will be in the form of a 1x”k” row vector. Thus, either the results should be transposed at the end or the denominator layout (or mixed layout) should be used. 

The results of operations will be transposed when switching between numerator-layout and denominator-layout notation.

=== Numerator-layout notation ===

Using numerator-layout notation, we have:Minka, Thomas P. “Old and New Matrix Algebra Useful for Statistics.” December 28, 2000. [http://research.microsoft.com/en-us/um/people/minka/papers/matrix/]

:

∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn].∂y∂x=[∂y∂x1∂y∂x2⋯∂y∂xn].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢∂y1∂x∂y2∂x⋮∂ym∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y1∂x∂y2∂x⋮∂ym∂x].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y2∂x1⋮∂ym∂x1∂y1∂x2∂y2∂x2⋮∂ym∂x2⋯⋯⋱⋯∂y1∂xn∂y2∂xn⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y1∂x1∂y1∂x2⋯∂y1∂xn∂y2∂x1∂y2∂x2⋯∂y2∂xn⋮⋮⋱⋮∂ym∂x1∂ym∂x2⋯∂ym∂xn].
∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x12⋮∂y∂x1q∂y∂x21∂y∂x22⋮∂y∂x2q⋯⋯⋱⋯∂y∂xp1∂y∂xp2⋮∂y∂xpq⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂X=[∂y∂x11∂y∂x21⋯∂y∂xp1∂y∂x12∂y∂x22⋯∂y∂xp2⋮⋮⋱⋮∂y∂x1q∂y∂x2q⋯∂y∂xpq].

The following definitions are only provided in numerator-layout notation:

∂Y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y11∂x∂y21∂x⋮∂ym1∂x∂y12∂x∂y22∂x⋮∂ym2∂x⋯⋯⋱⋯∂y1n∂x∂y2n∂x⋮∂ymn∂x⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂Y∂x=[∂y11∂x∂y12∂x⋯∂y1n∂x∂y21∂x∂y22∂x⋯∂y2n∂x⋮⋮⋱⋮∂ym1∂x∂ym2∂x⋯∂ymn∂x].
dX=⎡⎣⎢⎢⎢⎢⎢dx11dx21⋮dxm1dx12dx22⋮dxm2⋯⋯⋱⋯dx1ndx2n⋮dxmn⎤⎦⎥⎥⎥⎥⎥.dX=[dx11dx12⋯dx1ndx21dx22⋯dx2n⋮⋮⋱⋮dxm1dxm2⋯dxmn].

===Denominator-layout notation=== 
Using denominator-layout notation, we have:[http://www.colorado.edu/engineering/CAS/courses.d/IFEM.d/IFEM.AppD.pdf]

∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x1∂y∂x2⋮∂y∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y∂x1∂y∂x2⋮∂y∂xn].
∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x].∂y∂x=[∂y1∂x∂y2∂x⋯∂ym∂x].
∂y∂x=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢∂y1∂x1∂y1∂x2⋮∂y1∂xn∂y2∂x1∂y2∂x2⋮∂y2∂xn⋯⋯⋱⋯∂ym∂x1∂ym∂x2⋮∂ym∂xn⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂x=[∂y1∂x1∂y2∂x1⋯∂ym∂x1∂y1∂x2∂y2∂x2⋯∂ym∂x2⋮⋮⋱⋮∂y1∂xn∂y2∂xn⋯∂ym∂xn].
∂y∂X=⎡⎣⎢⎢⎢⎢⎢⎢⎢⎢⎢∂y∂x11∂y∂x21⋮∂y∂xp1∂y∂x12∂y∂x22⋮∂y∂xp2⋯⋯⋱⋯∂y∂x1q∂y∂x2q⋮∂y∂xpq⎤⎦⎥⎥⎥⎥⎥⎥⎥⎥⎥.∂y∂X=[∂y∂x11∂y∂x12⋯∂y∂x1q∂y∂x21∂y∂x22⋯∂y∂x2q⋮⋮⋱⋮∂y∂xp1∂y∂xp2⋯∂y∂xpq].

原文地址:https://blog.csdn.net/lansatiankongxxc/article/details/44992709

矩阵求导——Numerator Layout Denominator Layout相关推荐

  1. 矩阵论思维导图_矩阵求导与矩阵微分

    矩阵求导与矩阵微分 符号定义 ​ 使用大写的粗体字母表示矩阵 ​ 使用小写的粗体字母表示向量 ,这里默认为列向量 ​ 使用小写的正体字母表示标量 需要明白的是,矩阵求导的意义在哪来,我们回想一下函数求 ...

  2. 矩阵求导公式,及MathJax公式编辑

    最近学到线性回归中要用到向量,矩阵求导,所以就搜集了下资料,总结如下: 矩阵求导有两种布局: 分子布局(numerator layout) 分母布局(denominator layout) 下面用向量 ...

  3. [机器学习-数学] 矩阵求导(分母布局与分子布局),以及常用的矩阵求导公式

    一, 矩阵求导 1,矩阵求导的本质 矩阵A对矩阵B求导: 矩阵A中的每一个元素分别对矩阵B中的每个元素进行求导. A1×1A_{1\times1}A1×1​, B1×1B_{1\times1}B1×1 ...

  4. 机器学习中的线性代数之矩阵求导

    前面针对机器学习中基础的线性代数知识,我们做了一个常用知识的梳理.接下来针对机器学习公式推导过程中经常用到的矩阵求导,我们做一个详细介绍. 矩阵求导(Matrix Derivative)也称作矩阵微分 ...

  5. (Math)矩阵求导

    本文地址:http://blog.csdn.net/mounty_fsc/article/details/51583809 前言 本文为维基百科上矩阵微积分部分的翻译内容.本文为原文的翻译与个人总结, ...

  6. 【转载】矩阵求导、几种重要的矩阵及常用的矩阵求导公式

    一.矩阵求导 一般来讲,我们约定x=(x1,x2,-xN)Tx=(x1,x2,-xN)T,这是分母布局.常见的矩阵求导方式有:向量对向量求导,标量对向量求导,向量对标量求导. 1.向量对向量求导 Nu ...

  7. 矩阵求导(一)-- 求导的定义和布局约定

    本系列主要参考张贤达的<矩阵分析与应用>第三章 矩阵微分和下面的博客内容进行学习,并整理成学习笔记.学习路线参考SinclairWang的文章--矩阵求导入门学习路线,按下面推荐顺序学习, ...

  8. 矩阵求导常用公式(避坑)+矩阵的模和矩阵的绝对值的求导

    目录 矩阵求导常用公式 1.分母布局与分子布局 2.分母布局与分子布局的矩阵求导公式 (1)向量对向量求导 (2).标量对向量求导 (3).向量对标量求导 3.验证求导结果 矩阵的模和矩阵的绝对值的求 ...

  9. 关于矩阵求导的理解与计算方法

    前言 我今年大四即将毕业,毕设是深度学习相关,在进行理论学习时,一度对矩阵微分感到困惑,本科学习期间没接触过这个(软件工程专业...),网上资料也很零散,在<神经网络与深度学习>的数学基础 ...

最新文章

  1. Pytorch Lightning 完全攻略!
  2. Future有返回值的线程
  3. 又一次内存分配失败(关于overcommit_memory)
  4. 学习Git的最佳资料
  5. stm32数据手册boot_STM32的ISP下载的原理是什么呢?
  6. APP技巧:手机连接WiFi后,移动数据流量要不要关闭,看完你就懂了!
  7. 第12章[12.8] Spring Boot+Ext JS 实现图形验证码
  8. JAVA:JDK目录结构和文件作用介绍
  9. java怎么播放视频_java 播放视频
  10. 雷电模拟器一直android正在启动,雷电安卓模拟器启动后没反应、无法启动、闪退的3种解决办法-针对2020年4月4号出现的...
  11. 创建链表:头插法与尾插法
  12. 为什么99%的价值投资者最后都会死去?
  13. 基于Android的班级同学录校友录系统app
  14. FastDB 很难得的一篇分析
  15. linux文件目录管理
  16. 丰田增设电池生产线,加快丰田电动化进程
  17. GPU释放显存-----无进程但显存占满解决方法
  18. Python爬虫新手入门教学(十八):爬取yy全站小视频
  19. Python练习题(1)
  20. uniapp 提供的手指事件

热门文章

  1. Python图像处理
  2. oracle12c 配置监听,redhat上oracle 12c配置监听
  3. 矩阵并行加速之NENO与SSE
  4. Asp.Net常用文件【牛腩】
  5. 西门子step7安装注册表删除_西门子的软件如何卸载干净
  6. 路由器有以下几种联网接口
  7. java控制台编译_【java c#】通过控制台编译和运行程序//不依赖IDE
  8. 对话行癫:解密阿里云顶层设计和底层逻辑 1
  9. 僵尸进程和孤儿进程-(转自Anker's Blog)
  10. 大厂面试中HR可能会问到的问题