Learning the parts of object by NMF
本文为Letters to nature上文章Learning the parts of objects by non-negativematrix factorization的读书笔记,针对如何基于NMF在神经网络中学习一个object的各层part做出理论上的分析,并在人脸part学习和text语义特征学习上做了相应实验。本文不含如何去解NMF,只给出非负约束下矩阵分解的结果。
Learning the parts of object by NMF
Rachel Zhang
1. Theoretical basis and Motivation
1. Part-based representation
There is psychological and physiological evidence for parts-based representations in the brain, and certaincomputational theories of object recognition rely on such representations.
2. What is NMF? Why use NFM in NeuralNetwork?
NMF: non-negative matrix factorization
Difference: PCA, VQ learn holistic, not part-based representations, NMFdifferent from them via non-negative constrains.
Virtue in Neural Network:
1. Firing rates(number of spikes in a window) of neurons are nevernegative.
2. Synaptic strengths do not changesign.
2. Applied Experiments
3. Applied result of PCA, VQ and NMF
Figure 1 shows the 3 methods learn to represent a face as a linearcombination of basis images.
VQ: discovers a basisconsisting of whole-face prototypes.
PCA: discovers a basis of ‘eigenfaces’,some of which resemble distorted versions of wholefaces.
NMF: discovers a basis consisting localized features that correspond better withintuitive notions of the parts of faces.
Figure 1.Basis of NMF, VQ and PCA
对于图中的encoding,红色表示负数,灰黑表示正数,颜色程度表示大小。
4. Matrix Factorization framework
为什么NMF会得出与PCA和VQ迥然不同的基呢?我们这里将这三种方法展示在Matrix factorizationframework里,首先看一下这个框架。图像数据库用n*m的矩阵V表示,每列包括m张人脸图中一张的n个非负像素值。这三种方法构造近似矩阵分解:V≈WH,or
Viu≈(WH)iu≈ΣaWiaHau
W∈Rn*r中的r列是basis images. H∈Rr*m中的每一列是编码,对应V中的一张脸。秩r的选择遵循(n+m)r<nm。下面说说VQ,PCA和NMF的区别。
5. Difference in MF framework
In VQ,each column in H is constrained to be a unary vector(一个非0元),也就是说每张脸只由一个基进行估计。PCA 在VQ的基础上做了松弛, it constrains the column of W to be orthonormal and the rows of H to beorthogonal to each other. 这样的松弛就允许了每张脸可以由一些基图像(即eigenface)通过线性组合生成了。尽管eigenfaces在统计上有最大variance的解释,但由于PCA对矩阵W和H的赋值是任意sign的,所以从视觉角度去想还不是很说得通。
与一元向量约束的VQ不同,非负矩阵分解约束允许多个基图像组成一张人脸,但由于约束了W和H都非负,所以只允许增加基(不许减少)。所以基于NMF的方法可以构造出一个parts-based representation,这刚好符合了将不同parts组成一个整体的直觉想法。从图1中就可以看出,NMF的基和编码包含了一个大片0系数,或者说basis和encoding都是稀疏的。Basis的稀疏性源于基是局部信息(嘴巴、鼻子和其他人脸部分)在不同位置、形态的版本;由于不同人脸对这些versionsof faces’ parts 进行了组合、重用,所以encoding也是稀疏的,这和unaryencoding的VQ与fully distributed PCA都有很大不同。
6. NMF的优化函数
NMF优化目标函数:
该函数中,Viu表示一个pixel,它由加入泊松噪声的(WH)iu而得。因此目标函数实际上表示由基W和编码H产生图像V的概率。通过update算法,F将达到一个局部极大值。关于update的具体说明请见[2,3].
Figure 3. Iterativeupdate algorithm for NMF
图3中为updata算法,其单调收敛可以通过类似EM算法的收敛证明。更新算法保证了W和H的非负性及正交性(和W的标准正则性)。
7. image pixels和编码变量的依赖关系:
Figure 4. Probabilistichidden variables model underlying NMF
图4这个网络描述了底层可见变量Vi如何由生成高层隐变量生成。根据这个模型,Vi由ΣaWiaHa生成。或者说Ha对Vi的影响可以用权值Wia表示。在人脸图像应用中(右图),Vi就是V的第i列像素(底层的一个单元),隐变量(顶层)Ha(H的第a列)为第a个part-based基图像(中层,如某特定形态的眼睛)的编码. 给定a,W1a…Wna就构成了一个特定basis image(中层),和其他basis images一起构成了一个完整的人脸图像。
PS:[1]illustrate the versatility of NMF by applying NMF to the semantic analysis oftext documents. Refer to [1] ,P3, left part.
3. Method Analysis
8. 用NMF处理复杂问题<重点分析>
尽管NMF在人脸识别和语义topic分析中比较成功,但这不表示它可以用于任何数据,比如采集到非固定点拍摄的图像,或者高清晰物体就不适合用NMF做。对于这种复杂问题part学习的处理,就需要一个多层隐变量的结构模型(类似DL),而不像NMF中只用一层表示隐变量。另外,尽管非负这个约束可以进行part-based representation的学习,它们在编码的相关性方面也是有不足的,NMF只约束了W和H的非负性(这是唯一先验,只要求满足这个),而没有考虑到对于该先验,H内部元素间的相关性。比如,第一列第5幅,跟第4列最后一幅,那两列是有重合的地方:眼睛。它们线性组合表达人脸的时候,h对应这两位置的码应该相互间“有所协调”,否则就“重复”了。但是NMF不考虑的那么复杂,简单起见,假设他们不相关了。考虑的话是更复杂,更完善的模型,已有不少工作就是在这样方向上的改进,包括sparse的|X|1,|X|0。
需要注意的是,这样无相关性仅仅是先验的无相关性,在v=Wh的或min |v-Wh|约束下h内部还是相互影响,相互制约的。有人叫你用W表达一幅人脸,但他先不给你看到那幅人脸,人脸V=Wh,表达人脸的码只事先要求是非负的,没有其它要求。所以你知道了客户没有关心h内部元素相关性,在先验里头它们是相互独立的。然后,他把人脸给你了,要满足你写的约束,这里h内部要相互协调线性组合出V来,所以约束下(而非先验下)还是有相关性的。
9. ICA
ICA是PCA的变种,假设隐变量相互独立且非高斯分布。将ICA用于人脸图像学出来的基是整体的(类似PCA)。ICA的独立假设不适于part-basedrepresentation学习,因为很多部分可能会同时出现,假设h内部元素独立就不能表示比较复杂的相关性。
10.神经科学上的应用总结
1). The consequence of non-negative constraints is that synapses are either excitatory or inhibitory,but do not change sign.
2). In neural science, the non-negativity of the hidden and visible variables corresponds to the physiological fact that firing rates of neurons cannot be negative.
3). One-sided constraints on neural activity and synaptic strengths in the brain may be important for developing sparsely distributed, parts-based representations for perception.
这里解释下第一条,在神经元产生的时候,excitatory为正,inhibitory为0(而不为负)。因为如果刺激成功,产生spike,否则正常波动firing=0。所以在非负约束下H要么>0要么=0,不改变sign,使得NMF适用于神经信号firing rate矩阵分解。
Appendix - Code: implemented in MATLAB
1. Gradient Descent. 这个代码是台湾大学林智仁老师写的。
function [W,H] = nmf(V,Winit,Hinit,tol,timelimit,maxiter)% NMF by alternative non-negative least squares using projected gradients
% Author: Chih-Jen Lin, National Taiwan University% W,H: output solution
% Winit,Hinit: initial solution
% tol: tolerance for a relative stopping condition
% timelimit, maxiter: limit of time and iterationsW = Winit; H = Hinit; initt = cputime;gradW = W*(H*H') - V*H'; gradH = (W'*W)*H - W'*V;
initgrad = norm([gradW; gradH'],'fro');
fprintf('Init gradient norm %f\n', initgrad);
tolW = max(0.001,tol)*initgrad; tolH = tolW;for iter=1:maxiter,% stopping conditionprojnorm = norm([gradW(gradW<0 | W>0); gradH(gradH<0 | H>0)]);if projnorm < tol*initgrad | cputime-initt > timelimit,break;end[W,gradW,iterW] = nlssubprob(V',H',W',tolW,1000); W = W'; gradW = gradW';if iterW==1,tolW = 0.1 * tolW;end[H,gradH,iterH] = nlssubprob(V,W,H,tolH,1000);if iterH==1,tolH = 0.1 * tolH; endif rem(iter,10)==0, fprintf('.'); end
end
fprintf('\nIter = %d Final proj-grad norm %f\n', iter, projnorm);function [H,grad,iter] = nlssubprob(V,W,Hinit,tol,maxiter)% H, grad: output solution and gradient
% iter: #iterations used
% V, W: constant matrices
% Hinit: initial solution
% tol: stopping tolerance
% maxiter: limit of iterationsH = Hinit; WtV = W'*V; WtW = W'*W; alpha = 1; beta = 0.1;
for iter=1:maxiter, grad = WtW*H - WtV;projgrad = norm(grad(grad < 0 | H >0));if projgrad < tol,breakend% search step size for inner_iter=1:20,Hn = max(H - alpha*grad, 0); d = Hn-H;gradd=sum(sum(grad.*d)); dQd = sum(sum((WtW*d).*d));suff_decr = 0.99*gradd + 0.5*dQd < 0;if inner_iter==1,decr_alpha = ~suff_decr; Hp = H;endif decr_alpha, if suff_decr,H = Hn; break;elsealpha = alpha * beta;endelseif ~suff_decr | Hp == Hn,H = Hp; break;elsealpha = alpha/beta; Hp = Hn;endendend
endif iter==maxiter,fprintf('Max iter in nlssubprob\n');
end
2. Multiplier Update
multiplier update对应上面本文提到的更新方程。
function [A, S] = nmf(X, K, type)
NMF Nonnegative Matrix Factorization.[A, S] = NMF(X, K, TYPE) performs nonnegative matrix factorization on the
nonnegative M-by-N matrix X using K components. The M-by-K matrix A and
the K-by-N matrix S are computed such that a divergence between X and A*S
is minimized while preserving element-wise nonnegativity of both
matrices.[A, S] = NMF(X, K, 'euc') uses the Euclidean distance.
[A, S] = NMF(X, K, 'kl') uses the Kullback-Leibler divergence.
[A, S] = NMF(X, K, 'is') uses the Itakura-Saito divergence.Author: Steve Tjoa
Institution: University of Maryland (Signals and Information Group)
Created: July 1, 2009
Last modified: July 2, 2009This code was written during the workshop on Music Information Retrieval
at the Center for Computer Research in Music and Acoustics (CCRMA) at
Stanford University.Initialize parameters.
maxiter = 100;
[M, N] = size(X);
O = ones(M, N);
c = 1; % safety parameter% Initialize outputs.
A = rand(M, K);
S = rand(K, N);if strcmp(type, 'euc')for iter=1:maxiter% Euclidean distanceA = A.*(X*S' + c)./(A*(S*S') + c);S = S.*(A'*X + c)./((A'*A)*S + c);[A, S] = rescaledict(A, S);endelseif strcmp(type, 'kl')for iter=1:maxiter% KL DivergenceA = A.*((X./(A*S))*S' + c)./(O*S' + c);S = S.*(A'*(X./(A*S)) + c)./(A'*O + c);[A, S] = rescaledict(A, S);endelseif strcmp(type, 'is')for iter=1:maxiter% IS DivergenceA = A.*((X./(A*S).^2)*S' + c)./((1./(A*S))*S' + c);S = S.*(A'*(X./(A*S).^2) + c)./(A'*(1./(A*S)) + c);[A, S] = rescaledict(A, S);end
end
function [A, S] = rescaledict(A,S)
% RESCALEDICT Rescale dictionary.
%
% Author: Steve Tjoa
% Institution: University of Maryland (Signals and Information Group)
% Created: July 1, 2009
% Last modified: July 2, 2009
%
% This code was written during the workshop on Music Information Retrieval
% at the Center for Computer Research in Music and Acoustics (CCRMA) at
% Stanford University.if nargin==2K = size(A,2);for k=1:Kg = norm(A(:,k));A(:,k) = A(:,k)./g;S(k,:) = S(k,:).*g;end
end
end
11. Reference
[1]. Daniel D.Lee* & H. Sebastian Seung* Learningthe parts of objects by Non-negative Matrix Factorization
[2]. Daniel D.Lee* & H. Sebastian Seung* Algorithms for Non-negative Matrix Factorization
[3]. Chih-Jen Lin On the Convergence of Multiplicative Update Algorithms for Non-negative Matrix Factorization
[4]. Phoyer U. NMF with sparse constraints. 论文中matlab代码实现 和 这只牛的主页。
本文尚不成熟,希望大家提出宝贵意见。
关于Machine Learning更多的学习资料与相关讨论将继续更新,敬请关注本博客和新浪微博Rachel____Zhang.
Learning the parts of object by NMF相关推荐
- 目标定位--Deep Self-Taught Learning for Weakly Supervised Object Localization
Deep Self-Taught Learning for Weakly Supervised Object Localization CVPR 2017 https://arxiv.org/abs/ ...
- 文献学习(part72)--Learning the parts of objects by non-negative matrix factorization
学习笔记,仅供参考,有错必纠 文章目录 Learning the parts of objects by non-negative matrix factorization 摘要 正文 Learnin ...
- Learning to Track with Object Permanence阅读笔记
Learning to Track with Object Permanence阅读笔记 (一) Tilte (二) Summary (三) Research Object (四) Problem S ...
- Assisted Excitation of Activations:A Learning Technique to Improve Object Detectors论文解读
Assisted Excitation of Activations:A Learning Technique to Improve Object Detectors 这是cvpr2019上的一篇文章 ...
- Zero-shot Learning零样本学习 论文阅读(一)——Learning to detect unseen object classes by between-class attribute
Zero-shot Learning零样本学习 论文阅读(一)--Learning to detect unseen object classes by between-class attribute ...
- 【论文阅读笔记】Learning To Detect Unseen Object Classes by Between-Class Attribute Transfer
摘要: 本文主要研究训练和测试类别不相交时(即没有目标类别的训练示例)的对象分类问题.在此之前并没有对于毫无关联的训练集和测试集进行对象检测的工作,只是对训练集所包含的样本进行分类.实验表明,通过使用 ...
- 论文笔记(十一):Learning to Track with Object Permanence
Learning to Track with Object Permanence 文章概括 摘要 1. 介绍 2. 相关工作 3. 研究方法 3.1 背景 3.2 用于跟踪的视频级模型 3.3. 学会 ...
- ICCV2017——CoupleNet: Coupling Global Structure with Local Parts for Object Detection
CoupleNet: Coupling Global Structure with Local Parts for Object Detection 学习CoupleNet,这个网络结构就是使用全局上 ...
- PIXELHOP : A SUCCESSIVE SUBSPACE LEARNING (SSL) METHOD FOR OBJECT CLASSIFICATION
本文介绍了一种新的机器学习方法,称为连续子空间学习(SSL).以下我就挑重点的说,然后结合代码讲解一下: SSL包含四个关键成分: 1) 连续的近远邻域扩张 2) 通过子空间逼近,实现无监督降维 3) ...
最新文章
- 淘宝文件系统大文件结构
- mysql创建库和表确保utf8_mysql创建utf8数据库
- java实验四 集合与函数式编程实验
- Android实现简单短信发送器
- 算法学习之路|蒜头君的新游戏1
- mySQL危险命令_MYSQL教程Linux系统中最危险的10条命令
- 922. 按奇偶排序数组 II
- android space边框,Android中的图片视图的边框?
- Mybatis配置映射文件中parameterType的用法小结
- ai-人工智能的本质和未来_带有人工智能的动画电子设备-带来难以想象的结果...
- GBDT算法参数详解
- 计算机联锁设备的组成及原理,铁路信号计算机联锁系统及技术分析
- windows逻辑卷扩容_Windows 动态磁盘卷:简单卷、跨区卷 、带区卷 、镜像卷 、RAID5卷 相关配置操作介绍...
- 11.[STM32]PWM脉宽调制-呼吸灯
- wordpress主题html5,Salutation 强大的HTML5商务WordPress主题集成BuddyPress[更新至v3.0.7]
- Java Lambda 多级分组
- PCB设计FAQ集锦
- 收银机打印数据截取_既然没有苹果系统,那收银机选安卓还是Windows好?
- 暴风影音3.11.06.09手动去广告方法 【转】
- 希尔伯特黄变换python实现
热门文章
- 订阅号Scope 参数错误或没有 Scope 权限 redirect_uri 参数错误
- 关于iOS 11 tableView自动布局懵逼的事
- android killer去除火柴人广告
- Caesar(恺撒)密码加密解密-----Java实现
- 标记:伪原创标题思路
- LDA(线性判别分析(普通法))详解 —— python
- Font Helvetica, Times-Roman is not embedded - Latex字体未嵌入非典型解决方案
- 英语口语254之每日十句口语
- 艾德思奇刚做的架构“大手术”
- 自制文件管理工具介绍