UFLDL教程：Exercise:PCA in 2D PCA and Whitening

PCA的原理及MATLAB实现

UFLDL教程：Exercise:PCA in 2D & PCA and Whitening

python-A comparison of various Robust PCA implementations

Deep Learning and Unsupervised Feature Learning Tutorial Solutions

统计学的基本概念

统计学里最基本的概念就是样本的均值、方差、标准差。首先，我们给定一个含有n个样本的集合，下面给出这些概念的公式描述：

均值：

标准差：

方差：

均值描述的是样本集合的中间点，它告诉我们的信息是有限的，而标准差给我们描述的是样本集合的各个样本点到均值的距离之平均。标准差描述的是“散布度”。

在概率论和统计学中，一个随机变量的方差描述的是它的离散程度，也就是该变量离其期望值的距离。一个实随机变量的方差也称为它的二阶矩或二阶中心动差，恰巧也是它的二阶累积量。

这里的n为样本个数。

某个变量的方差越大,不确定度越大,就信息论而言,其包含的信息越大.

标准差和方差一般是用来描述一维数据的。

协方差:

协方差（Covariance）在概率论统计学中用于衡量两个变量的总体误差。

这里的n为样本个数。

协方差的结果有什么意义呢？如果结果为正值，则说明两者是正相关的（从协方差可以引出“相关系数”的定义），也就是说一个人越猥琐越受女孩欢迎。如果结果为负值，就说明两者是负相关，越猥琐女孩子越讨厌。如果为0，则两者之间没有关系，猥琐不猥琐和女孩子喜不喜欢之间没有关联，就是统计上说的“相互独立”。

协方差的性质

备注

协方差矩阵中的每一个元素是表示的随机向量X的不同分量之间的协方差，而不是不同样本之间的协方差.

样本的每一维都是有用信息,都与其他维线性无关,也就少了冗余信息,也就是两两的协方差为0.

协方差矩阵的对角元素是变量X自身的方差.

PCA

PCA的具有2个功能,一是维数约简（可以加快算法的训练速度，减小内存消耗等），一是数据的可视化。

PCA的直观目标：

要寻找一个线性变换,使得变换后变量两两线性无关,能量集中到较少的几个变量中,并且按照大小重新排列.,变换后,我们可以相应地舍去后面几个能量小的分量,达到降维的目的.

在使用PCA前需要对数据进行预处理，首先是均值化，即对每个特征维，都减掉该维的平均值，然后就是将不同维的数据范围归一化到同一范围，方法一般都是除以最大值。
但是比较奇怪的是，在对自然图像进行均值处理时并不是不是减去该维的平均值，而是减去这张图片本身的平均值。
因为PCA的预处理是按照不同应用场合来定的。

PCA的计算过程主要是要求2个东西，一个是降维后的各个向量的方向，另一个是原先的样本在新的方向上投影后的值。

由于相邻像素间的相关性，PCA算法可以将输入向量转换为一个维数低很多的近似向量，而且误差非常小。

　　首先需求出训练样本的协方差矩阵，如公式所示（输入数据已经均值化过）：

求出训练样本的协方差矩阵后，将其进行SVD分解，得出的U向量中的每一列就是这些数据样本的新的方向向量了，排在前面的向量代表的是主方向，依次类推。用U’*X得到的就是降维后的样本值z了，即：

备注

在使用有监督学习时，如果要采用PCA降维，那么只需将训练样本的x值抽取出来，计算出主成分矩阵U以及降维后的值z，然后让z和原先样本的y值组合构成新的训练样本来训练分类器。在测试过程中，同样可以用原先的U来对新的测试样本降维，然后输入到训练好的分类器中即可。

PCA并不能阻止过拟合现象。表明上看PCA是降维了，因为在同样多的训练样本数据下，其特征数变少了，应该是更不容易产生过拟合现象。但是在实际操作过程中，这个方法阻止过拟合现象效果很小，主要还是通过规则项来进行阻止过拟合的。

并不是所有ML算法场合都需要使用PCA来降维，因为只有当原始的训练样本不能满足我们所需要的情况下才使用，比如说模型的训练速度，内存大小，希望可视化等。如果不需要考虑那些情况，则也不一定需要使用PCA算法了。

Whitening

Whitening的目的是去掉数据之间的相关联度，是很多算法进行预处理的步骤。
比如说当训练图片数据时，由于图片中相邻像素值有一定的关联，所以很多信息是冗余的。这时候去相关的操作就可以采用白化操作。
数据的whitening必须满足两个条件：

一、是不同特征间相关性最小，接近0；
二、是所有特征的方差相等（不一定为1）。

常见的白化操作有PCA whitening和ZCA whitening。

PCA whitening是指将数据x经过PCA降维为z后，可以看出z中每一维是独立的，满足whitening白化的第一个条件，这是只需要将z中的每一维都除以标准差就得到了每一维的方差为1，也就是说方差相等。公式为：

白化与降维相结合

如果你想要得到经过白化后的数据，并且比初始输入维数更低,可以仅保留中前k 个成分。当我们把PCA白化和正则化结合起来时，中最后的少量成分将总是接近于0，因而舍弃这些成分不会带来很大的问题.

正则化

实践中需要实现PCA白化或ZCA白化时，有时一些特征值在数值上接近于0，这样在缩放步骤时我们除以将导致除以一个接近0的值；这可能使数据上溢 (赋为大数值)或造成数值不稳定。因而在实践中，我们使用少量的正则化实现这个缩放过程，即在取平方根和倒数之前给特征值加上一个很小的常数：

当 x 在区间 [-1,1] 上时, 一般取值为。

对图像来说, 这里加上，对输入图像也有一些平滑(或低通滤波)的作用。这样处理还能消除在图像的像素信息获取过程中产生的噪声，改善学习到的特征。

ZCA whitening是指数据x先经过PCA变换为z，但是并不降维，因为这里是把所有的成分都选进去了。这是也同样满足whtienning的第一个条件，特征间相互独立。然后同样进行方差为1的操作，最后将得到的矩阵左乘一个特征向量矩阵U即可。

ZCA白化，就是在PCA白化的基础上做了一个旋转.
　　ZCA whitening公式为：

PCA Whitening是保证数据各维度的方差为1，而ZCA Whitening是保证数据各维度的方差相等即可，不一定要为1。

并且这两种whitening的一般用途也不一样，PCA Whitening主要用于降维且去相关性，而ZCA Whitening主要用于去相关性，且尽量保持原数据的维数。

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

备注

1. PCA只有在x各行均值为0的前提下才成立（每个维度的均值为0）。其中x为n*m矩阵，每列为n维特征，每行表示m个样本。

其实PCA的推导过程中是要求x各行均值为0的，而对于某自然图像数据x，其各行的均值天生就接近0的，所以就算不把x各行均值归0也可以使用PCA。
其实pca_2d中的数据虽然不是图像数据，其各行均值也恰好接近0，所以就算不把各行的均值归0也可以（勉强）用PCA，但是理论上是应该各行均值归0的。

2. 协方差矩阵sigma可以用sigma = x * x’ / size(x, 2)来估计其实是在均值为0的条件下才成立

3. 不同的数据均值化（2d数据和图像）

针对的图像数据的均值归0：（每个像素减去的是整个图像的均值）avg = mean(x, 1); % 对行求和再平均，使各列均值归0。  x = x - repmat(avg, size(x, 1), 1);  针对2d数据的均值归0：（均值为每个行的均值，也就是每个维度的均值。x为n行m列，n为维度，m个样本）mean_x=mean(x,2); % 对列求和再平均，使各行均值归0。  x=x-repmat(mean_x,1,size(x,2));

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

pca_2d MATLAB实现

close all%%================================================================
%% Step 0: Load data
%  We have provided the code to load data from pcaData.txt into x.
%  x is a 2 * 45 matrix, where the kth column x(:,k) corresponds to
%  the kth data point.Here we provide the code to load natural image data into x.
%  You do not need to change the code below.x = load('pcaData.txt','-ascii');
figure(1);
scatter(x(1, :), x(2, :));
title('Raw data');%%================================================================
%% Step 1a: Implement PCA to obtain U
%  Implement PCA to obtain the rotation matrix U, which is the eigenbasis
%  sigma. % -------------------- YOUR CODE HERE --------------------
u = zeros(size(x, 1)); % You need to compute this
[n m] = size(x);
%x = x-repmat(mean(x,2),1,m);%预处理，均值为0,针对2d数据
sigma = (1.0/m)*x*x';
[u s v] = svd(sigma);% --------------------------------------------------------
hold on
plot([0 u(1,1)], [0 u(2,1)]);
plot([0 u(1,2)], [0 u(2,2)]);
scatter(x(1, :), x(2, :));
hold off%%================================================================
%% Step 1b: Compute xRot, the projection on to the eigenbasis
%  Now, compute xRot by projecting the data on to the basis defined
%  by U. Visualize the points by performing a scatter plot.% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
xRot = u'*x;% -------------------------------------------------------- % Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure(2);
scatter(xRot(1, :), xRot(2, :));
title('xRot');%%================================================================
%% Step 2: Reduce the number of dimensions from 2 to 1.
%  Compute xRot again (this time projecting to 1 dimension).
%  Then, compute xHat by projecting the xRot back onto the original axes
%  to see the effect of dimension reduction% -------------------- YOUR CODE HERE --------------------
k = 1; % Use k = 1 and project the data onto the first eigenbasis
xHat = zeros(size(x)); % You need to compute this
xHat = u(:,1:k)*(u(:,1:k)'*x);%这样写是为了使特征点落在特征向量所指的方向上而不是原坐标系上
% --------------------------------------------------------
figure(3);
scatter(xHat(1, :), xHat(2, :));
title('xHat');%%================================================================
%% Step 3: PCA Whitening
%  Complute xPCAWhite and plot the results.epsilon = 1e-5;
% -------------------- YOUR CODE HERE --------------------
xPCAWhite = zeros(size(x)); % You need to compute this
xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x;% --------------------------------------------------------
figure(4);
scatter(xPCAWhite(1, :), xPCAWhite(2, :));
title('xPCAWhite');%%================================================================
%% Step 3: ZCA Whitening
%  Complute xZCAWhite and plot the results.% -------------------- YOUR CODE HERE --------------------
xZCAWhite = zeros(size(x)); % You need to compute this
xZCAWhite = u*diag(1./sqrt(diag(s)+epsilon))*u'*x;% --------------------------------------------------------
figure(5);
scatter(xZCAWhite(1, :), xZCAWhite(2, :));
title('xZCAWhite');%% Congratulations! When you have reached this point, you are done!
%  You can now move onto the next PCA exercise. :)

PCA and Whitening MATLAB实现

第0步：数据准备

UFLDL下载的文件中，包含数据集IMAGES_RAW，它是一个512*512*10的矩阵，也就是10幅512*512的图像

（a）载入数据

利用sampleIMAGESRAW函数，从IMAGES_RAW中提取numPatches个图像块儿，每个图像块儿大小为patchSize，并将提取到的图像块儿按列存放，分别存放在在矩阵patches的每一列中，即patches(:,i)存放的是第i个图像块儿的所有像素值

（b）数据去均值化处理

将每一个图像块儿的所有像素值都减去该图像块儿的平均像素值，实现数据的去均值化

第一步：执行PCA

该部分分为两部分

（1）进行PCA计算，这里仅仅对数据x进行旋转得到xrot，而不进行主成分的提取

（2）对旋转后的数据求解协方差矩阵covar，并将其可视化，观察得到的选择后的数据是否正确

PCA保证选择后的数据的协方差矩阵是一个对角阵，如果covar是正确的

那么它的图像应该是一个蓝色背景，并且在对角线位置有一斜线.

第二步：满足条件的主成分个数

本部分，找到满足条件的主成分的个数k

也就是找到最小的k值，使得(λ1+…+ λk)/(λ1+…+ λn)>某个百分数，如99%

第三步：利用找到的主成分个数，对数据进行降维

在第二步，已经找到了数字k，也就是，保留数据的k个主成分就满足了要求

在该步，将对数据x进行降维，只留下k个主成分，得到xTidle

同时，为了观察降维后的数据的好坏，在利用U(:,k)将降维后的数据变换会原来的维数，也就是得到了原数据的近似恢复数据

并利用网格将恢复出的图像显示出，与原图像进行比较.

第四步：PCA白化+正则化

该部分分为两步

（1）执行具有白化和正则化的PCA

首先，对数据进行旋转（利用特征矩阵U）
然后，利用特征值对旋转后的数据进行缩放，实现白化
同时，在利用特征值缩放时，利用参数ε对特征值进行微调，实现正则化

（b）计算百化后的数据的协方差矩阵，观察该协方差矩阵

如果加入了正则化项，则该协方差矩阵的对角线元素都小于1

如果没有加入正则项（即仅有旋转+白化），则该协方差矩阵的对角线元素都为1（实际上，是令ε为一个极小的数）

第五步：ZCA白化

ZCA白化，就是在PCA白化的基础上做了一个旋转，即

%%================================================================
%% Step 0a: Load data
%  Here we provide the code to load natural image data into x.
%  x will be a 144 * 10000 matrix, where the kth column x(:, k) corresponds to
%  the raw image data from the kth 12x12 image patch sampled.
%  You do not need to change the code below.x = sampleIMAGESRAW(); %从IMAGES_RAW中读取一些图像patches
figure('name','Raw images');%显示一个figure，标题为raw images
randsel = randi(size(x,2),200,1); % A random selection of samples for visualization
display_network(x(:,randsel));  %显示随机选取的图像块儿%%================================================================
%% Step 0b: Zero-mean the data (by row)
%  You can make use of the mean and repmat/bsxfun functions.% -------------------- YOUR CODE HERE --------------------
x = x-repmat(mean(x,1),size(x,1),1);%求的是每一列的均值 %x的每一列的所有元素都减去该列的均值，这是针对图像数据。 分别为每个图像块计算像素强度的均值。
%x = x-repmat(mean(x,2),1,size(x,2)); 这是针对2d数据的均值处理。%%================================================================
%% Step 1a: Implement PCA to obtain xRot
%  Implement PCA to obtain xRot, the matrix in which the data is expressed
%  with respect to the eigenbasis of sigma, which is the matrix U.% -------------------- YOUR CODE HERE --------------------
xRot = zeros(size(x)); % You need to compute this
[n m] = size(x);
sigma = (1.0/m)*x*x'; %输入数据的协方差矩阵
[u s v] = svd(sigma); %对协方差矩阵进行特征值分解
xRot = u'*x;% 对数据进行旋转 %%================================================================
%% Step 1b: Check your implementation of PCA
%  The covariance matrix for the data expressed with respect to the basis U
%  should be a diagonal matrix with non-zero entries only along the main
%  diagonal. We will verify this here.
%  Write code to compute the covariance matrix, covar.
%  When visualised as an image, you should see a straight line across the
%  diagonal (non-zero entries) against a blue background (zero entries).% -------------------- YOUR CODE HERE --------------------
covar = zeros(size(x, 1)); % You need to compute this
covar = (1./m)*xRot*xRot';%旋转数据后的数据对应的协方差矩阵% Visualise the covariance matrix. You should see a line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);%%================================================================
%% Step 2: Find k, the number of components to retain
%  Write code to determine k, the number of components to retain in order
%  to retain at least 99% of the variance.% -------------------- YOUR CODE HERE --------------------
k = 0; % Set k accordingly
ss = diag(s);
% for k=1:m
%    if sum(s(1:k))./sum(ss) < 0.99
%        continue;
% end
%其中cumsum(ss)求出的是一个累积向量，也就是说ss向量值的累加值
%并且(cumsum(ss)/sum(ss))<=0.99是一个向量，值为0或者1的向量，为1表示满足那个条件
k = length(ss((cumsum(ss)/sum(ss))<=0.99));% k = 0; % Set k accordingly
% egis=eig(covar)
% egis=sort(egis,'descend')
% for i=1:size(covar,1)
%     if (sum(egis(1:i))/sum(egis)>0.99)
%          k=i
%         break;
%     end
% end%%================================================================
%% Step 3: Implement PCA with dimension reduction
%  Now that you have found k, you can reduce the dimension of the data by
%  discarding the remaining dimensions. In this way, you can represent the
%  data in k dimensions instead of the original 144, which will save you
%  computational time when running learning algorithms on the reduced
%  representation.
%
%  Following the dimension reduction, invert the PCA transformation to produce
%  the matrix xHat, the dimension-reduced data with respect to the original basis.
%  Visualise the data and compare it to the raw data. You will observe that
%  there is little loss due to throwing away the principal components that
%  correspond to dimensions with low variation.% -------------------- YOUR CODE HERE --------------------
% 对数据进行降维
xTidle=u(:,1:k)'*x;
% 利用降维后的数据xTidle对数据进行恢复
xHat = zeros(size(x));  % You need to compute this
xHat = u*[u(:,1:k)'*x;zeros(n-k,m)];% Visualise the data, and compare it to the raw data
% You should observe that the raw and processed data are of comparable quality.
% For comparison, you may wish to generate a PCA reduced image which
% retains only 90% of the variance.figure('name',['PCA processed images ',sprintf('(%d / %d dimensions)', k, size(x, 1)),'']);
display_network(xHat(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));%%================================================================
%% Step 4a: Implement PCA with whitening and regularisation
%  Implement PCA with whitening and regularisation to produce the matrix
%  xPCAWhite. epsilon = 0.1;
xPCAWhite = zeros(size(x));% -------------------- YOUR CODE HERE --------------------
xPCAWhite = diag(1./sqrt(diag(s)+epsilon))*u'*x;
figure('name','PCA whitened images');
display_network(xPCAWhite(:,randsel));%%================================================================
%% Step 4b: Check your implementation of PCA whitening
%  Check your implementation of PCA whitening with and without regularisation.
%  PCA whitening without regularisation results a covariance matrix
%  that is equal to the identity matrix. PCA whitening with regularisation
%  results in a covariance matrix with diagonal entries starting close to
%  1 and gradually becoming smaller. We will verify these properties here.
%  Write code to compute the covariance matrix, covar.
%
%  Without regularisation (set epsilon to 0 or close to 0),
%  when visualised as an image, you should see a red line across the
%  diagonal (one entries) against a blue background (zero entries).
%  With regularisation, you should see a red line that slowly turns
%  blue across the diagonal, corresponding to the one entries slowly
%  becoming smaller.% -------------------- YOUR CODE HERE --------------------
covar = (1./m)*xPCAWhite*xPCAWhite';% Visualise the covariance matrix. You should see a red line across the
% diagonal against a blue background.
figure('name','Visualisation of covariance matrix');
imagesc(covar);%%================================================================
%% Step 5: Implement ZCA whitening
%  Now implement ZCA whitening to produce the matrix xZCAWhite.
%  Visualise the data and compare it to the raw data. You should observe
%  that whitening results in, among other things, enhanced edges.xZCAWhite = zeros(size(x));% -------------------- YOUR CODE HERE --------------------
xZCAWhite = u*xPCAWhite;%ZCA白化即在PCA白化基础上做了一个旋转
% Visualise the data, and compare it to the raw data.
% You should observe that the whitened images have enhanced edges.
figure('name','ZCA whitened images');
display_network(xZCAWhite(:,randsel));
figure('name','Raw images');
display_network(x(:,randsel));

参考文献

Deep learning：十(PCA和whitening)

UFLDL教程答案(3):Exercise:PCA_in_2D&PCA_and_Whitening

UFLDL教程之（三）PCA and Whitening exercise

主成分分析

白化

深入理解PCA

PCA的原理及MATLAB实现

浅谈协方差矩阵

再谈协方差矩阵之主成分分析

吴恩达 Andrew Ng 的公开课