原文链接:https://www.visiondummy.com/2014/04/geometric-interpretation-covariance-matrix/

A geometric interpretation of the covariance matrix

Introduction

In this article, we provide an intuitive, geometric interpretation of the covariance matrix, by exploring the relation between linear transformations and the resulting data covariance. Most textbooks explain the shape of data based on the concept of covariance matrices. Instead, we take a backwards approach and explain the concept of covariance matrices based on the shape of data.

In a previous article, we discussed the concept of variance, and provided a derivation and proof of the well known formula to estimate the sample variance. Figure 1 was used in this article to show that the standard deviation, as the square root of the variance, provides a measure of how much the data is spread across the feature space.

                                           Figure 1. Gaussian density function. For normally distributed data, 68% of the

samples fall within the interval defined by the mean plus and minus the

standard deviation.

We showed that an unbiased estimator of the sample variance can be obtained by:

However, variance can only be used to explain the spread of the data in the directions parallel to the axes of the feature space. Consider the 2D feature space shown by figure 2:

                                                     Figure 2. The diagnoal spread of the data is captured by the covariance.

For this data, we could calculate the variance  in the x-direction and the variance  in the y-direction. However, the horizontal spread and the vertical spread of the data does not explain the clear diagonal correlation. Figure 2 clearly shows that on average, if the x-value of a data point increases, then also the y-value increases, resulting in a positive correlation. This correlation can be captured by extending the notion of variance to what is called the ‘covariance’ of the data:

For 2D data, we thus obtain  and . These four values can be summarized in a matrix, called the covariance matrix:

If x is positively correlated with y, y is also positively correlated with x. In other words, we can state that . Therefore, the covariance matrix is always a symmetric matrix with the variances on its diagonal and the covariances off-diagonal. Two-dimensional normally distributed data is explained completely by its mean and its  covariance matrix. Similarly, a  covariance matrix is used to capture the spread of three-dimensional data, and a  covariance matrix captures the spread of N-dimensional data.

Figure 3 illustrates how the overall shape of the data defines the covariance matrix:

                                             Figure 3. The covariance matrix defines the shape of the data. Diagonal

spread is captured by the covariance, while axis-aligned spread is captured

by the variance.

Eigendecomposition of a covariance matrix

In the next section, we will discuss how the covariance matrix can be interpreted as a linear operator that transforms white data into the data we observed. However, before diving into the technical details, it is important to gain an intuitive understanding of how eigenvectors and eigenvalues uniquely define the covariance matrix, and therefore the shape of our data.

As we saw in figure 3, the covariance matrix defines both the spread (variance), and the orientation (covariance) of our data. So, if we would like to represent the covariance matrix with a vector and its magnitude, we should simply try to find the vector that points into the direction of the largest spread of the data, and whose magnitude equals the spread (variance) in this direction.

If we define this vector as , then the projection of our data  onto this vector is obtained as , and the variance of the projected data is . Since we are looking for the vector  that points into the direction of the largest variance, we should choose its components such that the covariance matrix  of the projected data is as large as possible. Maximizing any function of the form  with respect to , where  is a normalized unit vector, can be formulated as a so called Rayleigh Quotient. The maximum of such a Rayleigh Quotient is obtained by setting  equal to the largest eigenvector of matrix .

In other words, the largest eigenvector of the covariance matrix always points into the direction of the largest variance of the data, and the magnitude of this vector equals the corresponding eigenvalue. The second largest eigenvector is always orthogonal to the largest eigenvector, and points into the direction of the second largest spread of the data.

Now let’s have a look at some examples. In an earlier article we saw that a linear transformation matrix  is completely defined by its eigenvectors and eigenvalues. Applied to the covariance matrix, this means that:

where  is an eigenvector of , and  is the corresponding eigenvalue.

If the covariance matrix of our data is a diagonal matrix, such that the covariances are zero, then this means that the variances must be equal to the eigenvalues . This is illustrated by figure 4, where the eigenvectors are shown in green and magenta, and where the eigenvalues clearly equal the variance components of the covariance matrix.

                                                                      Figure 4. Eigenvectors of a covariance matrix

However, if the covariance matrix is not diagonal, such that the covariances are not zero, then the situation is a little more complicated. The eigenvalues still represent the variance magnitude in the direction of the largest spread of the data, and the variance components of the covariance matrix still represent the variance magnitude in the direction of the x-axis and y-axis. But since the data is not axis aligned, these values are not the same anymore as shown by figure 5.

                                                                     Figure 5. Eigenvalues versus variance

By comparing figure 5 with figure 4, it becomes clear that the eigenvalues represent the variance of the data along the eigenvector directions, whereas the variance components of the covariance matrix represent the spread along the axes. If there are no covariances, then both values are equal.

Covariance matrix as a linear transformation

Now let’s forget about covariance matrices for a moment. Each of the examples in figure 3 can simply be considered to be a linearly transformed instance of figure 6:

                                                          Figure 6. Data with unit covariance matrix is called white data.

Let the data shown by figure 6 be , then each of the examples shown by figure 3 can be obtained by linearly transforming :

where  is a transformation matrix consisting of a rotation matrix  and a scaling matrix :

These matrices are defined as:

where  is the rotation angle, and:

where  and  are the scaling factors in the x direction and the y direction respectively.

In the following paragraphs, we will discuss the relation between the covariance matrix , and the linear transformation matrix .

Let’s start with unscaled (scale equals 1) and unrotated data. In statistics this is often refered to as ‘white data’ because its samples are drawn from a standard normal distribution and therefore correspond to white (uncorrelated) noise:

                                                          Figure 7. White data is data with a unit covariance matrix.

The covariance matrix of this ‘white’ data equals the identity matrix, such that the variances and standard deviations equal 1 and the covariance equals zero:

Now let’s scale the data in the x-direction with a factor 4:

The data  now looks as follows:

                                                          Figure 8. Variance in the x-direction results in a horizontal scaling.

The covariance matrix  of  is now:

Thus, the covariance matrix  of the resulting data  is related to the linear transformation  that is applied to the original data as follows: , where

However, although equation (12) holds when the data is scaled in the x and y direction, the question rises if it also holds when a rotation is applied. To investigate the relation between the linear transformation matrix  and the covariance matrix  in the general case, we will therefore try to decompose the covariance matrix into the product of rotation and scaling matrices.

As we saw earlier, we can represent the covariance matrix by its eigenvectors and eigenvalues:

where  is an eigenvector of , and  is the corresponding eigenvalue.

Equation (13) holds for each eigenvector-eigenvalue pair of matrix . In the 2D case, we obtain two eigenvectors and two eigenvalues. The system of two equations defined by equation (13) can be represented efficiently using matrix notation:

where  is the matrix whose columns are the eigenvectors of  and  is the diagonal matrix whose non-zero elements are the corresponding eigenvalues.

This means that we can represent the covariance matrix as a function of its eigenvectors and eigenvalues:

Equation (15) is called the eigendecomposition of the covariance matrix and can be obtained using a Singular Value Decomposition algorithm. Whereas the eigenvectors represent the directions of the largest variance of the data, the eigenvalues represent the magnitude of this variance in those directions. In other words,  represents a rotation matrix, while  represents a scaling matrix. The covariance matrix can thus be decomposed further as:

where  is a rotation matrix and  is a scaling matrix.

In equation (6) we defined a linear transformation . Since  is a diagonal scaling matrix, . Furthermore, since  is an orthogonal matrix, . Therefore, . The covariance matrix can thus be written as:

In other words, if we apply the linear transformation defined by  to the original white data  shown by figure 7, we obtain the rotated and scaled data  with covariance matrix . This is illustrated by figure 10:

                               Figure 10. The covariance matrix represents a linear transformation of the original data.

The colored arrows in figure 10 represent the eigenvectors. The largest eigenvector, i.e. the eigenvector with the largest corresponding eigenvalue, always points in the direction of the largest variance of the data and thereby defines its orientation. Subsequent eigenvectors are always orthogonal to the largest eigenvector due to the orthogonality of rotation matrices.

Conclusion

In this article we showed that the covariance matrix of observed data is directly related to a linear transformation of white, uncorrelated data. This linear transformation is completely defined by the eigenvectors and eigenvalues of the data. While the eigenvectors represent the rotation matrix, the eigenvalues correspond to the square of the scaling factor in each dimension.

协方差矩阵的几何解释相关推荐

  1. 统计篇(四)-- 协方差矩阵的理解

    本文将针对协方差矩阵做一个详细的介绍,其中包括协方差矩阵的定义.数学背景与意义.计算公式的推导.几何解释,主要整理自下面两篇博客: peghoty-关于协方差矩阵的理解:http://blog.csd ...

  2. 期望、方差、协方差、协方差矩阵

    原 期望.方差.协方差和协方差矩阵 2018年06月07日 17:10:58 siucaan 阅读数:6231 </div><div class="operating&qu ...

  3. 一文图解卡尔曼滤波(Kalman Filter)

    点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 译者注:这恐怕是全网有关卡尔曼滤波最简单易懂的解释,如果你认真的读 ...

  4. 图解卡尔曼滤波(Kalman Filter)

    背景 关于滤波 首先援引来自知乎大神的解释. "一位专业课的教授给我们上课的时候,曾谈到:filtering is weighting(滤波即加权).滤波的作用就是给不同的信号分量不同的权重 ...

  5. 本征向量、PCA和熵的基础教程

    1. 简介 本页主要以通俗语言和少量数学公式介绍本征向量及其与矩阵之间的关系,并且在此基础上解释协方差.主成分分析和信息熵. 本征向量(eigenvector)一词中的"本征(eigen)& ...

  6. 【AI绘图学习笔记】奇异值分解(SVD)、主成分分析(PCA)

    这节的内容需要一些线性代数基础知识,如果你没听懂本文在讲什么,强烈建议你学习[官方双语/合集]线性代数的本质 - 系列合集 文章目录 奇异值分解 线性变换 特征值和特征向量的几何意义 什么是奇异值分解 ...

  7. 矩阵的特征值、特征向量、特征值分解、奇异值分解之间的关系

    可逆矩阵 A⋅A−1=A−1⋅A=EA\cdot A^{-1}=A^{-1}\cdot A = EA⋅A−1=A−1⋅A=E 矩阵的几何意义是对一组向量进行变换,包括方向和模长的变化.而逆矩阵表示对其 ...

  8. 向量表示,投影,协方差矩阵,PCA

    原文:http://blog.csdn.net/songzitea/article/details/18219237 引言 当面对的数据被抽象为一组向量,那么有必要研究一些向量的数学性质.而这些数学性 ...

  9. 协方差矩阵有什么意义?

    Yining ​ 交易员 740 人赞同了该回答 协方差矩阵实在是太重要了,无论是在计量,金融工程还是随机分析中,我们都会到用到协方差矩阵.其实,这三者都利用了协方差矩阵本身的含义,即随机变量之间的线 ...

最新文章

  1. SVO学习笔记(二)
  2. 【蓝桥java】递归基础之车辆进出栈
  3. SAP HUM 拆包之后的HU号码依旧存在
  4. Kotlin中的Java静态方法等效于什么?
  5. android HDMI 清晰度 分辨率
  6. 对于窗口大小为n个滑动窗口,最多可以有( )帧已发送但没有确认。
  7. 如何检查python的库是否安装成功_如何测试redis是否安装成功
  8. H5 FormData 表单数据对象详解 与 Json 对象相互转换
  9. labview虚拟心电监测系统_虚拟心电监护系统软件设计
  10. 使用Go实现Socket服务端和客户端通信
  11. mate10 android系统,华为Mate10官方固件rom刷机包_华为Mate10完整系统升级包
  12. 视频剪辑mp4parser
  13. 要马儿跑,又要马儿不吃草?聊聊联邦学习与分布式机器学习
  14. 记录mysql in和not in 效率低下的问题
  15. com.android.dx.cf.iface.ParseException
  16. flv.js构建及下载
  17. 能ping通ip但无法ping通域名和localhost //ping: bad address 'www.baidu.com'
  18. android手机安装carplay,安卓系统适用carplay经验分享
  19. 微信3.1.0.41逆向-微信3.1.0.41HOOK接口(WeChatHelper3.1.0.41.dll)使用说明
  20. 解决蓝奏网盘无法正常下载

热门文章

  1. 终于搞懂红黑树!--红黑树的原理及操作
  2. 【设计模式】工厂模式(Factory Pattern)
  3. 编程语言培训学哪个?
  4. Unity3d 制定游戏循环的策略
  5. Win7电脑开机的速度非常慢怎么解决?
  6. 仿真软件算法(MOM/FDTD/FEM/BEM/FDID)
  7. 风火家人:避风港湾;火泽暌:求同存异
  8. 产品交互设计入门书籍推荐(亲自看过)
  9. 【Blender】问题记录001--用grease pencil画线条时一节一节不连贯的原因
  10. linux 文件缓存大小设置,Linux文件读写机制及优化方式