http://blog.csdn.net/pipisorry/article/details/44705051

机器学习Machine Learning - Andrew NG courses学习笔记

降维Dimensionality Reduction

{a second type of unsupervised learning problem called dimensionality reduction.as data compression not only allows us to compress the data also allow us to speed up our learning algorithms}

Dimensionality Reduction降维的目的

1. 数据压缩Data Compression

将二维数据投影到一维(绿线),用一个数据点就可以近似表示原始数据了。

2. 可视化Visualization

Note:if we can have just a pair of numbers, z1 and z2 that somehow, summarizes my 50 numbers, maybe we can plot these countries in R2 and use that to try to understand the space in [xx] of features of different countries [xx]  the better and is reduce the data from 50D,to 2D.

Note:So, you might find,for example, That the horizontial axis Z1 corresponds roughly to the overall country size, or the overall economic activity of a country.Whereas the vertical axis correspond to the per person GDP.and,you might find that, given these 50 features,these are really the 2 main dimensions of the deviation.

[主成分分析(PCA)的交互可视化]

主成分分析Principal Component Analysis

PCA公式定义

 

Note:

1. PCA tries to find a lower-dimensional surface really a line in this case, onto which to project the data, so that the sum of squares of these little blue line segments(the projection error) is minimized.
2. As an a side, before applying PCA it's standard practice to first perform mean normalization and feature scaling so that the features, X1 and X2,should have zero mean and should have comparable ranges of values.
3. PCA would choose something like the red line rather than like the magenta line down here.

Note:

1. whether it gives me positive U1 or -U1,it doesn't matter, because each of these vectors defines the same red line onto which I'm projecting my data.
2. For those of you that are familiar with linear algebra, for those of you that are really experts in linear algebra, the formal definition of this is that we're going to find a set of vectors, U1 U2 maybe up to Uk, and project the data onto the linear subspace spanned by this set of k vectors.But if you are not familiar with linear algebra, just think of it as finding k directions instead of just one direction, onto which to project the data.

PCA和线性规划的区别

Note:

1. linear regression tries to minimize the vertical distance between a point and the value predicted by the hypothesis, whereas in PCA, it tries to minimize the magnitude of these blue lines,these are really the shortestorthogonal(垂直的) distances.
2. more generally when you're doing linear regression there is this distinguished variable ythat we're trying to predict,linear regression is taking all the values of X and try to use that to predict Y(最小化到对应实际y值的距离和). Whereas in PCA, there is no distinguished or there is no special variable Ythat we're trying to predict(而是都投影到一个平面), and instead we have a list of features x1, x2, and so on up to xN, and all of these features are treated equally.

PCA算法

数据预处理

sj is some measure of the beta values of feature j,it could be the max minus min value, or more commonly,it is the standard deviation of feature j.

如果features取值范围非常不同,feature scaling是有必要的。

主成分分析PCA算法

{PCA算法就是要建立新子空间u,并求出原来的高维表示x 在新坐标上对应的值低维表示z}

Note:

1. PCA need to come up with a way to compute two things.One is to compute these vectors,u1 and u2.the other is how do we compute Z.

2. It turns out that a mathematical derivation, also the mathematical proof, for what is the right value U1, U2, Z1,Z2, and so on.That mathematical proof is very complicated.

Note:

1. covariance matrix协方差矩阵:Sigma Matrix ∑ is a summation symbol.

2. 计算矩阵∑的特征向量。The SVD function and the I function it will give the same vectors(because the covariance matrix always satisfies a mathematical Property called symmetric positive definite对称正定), although SVD is a little more numerically stable.So I tend to use SVD.

3. u matrix 是一个N阶酉矩阵。
4. if we want to reduce the data from n dimensions down to k dimensions, then what we need to do is take the first k vectors.

5. these x's can be Examples in our training、cross validation set、test set.

PCA算法总结

Note:

1. 方框中的sigma计算的向量形式。

2. similar to k Means, if you're apply PCA, the way you'd apply this is with vectors X ∈ RN.So, this is not done withX-0 = 1.
3. I didn't give a mathematical proof that the U1 and U2 and so on and the Z and so on you get out of this procedure is really the choices that would minimize these squared projection error.I didn't prove that why PCA tries to do is try to find a surface or line onto which to project the data so as to minimize to square projection error.

Reconstruction from Compressed Representation从压缩表示中重构

{So, given z(i), which maybe a hundred dimensional, how do you go back to your original representation x(i), which was maybe a thousand dimensional}

Note: 可以近似还原的原因:酉矩阵U满足U*UT = I, 也就是Z = UT*X => U*Z = U*UT*X = X

Choosing the Number of Principal Components选择主成分的数目

variance retained保留的方差

Note:

1. the total variation : How far on average are the training examples from the origin.

2. Maybe 95 to 99 is really the most common range of values that people use.

3. For many data sets you'd be surprised, in order to retain 99% of the variance, you can oftenreduce the dimension of the data significantly and still retain most of the variance.Because for most real life data says many features are just highly correlated, and so it turns out to be possible to compress the data a lot and still retain 99% of the variance.

4. 上式可以认为是x坐标还原后的偏移不超过原来坐标位置的1%
k选择的实现

Note:

1. It is one way to choose the smallest value of k, so that and 99% of the variance is retained.But this procedure seems horribly inefficient.Fortunately when you implement PCA,it actually gives us a quantity that makes it much easier to compute these things(like右边).

2. the right is really a measure of your square of construction error, that ratio being 0.01, just gives people a good intuitive sense of whether your implementation of PCA is finding a good approximation of your original data set.

3. 图像压缩的SVD分解表示:reconMat = U [:,:numSV ] * SigRecon * VT [: numSV,:];

这里的numSV相当于上面的k,SigRecon是只取前k个值的s, 也可以用于近似还原原图(不用计算协方差)。

[Peter Harrington - Machine Learning in Action - 1 4 . 6 基于SVD的图像压缩][TopicModel - LSA(隐性语义分析)模型和其实现的早期方法SVD]

应用PCA的建议

PCA可以用于加速学习

Note:

1. check our labeled training set and extract just the X's and temporarily put aside the Y's.

2. if you have a new example, maybe a new test example X. take your test example x,map it through the same mapping that was found by PCA to get you your corresponding z.then gets fed to this hypothesis, and this hypothesis then makes a prediction on your input x.

3. U reduce is like a parameter that is learned by PCA and we should be fitting our parameters only to our training sets.
4. And then having found U reduced for feature scaling where the mean normalization and scaling the scale that you divide the features by to get them on to comparable scales.

PCA应用总结

Note:for visualization applications, we'll usually choose k equals 2 or k equals 3, because we can plot only 2D and 3D data sets.
PCA在防止overfitting上的不好应用

Note:

1. fewer features to fix high variance(overfitting))[Machine Learning - X. Advice for Applying Machine Learning (Week 6) ]

2. the reason of bad use is,PCA does not use the labels y.just looking at inputs xi to find a lower-dimensional approximation to your data.it throws away some information.It throws away or reduces the dimension of your data without knowing what the values of y is.it turns out that just using regularization will often give you at least as good a method for preventing over-fitting and often just work better, because with regularization, this minimization problem actually knows what the values of y are, and so is less likely to throw away some valuable information, whereas PCA doesn't make use of the labels and is more likely to throw away valuable information.
2. it is a good use of PCA to speed up your learning algorithm, but using PCA to prevent over-fitting isnot.
应用PCA的建议


recommend, instead of putting PCA into the algorithm, just try doing whatever it is you're doing with the xi first. And only if you have a reason to believe that doesn't work, so that only if your learning algorithm ends up running too slowly, or only if the memory requirement or the disk space requirement is too large,so you want to compress your representation.

Review复习

from:http://blog.csdn.net/pipisorry/article/details/44705051

Machine Learning - XIV. Dimensionality Reduction降维 (Week 8)相关推荐

  1. 【Machine Learning】机器学习の特征

    绘制了一张导图,有不对的地方欢迎指正: 下载地址 机器学习中,特征是很关键的.其中包括,特征的提取和特征的选择.他们是降维的两种方法,但又有所不同: 特征抽取(Feature Extraction): ...

  2. [Machine learning] 国外程序员整理的机器学习资源大全

    阅读目录 本文汇编了一些机器学习领域的框架.库以及软件(按编程语言排序). 1. C++ 1.1 计算机视觉 CCV -基于C语言/提供缓存/核心的机器视觉库,新颖的机器视觉库 OpenCV-它提供C ...

  3. sklearn自学指南(part1)--Machine Learning in Python

    学习笔记,仅供参考,有错必纠 自翻译+举一反三 scikit-learn(Machine Learning in Python) 预测数据分析的简单和有效的工具 每个人都可以访问,并可在各种上下文中重 ...

  4. EE4408: Machine Learning:

    #########更新完成 ################################ 更正1:MLE classifier 中是利用似然函数算出来的likehood进行判断的,MAP中才使用后 ...

  5. Machine Learning Algorithms Study Notes(4)—无监督学习(unsupervised learning)

    1    Unsupervised Learning 1.1    k-means clustering algorithm 1.1.1    算法思想 1.1.2    k-means的不足之处 1 ...

  6. 机器学习(Machine Learning)入门科普

    =======================国外==================== Machine Learning 大家(1):M. I. Jordan (http://www.cs.ber ...

  7. 机器学习Machine Learning:特征选择Feature Selection 与 数据降维Dimension Reduction的区别?

    为什么会有降维和特征选择??? 我们知道机器学习的终极目标就是为了预测,当然预测前我们要对数据进行训练.通常我们不会拿原始数据来训练,为什么呢?可能有些人觉得原始信息(original data)包含 ...

  8. 台大李宏毅Machine Learning 2017Fall学习笔记 (14)Unsupervised Learning:Linear Dimension Reduction

    台大李宏毅Machine Learning 2017Fall学习笔记 (14)Unsupervised Learning:Linear Dimension Reduction 本博客整理自: http ...

  9. 【论文阅读】Dimensionality Reduction by Learning an Invariant Mapping

    1.背景 对比学习算是比较早就已经提出了一种技术.其中,早期比较有名的一篇文章就是Lecun等在<Dimensionality Reduction by Learning an Invarian ...

  10. 吴恩达《Machine Learning》精炼笔记 9:PCA 及其 Python 实现

    作者 | Peter 编辑 | AI有道 系列文章: 吴恩达<Machine Learning>精炼笔记 1:监督学习与非监督学习 吴恩达<Machine Learning>精 ...

最新文章

  1. 技术人员如何创业《四》- 打造超强执行力团队
  2. JVM - 列出JVM默认参数及运行时生效参数
  3. 三、Netty的粘包半包问题解决
  4. C语言多维数组本质技术推演
  5. Linux之字符串截取
  6. HDU5248:序列变换(二分)
  7. BZOJ 4241 分块
  8. cornerstone 库删除 后 重新添加 ,引用找不到,
  9. C#写字板问题一二 —— C#+WinForm编程趣味入门实战-天轰穿.NET4趣味编程视频教程...
  10. 福建省高考成绩查询2021具体时间,福建高考时间2021具体时间表一览 福建高考时间是几月几号...
  11. php 如何去除说有空格,php怎么去除所有空格
  12. mysql基于SpringBoot的“1818小酒馆”商城网站的设计与实现毕业设计源码192004
  13. 2019年VQA论文整理
  14. 计算机系统指定文件类型,Win7系统下设置显示已知文件类型的扩展名
  15. 常见笔顺错误的字_常见笔画笔顺易错字 的的笔顺笔画顺序
  16. ruby 读取文本_使用Ruby进行文本处理
  17. 如何选型APS系统,还需明确这七大关键因素
  18. 康奈尔大学计算机生物教授,美国康奈尔大学Kirk Deitsch教授访问广州生物院
  19. [CTSC2016]时空旅行
  20. 如何让全链路压测落地?

热门文章

  1. js模拟java的spring
  2. [Project Euler] 来做欧拉项目练习题吧: 题目005
  3. 数据库的跨平台设计(转)
  4. 一天一个小技巧(2)——CSDN编译器的首行缩进2字符
  5. Postman 设置token为全局变量
  6. LinuxDay19——加密与安全(2)
  7. svg绘图工具raphael.js的使用
  8. gdb进行多线程调试
  9. POJ 3669 简单BFS
  10. 关于redis启动流程介绍