2.1 What Is Statistical Learning? 

In essence, statistical learning refers to a set of approaches for estimating f.

2.1.1 Why Estimate f?

Prediction: predict Y using $\hat Y = \hat f(X)$, where $\hat f$ represents our estimate for f , and $\hat Y$ represents the resulting prediction for Y .

Inference: understand the relationship between X and Y , or more specifically, to understand how Y changes as a function of $X_1, ... , X_p$.

An example: in a real estate setting, one may seek to relate values of homes to inputs such as crime rate, zoning, distance from a river, air quality, schools, income level of community, size of houses, and so forth. In this case one might be interested in how the individual input variables affect the prices — that is, how much extra will a house be worth if it has a view of the river? This is an inference problem. Alternatively, one may simply be interested in predicting the value of a home given its characteristics: is this house under- or over-valued? This is a prediction problem.

2.1.2 How Do We Estimate f? 

2 methods: parametric & non-parametric

Parametric: model-based.it reduces the problem of estimating f down to one of estimating a set of parameters

Non-parametric:no explicit assumptions about the functional form of f. Seek an estimate of f that gets as close to the data points as possible without being too rough or wiggly.

2.1.3 The Trade-off Between Prediction Accuracy and Model Interpretability 

Example: when inference is the goal, the linear model may be a good choice since it will be quite easy to understand the relationship between Y and X1,X2,...,Xp. In contrast, very flexible approaches, such as the splines discussed in Chapter 7, and the boosting methods discussed in Chapter 8, can lead to such complicated estimates of f that it is difficult to understand how any individual predictor is associated with the response.

2.1.4 Supervised versus Unsupervised Learning 

Supervised: we have both predictor measurements and a response measurement.

Unsupervised: we have predictor measurements but no response measurement.

2.1.5 Regression versus Classification Problems 

We tend to refer to problems with a quantitative response as regression problems, while those involving a qualitative response are often referred to as classification problems.

2.2 Assessing Model Accuracy 

In this section, we discuss some of the most important concepts that arise in selecting a statistical learning procedure for a specific data set. Remember, there is no free lunch in statistics: no one method dominates all others over all possible data sets.

2.2.1 Measuring the Quality of Fit 

In the regression setting, the most commonly-used measure is the mean squared error (MSE), given by

$$MSE = \frac{1}{n} \sum_{i=1}^{n}(y_i - \hat f(x_i))^2$$

In practice, one can usually compute the training MSE with relative ease, but estimating test MSE is considerably more difficult because usually no test data are available.

Throughout this book, we discuss a variety of approaches that can be used in practice to estimate this minimum point. One important method is cross-validation (Chapter 5)

2.2.2 The Bias-Variance Trade-Off 

The expected test MSE, for a given value $x_0$, can always be decomposed into the sum of three fundamental quantities: the variance of $\hat f(x_0)$, the squared bias of $f(x_0)$ and the variance of the error terms $\epsilon$.

$$E(y_0 - \hat f(x_0))^2 = Var(\hat f(x_0))+[Bias(\hat f(x_0))]^2+Var(\epsilon)$$

Variance refers to the amount by which $\hat f$ would change if we estimated it using a different training data set.

Bias refers to the error that is introduced by approximating a real-life problem, which may be extremely complicated, by a much simpler model.

As a general rule, as we use more flexible methods, the variance will increase and the bias will decrease (bias-variance trade-off).

2.2.3 The Classification Setting 

In the classification setting, error rate $\frac{1}{n}\sum_{i=1}^{n} I(y_i\neq \hat y_i)$

I is an indicator variable that equals 1 if $y_i \neq \hat y_i$ and zero if $y_i = \hat y_i$

On the other words, we should simply assign a test observation with predictor vector $x_0$ to the class j for which

$$Pr(Y=j|X=x_0)$$

is largest. It is a conditional probability.

Example:

The orange shaded region reflects the set of points for which Pr(Y = orange|X) is greater than 50%, while the blue shaded region indicates the set of points for which the probability is below 50%. The purple dashed line represents the points where the probability is exactly 50%. This is called the Bayes decision boundary.

In theory we would always like to predict qualitative responses using the Bayes classifier. But for real data, we do not know the conditional distribution of Y given X, and so computing the Bayes classifier is impossible.

So then, we introduce K-nearest neighbors (KNN) classifier

An example:

The choice of K has a drastic effect on the KNN classifier obtained. As K grows, the method becomes less flexible and produces a decision boundary that is close to linear. This corresponds to a low-variance but high-bias classifier.

转载于:https://www.cnblogs.com/sheepshaker/p/6613373.html

ISL - Ch.2 Statistical Learning相关推荐

  1. 统计学习导论 Chapter2--What Is Statistical Learning?

    Book: An Introduction to Statistical Learning with Applications in R http://www-bcf.usc.edu/~gareth/ ...

  2. The Elements of Statistical Learning的笔记

    1. The Problem of Overfitting 1 还是来看预测房价的这个例子,我们先对该数据做线性回归,也就是左边第一张图. 如果这么做,我们可以获得拟合数据的这样一条直线,但是,实际上 ...

  3. ISLR—第二章 Statistical Learning

    Statistical Learning Y 和X的关系 why estimate f 用来预测  预测的时候可以将f^当成一个black box来用,目的主要是预测对应x时候的y而不关系它们之间的关 ...

  4. Statistical learning Week 1 什么是统计学习?

    2016-10-14 关于这门课 这门课使用的教材主要是<An Introduction to Statistical Learning with Applications in R>,老 ...

  5. AI之路(二)——关于统计学习(statistical learning)Part 1 概论

    从今日起,正式开启AI之路,在人工智能学习领域,无论机器学习还是深度学习,统计学习是入门的最好参考教材,是不可或缺的.因此,这漫漫求索之路,就从统计学习开始吧. 我所选择的是李航所著的统计学习(第二版 ...

  6. 《The elements of statistical learning》

    <The elements of statistical learning>是由统计学界三位泰斗级人物Trevor Hastie, Robert Tibshirani和Jerome Fri ...

  7. 【文献阅读笔记】KAM Theory Meets Statistical Learning Theory: Hamiltonian Neural Networks with Non-Zero Trai

    文章发表于[2022]AAAI Technical Track on Machine Learning I 文章目录 文章目的 一.主要内容: 1.用统计学习理论证明哈密顿神经网络的训练模型是原系统的 ...

  8. 《The Elements of Statistical Learning》 chp3 Linear Models for Regression

    3.1    线性回归模型假定 回归函数E(Y|X)与输入X1,...,Xp是线性关系. ## 可以扩展到与 输入的变换 是线性关系,这种扩展叫做 basis-function methods(基函数 ...

  9. Machine Learning Summary

    Machine Learning Summary General Idea No Free Lunch Theorem (no "best") CV for complex par ...

  10. Python Tools for Machine Learning

    2019独角兽企业重金招聘Python工程师标准>>> 原文:https://www.cbinsights.com/blog/python-tools-machine-learnin ...

最新文章

  1. 阿里云支撑马来西亚数字自由贸易区落地 帮助马来西亚中小企业参与全球贸易...
  2. 今日头条Marketing API小工具(.Net Core版本)
  3. [css] 相邻兄弟选择器、后代选择器和子选择器三者有什么区别?
  4. 太阳能板清洗机器人科沃斯_太阳能电池板清洁机器人
  5. Centos7 密码重置
  6. Linux 下的终端
  7. Python实现Excel与XML之间的转换
  8. linux常用解压命令总结
  9. 解决iOS app集成共享QQ场地,微信的朋友,朋友等功能圈,不能采用苹果公司的审计问题...
  10. 平面设计版式构成实用技巧
  11. BIN文件和HEX文件区别
  12. linux服务器单向ping不通,Linux下的单向ping通问题
  13. maya2020卸载不干净安装不了_Maya2020安装失败怎么办?2018却可以安装?
  14. VMware虚拟机使用记录
  15. arctanx麦克劳林公式推导过程_半桥 LLC 基波分析和参数计算过程推导
  16. Linux0号进程,1号进程,2号进程
  17. 2022-04-14每日刷题打卡
  18. 2019年必看最新创意手机聊天界面设计
  19. 深度学习图像分类:Kaggle植物幼苗分类(Plant Seedlings Classification)完整代码
  20. python下载图片被覆盖了怎么办_为什么我的图片下载并用python编写后会被破坏?...

热门文章

  1. hibernate数据类型之间的映射关系
  2. 学习嵌入式和单片机有没有必要,价值体现在哪
  3. SpringCloud-Config通过Java访问URL对敏感词加密解密
  4. Emacs shutcuts
  5. python之路 -- 并发编程之进程
  6. 【iCore4 双核心板_ARM】例程二十九:SD_IAP_FPGA实验——更新升级FPGA
  7. C# 各种常用集合类型的线程安全版本
  8. python序列化-复习
  9. [转]Linux 技巧:让进程在后台可靠运行的几种方法
  10. linux常用命令的全拼(转载)