ISLR(7)- 非线性回归分析

多项式回归和阶梯函数

Note Summary:
0.从理想的线性到现实的非线性
1.多项式回归
2.Step Function
3.参考

0. Moving Beyond Linearity

相较于其他模型, 线性模型更易于描述和实现

  • 解释性能和推断理论更有优势

However, standard linear regression can have significant limitations in terms of predictive power

  • Since the linearity assumption is always an poor approximation
  • Recall that Least Squares can be improved by Ridge Regression, LASSO, PCR... to reduce the complexity of the linear model
    • reduce the variance of the estimates

Goals Beyond Linearity:

Relax the Linearity assumption while

  • still maintaining interpretability as much as possible

Extensions of linear models

  1. Polynomial Regression(7.1)
  2. Step Function(7.2)
  3. Regression Spline(7.4)
  4. Smoothing Spline(7.5)
  5. Local Regression(7.6)
  • above approaches are for modeling the relationship between a response Y and a single predictor X in a flexible way.
  1. Generalized Additive Model (GAM)
  • above approaches can be seamlessly integrated to model

    and several

1. Polynomial Regression

❝ Polynomial Regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power:

  • A Cubic regression uses three variables,

    , as predictors to

    provide a non-linear fit to data

Standard Linear Model to Polynomial

  • for large enough 「degree d」, polynomial regression produces an extremely non-linear curve
  • the coefficients
    are still estimated by Least Squre
  • Genearlly,

    since large d will lead polynomial curve overly flexible and take strange shapes

Wage & Age Non-Linear Relation

Fitting a degree-4 polynomial using least squares

  • the individual coefficients are not of particular interest (black box???)
  • Let
    be the value of

    age, to predict wage:

「Variance」
Compute Variance of the fit,

, we need:

  • Variance Estimates for each of the fitted coefficients

    from Least Squares
  • The Covariances between pairs of coefficient estimates,
    • Let

      be the 5x5 covariance matrix of the
  • Let
is the

estimated pointwise standard error of

  • As EACH reference point

    , this computation is repeated and get the fitted curve and twice the standard error

The pair of dotted curves at both sides of the fit are (2x) standard error curves

  • Since this (2x) quantity corresponds to an approximate 95% CI, for normally distributed error terms

「Logsitic Regression」
We can treat Wage as a binary variable by splitting it into 「high/low earners」

  • logistic regression can be fitted to predict binary response:

Although the sample size is n = 3000, there are only 79 high earners,

  • this results in a high variance in the estimated coefficients and therefore fairly wide confidence intervals

2. Step Function

Using polynomial functions in a linear model imposes a 「global structure」 on the non-linear function of X

  • use step function to avoid such global structure

❝ Step Function cut the range of a variable into K distinct regions to produce a qualitative variable

  • this has the effect of fitting a piecewise constant function in each bin
  • and convert a continuous variable into ordered categorical variable

Create cutpoints

in the range of X, and then construct
new variables:

Since

must be in exactly one of the
intervals,
  • Use Least Squares to fit a linear model by using

    as predictors:
  • can be interpreted as the mean value of
    for
  • can represent the average increase in the response for
    in
    relative to

Fit the Logistic Regression Model to predict the probability:

「Disadvantages:」
Unless there are natural breakpoints in the predictors, piecewise-constant functions can 「miss the action」

  • age from 20 to 30

「Advantages:」
Step functions are more likely used in biostatistics and epidemiology,

  • 5-year age groups are often used to define the bins

3. 参考:

  • 《Introduction to Statistical Learning》

    • Section 7.1, 7.2

TOGO: (7) Basis Functions and Splines!

统计学习导论_统计学习导论 | 读书笔记11 | 多项式回归和阶梯函数相关推荐

  1. 正则表达式学习日记_《学习正则表达式》笔记_Mr_Ouyang

    正则表达式学习日记_<学习正则表达式>笔记_Mr_Ouyang 所属分类: 正则表达式学习日记  书名:     学习正则表达式 作者:     Michael Fitzgerald 译者 ...

  2. 大数据之路读书笔记-11事实表设计

    大数据之路读书笔记-11事实表设计 文章目录 大数据之路读书笔记-11事实表设计 11.1 事实表基础 11.1.1 事实表特性 11.1.2 事实表设计原则 11.1.3 事实表设计方法 11.2 ...

  3. 统计学习导论_统计机器学习之扫盲导论篇

    机器学习之扫盲导论篇 来都来了,不关注一下吗?? 人工智能是当下最火的词,而机器学习就是它的灵魂. 现在超级多搞金融的人已经用到很深的机器学习模型了,更别提互联网企业的大佬们了,比如: (这是一篇研报 ...

  4. 深度学习与无人车导论_深度学习导论

    深度学习与无人车导论 改变游戏规则 图片的信誉归功于: https : //www.digitalocean.com/ 深度学习 已经成为许多新应用程序的主要驱动力,是时候真正了解为什么会这样了. 我 ...

  5. 电路分析导论_生存分析导论

    电路分析导论 In our extremely competitive times, all businesses face the problem of customer churn/retenti ...

  6. 强化学习-动态规划_强化学习-第5部分

    强化学习-动态规划 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU's Yo ...

  7. 深度学习 图像分类_深度学习时代您应该阅读的10篇文章了解图像分类

    深度学习 图像分类 前言 (Foreword) Computer vision is a subject to convert images and videos into machine-under ...

  8. 统计数字问题_统计问题

    统计数字问题 Statistics can be one of the most divisive and harmful misinformation tools, and I have seen ...

  9. 强化学习-动态规划_强化学习-第4部分

    强化学习-动态规划 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU's Yo ...

最新文章

  1. docker删除image失败,conflict
  2. wxWidgets:文件类和函数
  3. 使用命令行结合jq提取出Kubernetes配置文件里的secret信息
  4. Android入门(八) | 常用的界面布局 及 自定义控件
  5. CCIE试验备考之交换security
  6. 电子电路学习笔记(12)——稳压二极管
  7. weblogic下载
  8. 嵌入式入门必去的网站 —— 介绍的非常详细
  9. JS图片压缩+图片上传前检测类型、大小、尺寸
  10. django 1.11 文档
  11. 大家都见过哪些让你虎躯一震的代码?
  12. python语言实现读取菜谱_通过Python语言实现美团美食商家数据抓取
  13. CALCULATE函数的运算顺序-第一弹
  14. 06年注册安全工程师试题
  15. php漏洞防范措施,php漏洞php后门浅析以及防范措施
  16. linux can总线接收数据串口打包上传_「干货」手把手教你用Zedboard学习Linux移植和驱动开发...
  17. 2018/09/25渡课
  18. 操作系统安全配置操作
  19. 在线预览word文档
  20. DPDK发包处理流程

热门文章

  1. Oracle/PLSQL AFTER DELETE Trigger
  2. MAT(Memory Analyzer Tool)工具入门介绍
  3. Spark基础学习笔记10:Scala集成开发环境
  4. Python案例:求转置矩阵
  5. Java讲课笔记28:Path接口和Files工具类
  6. 学用软件:laTex软件初体验
  7. Go程序:演示数组切片用法
  8. 【BZOJ3670】【codevs3319】动物园,KMP+时间优化
  9. 14.图像透视——投影几何性质,平行线(Parallel Lines),消失点(Vanishing Point)_3
  10. noip2016的研究