统计学习导论_统计学习导论 | 读书笔记11 | 多项式回归和阶梯函数
ISLR(7)- 非线性回归分析
多项式回归和阶梯函数
Note Summary:
0.从理想的线性到现实的非线性
1.多项式回归
2.Step Function
3.参考
0. Moving Beyond Linearity
相较于其他模型, 线性模型更易于描述和实现
- 解释性能和推断理论更有优势
However, standard linear regression can have significant limitations in terms of predictive power
- Since the linearity assumption is always an poor approximation
- Recall that Least Squares can be improved by Ridge Regression, LASSO, PCR... to reduce the complexity of the linear model
- reduce the variance of the estimates
Goals Beyond Linearity:
Relax the Linearity assumption while
- still maintaining interpretability as much as possible
Extensions of linear models
- Polynomial Regression(7.1)
- Step Function(7.2)
- Regression Spline(7.4)
- Smoothing Spline(7.5)
- Local Regression(7.6)
- above approaches are for modeling the relationship between a response Y and a single predictor X in a flexible way.
- Generalized Additive Model (GAM)
- above approaches can be seamlessly integrated to model
and several
1. Polynomial Regression
❝ Polynomial Regression extends the linear model by adding extra predictors, obtained by raising each of the original predictors to a power:
- A Cubic regression uses three variables,
, as predictors to
provide a non-linear fit to data
❞
Standard Linear Model to Polynomial
- for large enough 「degree d」, polynomial regression produces an extremely non-linear curve
- the coefficients
are still estimated by Least Squre
- Genearlly, 「
」
since large d will lead polynomial curve overly flexible and take strange shapes
Wage & Age Non-Linear Relation
Fitting a degree-4 polynomial using least squares
- the individual coefficients are not of particular interest (black box???)
- Let
be the value of
age
, to predictwage
:
「Variance」
Compute Variance of the fit,
, we need:
- Variance Estimates for each of the fitted coefficients
from Least Squares
- The Covariances between pairs of coefficient estimates,
- Let
be the 5x5 covariance matrix of the
- Let
- Let
:
estimated pointwise standard error of
- As EACH reference point
, this computation is repeated and get the fitted curve and twice the standard error
The pair of dotted curves at both sides of the fit are (2x) standard error curves
- Since this (2x) quantity corresponds to an approximate 95% CI, for normally distributed error terms
「Logsitic Regression」
We can treat Wage
as a binary variable by splitting it into 「high/low earners」
- logistic regression can be fitted to predict binary response:
Although the sample size is n = 3000, there are only 79 high earners,
- this results in a high variance in the estimated coefficients and therefore fairly wide confidence intervals
2. Step Function
Using polynomial functions in a linear model imposes a 「global structure」 on the non-linear function of X
- use step function to avoid such global structure
❝ Step Function cut the range of a variable into K distinct regions to produce a qualitative variable
- this has the effect of fitting a piecewise constant function in each bin
- and convert a continuous variable into ordered categorical variable
❞
Create cutpoints
Since
- Use Least Squares to fit a linear model by using
as predictors:
- can be interpreted as the mean value offor
- can represent the average increase in the response forinrelative to
Fit the Logistic Regression Model to predict the probability:
「Disadvantages:」
Unless there are natural breakpoints in the predictors, piecewise-constant functions can 「miss the action」
age
from 20 to 30
「Advantages:」
Step functions are more likely used in biostatistics and epidemiology,
- 5-year age groups are often used to define the bins
3. 参考:
- 《Introduction to Statistical Learning》
- Section 7.1, 7.2
TOGO: (7) Basis Functions and Splines!
统计学习导论_统计学习导论 | 读书笔记11 | 多项式回归和阶梯函数相关推荐
- 正则表达式学习日记_《学习正则表达式》笔记_Mr_Ouyang
正则表达式学习日记_<学习正则表达式>笔记_Mr_Ouyang 所属分类: 正则表达式学习日记 书名: 学习正则表达式 作者: Michael Fitzgerald 译者 ...
- 大数据之路读书笔记-11事实表设计
大数据之路读书笔记-11事实表设计 文章目录 大数据之路读书笔记-11事实表设计 11.1 事实表基础 11.1.1 事实表特性 11.1.2 事实表设计原则 11.1.3 事实表设计方法 11.2 ...
- 统计学习导论_统计机器学习之扫盲导论篇
机器学习之扫盲导论篇 来都来了,不关注一下吗?? 人工智能是当下最火的词,而机器学习就是它的灵魂. 现在超级多搞金融的人已经用到很深的机器学习模型了,更别提互联网企业的大佬们了,比如: (这是一篇研报 ...
- 深度学习与无人车导论_深度学习导论
深度学习与无人车导论 改变游戏规则 图片的信誉归功于: https : //www.digitalocean.com/ 深度学习 已经成为许多新应用程序的主要驱动力,是时候真正了解为什么会这样了. 我 ...
- 电路分析导论_生存分析导论
电路分析导论 In our extremely competitive times, all businesses face the problem of customer churn/retenti ...
- 强化学习-动态规划_强化学习-第5部分
强化学习-动态规划 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU's Yo ...
- 深度学习 图像分类_深度学习时代您应该阅读的10篇文章了解图像分类
深度学习 图像分类 前言 (Foreword) Computer vision is a subject to convert images and videos into machine-under ...
- 统计数字问题_统计问题
统计数字问题 Statistics can be one of the most divisive and harmful misinformation tools, and I have seen ...
- 强化学习-动态规划_强化学习-第4部分
强化学习-动态规划 有关深层学习的FAU讲义 (FAU LECTURE NOTES ON DEEP LEARNING) These are the lecture notes for FAU's Yo ...
最新文章
- docker删除image失败,conflict
- wxWidgets:文件类和函数
- 使用命令行结合jq提取出Kubernetes配置文件里的secret信息
- Android入门(八) | 常用的界面布局 及 自定义控件
- CCIE试验备考之交换security
- 电子电路学习笔记(12)——稳压二极管
- weblogic下载
- 嵌入式入门必去的网站 —— 介绍的非常详细
- JS图片压缩+图片上传前检测类型、大小、尺寸
- django 1.11 文档
- 大家都见过哪些让你虎躯一震的代码?
- python语言实现读取菜谱_通过Python语言实现美团美食商家数据抓取
- CALCULATE函数的运算顺序-第一弹
- 06年注册安全工程师试题
- php漏洞防范措施,php漏洞php后门浅析以及防范措施
- linux can总线接收数据串口打包上传_「干货」手把手教你用Zedboard学习Linux移植和驱动开发...
- 2018/09/25渡课
- 操作系统安全配置操作
- 在线预览word文档
- DPDK发包处理流程
热门文章
- Oracle/PLSQL AFTER DELETE Trigger
- MAT(Memory Analyzer Tool)工具入门介绍
- Spark基础学习笔记10:Scala集成开发环境
- Python案例:求转置矩阵
- Java讲课笔记28:Path接口和Files工具类
- 学用软件:laTex软件初体验
- Go程序:演示数组切片用法
- 【BZOJ3670】【codevs3319】动物园,KMP+时间优化
- 14.图像透视——投影几何性质,平行线(Parallel Lines),消失点(Vanishing Point)_3
- noip2016的研究