R语言学习_回归预测

相关与回归的联系
相关与回归
不独立两个随机变量，二者之间肯定会存在某种关系：
函数关系（确定性关系）
相关关系（非确定关系）
散点图观察相关性
相关性的程度
相关系数
相关关系不是因果关系

一元线性回归一元线性回归————数学思想最佳回归线不同的人会找到不同的‘最佳’回归线残差平方和最小为‘最佳’普通最小二乘法（OLS）残差平方和最小残差就是y的实际值和y的回归值之间的差异，就是随机误差。建立模型解读模型高斯-马尔科夫假定（LINE）模型检验（系数检验、方程检验）统计预测前提条件（LIME）      # 使用残差分析进行前提条件的检查   （不是很重要，可以不做）线性（Linear）独立性（Independence）正态性（normal）等方差（equalvariance）模型检查系数的检验方程的检验决定系数R^2贡献有多大？回归平方和占总平方和的比例等于相关系数的平方统计预测点预测区间预测个体的预测区间均数的置信区间多项式回归
多元线性回归回归公式求解方法R实战常用函数summary()           展示拟合模型的详细结果coefficients()      列出拟合模型的模型参数（截距项和斜率）confint()           提供模型参数的置信区间（默认95%）fitted()            列出拟合模型的预测值residuals()         列出拟合模型的残差值anova()             生成一个拟合模型的方差分析表vcov()              列出模型参数的协方差矩阵AIC()               输出赤池信息统计量plot()              生成评价拟合模型的诊断图predict()           用拟合模型对新的数据集预测响应变量值

R实例：
lm函数
lm(formula,data)
formula表达式，data数据集
表达式写法
Y ~ X1 + X2 + … + Xk

    > # 一元线性回归> head(women,3)height weight1     58    1152     59    1173     60    120> plot(women$height,women$weight)> whlm = lm(weight~height,data = women)> abline(whlm,col = 'red',lw = 2)> summary(whlm)Call:lm(formula = weight ~ height, data = women)Residuals:          # 残差Min      1Q  Median      3Q     Max-1.7333 -1.1333 -0.3833  0.7417  3.1167Coefficients:       # 得出回归方程 y = 3.45x - 87.51667Estimate Std. Error t value Pr(>|t|)(Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***       # 最后一个值是重点，*号越多显著性越高，height        3.45000    0.09114   37.85 1.09e-14 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1Residual standard error: 1.525 on 13 degrees of freedomMultiple R-squared:  0.991,  Adjusted R-squared:  0.9903     # 决定系数R^2看调整后的值F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14      # 方差分析的统计量 p-value值小于0.05或者0.01都是有意义的> (wp = predict(whlm,data.frame(height = c(67,72))))        # 点估计1        2143.6333 160.8833> (wp = predict(whlm,data.frame(height = c(67,72)),interval = 'prediction',level = c(0.9)))     # 区间预测（估计）（常用）fit      lwr      upr1 143.6333 140.8255 146.44122 160.8833 157.8740 163.8927> # 多项式回归> library(car)> scatterplot(weight~height,data = women,spread = F,lty.smooth = 2,pch = 19,main = "WomenAge 30-39",xlab = "Height",ylab = "Weight")> whlm2 = lm(weight~height+I(height^2),data = women)> lines(women$height,fitted(whlm2),col = 'blue',lw = 2,type = 'l')> summary(whlm2)> (wp = predict(whlm2,data.frame(height = c(67,72)),interval = 'prediction',level = c(0.9)))fit      lwr      upr1 142.4151 141.6864 143.14372 163.4029 162.5745 164.2314> par(mfrow = c(1,2))> # 一元线性回归> plot(women$height,women$weight)> whlm1 = lm(weight~height,data = women)> abline(whlm1,col = 'red',lw = 2)> # 多项式回归> plot(women$height,women$weight)> whlm2 = lm(weight~height+I(height^2),data = women)> lines(women$height,fitted(whlm2),col = 'red',lw = 2,type = 'l')> par(mfrow = c(1,1))> # 用anova检查两个模型效果是否有差异> anova(whlm1,whlm2)Analysis of Variance TableModel 1: weight ~ heightModel 2: weight ~ height + I(height^2)Res.Df     RSS Df Sum of Sq      F    Pr(>F)1     13 30.23332     12  1.7701  1    28.463 192.96 9.322e-09 ***---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1残差分析> # 绘制残差诊断图> par(mfrow = c(2,2))> plot(whlm1)           # plot()生成四张残差图，如果不用par()，就需要一张一张的点回车查看> par(mfrow = c(1,1))> library(car)> qqPlot(whlm1$residuals)[1] 15  1> par(mfrow = c(2,2))> plot(whlm2)> par(mfrow = c(1,1))> # 识别没有影响的值和离群值> influence.measures(whlm1)Influence measures oflm(formula = weight ~ height, data = women) :dfb.1_  dfb.hght    dffit cov.r   cook.d    hat inf1   1.0106 -9.73e-01  1.14329 0.860 5.28e-01 0.2417   *     # 带有*号表明数据对结果影响较大，离群值2   0.2893 -2.77e-01  0.34099 1.348 6.06e-02 0.19523   0.1222 -1.16e-01  0.15310 1.362 1.26e-02 0.15604   0.0123 -1.15e-02  0.01687 1.339 1.54e-04 0.12385  -0.0527  4.82e-02 -0.08447 1.288 3.84e-03 0.09886  -0.0789  6.91e-02 -0.16460 1.214 1.43e-02 0.08107  -0.0688  5.36e-02 -0.23753 1.119 2.88e-02 0.07028  -0.0212 -4.53e-16 -0.31959 1.004 4.94e-02 0.06679   0.0350 -4.92e-02 -0.21800 1.140 2.45e-02 0.070210  0.1203 -1.41e-01 -0.33506 1.044 5.50e-02 0.081011  0.1252 -1.39e-01 -0.24336 1.193 3.07e-02 0.098812  0.0854 -9.22e-02 -0.13567 1.311 9.86e-03 0.123813 -0.0035  3.72e-03  0.00491 1.390 1.31e-05 0.156014 -0.4406  4.64e-01  0.57152 1.179 1.59e-01 0.195215 -1.3653  1.43e+00  1.67669 0.514 8.78e-01 0.2417   *> library(car)> influencePlot(whlm1)StudRes       Hat     CookD1  2.025250 0.2416667 0.527663815 2.970125 0.2416667 0.8776159> outlierTest(whlm1)            # 检查离群点No Studentized residuals with Bonferonni p < 0.05Largest |rstudent|:rstudent unadjusted p-value Bonferonni p15 2.970125           0.011698      0.17548多元线性回归演示：> # 多元线性回归> idata = read.csv("e:/insurance.csv",stringsAsFactors = T)> str(idata)'data.frame':  1338 obs. of  7 variables:$ age     : int  19 18 28 33 32 31 46 37 37 60 ...$ sex     : Factor w/ 2 levels "F","M": 1 2 2 2 2 1 1 1 2 1 ...$ bmi     : num  27.9 33.8 33 22.7 28.9 ...$ children: int  0 1 3 0 0 0 1 3 2 0 ...$ smoker  : Factor w/ 2 levels "no","yes": 2 1 1 1 1 1 1 1 1 1 ...$ region  : Factor w/ 4 levels "northeast","northwest",..: 4 3 3 2 2 3 3 2 1 2 ...$ charges : num  16885 1726 4449 21984 3867 ...> # bmi 体重质量指数(body Mass index) = (体重/身高)^2  18.5~24.9是理想，小偏瘦，大偏胖> library(corrgram)> corrgram(idata[,c(1,3,4,7)],lower.panel = panel.conf,upper.panel = panel.pts,diag.panel = panel.minmax)> library(car)> scatterplotMatrix(idata[,c(1,3,4,7)],lty.smooth = 2,spread = F)> idata.lm = lm(charges~.,data = idata)> summary(idata.lm)> idata.lm> idata$age2 = idata$age^2> idata$bmi30 = factor(ifelse(idata$bmi >= 30,1,0))> idata> idata.lm2 = lm(charges~age+age2+children+bmi+sex+bmi30*smoker+region,data = idata)> summary(idata.lm2)> idata.lm3 = update(idata.lm2,~.-age)> summary(idata.lm3)

R语言学习_回归预测相关推荐

R语言学习_数据降维
纬度灾难变量过多(没用的变量) 变量相关(相关的变量) 解决办法剔除无用变量逐步回归向前引入法向后剔除法逐步筛选法 Step函数 AIC越小越好 AIC = n ln(SSE) + 2p ...
R语言学习手记 (1)
R语言学习手记 (1) 经管的会计和财管都会学数据统计与分析R语言这门课,加上我也有点兴趣,就提前选了这门课,以下的笔记由老师上课的PPT.<R语言编程艺术>和<R语言数据科学> ...
R语言学习笔记 06 岭回归、lasso回归
R语言学习笔记文章目录 R语言学习笔记比较lm.ridge和glmnet函数画岭迹图图6-4 <统计学习导论基于R语言的应用>P182 图6-6<统计学习导论基于R语言的 ...
R语言学习笔记（1~3）
R语言学习笔记(1~3) 一.R语言介绍 x <- rnorm(5) 创建了一个名为x的向量对象,它包含5个来自标准正态分布的随机偏差. 1.1 注释由符号#开头. #函数c()以向量的形式输 ...
r语言c函数怎么用,R语言学习笔记——C#中如何使用R语言setwd()函数
在R语言编译器中,设置当前工作文件夹可以用setwd()函数. > setwd("e://桌面//") > setwd("e:\桌面\") > ...
R语言学习二——工具的使用
R语言学习(二) 本章学习R语言相关开发工具的使用: 软件下载软件安装 RStudio的使用 R扩展包的安装与载入容易遇到的问题一.软件下载(RStudio) Rstudio下载地址选择免费版 ...
当当网 R 语言学习资料统计分析
当当网 R 语言学习资料统计分析一.网络数据的抓取二.数据清洗与保存 (一)工作目录的修改 (二)导入数据并修改列名 1. 交互式编辑器 2. names()函数 3. rename()函数 (三 ...
R语言学习笔记 07 Probit、Logistic回归
R语言学习笔记文章目录 R语言学习笔记 probit回归 factor()和as.factor() relevel() 案例11.4复刻 glm函数整理变量回归:Logistic和Probit- ...
R语言学习笔记——入门篇：第一章-R语言介绍
R语言 R语言学习笔记--入门篇:第一章-R语言介绍文章目录 R语言一.R语言简介 1.1.R语言的应用方向 1.2.R语言的特点二.R软件的安装 2.1.Windows/Mac 2.2.Lin ...

R语言学习_回归预测

R语言学习_回归预测相关推荐

最新文章

热门文章