R语言统计入门第六章——回归与相关性
回归与相关性
6.1简单线性回归
library(ISwR)
attach(thuesen)
The following objects are masked from thuesen (pos = 3):
blood.glucose, short.velocity
The following objects are masked from thuesen (pos = 6):blood.glucose, short.velocity
lm(short.velocity~blood.glucose)#线形模型,short.velocity通过blood.glucose来描述
Call:
lm(formula = short.velocity ~ blood.glucose)Coefficients:
(Intercept) blood.glucose
1.09781 0.02196
summary(lm(short.velocity~blood.glucose))#对回归结果进行分析和检验
Call:
lm(formula = short.velocity ~ blood.glucose)Residuals:
Min 1Q Median 3Q Max
-0.40141 -0.14760 -0.02202 0.03001 0.43490
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.09781 0.11748 9.345 6.26e-09 ***
blood.glucose 0.02196 0.01045 2.101 0.0479 *Signif. codes: 0 ‘’ 0.001 '’ 0.01 '’ 0.05 ‘.’ 0.1 ’ ’ 1
Residual standard error: 0.2167 on 21 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.1737, Adjusted R-squared: 0.1343
F-statistic: 4.414 on 1 and 21 DF, p-value: 0.0479
plot(blood.glucose,short.velocity)abline(lm(short.velocity~blood.glucose))#绘制回归线
缺失值的处理
detach(thuesen)complete.cases(thuesen)#输出判断缺失值的逻辑向量
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[12] TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
[23] TRUE TRUE
cc<-complete.cases(thuesen)
thuesen
blood.glucose short.velocity
1 15.3 1.76
2 10.8 1.34
3 8.1 1.27
4 19.5 1.47
5 7.2 1.27
6 5.3 1.49
7 9.3 1.31
8 11.1 1.09
9 7.5 1.18
10 12.2 1.22
11 6.7 1.25
12 5.2 1.19
13 19.0 1.95
14 15.1 1.28
15 6.7 1.52
16 8.6 NA
17 4.2 1.12
18 10.3 1.37
19 12.5 1.19
20 16.1 1.05
21 13.3 1.32
22 4.9 1.03
23 8.8 1.12
24 9.5 1.70
thuesen1<-thuesen[cc,]#剔出有缺失值的观测,并赋给thuesen1
thuesen1
blood.glucose short.velocity
1 15.3 1.76
2 10.8 1.34
3 8.1 1.27
4 19.5 1.47
5 7.2 1.27
6 5.3 1.49
7 9.3 1.31
8 11.1 1.09
9 7.5 1.18
10 12.2 1.22
11 6.7 1.25
12 5.2 1.19
13 19.0 1.95
14 15.1 1.28
15 6.7 1.52
17 4.2 1.12
18 10.3 1.37
19 12.5 1.19
20 16.1 1.05
21 13.3 1.32
22 4.9 1.03
23 8.8 1.12
24 9.5 1.70
attach(thuesen1)
The following objects are masked from thuesen (pos = 3):
blood.glucose, short.velocity
The following objects are masked from thuesen (pos = 6):
blood.glucose, short.velocity
lm(short.velocity~blood.glucose)#线形模型,short.velocity通过blood.glucose来描述
Call:
lm(formula = short.velocity ~ blood.glucose)Coefficients:
(Intercept) blood.glucose
1.09781 0.02196
detach(thuesen1)
6.2残差与回归值
attach(thuesen)
The following objects are masked from thuesen (pos = 3):
blood.glucose, short.velocity
The following objects are masked from thuesen (pos = 6):
blood.glucose, short.velocity
lm.velo<-lm(short.velocity~blood.glucose)#将回归结果赋给lm.velo
fitted(lm.velo)#计算回归值
1 2 3 4 5 6 7 8
1.433841 1.335010 1.275711 1.526084 1.255945 1.214216 1.302066 1.341599
9 10 11 12 13 14 15 16
1.262534 1.365758 1.244964 1.212020 1.515103 1.429449 1.244964 NA
17 18 19 20 21 22 23 24
1.190057 1.324029 1.372346 1.451411 1.389916 1.205431 1.291085 1.306459
resid(lm.velo)#输出残差
1 2 3 4 5
0.326158532 0.004989882 -0.005711308 -0.056084062 0.014054962
6 7 8 9 10
0.275783754 0.007933665 -0.251598875 -0.082533795 -0.145757649
11 12 13 14 15
0.005036223 -0.022019994 0.434897199 -0.149448964 0.275036223
16 17 18 19 20
NA -0.070057471 0.045971143 -0.182346406 -0.401411486
21 22 23 24
-0.069916424 -0.175431237 -0.171085074 0.393541161
plot(blood.glucose,short.velocity)
lines(blood.glucose,fitted(lm.velo))#绘制回归线,
lines(blood.glucose[!is.na(short.velocity)],fitted(lm.velo))
Error in xy.coords(x, y): ‘x’ and ‘y’ lengths differ
cc<-complete.cases(thuesen)
options(na.action = na.exclude)#缺失值处理选项
clm.velo<-lm(short.velocity~blood.glucose)#不使用上条命令时,将na.action = na.exclude作为参数设置结果一致
fitted(lm.velo)#计算回归值
1 2 3 4 5 6 7 8
1.433841 1.335010 1.275711 1.526084 1.255945 1.214216 1.302066 1.341599
9 10 11 12 13 14 15 16
1.262534 1.365758 1.244964 1.212020 1.515103 1.429449 1.244964 NA
17 18 19 20 21 22 23 24
1.190057 1.324029 1.372346 1.451411 1.389916 1.205431 1.291085 1.306459
segments(blood.glucose,fitted(lm.velo),blood.glucose,short.velocity)#绘制残差线段
plot(fitted(lm.velo),resid(lm.velo))#残差与回归值的散点图
qqnorm(resid(lm.velo))#通过qq图检验残差的正态性
6.3预测与置信带
predict(lm.velo)#不加参数时,predict输出的是回归值
1 2 3 4 5 6 7 8
1.433841 1.335010 1.275711 1.526084 1.255945 1.214216 1.302066 1.341599
9 10 11 12 13 14 15 16
1.262534 1.365758 1.244964 1.212020 1.515103 1.429449 1.244964 NA
17 18 19 20 21 22 23 24
1.190057 1.324029 1.372346 1.451411 1.389916 1.205431 1.291085 1.306459
predict(lm.velo,int="c")#得到自信边界值,fit期望得到的值,lwr下界,upr上界
fit lwr upr
1 1.433841 1.291371 1.576312
2 1.335010 1.240589 1.429431
3 1.275711 1.169536 1.381887
4 1.526084 1.306561 1.745607
5 1.255945 1.139367 1.372523
6 1.214216 1.069315 1.359118
7 1.302066 1.205244 1.398889
8 1.341599 1.246317 1.436881
9 1.262534 1.149694 1.375374
10 1.365758 1.263750 1.467765
11 1.244964 1.121641 1.368287
12 1.212020 1.065457 1.358583
13 1.515103 1.305352 1.724854
14 1.429449 1.290217 1.568681
15 1.244964 1.121641 1.368287
16 NA NA NA
17 1.190057 1.026217 1.353898
18 1.324029 1.230050 1.418008
19 1.372346 1.267629 1.477064
20 1.451411 1.295446 1.607377
21 1.389916 1.276444 1.503389
22 1.205431 1.053805 1.357057
23 1.291085 1.191084 1.391086
24 1.306459 1.210592 1.402326
predict(lm.velo,int="p")#预测边界
Warning in predict.lm(lm.velo, int = “p”): predictions on current data refer to future responses
fit lwr upr
1 1.433841 0.9612137 1.906469
2 1.335010 0.8745815 1.795439
3 1.275711 0.8127292 1.738693
4 1.526084 1.0248161 2.027352
5 1.255945 0.7904672 1.721423
6 1.214216 0.7408499 1.687583
7 1.302066 0.8411393 1.762993
8 1.341599 0.8809929 1.802205
9 1.262534 0.7979780 1.727090
10 1.365758 0.9037136 1.827802
11 1.244964 0.7777510 1.712177
12 1.212020 0.7381424 1.685898
13 1.515103 1.0180367 2.012169
14 1.429449 0.9577873 1.901111
15 1.244964 0.7777510 1.712177
16 NA NA NA
17 1.190057 0.7105546 1.669560
18 1.324029 0.8636906 1.784367
19 1.372346 0.9096964 1.834996
20 1.451411 0.9745421 1.928281
21 1.389916 0.9252067 1.854626
22 1.205431 0.7299634 1.680899
23 1.291085 0.8294798 1.752690
24 1.306459 0.8457315 1.767186
pred.frame<-data.frame(blood.glucose=4:20)#生成blood.glucose的新数据框
pp<-predict(lm.velo,int="p",newdata=pred.frame)#预测pred.frame中的y值,计算预测区间,并赋给pp
pc<-predict(lm.velo,int="c",newdata=pred.frame)#预测pred.frame中的y值,计算自信区间,并赋给pc
plot(blood.glucose,short.velocity,ylim = range(short.velocity,pp,na.rm = T))#绘制散点图,确定图形比例
pred.gluc<-pred.frame$blood.glucose#提取变量数据并赋给新的数据框,
matlines(pred.gluc,pc,lty = c(1,2,2),col="black")
matlines(pred.gluc,pp,lty=c(1,3,3),col="black")
6.4相关性
6.4.1皮尔逊相关系数
cor(blood.glucose,short.velocity)#结果缺失,因为参数存在缺失值
[1] NA
cor(blood.glucose,short.velocity,use="complete.obs")#对blood.glucose,short.velocity进行相关系数计算,use=c为缺失值处理选项
[1] 0.4167546
cor(thuesen,use="complete.obs")#对数据框中所有变量进行相关系数计算生成相关系数矩阵
blood.glucose short.velocity
blood.glucose 1.0000000 0.4167546
short.velocity 0.4167546 1.0000000
cor.test(blood.glucose,short.velocity)#相关性检验
Pearson’s product-moment correlation
data: blood.glucose and short.velocity
t = 2.101, df = 21, p-value = 0.0479
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.005496682 0.707429479
sample estimates:
cor
0.4167546
### 6.4.2斯皮尔曼相关系数
cor.test(blood.glucose,short.velocity,method = "spearman")#斯皮尔曼相关系数检验,method=s为其选项
Warning in cor.test.default(blood.glucose, short.velocity, method =
“spearman”): Cannot compute exact p-value with ties
Spearman’s rank correlation rho
data: blood.glucose and short.velocity
S = 1380.4, p-value = 0.1392
alternative hypothesis: true rho is not equal to 0
sample estimates:
rho
0.318002
6.4.3肯德尔等级相关系数
cor.test(blood.glucose,short.velocity,method = "kendall")#与上同
Warning in cor.test.default(blood.glucose, short.velocity, method =
“kendall”): Cannot compute exact p-value with ties
Kendall’s rank correlation tau
data: blood.glucose and short.velocity
z = 1.5604, p-value = 0.1187
alternative hypothesis: true tau is not equal to 0
sample estimates:
tau
0.2350616
R语言统计入门第六章——回归与相关性相关推荐
- R语言统计入门第四章描述性统计和图形——4.6表格的图形显示
预处理 caff.marital<-matrix(c(625,1537,598,242,36,46,38,21,218,327,106,67),nrow = 3,byrow = T)#生成表格, ...
- R语言统计入门第四章描述性统计和图形——4.3分组数据的汇总统计量
library(ISwR) attach(juul) 4.3分组数据的汇总统计量 attach(red.cell.folate) tapply(folate,ventilation,mean)#提取f ...
- R语言基础题及答案(六)——R语言与统计分析第六章课后习题(汤银才)
R语言与统计分析第六章课后习题(汤银才) 题-1 有一批枪弹, 出厂时, 其初速v∼N(950,σ2)v\sim N(950,\sigma^2)v∼N(950,σ2)(单位:m/sm/sm/s). 经 ...
- python入门第六章 信息安全策略-文件备份 用户账户管理
import os def file_backups(file_name, path):# 备份的文件名file_back = file_name.split('\\')[-1]# 判断用户输入的内容 ...
- R语言实战-第十六章 聚类分析
第16章 聚类分析 本章需要的包 library(flexclust) library(rattle) library(cluster) library(NbClust) library(fMulti ...
- R语言入门第六集 实验五:综合应用
R语言入门第六集 实验五:综合应用 一.资源 [R语言]沈阳地铁数据处理及站间流量统计--R语言第五次实训 lubridate-轻松处理日期时间 数据整理-dplyr包(mutate系列) CEILI ...
- ISLR统计学习导论之R语言应用(六):R语言实现变量选择和岭回归
ISLR统计学习导论之R语言应用(六):R语言实现变量选择和岭回归
- 《数据科学:R语言实现》——第1章 R中的函数
本节书摘来自华章出版社<数据科学:R语言实现>一 书中的第1章,第1.1节,作者:R for Data Science Cookbook 丘祐玮(David Chiu),更多章节内容可以访 ...
- R语言中的多项式回归、局部回归、核平滑和平滑样条回归模型
全文下载链接:http://tecdat.cn/?p=20531 当线性假设无法满足时,可以考虑使用其他方法(点击文末"阅读原文"获取完整代码数据). 相关视频 多项式回归 扩展可 ...
最新文章
- Hadoop三种安装模式
- Squid配置二级代理(父代理)
- qqsafe病毒 arp网站挂马 原理剖析-786ts.qqsafe-qqservicesyydswfhuw8ysjftwf.org(转载)
- node08-express
- 利用记录型信号量解决不会出现死锁的哲学家就餐问题
- Java面向对象——基础3 其他关键字
- Web后端学习笔记 Flask(13)memcached
- Java web(2012/2/23)
- python计算工资编程-老男孩学Python编程后薪资待遇高吗?
- InDesign入门教程,如何链接图形?
- 网络干货,无论是运维还是开发都要知道的网络知识系列之(八)
- OPENCV中操作鼠标
- 敏捷开发案例:用白板解决项目管理和团队沟通
- sht20中写用户寄存器_SHT20 中文技术手册
- 设计模式-单一职责原著
- 修改密码 的测试用例(web)
- 论文笔记目录(ver2.0)
- 【西欧经济史第二版】【4】第一章 导言
- 系统测试常见类型及说明
- 2017年小老虎软考辅导视频访问量备忘录