R语言学习笔记

统计机器学习导论第二章部分习题

文章目录

R语言学习笔记
一、8题
- 8. This exercise relates to the College data set, which can be found in the file College.csv. It contains a number of variables for 777 different universities and colleges in the US. The variables are
二、9题
- 9. This exercise involves the Auto data set studied in the lab. Make sure that the missing values have been removed from the data.
总结

一、8题

8. This exercise relates to the College data set, which can be found in the file College.csv. It contains a number of variables for 777 different universities and colleges in the US. The variables are

(a)Use the read.csv() function to read the data into R. Call the loaded data college

#college=read.csv ("College.csv", header =T,na.strings ="?")
college=read.csv ("E:/大三下学期/机器学习/jiqixuxi/data/College.csv", header =T,na.strings ="?")

(b)Look at the data using the fix() function.

fix(college)
rownames (college)=college [,1]

summary(college)

ii. Use the pairs() function to produce a scatterplot matrix of the first ten columns or variables of the data.

Private=as.factor (college$Private)
college=data.frame(Private,college[,2:11])
#pairs(college,main="矩阵散点图")
pairs(~Apps+Accept+Enroll+Top10perc+Top25perc+F.Undergrad+P.Undergrad+Outstate,panel = panel.smooth,data=college,main="矩阵散点图")

iii. Use the plot() function to produce side-by-side boxplots of Outstate versus Private.

plot(Private,college$Outstate,ylab="Outstate")

iv. Use the summary() function to see how many elite universities there are.

Now use the plot() function to produce side-by-side boxplots of Outstate versus Elite.

Elite=rep("No",nrow(college ))
Elite[college$Top10perc >50]=" Yes"
Elite=as.factor(Elite)
college=data.frame(college , Elite)
summary(college)

Now use the plot() function to produce side-by-side boxplots of Outstate versus Elite.

plot(Elite,college$Outstate,ylab="Outstate")

v. Use the hist() function to produce some histograms with differing numbers of bins for a few of the quantitative variables.


par(mfrow=c(3,3))
college=read.csv ("E:/大三下学期/机器学习/jiqixuxi/data/College.csv", header =T,na.strings ="?")
name=colnames(college) #提取列名
for(i in 3:19){  #十七个定量变量的频数直方图hist(college[,i],col =2, breaks =20,xlab=name[i],main="频数直方图")
}

vi. Continue exploring the data, and provide a brief summary of what you discover.
1.学生情况：申请人数、接受申请数、入学人数和本科生人数线性正相关；高中班前10%的新生和前25%的新生呈现非线性相关；申请留学的学生大部分成绩都没有达到前25%，前25%的学生人数只占每个学校申人数的50个左右。

2.师资配置：各个高校拥有博士学位的教员百分比呈现左偏分布，大部分学校拥有博士学位的导师占比在80%左右，少部分高校拥有博士学位的导师占比不足50%。大部分学校的学生比教员的比率在15%左右，师资力量好。

3.留学费用：住宿费用4300左右，书费500左右，个人支出1300左右，可见留学费用中住宿支出占比最大；私立学校学费远远高公立学校，其学费波动程度也稍大于公立学校学费波动程度，仅有极少数的公立学校学费高于私立学校；私立学校成绩排前10%的新生学费高于非前10%的新生。

二、9题

9. This exercise involves the Auto data set studied in the lab. Make sure that the missing values have been removed from the data.

Auto=read.table ("E:/大三下学期/机器学习/jiqixuxi/data/Auto.data", header =T,na.strings ="?")
fix(Auto)
Auto=na.omit(Auto)
#dim(Auto)

(a) Which of the predictors are quantitative, and which are qualitative?

names(Auto)
summary(Auto)

cylinders和origin是定性变量，其他变量均为定量变量

(b) What is the range of each quantitative predictor? You can answer this using the range() function.

len=matrix(0,8,4)
for(l in 1:8){len[l,]=range(Auto[,l])#变量取值范围len[l,3]=sd(Auto[,l])#变量标准差len[l,4]=mean((Auto[,l]))#变量均值
}name=matrix(names(Auto[,1:8]),8,1)#提取变量名
len=cbind(name,len)#组合数据表
len=data.frame(len)
names(len)=c("变量名","最小值","最大值","标准差","均值")
len

(d) Now remove the 10th through 85th observations. What is the range, mean, and standard deviation of each predictor in the subset of the data that remains?

Auto2=Auto[-c(10,85),]
dim(Auto2)

len2=matrix(0,8,4)
for(l in 1:8){len2[l,]=range(Auto2[,l])len2[l,3]=sd(Auto2[,l])len2[l,4]=mean((Auto2[,l]))
}
name2=matrix(names(Auto[,1:8]),8,1)
len2=cbind(name2,len2)
len2=data.frame(len2)
names(len2)=c("变量名","最小值","最大值","标准差","均值")
len2

(e) Using the full data set, investigate the predictors graphically, using scatterplots or other tools of your choice. Create some plots highlighting the relationships among the predictors. Comment on your findings.

pairs(Auto[,1:8],main="Auto's matrix scatter plot")#矩阵散点图查看大致相关情况
#pairs(Auto[,1:7],main="Auto's matrix scatter plot")

displacement与horsepower、weight呈现线性正相关，与acceleration呈现负相关；
horsepower与weight呈现线性，与acceleration、year呈现负相关正相关；
mpg与horsepower、weight、acceleration呈现正相关，与acceleration呈现负相关。

(f) Suppose that we wish to predict gas mileage (mpg) on the basis of the other variables. Do your plots suggest that any of the other variables might be useful in predicting mpg? Justify your answer.

origin=as.factor(Auto$origin)
Auto3=data.frame(Auto[,1:7],origin)

library(GGally)
library(ggplot2)
ggpairs(Auto3, columns=1:8, aes(color=origin)) + ggtitle("matrix scatter plot-Auto)")+theme_bw()

Auto4=data.frame(Auto[,1:7],origin)
name4=names(Auto4)
name4
fm=lm(mpg~cylinders+displacement+horsepower+weight+acceleration+year,Auto4)
summary(fm)
lm.step=step(fm,direction = 'backward')
#lm.step2=step(fm,direction = 'both')

fm=lm(mpg~weight+year,Auto4)
fm

mpg与wight成负相关和year呈现正相关。
自变量选择对应的AIC最小的值是968.66，wight和year的回归系数是-0.006632 0.757318
回归方程是 mpg=-14.347253-0.006632wight+0.757318year

总结

以上均为个人观点，由于个人能力有限，难免有差错，还请多多指教

统计机器学习导论第二章答案相关推荐

李航《统计学习方法》第二章课后答案链接
李航<统计学习方法>第二章课后答案链接李航统计学习方法第二章课后习题答案 http://blog.csdn.net/cracker180/article/details/787 ...
计算机科学导论第二章,计算机科学导论第二章.doc
计算机科学导论第二章.doc (2页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦! 3.9 积分第二次作业 2.1 数值数据1. 计算机只识别二进制编 ...
计算机导论重写算法,计算机导论第二章.ppt
<计算机导论第二章.ppt>由会员分享,可在线阅读,更多相关<计算机导论第二章.ppt(66页珍藏版)>请在人人文库网上搜索. 1.1.第二章计算机系统的组成2.1四个功能部件 ...
计算机导论免费阅读小说,计算机导论第二章.ppt
计算机导论第二章.ppt (66页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦! 19.90 积分第二章计算机系统的组成 2.1 四大功能部件 2.2 ...
c 语言与自动控制原理,自动控制原理C作业(第二章)答案
<自动控制原理C作业(第二章)答案>由会员分享,可在线阅读,更多相关<自动控制原理C作业(第二章)答案(20页珍藏版)>请在人人文库网上搜索. 1.第二章控制系统的数学模型2 ...
统计学习方法笔记第二章-感知机
统计学习方法笔记第二章-感知机 2.1 感知机模型 2.2感知机学习策略 2.2.1数据集的线性可分型 2.2.2感知机学习策略 2.3感知机学习算法 2.3.1感知机算法的原始形式 2.3.2算法的 ...
一篇详解带你再次重现《统计学习方法》——第二章、感知机模型
个性签名:整个建筑最重要的是地基,地基不稳,地动山摇. 而学技术更要扎稳基础,关注我,带你稳扎每一板块邻域的基础. 博客主页:七归的博客专栏:<统计学习方法>第二版--个人笔记创作不易 ...
吴恩达机器学习（第二章）——单变量线性回归
第二章-单变量线性回归文章目录第二章-单变量线性回归模型描述代价函数梯度下降梯度下降的思想梯度下降算法的公式梯度下降的运动方式线性回归的梯度下降模型描述在监督学习中我们有一个数据 ...
计算机网络原理第二章答案
这是计算机网络原理系列答案第二章,是我个人总结的有问题欢迎指出 ReviewQuestion: R.1. What is the difference between network archite ...

统计机器学习导论第二章答案