《统计学习导论：基于R应用》第3章课后习题参考答案

【第3章课后习题参考答案】
Chapter 3 Exercise

1. In Table 3.4, the null hypothesis for "TV" is that in the presence of radio
ads and newspaper ads, TV ads have no effect on sales. Similarly, the null
hypothesis for "radio" is that in the presence of TV and newspaper ads, radio
ads have no effect on sales. (And there is a similar null hypothesis for
"newspaper".) The low p-values of TV and radio suggest that the null hypotheses
are false for TV and radio. The high p-value of newspaper suggests that the null
hypothesis is true for newspaper.

2. KNN classifier and KNN regression methods are closely related in formula.
However, the final result of KNN classifier is the classification output for Y
(qualitative), where as the output for a KNN regression predicts the
quantitative value for f(X).

3. Y = 50 + 20(gpa) + 0.07(iq) + 35(gender) + 0.01(gpa * iq) - 10 (gpa * gender)

(a) Y = 50 + 20 k_1 + 0.07 k_2 + 35 gender + 0.01(k_1 * k_2) - 10 (k_1 * gender)
male: (gender = 0) 50 + 20 k_1 + 0.07 k_2 + 0.01(k_1 * k_2)
female: (gender = 1) 50 + 20 k_1 + 0.07 k_2 + 35 + 0.01(k_1 * k_2) - 10 (k_1)

Once the GPA is high enough, males earn more on average. => iii.

(b) Y(Gender = 1, IQ = 110, GPA = 4.0)
= 50 + 20 * 4 + 0.07 * 110 + 35 + 0.01 (4 * 110) - 10 * 4
= 137.1

(c) False. We must examine the p-value of the regression coefficient to
determine if the interaction term is statistically significant or not.

4. (a) I would expect the polynomial regression to have a lower training RSS
than the linear regression because it could make a tighter fit against data that
matched with a wider irreducible error (Var(epsilon)).

(b) Converse to (a), I would expect the polynomial regression to have a higher
test RSS as the overfit from training would have more error than the linear
regression.

(c) Polynomial regression has lower train RSS than the linear fit because of
higher flexibility: no matter what the underlying true relationshop is the
more flexible model will closer follow points and reduce train RSS.
An example of this beahvior is shown on Figure~2.9 from Chapter 2.

(d) There is not enough information to tell which test RSS would be lower
for either regression given the problem statement is defined as not knowing
"how far it is from linear". If it is closer to linear than cubic, the linear
regression test RSS could be lower than the cubic regression test RSS.
Or, if it is closer to cubic than linear, the cubic regression test RSS
could be lower than the linear regression test RSS. It is dues to
bias-variance tradeoff: it is not clear what level of flexibility will
fit data better.

5. See 5.jpg.

6. y = B_0 + B_1 x
from (3.4): B_0 = avg(y) - B_1 avg(x)
right hand side will equal 0 if (avg(x), avg(y)) is a point on the line
0 = B_0 + B_1 avg(x) - avg(y)
0 = (avg(y) - B_1 avg(x)) + B_1 avg(x) - avg(y)
0 = 0

8a.
Auto = read.csv("../data/Auto.csv", header=T, na.strings="?")
Auto = na.omit(Auto)
summary(Auto)

attach(Auto)
lm.fit = lm(mpg ~ horsepower)
summary(lm.fit)

i.
Yes, there is a relationship between horsepower and mpg as deterined by testing the null hypothesis of all
regression coefficients equal to zero. Since the F-statistic is far larger than 1 and the p-value of the F-
statistic is close to zero we can reject the null hypothesis and state there is a statistically significant
relationship between horsepower and mpg.

ii.
To calculate the residual error relative to the response we use the mean of the response and the RSE. The
mean of mpg is 23.4459. The RSE of the lm.fit was 4.906 which indicates a percentage error of 20.9248%. The
R2 of the lm.fit was about 0.6059, meaning 60.5948% of the variance in mpg is explained by horsepower.

iii.
The relationship between mpg and horsepower is negative. The more horsepower an automobile has the linear
regression indicates the less mpg fuel efficiency the automobile will have.

iv.
predict(lm.fit, data.frame(horsepower=c(98)), interval="confidence")

predict(lm.fit, data.frame(horsepower=c(98)), interval="prediction")

8b.
plot(horsepower, mpg)
abline(lm.fit)

8c.
par(mfrow=c(2,2))
plot(lm.fit)

Based on the residuals plots, there is some evidence of non-linearity.

9a.
pairs(Auto)

9b.
cor(subset(Auto, select=-name))

9c.
lm.fit1 = lm(mpg~.-name, data=Auto)
summary(lm.fit1)

i.
Yes, there is a relatioship between the predictors and the response by testing the null hypothesis of whether
all the regression coefficients are zero. The F -statistic is far from 1 (with a small p-value), indicating
evidence against the null hypothesis.

ii.
Looking at the p-values associated with each predictor’s t-statistic, we see that displacement, weight,
year, and origin have a statistically significant relationship, while cylinders, horsepower, and acceleration
do not.

iii.
The regression coefficient for year, 0.7508, suggests that for every one year, mpg increases by the
coefficient. In other words, cars become more fuel efficient every year by almost 1 mpg / year.

9d.
par(mfrow=c(2,2))
plot(lm.fit1)

because there is a discernible curve pattern to the residuals plots. From the leverage plot, point 14 appears
to have high leverage, although not a high magnitude residual.

plot(predict(lm.fit1), rstudent(lm.fit1))

There are possible outliers as seen in the plot of studentized residuals because there are data with a value
greater than 3.

9e.
lm.fit2 = lm(mpg~cylinders*displacement+displacement*weight)
summary(lm.fit2)

From the correlation matrix, I obtained the two highest correlated pairs and used them in picking my
interaction effects. From the p-values, we can see that the interaction between displacement and weight is
statistically signifcant, while the interactiion between cylinders and displacement is not.

9f.
lm.fit3 = lm(mpg~log(weight)+sqrt(horsepower)+acceleration+I(acceleration^2))
summary(lm.fit3)

par(mfrow=c(2,2))
plot(lm.fit3)

plot(predict(lm.fit3), rstudent(lm.fit3))

Apparently, from the p-values, the log(weight), sqrt(horsepower), and acceleration^2 all have statistical
significance of some sort. The residuals plot has less of a discernible pattern than the plot of all linear
regression terms. The studentized residuals displays potential outliers (>3). The leverage plot indicates
more than three points with high leverage.

However, 2 problems are observed from the above plots: 1) the residuals vs fitted plot indicates
heteroskedasticity (unconstant variance over mean) in the model. 2) The Q-Q plot indicates somewhat
unnormality of the residuals.

So, a better transformation need to be applied to our model. From the correlation matrix in 9a.,
displacement, horsepower and weight show a similar nonlinear pattern against our response mpg. This nonlinear
pattern is very close to a log form. So in the next attempt, we use log(mpg) as our response variable.

The outputs show that log transform of mpg yield better model fitting (better R^2, normality of residuals).

lm.fit2<-lm(log(mpg)~cylinders+displacement+horsepower+weight+acceleration+year+origin,data=Auto)
summary(lm.fit2)

par(mfrow=c(2,2))
plot(lm.fit2)

plot(predict(lm.fit2),rstudent(lm.fit2))

10a.
library(ISLR)

summary(Carseats)

attach(Carseats)
lm.fit = lm(Sales~Price+Urban+US)
summary(lm.fit)

10b.
Price
The linear regression suggests a relationship between price and sales given the low p-value of the t-
statistic. The coefficient states a negative relationship between Price and Sales: as Price increases, Sales
decreases.

UrbanYes
The linear regression suggests that there isn’t a relationship between the location of the store and the
number of sales based on the high p-value of the t-statistic.

USYes
The linear regression suggests there is a relationship between whether the store is in the US or not and the
amount of sales. The coefficient states a positive relationship between USYes and Sales: if the store is in
the US, the sales will increase by approximately 1201 units.

10c.
Sales = 13.04 + -0.05 Price + -0.02 UrbanYes + 1.20 USYes

10d.
Price and USYes, based on the p-values, F-statistic, and p-value of the F-statistic.

10e.
lm.fit2 = lm(Sales ~ Price + US)
summary(lm.fit2)

10f.
Based on the RSE and R^2 of the linear regressions, they both fit the data similarly, with linear regression
from (e) fitting the data slightly better.

10g.
confint(lm.fit2)

10h.
plot(predict(lm.fit2), rstudent(lm.fit2))

All studentized residuals appear to be bounded by -3 to 3, so not potential outliers are suggested from the
linear regression.

par(mfrow=c(2,2))
plot(lm.fit2)

There are a few observations that greatly exceed (p+1)/n (0.0076) on the leverage-statistic plot that suggest
that the corresponding points have high leverage.

11.
set.seed(1)
x = rnorm(100)
y = 2*x + rnorm(100)

11a.
lm.fit = lm(y~x+0)
summary(lm.fit)

The p-value of the t-statistic is near zero so the null hypothesis is rejected.

11b.
lm.fit = lm(x~y+0)
summary(lm.fit)

The p-value of the t-statistic is near zero so the null hypothesis is rejected.

11c.
Both results in (a) and (b) reflect the same line created in 11a. In other words, y=2x+ϵ
could also be written x=0.5∗(y−ϵ).

11d.
(sqrt(length(x)-1) * sum(x*y)) / (sqrt(sum(x*x) * sum(y*y) - (sum(x*y))^2))

This is same as the t-statistic shown above.

11e.
If you swap t(x,y) as t(y,x), then you will find t(x,y) = t(y,x).

11f.
lm.fit = lm(y~x)
lm.fit2 = lm(x~y)
summary(lm.fit)

summary(lm.fit2)

You can see the t-statistic is the same for the two linear regressions.

12a.
When the sum of the squares of the observed y-values are equal to the sum of the squares of the observed x-
values.

12b.
set.seed(1)
x = rnorm(100)
y = 2*x
lm.fit = lm(y~x+0)
lm.fit2 = lm(x~y+0)
summary(lm.fit)

summary(lm.fit2)

The regression coefficients are different for each linear regression.

12c.
set.seed(1)
x <- rnorm(100)
y <- -sample(x, 100)
sum(x^2)

sum(y^2)

lm.fit <- lm(y~x+0)
lm.fit2 <- lm(x~y+0)
summary(lm.fit)

summary(lm.fit2)

The regression coefficients are the same for each linear regression. So long as sum sum(x^2) = sum(y^2) the
condition in 12a. will be satisfied. Here we have simply taken all the xi

in a different order and made them negative.

13a.
set.seed(1)
x = rnorm(100)

13b.
eps = rnorm(100, 0, sqrt(0.25))

13c.
y = -1 + 0.5*x + eps

y is of length 100. β0 is -1, β1 is 0.5.

13d.
plot(x, y)

plot of chunk unnamed-chunk-30

I observe a linear relationship between x and y with a positive slope, with a variance as is to be expected.

13e.
lm.fit = lm(y~x)
summary(lm.fit)

The linear regression fits a model close to the true value of the coefficients as was constructed. The model
has a large F-statistic with a near-zero p-value so the null hypothesis can be rejected.

13f.
plot(x, y)
abline(lm.fit, lwd=3, col=2)
abline(-1, 0.5, lwd=3, col=3)
legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)

13g.
lm.fit_sq = lm(y~x+I(x^2))
summary(lm.fit_sq)

There is evidence that model fit has increased over the training data given the slight increase in R2 and
RSE. Although, the p-value of the t-statistic suggests that there isn’t a relationship between y and x2.

13h.
set.seed(1)
eps1 = rnorm(100, 0, 0.125)
x1 = rnorm(100)
y1 = -1 + 0.5*x1 + eps1
plot(x1, y1)
lm.fit1 = lm(y1~x1)
summary(lm.fit1)

abline(lm.fit1, lwd=3, col=2)
abline(-1, 0.5, lwd=3, col=3)
legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)

As expected, the error observed in R2 and RSE decreases considerably.

13i.
set.seed(1)
eps2 = rnorm(100, 0, 0.5)
x2 = rnorm(100)
y2 = -1 + 0.5*x2 + eps2
plot(x2, y2)
lm.fit2 = lm(y2~x2)
summary(lm.fit2)

abline(lm.fit2, lwd=3, col=2)
abline(-1, 0.5, lwd=3, col=3)
legend(-1, legend = c("model fit", "pop. regression"), col=2:3, lwd=3)

As expected, the error observed in R2 and RSE increases considerably.

13j.
confint(lm.fit)

confint(lm.fit1)

confint(lm.fit2)

All intervals seem to be centered on approximately 0.5, with the second fit’s interval being narrower than
the first fit’s interval and the last fit’s interval being wider than the first fit’s interval.

14a.
set.seed(1)
x1 = runif(100)
x2 = 0.5 * x1 + rnorm(100)/10
y = 2 + 2*x1 + 0.3*x2 + rnorm(100)

14b.
cor(x1, x2)

plot(x1, x2)

14c.
lm.fit = lm(y~x1+x2)
summary(lm.fit)

The regression coefficients are close to the true coefficients, although with high standard error. We can
reject the null hypothesis for β1 because its p-value is below 5%. We cannot reject the null hypothesis for
β2

because its p-value is much above the 5% typical cutoff, over 60%.

14d.
lm.fit = lm(y~x1)
summary(lm.fit)

Yes, we can reject the null hypothesis for the regression coefficient given the p-value for its t-statistic
is near zero.

14e.
lm.fit = lm(y~x2)
summary(lm.fit)

Yes, we can reject the null hypothesis for the regression coefficient given the p-value for its t-statistic
is near zero.

14f.
No, because x1 and x2 have collinearity, it is hard to distinguish their effects when regressed upon
together. When they are regressed upon separately, the linear relationship between y and each predictor is
indicated more clearly.

14g.
x1 = c(x1, 0.1)
x2 = c(x2, 0.8)
y = c(y, 6)
lm.fit1 = lm(y~x1+x2)
summary(lm.fit1)

lm.fit2 = lm(y~x1)
summary(lm.fit2)

lm.fit3 = lm(y~x2)
summary(lm.fit3)

In the first model, it shifts x1 to statistically insignificance and shifts x2 to statistiscal significance
from the change in p-values between the two linear regressions.

par(mfrow=c(2,2))
plot(lm.fit1)

par(mfrow=c(2,2))
plot(lm.fit2)

par(mfrow=c(2,2))
plot(lm.fit3)

In the first and third models, the point becomes a high leverage point.

plot(predict(lm.fit1), rstudent(lm.fit1))

plot(predict(lm.fit2), rstudent(lm.fit2))

plot(predict(lm.fit3), rstudent(lm.fit3))

Looking at the studentized residuals, we don’t observe points too far from the |3| value cutoff, except for
the second linear regression: y ~ x1.

15a.
library(MASS)

summary(Boston)

Boston$chas <- factor(Boston$chas, labels = c("N","Y"))
summary(Boston)

attach(Boston)
lm.zn = lm(crim~zn)
summary(lm.zn) # yes

lm.indus = lm(crim~indus)
summary(lm.indus) # yes

lm.chas = lm(crim~chas)
summary(lm.chas) # no

lm.nox = lm(crim~nox)
summary(lm.nox) # yes

lm.rm = lm(crim~rm)
summary(lm.rm) # yes

lm.age = lm(crim~age)
summary(lm.age) # yes

lm.dis = lm(crim~dis)
summary(lm.dis) # yes

lm.rad = lm(crim~rad)
summary(lm.rad) # yes

lm.tax = lm(crim~tax)
summary(lm.tax) # yes

lm.ptratio = lm(crim~ptratio)
summary(lm.ptratio) # yes

lm.black = lm(crim~black)
summary(lm.black) # yes

lm.lstat = lm(crim~lstat)
summary(lm.lstat) # yes

lm.medv = lm(crim~medv)
summary(lm.medv) # yes

All, except chas. Plot each linear regression using “plot(lm)” to see residuals.

15b.
lm.all = lm(crim~., data=Boston)
summary(lm.all)

15c.
x = c(coefficients(lm.zn)[2],
coefficients(lm.indus)[2],
coefficients(lm.chas)[2],
coefficients(lm.nox)[2],
coefficients(lm.rm)[2],
coefficients(lm.age)[2],
coefficients(lm.dis)[2],
coefficients(lm.rad)[2],
coefficients(lm.tax)[2],
coefficients(lm.ptratio)[2],
coefficients(lm.black)[2],
coefficients(lm.lstat)[2],
coefficients(lm.medv)[2])
y = coefficients(lm.all)[2:14]
plot(x, y)

15d.
lm.zn = lm(crim~poly(zn,3))
summary(lm.zn) # 1, 2

lm.indus = lm(crim~poly(indus,3))
summary(lm.indus) # 1, 2, 3

# lm.chas = lm(crim~poly(chas,3)) : qualitative predictor
lm.nox = lm(crim~poly(nox,3))
summary(lm.nox) # 1, 2, 3

lm.rm = lm(crim~poly(rm,3))
summary(lm.rm) # 1, 2

lm.age = lm(crim~poly(age,3))
summary(lm.age) # 1, 2, 3

lm.dis = lm(crim~poly(dis,3))
summary(lm.dis) # 1, 2, 3

lm.rad = lm(crim~poly(rad,3))
summary(lm.rad) # 1, 2

lm.tax = lm(crim~poly(tax,3))
summary(lm.tax) # 1, 2

lm.ptratio = lm(crim~poly(ptratio,3))
summary(lm.ptratio) # 1, 2, 3

lm.black = lm(crim~poly(black,3))
summary(lm.black) # 1

lm.lstat = lm(crim~poly(lstat,3))
summary(lm.lstat) # 1, 2

lm.medv = lm(crim~poly(medv,3))
summary(lm.medv) # 1, 2, 3

【资源来源】

代码、数据：
https://github.com/asadoughi/stat-learning/archive/refs/heads/master.zip
https://www.statlearning.com/resources-first-edition

课后答案：https://blog.princehonest.com/stat-learning/

网上视频：https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/

《统计学习导论：基于R应用》第3章课后习题参考答案相关推荐

《统计学习导论-基于R应用》第二章：统计学习（代码）
A = matrix(seq(1,16),4,4) A 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 A[1,] 1 5 9 13 A[-c(1,3),] 2 6 10 ...
统计学习导论 - 基于R的应用学习笔记1
统计学习导论 - 基于R的应用学习笔记 Chapter 1 导论统计学习是什么: 关于估计ƒ的一系列方法 Y=f(x)+ξ f:X 提供给 Y 的系统信息,是 x 的函数 ξ:随机误差项:均值为0 ...
最优化理论c语言代码,《统计学习导论基于R应用》PDF代码导图+《最优化理论与算法第2版》PDF习题指导...
要想深入理解机器学习,或者对人工智能的某个领域有所研究,都必须掌握统计学.最优化.矩阵及其应用等知识. 推荐<统计学习导论:基于R应用>,适合运用统计学习前沿技术分析数据的人士.读起来不费 ...
【医学信息学】《统计学习导论-基于r应用》的学习总结
以下仅为笔者在学习<统计学习导论-基于r应用>过程中的理解总结,如有错误,敬请指正统计学习导论概述什么是统计学习? 对一系列观测值(自变量/预测变量/X,因变量/响应变量/Y)之间的关 ...
《统计学习导论-基于R应用》
机器学习更底层的东西在<统计学习理论的本质>里, <统计学习理论的本质>更现代化.更通俗的理解在这里 <统计学习导论-基于R应用> Gareth James Dan ...
mysql 三阶多项式拟合,《统计学习导论-基于R应用》第三章：线性回归（代码）...
库library 库:一组不含在基础R配置内的函数和数据集 library(MASS) # 加载库 library(ISLR)# 安装库 install.packages("ISLR&quo ...
统计学习导论-基于R应用学习笔记
目录误差假设检验 F-检验分类classification 线性判别分析(LDA) Threshold 分类阀值 resampling 重采样留一法交叉验证(LOOCV) The Bootst ...
统计学习导论基于R应用——作业 3
7. 解: (a)欧几里德距离: |x| = sqrt( x[1]^2 + x[2]^2 + - + x[n]^2 ) 所以计算结果是: 1.3. 2.2. 3.sqrt(10). 4.sqrt(5) ...
《统计学习导论：基于R应用》第2章课后习题参考答案
[第2章课后习题参考答案] Chapter 2 Exercise 1. (a) better - a more flexible approach will fit the data closer a ...

《统计学习导论：基于R应用》第3章课后习题参考答案

《统计学习导论：基于R应用》第3章课后习题参考答案相关推荐

最新文章

热门文章