应用预测建模第六章线性回归习题6.1【主成分分析,模型的最优参数选择与模型对比 ,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】
模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网
语言:R语言
参考书:应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译
案例:
( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?
( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?
( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?
( e)解释你将使用哪个模型来预测样品的脂肪含量。
载入数据
library(caret)#载入数据
data(tecator)
head(absorp)
head(endpoints)
> #载入数据
> data(tecator)
> head(absorp)[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 2.61776 2.61814 2.61859 2.61912 2.61981 2.62071 2.62186 2.62334 2.62511 2.62722 2.62964
[2,] 2.83454 2.83871 2.84283 2.84705 2.85138 2.85587 2.86060 2.86566 2.87093 2.87661 2.88264
[3,] 2.58284 2.58458 2.58629 2.58808 2.58996 2.59192 2.59401 2.59627 2.59873 2.60131 2.60414
[4,] 2.82286 2.82460 2.82630 2.82814 2.83001 2.83192 2.83392 2.83606 2.83842 2.84097 2.84374
[5,] 2.78813 2.78989 2.79167 2.79350 2.79538 2.79746 2.79984 2.80254 2.80553 2.80890 2.81272
[6,] 3.00993 3.01540 3.02086 3.02634 3.03190 3.03756 3.04341 3.04955 3.05599 3.06274 3.06982[,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
[1,] 2.63245 2.63565 2.63933 2.64353 2.64825 2.65350 2.65937 2.66585 2.67281 2.68008 2.68733
[2,] 2.88898 2.89577 2.90308 2.91097 2.91953 2.92873 2.93863 2.94929 2.96072 2.97272 2.98493
[3,] 2.60714 2.61029 2.61361 2.61714 2.62089 2.62486 2.62909 2.63361 2.63835 2.64330 2.64838
[4,] 2.84664 2.84975 2.85307 2.85661 2.86038 2.86437 2.86860 2.87308 2.87789 2.88301 2.88832
[5,] 2.81704 2.82184 2.82710 2.83294 2.83945 2.84664 2.85458 2.86331 2.87280 2.88291 2.89335
[6,] 3.07724 3.08511 3.09343 3.10231 3.11185 3.12205 3.13294 3.14457 3.15703 3.17038 3.18429[,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33]
[1,] 2.69427 2.70073 2.70684 2.71281 2.71914 2.72628 2.73462 2.74416 2.75466 2.76568 2.77679
[2,] 2.99690 3.00833 3.01920 3.02990 3.04101 3.05345 3.06777 3.08416 3.10221 3.12106 3.13983
[3,] 2.65354 2.65870 2.66375 2.66880 2.67383 2.67892 2.68411 2.68937 2.69470 2.70012 2.70563
[4,] 2.89374 2.89917 2.90457 2.90991 2.91521 2.92043 2.92565 2.93082 2.93604 2.94128 2.94658
[5,] 2.90374 2.91371 2.92305 2.93187 2.94060 2.94986 2.96035 2.97241 2.98606 3.00097 3.01652
[6,] 3.19840 3.21225 3.22552 3.23827 3.25084 3.26393 3.27851 3.29514 3.31401 3.33458 3.35591[,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44]
[1,] 2.78790 2.79949 2.81225 2.82706 2.84356 2.86106 2.87857 2.89497 2.90924 2.92085 2.93015
[2,] 3.15810 3.17623 3.19519 3.21584 3.23747 3.25889 3.27835 3.29384 3.30362 3.30681 3.30393
[3,] 2.71141 2.71775 2.72490 2.73344 2.74327 2.75433 2.76642 2.77931 2.79272 2.80649 2.82064
[4,] 2.95202 2.95777 2.96419 2.97159 2.98045 2.99090 3.00284 3.01611 3.03048 3.04579 3.06194
[5,] 3.03220 3.04793 3.06413 3.08153 3.10078 3.12185 3.14371 3.16510 3.18470 3.20140 3.21477
[6,] 3.37709 3.39772 3.41828 3.43974 3.46266 3.48663 3.51002 3.53087 3.54711 3.55699 3.55986[,45] [,46] [,47] [,48] [,49] [,50] [,51] [,52] [,53] [,54] [,55]
[1,] 2.93846 2.94771 2.96019 2.97831 3.00306 3.03506 3.07428 3.11963 3.16868 3.21771 3.26254
[2,] 3.29700 3.28925 3.28409 3.28505 3.29326 3.30923 3.33267 3.36251 3.39661 3.43188 3.46492
[3,] 2.83541 2.85121 2.86872 2.88905 2.91289 2.94088 2.97325 3.00946 3.04780 3.08554 3.11947
[4,] 3.07889 3.09686 3.11629 3.13775 3.16217 3.19068 3.22376 3.26172 3.30379 3.34793 3.39093
[5,] 3.22544 3.23505 3.24586 3.26027 3.28063 3.30889 3.34543 3.39019 3.44198 3.49800 3.55407
[6,] 3.55656 3.54937 3.54169 3.53692 3.53823 3.54760 3.56512 3.59043 3.62229 3.65830 3.69515[,56] [,57] [,58] [,59] [,60] [,61] [,62] [,63] [,64] [,65] [,66]
[1,] 3.29988 3.32847 3.34899 3.36342 3.37379 3.38152 3.38741 3.39164 3.39418 3.39490 3.39366
[2,] 3.49295 3.51458 3.53004 3.54067 3.54797 3.55306 3.55675 3.55921 3.56045 3.56034 3.55876
[3,] 3.14696 3.16677 3.17938 3.18631 3.18924 3.18950 3.18801 3.18498 3.18039 3.17411 3.16611
[4,] 3.42920 3.45998 3.48227 3.49687 3.50558 3.51026 3.51221 3.51215 3.51036 3.50682 3.50140
[5,] 3.60534 3.64789 3.68011 3.70272 3.71815 3.72863 3.73574 3.74059 3.74357 3.74453 3.74336
[6,] 3.72932 3.75803 3.78003 3.79560 3.80614 3.81313 3.81774 3.82079 3.82258 3.82301 3.82206[,67] [,68] [,69] [,70] [,71] [,72] [,73] [,74] [,75] [,76] [,77]
[1,] 3.39045 3.38541 3.37869 3.37041 3.36073 3.34979 3.33769 3.32443 3.31013 3.29487 3.27891
[2,] 3.55571 3.55132 3.54585 3.53950 3.53235 3.52442 3.51583 3.50668 3.49700 3.48683 3.47626
[3,] 3.15641 3.14512 3.13241 3.11843 3.10329 3.08714 3.07014 3.05237 3.03393 3.01504 2.99569
[4,] 3.49398 3.48457 3.47333 3.46041 3.44595 3.43005 3.41285 3.39450 3.37511 3.35482 3.33376
[5,] 3.73991 3.73418 3.72638 3.71676 3.70553 3.69289 3.67900 3.66396 3.64785 3.63085 3.61305
[6,] 3.81959 3.81557 3.81021 3.80375 3.79642 3.78835 3.77958 3.77024 3.76040 3.75005 3.73929[,78] [,79] [,80] [,81] [,82] [,83] [,84] [,85] [,86] [,87] [,88]
[1,] 3.26232 3.24542 3.22828 3.21080 3.19287 3.17433 3.15503 3.13475 3.11339 3.09116 3.06850
[2,] 3.46552 3.45501 3.44481 3.43477 3.42465 3.41419 3.40303 3.39082 3.37731 3.36265 3.34745
[3,] 2.97612 2.95642 2.93660 2.91667 2.89655 2.87622 2.85563 2.83474 2.81361 2.79235 2.77113
[4,] 3.31204 3.28986 3.26730 3.24442 3.22117 3.19757 3.17357 3.14915 3.12429 3.09908 3.07366
[5,] 3.59463 3.57582 3.55695 3.53796 3.51880 3.49936 3.47938 3.45869 3.43711 3.41458 3.39129
[6,] 3.72831 3.71738 3.70681 3.69664 3.68659 3.67649 3.66611 3.65503 3.64283 3.62938 3.61483[,89] [,90] [,91] [,92] [,93] [,94] [,95] [,96] [,97] [,98] [,99]
[1,] 3.04596 3.02393 3.00247 2.98145 2.96072 2.94013 2.91978 2.89966 2.87964 2.85960 2.83940
[2,] 3.33245 3.31818 3.30473 3.29186 3.27921 3.26655 3.25369 3.24045 3.22659 3.21181 3.19600
[3,] 2.75015 2.72956 2.70934 2.68951 2.67009 2.65112 2.63262 2.61461 2.59718 2.58034 2.56404
[4,] 3.04825 3.02308 2.99820 2.97367 2.94951 2.92576 2.90251 2.87988 2.85794 2.83672 2.81617
[5,] 3.36772 3.34450 3.32201 3.30025 3.27907 3.25831 3.23784 3.21765 3.19766 3.17770 3.15770
[6,] 3.59990 3.58535 3.57163 3.55877 3.54651 3.53442 3.52221 3.50972 3.49682 3.48325 3.46870[,100]
[1,] 2.81920
[2,] 3.17942
[3,] 2.54816
[4,] 2.79622
[5,] 3.13753
[6,] 3.45307
> head(endpoints)[,1] [,2] [,3]
[1,] 60.5 22.5 16.7
[2,] 46.0 40.1 13.5
[3,] 71.0 8.4 20.5
[4,] 72.8 5.9 20.7
[5,] 58.3 25.5 15.5
[6,] 44.0 42.7 13.7
( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?
主成分分析
应用主成分分析时,注意以下五点:
- 可使用样本协方差阵或相关系数矩阵为出发点来进行分析,但大都以相关系数矩阵为主;
- 为使方差达到最大,通常主成分分析是不加以转轴的;
- 成分的保留(可有三种方法参考,1:保留特征值大于1的主成分;2:碎石图,在图形变化最大处之上的主成分均可保留;3:平行分析,将真实数据的特征值与模拟数据的特征值进行比较,保留真实数据的特征值大于模拟数据的特征值的主成分);
- 在实际研究里,研究者如果用不超过三或五个成分就能解释变异的80%,就算令人满意;
- 使用成分得分后,会使各变量的方差为最大,而且各变量之间会彼此独立正交。
PCA=princomp(absorp,cor=T)#cor=T时,输入矩阵为相关系数矩阵,每个元素是0<=x<=1的,对角线为1。
help(princomp)
一共有100个成分。查看成分的重要性(每个成分的标准差,方差贡献率,累计方差贡献率)。
可见,第一个成分解释了98.62%的方差。
> summary(PCA)
Importance of components:Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
Standard deviation 9.9310721 0.984736121 0.528511377 0.338274841 8.037979e-02 5.123077e-02
Proportion of Variance 0.9862619 0.009697052 0.002793243 0.001144299 6.460911e-05 2.624591e-05
Cumulative Proportion 0.9862619 0.995958978 0.998752221 0.999896520 9.999611e-01 9.999874e-01
每个成分的标准差即为特征值开根号的结果,可以选择保留特征值大于1的主成分 。
本例中,保留第一个成分。
> PCA$sdevComp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7
9.931072e+00 9.847361e-01 5.285114e-01 3.382748e-01 8.037979e-02 5.123077e-02 2.680884e-02 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14
1.960880e-02 8.564232e-03 6.739417e-03 4.441898e-03 3.360852e-03 1.867188e-03 1.376574e-03
由碎石图可以看出,第二个主成分及之后图线变化趋于平稳,因此可以选择第一个主成分
screeplot(PCA,type="lines")
结论:这些数据的有效维度为1维。
( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?
进行数据预处理
#数据预处理
summary(absorp)
summary(endpoints)
#有缺失值的变量所在的位置
NAcol <- which(colSums(is.na(absorp))>0)
#本例无缺失值#计算偏度
library(e1071)
summary(apply(absorp,2,skewness) )#矩阵转为数据框
absorp<-as.data.frame(absorp)#box-cox变换
boxcox = function(x){trans = BoxCoxTrans(x)x_bc = predict( trans, x )
}
absorp.trans<-apply( absorp, 2, boxcox )
absorp.trans<-as.data.frame(absorp.trans)#偏度减小
summary(apply(absorp.trans,2,skewness) )#响应变量
fat<-endpoints[,2]
> #计算偏度
> library(e1071)
> summary(apply(absorp,2,skewness) )Min. 1st Qu. Median Mean 3rd Qu. Max. 0.8260 0.8432 0.8946 0.9027 0.9667 0.9976
> #偏度减小
> summary(apply(absorp.trans,2,skewness) )Min. 1st Qu. Median Mean 3rd Qu. Max.
0.005178 0.021773 0.034949 0.034739 0.046840 0.066915
进行建模。
由于本数据集为小样本(215),且建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择),因此重抽样方法采取自助法。
#本次的重抽样方法用自助法,因为案例为小样本,建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择)#设定随机数种子,这样重抽样数据集可以重复
set.seed(100)
indx <- createResample(fat,times = 50, list = TRUE)
ctrl <- trainControl(method = "boot",number=50,index = indx)
线性回归
#进行线性回归
set.seed(100)
lmFit1 <- train(x = absorp.trans, y = fat,method = "lm",trControl = ctrl,preProc = c("center", "scale"))lmFit1
> lmFit1
Linear Regression 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results:RMSE Rsquared MAE 2.706693 0.9560697 1.87457Tuning parameter 'intercept' was held constant at a value of TRUE
最小二乘回归PLS
#PLS
set.seed(100)
plsTune <- train(x = absorp.trans, y = fat,method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelplstuneGrid = expand.grid(ncomp = 1:50),#设定成分数trControl = ctrl,preProc = c("center", "scale"))
plsTune
PLS调优参数:最佳成分数为20
> plsTune
Partial Least Squares 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:ncomp RMSE Rsquared MAE 1 11.183174 0.2396493 9.0881242 8.686219 0.5457929 6.9297973 5.436814 0.8171801 4.0171264 4.719667 0.8639848 3.6372225 3.183214 0.9391108 2.4327786 3.113421 0.9417685 2.4079837 2.981929 0.9478784 2.2033648 2.803669 0.9537681 2.0047709 2.658721 0.9578675 1.85133110 2.486691 0.9635027 1.72463311 2.291300 0.9688312 1.59411312 2.148954 0.9725424 1.52789913 2.046004 0.9746878 1.46225414 2.019919 0.9752486 1.42562215 1.909752 0.9777992 1.33613716 1.760398 0.9808560 1.22425017 1.666462 0.9829127 1.17177718 1.590492 0.9845054 1.13406719 1.567033 0.9849643 1.12889820 1.534394 0.9855824 1.11320021 1.560273 0.9850988 1.11998622 1.566204 0.9849703 1.11595223 1.553964 0.9851720 1.10592924 1.591527 0.9845027 1.11878825 1.625377 0.9838303 1.13515726 1.658889 0.9831096 1.15761127 1.683492 0.9824806 1.17255428 1.744393 0.9811685 1.20838329 1.795215 0.9800030 1.24479430 1.848273 0.9788338 1.28698731 1.883307 0.9780175 1.31840932 1.938951 0.9767456 1.36051033 1.986740 0.9755708 1.39239134 2.023170 0.9746550 1.42209735 2.071566 0.9734367 1.45308736 2.112281 0.9724229 1.47897137 2.146216 0.9715039 1.50082838 2.175165 0.9708326 1.52076339 2.192173 0.9705042 1.53653640 2.222708 0.9697809 1.55730841 2.246722 0.9692350 1.57567542 2.256637 0.9689731 1.58778543 2.274497 0.9685442 1.60437544 2.306888 0.9677469 1.62721745 2.329405 0.9671802 1.64455346 2.359832 0.9663816 1.66237747 2.374701 0.9659943 1.67354048 2.404638 0.9652013 1.69198049 2.433347 0.9643316 1.70994550 2.449268 0.9638598 1.721255RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 20.
稳健回归
#稳健回归
set.seed(100)
rlmFit <- train(x = absorp.trans, y = fat,method = "rlm",trControl = ctrl,preProc = c("center", "scale"))rlmFit
有截距,方法为huber
> rlmFit
Robust Linear Model 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:intercept psi RMSE Rsquared MAE FALSE psi.huber 18.145830 0.9560697 17.940133FALSE psi.hampel 18.145830 0.9560697 17.940133FALSE psi.bisquare 18.146153 0.9560864 17.940487TRUE psi.huber 3.220055 0.9385840 2.215829TRUE psi.hampel 25.311035 0.3743814 17.164686TRUE psi.bisquare 76.651845 0.4210313 54.553360RMSE was used to select the optimal model using the smallest value.
The final values used for the model were intercept = TRUE and psi = psi.huber.
岭回归
#岭回归
#用train函数选择岭回归的最佳参数
#设定正则化参数 取值范围为0-0.1,中间取15个值
ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))set.seed(100)
ridgeTune <- train(x = absorp.trans, y = fat,method = "ridge", #岭回归tuneGrid = ridgeGrid,trControl = ctrl,preProc = c("center", "scale"))ridgeTune
岭回归调优参数:正则化参数lambda为0,即不进行正则化
> ridgeTune
Ridge Regression 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:lambda RMSE Rsquared MAE 0.000000000 2.706699 0.9560687 1.8745790.007142857 3.643592 0.9213753 2.7489340.014285714 4.052245 0.9031794 3.0116700.021428571 4.316364 0.8907687 3.1753790.028571429 4.511024 0.8814221 3.3005250.035714286 4.668275 0.8737664 3.4064870.042857143 4.803098 0.8670998 3.5022200.050000000 4.923200 0.8610413 3.5897880.057142857 5.032872 0.8553722 3.6726770.064285714 5.134671 0.8499616 3.7516020.071428571 5.230211 0.8447286 3.8268110.078571429 5.320565 0.8396221 3.8989180.085714286 5.406487 0.8346093 3.9679950.092857143 5.488526 0.8296690 4.0343240.100000000 5.567103 0.8247876 4.099350RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.
弹性网
#enet弹性网
#弹性网模型同时具有岭回归罚参数和lasso 罚参数
#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)
#fraction为lasso罚参数,取值范围为0.05-1,取了20个值
enetGrid <- expand.grid(lambda = seq(0, .1, length = 15), fraction = seq(.05, 1, length = 20))
set.seed(100)
enetTune <- train(x = absorp.trans, y = fat,method = "enet", #弹性网 elastic nettuneGrid = enetGrid,trControl = ctrl,preProc = c("center", "scale"))
enetTune
弹性网的调优参数:岭回归罚lambda为0,lasso罚为0.0526
> enetTune
Elasticnet 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:lambda fraction RMSE Rsquared MAE 0.000000000 0.00000000 12.719745 NaN 10.7471910.000000000 0.05263158 1.505096 0.9861744 1.0723240.000000000 0.10526316 1.611281 0.9839983 1.1331290.000000000 0.15789474 1.713409 0.9818137 1.2004350.000000000 0.21052632 1.787748 0.9802273 1.2576520.000000000 0.26315789 1.844759 0.9790006 1.3034760.000000000 0.31578947 1.897734 0.9778433 1.3443800.000000000 0.36842105 1.952677 0.9766278 1.3834940.000000000 0.42105263 2.008177 0.9753659 1.4245760.000000000 0.47368421 2.060634 0.9741002 1.4633390.000000000 0.52631579 2.113615 0.9727729 1.5025240.000000000 0.57894737 2.170547 0.9713402 1.5409150.000000000 0.63157895 2.232797 0.9697258 1.5817970.000000000 0.68421053 2.294018 0.9680902 1.6208180.000000000 0.73684211 2.357063 0.9663678 1.6604600.000000000 0.78947368 2.423098 0.9645125 1.7004070.000000000 0.84210526 2.490069 0.9625804 1.7407590.000000000 0.89473684 2.560683 0.9604961 1.7842470.000000000 0.94736842 2.632378 0.9583456 1.8287120.000000000 1.00000000 2.706699 0.9560687 1.8745790.007142857 0.00000000 12.719745 NaN 10.7471910.007142857 0.05263158 10.350630 0.3671638 8.4389780.007142857 0.10526316 9.476790 0.4998352 7.7286730.007142857 0.15789474 8.639364 0.6119787 7.0450300.007142857 0.21052632 7.846910 0.6977900 6.3788260.007142857 0.26315789 7.111837 0.7588304 5.7448520.007142857 0.31578947 6.465549 0.7991165 5.1680810.007142857 0.36842105 5.913752 0.8265441 4.6560870.007142857 0.42105263 5.440989 0.8490667 4.2295220.007142857 0.47368421 5.027732 0.8674093 3.8589510.007142857 0.52631579 4.688027 0.8811803 3.5508590.007142857 0.57894737 4.415170 0.8914804 3.3079660.007142857 0.63157895 4.206019 0.8992059 3.1361130.007142857 0.68421053 4.051948 0.9049141 3.0238970.007142857 0.73684211 3.935221 0.9093661 2.9454590.007142857 0.78947368 3.848531 0.9128567 2.8884260.007142857 0.84210526 3.779217 0.9157310 2.8419660.007142857 0.89473684 3.722209 0.9181026 2.8030510.007142857 0.94736842 3.676363 0.9200050 2.7715920.007142857 1.00000000 3.643592 0.9213753 2.7489340.014285714 0.00000000 12.719745 NaN 10.7471910.014285714 0.05263158 10.502349 0.3428964 8.5586950.014285714 0.10526316 9.773519 0.4546769 7.9611950.014285714 0.15789474 9.069145 0.5553162 7.3903260.014285714 0.21052632 8.394427 0.6389904 6.8328860.014285714 0.26315789 7.761244 0.7038771 6.2961830.014285714 0.31578947 7.168269 0.7528274 5.7858710.014285714 0.36842105 6.630843 0.7880543 5.3099860.014285714 0.42105263 6.163194 0.8129957 4.8812570.014285714 0.47368421 5.767428 0.8322562 4.5123570.014285714 0.52631579 5.416732 0.8490222 4.1943300.014285714 0.57894737 5.106516 0.8627297 3.9080380.014285714 0.63157895 4.841776 0.8735009 3.6602740.014285714 0.68421053 4.631100 0.8814848 3.4646790.014285714 0.73684211 4.459273 0.8877800 3.3125590.014285714 0.78947368 4.326597 0.8925717 3.2027470.014285714 0.84210526 4.226622 0.8961885 3.1294340.014285714 0.89473684 4.151838 0.8990589 3.0773070.014285714 0.94736842 4.095699 0.9013489 3.0397110.014285714 1.00000000 4.052245 0.9031794 3.0116700.021428571 0.00000000 12.719745 NaN 10.7471910.021428571 0.05263158 10.568050 0.3321870 8.6078460.021428571 0.10526316 9.903277 0.4339021 8.0598250.021428571 0.15789474 9.258979 0.5277223 7.5369740.021428571 0.21052632 8.643469 0.6077921 7.0316430.021428571 0.26315789 8.056217 0.6732191 6.5384040.021428571 0.31578947 7.500502 0.7247124 6.0633810.021428571 0.36842105 6.989114 0.7634905 5.6202270.021428571 0.42105263 6.534448 0.7915351 5.2119650.021428571 0.47368421 6.143486 0.8120566 4.8489220.021428571 0.52631579 5.799866 0.8294819 4.5278030.021428571 0.57894737 5.491306 0.8443804 4.2426450.021428571 0.63157895 5.225475 0.8562590 3.9926740.021428571 0.68421053 4.995680 0.8657523 3.7732730.021428571 0.73684211 4.802370 0.8732078 3.5901650.021428571 0.78947368 4.648047 0.8788861 3.4464890.021428571 0.84210526 4.525632 0.8832665 3.3388550.021428571 0.89473684 4.430319 0.8866321 3.2600020.021428571 0.94736842 4.364849 0.8889374 3.2100780.021428571 1.00000000 4.316364 0.8907687 3.1753790.028571429 0.00000000 12.719745 NaN 10.7471910.028571429 0.05263158 10.605392 0.3259594 8.6335460.028571429 0.10526316 9.976188 0.4217673 8.1127060.028571429 0.15789474 9.368227 0.5108284 7.6181460.028571429 0.21052632 8.784915 0.5884503 7.1396890.028571429 0.26315789 8.224602 0.6535386 6.6708140.028571429 0.31578947 7.696412 0.7055287 6.2210960.028571429 0.36842105 7.207055 0.7456827 5.7994050.028571429 0.42105263 6.764782 0.7757525 5.4079320.028571429 0.47368421 6.376330 0.7980554 5.0517090.028571429 0.52631579 6.038847 0.8158720 4.7340500.028571429 0.57894737 5.739066 0.8312751 4.4529230.028571429 0.63157895 5.476325 0.8437891 4.2055950.028571429 0.68421053 5.242242 0.8541133 3.9812190.028571429 0.73684211 5.041635 0.8623182 3.7867110.028571429 0.78947368 4.879378 0.8684836 3.6311710.028571429 0.84210526 4.745253 0.8734010 3.5047120.028571429 0.89473684 4.641796 0.8770309 3.4106860.028571429 0.94736842 4.567118 0.8795336 3.3459360.028571429 1.00000000 4.511024 0.8814221 3.3005250.035714286 0.00000000 12.719745 NaN 10.7471910.035714286 0.05263158 10.630389 0.3216962 8.6491420.035714286 0.10526316 10.025364 0.4132908 8.1463980.035714286 0.15789474 9.441794 0.4988722 7.6706470.035714286 0.21052632 8.878988 0.5746934 7.2082900.035714286 0.26315789 8.339491 0.6389087 6.7569220.035714286 0.31578947 7.831674 0.6908389 6.3255320.035714286 0.36842105 7.359193 0.7316532 5.9194040.035714286 0.42105263 6.926296 0.7631330 5.5391670.035714286 0.47368421 6.543396 0.7867501 5.1914190.035714286 0.52631579 6.210901 0.8051696 4.8797790.035714286 0.57894737 5.921095 0.8207037 4.6051580.035714286 0.63157895 5.658841 0.8338811 4.3565540.035714286 0.68421053 5.425099 0.8447735 4.1320490.035714286 0.73684211 5.224377 0.8533910 3.9357460.035714286 0.78947368 5.056868 0.8600396 3.7720630.035714286 0.84210526 4.916623 0.8653073 3.6366310.035714286 0.89473684 4.810863 0.8690600 3.5363020.035714286 0.94736842 4.729634 0.8717975 3.4612350.035714286 1.00000000 4.668275 0.8737664 3.4064870.042857143 0.00000000 12.719745 NaN 10.7471910.042857143 0.05263158 10.649006 0.3183598 8.6594080.042857143 0.10526316 10.063732 0.4065052 8.1714190.042857143 0.15789474 9.498332 0.4893522 7.7093200.042857143 0.21052632 8.951603 0.5635681 7.2590620.042857143 0.26315789 8.429932 0.6267147 6.8228510.042857143 0.31578947 7.938112 0.6784272 6.4058750.042857143 0.36842105 7.478275 0.7197554 6.0107010.042857143 0.42105263 7.053938 0.7521797 5.6391660.042857143 0.47368421 6.677302 0.7767562 5.2994040.042857143 0.52631579 6.352896 0.7955833 4.9968250.042857143 0.57894737 6.066895 0.8115599 4.7257890.042857143 0.63157895 5.807544 0.8252168 4.4779260.042857143 0.68421053 5.574963 0.8365873 4.2536070.042857143 0.73684211 5.375833 0.8455274 4.0584610.042857143 0.78947368 5.204039 0.8526667 3.8888740.042857143 0.84210526 5.061502 0.8581731 3.7484930.042857143 0.89473684 4.954026 0.8620665 3.6446320.042857143 0.94736842 4.868042 0.8650243 3.5625850.042857143 1.00000000 4.803098 0.8670998 3.5022200.050000000 0.00000000 12.719745 NaN 10.7471910.050000000 0.05263158 10.664511 0.3155007 8.6670790.050000000 0.10526316 10.096683 0.4006075 8.1923190.050000000 0.15789474 9.546087 0.4811327 7.7410900.050000000 0.21052632 9.014028 0.5537204 7.3021580.050000000 0.26315789 8.507619 0.6158259 6.8784130.050000000 0.31578947 8.028660 0.6673112 6.4721900.050000000 0.36842105 7.579264 0.7090114 6.0862930.050000000 0.42105263 7.164037 0.7420437 5.7231390.050000000 0.47368421 6.794556 0.7672864 5.3913970.050000000 0.52631579 6.476421 0.7866019 5.0963950.050000000 0.57894737 6.193359 0.8030704 4.8290240.050000000 0.63157895 5.936877 0.8171967 4.5839140.050000000 0.68421053 5.707621 0.8288982 4.3614430.050000000 0.73684211 5.508621 0.8382434 4.1660450.050000000 0.78947368 5.334012 0.8458269 3.9929270.050000000 0.84210526 5.191385 0.8515252 3.8514080.050000000 0.89473684 5.081095 0.8556455 3.7420930.050000000 0.94736842 4.991350 0.8588172 3.6550830.050000000 1.00000000 4.923200 0.8610413 3.5897880.057142857 0.00000000 12.719745 NaN 10.7471910.057142857 0.05263158 10.676963 0.3130233 8.6727580.057142857 0.10526316 10.125993 0.3953287 8.2105360.057142857 0.15789474 9.588444 0.4737441 7.7688320.057142857 0.21052632 9.069954 0.5447150 7.3399950.057142857 0.26315789 8.576760 0.6058303 6.9265220.057142857 0.31578947 8.109112 0.6570301 6.5303000.057142857 0.36842105 7.668838 0.6990048 6.1523850.057142857 0.42105263 7.262463 0.7324665 5.7971810.057142857 0.47368421 6.901056 0.7581593 5.4738700.057142857 0.52631579 6.586697 0.7780815 5.1838510.057142857 0.57894737 6.307299 0.7949700 4.9211660.057142857 0.63157895 6.053480 0.8095358 4.6791310.057142857 0.68421053 5.828208 0.8215064 4.4600880.057142857 0.73684211 5.628373 0.8313213 4.2632050.057142857 0.78947368 5.452644 0.8392831 4.0882330.057142857 0.84210526 5.310478 0.8451827 3.9466650.057142857 0.89473684 5.197039 0.8495755 3.8333970.057142857 0.94736842 5.104287 0.8529609 3.7419330.057142857 1.00000000 5.032872 0.8553722 3.6726770.064285714 0.00000000 12.719745 NaN 10.7471910.064285714 0.05263158 10.686288 0.3108834 8.6769530.064285714 0.10526316 10.152606 0.3905288 8.2267720.064285714 0.15789474 9.626873 0.4669835 7.7935710.064285714 0.21052632 9.120859 0.5363797 7.3737400.064285714 0.26315789 8.639160 0.5965806 6.9694170.064285714 0.31578947 8.181790 0.6474532 6.5821300.064285714 0.36842105 7.750265 0.6895483 6.2119520.064285714 0.42105263 7.352325 0.7233143 5.8642930.064285714 0.47368421 6.998582 0.7493784 5.5484920.064285714 0.52631579 6.687170 0.7698967 5.2625440.064285714 0.57894737 6.411479 0.7871715 5.0045140.064285714 0.63157895 6.161145 0.8020894 4.7663870.064285714 0.68421053 5.938954 0.8143581 4.5510330.064285714 0.73684211 5.738487 0.8246301 4.3530210.064285714 0.78947368 5.562997 0.8329097 4.1779920.064285714 0.84210526 5.420938 0.8390481 4.0362340.064285714 0.89473684 5.304599 0.8437340 3.9194020.064285714 0.94736842 5.209377 0.8473351 3.8247810.064285714 1.00000000 5.134671 0.8499616 3.751602[ reached getOption("max.print") -- omitted 100 rows ]RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.05263158 and lambda = 0.
( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?
比较模型结果:
caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。
对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为自助法,设定抽取了50次,因此每个算法的最终模型都有50个结果。
#模型比较
#resamples函数可以分析和可视化重抽样的结果
resamp <- resamples( list(lm=lmFit1,rlm=rlmFit,pls=plsTune,ridge=ridgeTune,enet=enetTune) )
summary(resamp)
可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而rlm的RMSE均值最高,代表预测效果最差。
> summary(resamp)Call:
summary.resamples(object = resamp)Models: lm, rlm, pls, ridge, enet
Number of resamples: 50 MAE Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 1.3090300 1.6211047 1.879747 1.874570 2.024030 2.989562 0
rlm 1.4045612 1.9200873 2.183430 2.215829 2.456707 3.715148 0
pls 0.8375685 1.0323470 1.101890 1.113200 1.180808 1.440820 0
ridge 1.3089927 1.6211901 1.879508 1.874579 2.024011 2.990832 0
enet 0.8195073 0.9752678 1.076620 1.072324 1.156561 1.291045 0RMSE Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 1.705850 2.286390 2.678425 2.706693 3.014897 4.458380 0
rlm 1.886688 2.693024 3.180843 3.220055 3.576437 5.347496 0
pls 1.068864 1.370754 1.503226 1.534394 1.646459 2.127759 0
ridge 1.705678 2.286423 2.678400 2.706699 3.014607 4.460421 0
enet 1.109904 1.346236 1.484470 1.505096 1.661772 2.002391 0Rsquared Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
lm 0.8969856 0.9481050 0.9575495 0.9560697 0.9703579 0.9829345 0
rlm 0.8266755 0.9283558 0.9418637 0.9385840 0.9597848 0.9779785 0
pls 0.9708559 0.9823688 0.9869358 0.9855824 0.9888411 0.9926647 0
ridge 0.8969050 0.9481031 0.9575498 0.9560687 0.9703573 0.9829340 0
enet 0.9744586 0.9833999 0.9869512 0.9861744 0.9887381 0.9930981 0
绘图,每一个模型的RMSE均值的置信区间。
dotplot( resamp, metric="RMSE" )
由图可直观看出,enet与pls预测效果最优且两者效果相近,rlm预测效果最差。
用diff函数进行对比。
summary(diff(resamp))
上三角: estimates of the difference,差值
下三角:P值
从RMSE角度进行考虑,联系差值与P值,可以得出结论:rlm显著地比其他模型差,enet显著地比其他模型好。
> summary(diff(resamp))Call:
summary.diff.resamples(object = diff(resamp))p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0MAE lm rlm pls ridge enet
lm -3.413e-01 7.614e-01 -8.713e-06 8.022e-01
rlm 1.605e-11 1.103e+00 3.413e-01 1.144e+00
pls < 2.2e-16 < 2.2e-16 -7.614e-01 4.088e-02
ridge 1 1.591e-11 < 2.2e-16 8.023e-01
enet < 2.2e-16 < 2.2e-16 1.420e-05 < 2.2e-16 RMSE lm rlm pls ridge enet
lm -5.134e-01 1.172e+00 -6.918e-06 1.202e+00
rlm 2.097e-11 1.686e+00 5.134e-01 1.715e+00
pls < 2.2e-16 < 2.2e-16 -1.172e+00 2.930e-02
ridge 1.0000 2.082e-11 < 2.2e-16 1.202e+00
enet < 2.2e-16 < 2.2e-16 0.2861 < 2.2e-16 Rsquared lm rlm pls ridge enet
lm 1.749e-02 -2.951e-02 9.420e-07 -3.010e-02
rlm 5.945e-09 -4.700e-02 -1.748e-02 -4.759e-02
pls 1.224e-14 1.451e-14 2.951e-02 -5.920e-04
ridge 1.0000 5.896e-09 1.238e-14 -3.011e-02
enet 1.679e-15 5.609e-15 0.3368 1.698e-15
( e)解释你将使用哪个模型来预测样品的脂肪含量。
选择弹性网,因为此模型显著地比其他模型好,RMSE最低。
应用预测建模第六章线性回归习题6.1【主成分分析,模型的最优参数选择与模型对比 ,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】相关推荐
- 应用预测建模第六章-线性回归-预测化合物溶解度练习-R语言(多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网)
模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网 语言:R语言 参考书:应用预测建模 Applied Predictive Modeling (2013) by Max K ...
- 应用预测建模第六章线性回归习题6.3【缺失值插补,分层抽样,预测变量重要性,重要预测变量如何影响响应变量,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】
模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网 语言:R语言 参考书:应用预测建模 Applied Predictive Modeling (2013) by Max K ...
- c语言第六章数组题库及详解答案,C语言第六章数组习题答案.doc
C语言第六章数组习题答案 第六章 数组 习题答案 一.选择题 12345678910CDBDCDDBCC11121314151617181920DBBCDDCDBD212223242526272829 ...
- linux课后作业答案第六章,操作系统 第六章作业习题解答
第六章作业习题解答 3.某操作系统的磁盘文件空间共有500块,若用字长为32位的位示图管理盘空间,试问: (1)位示图需多少个字? (2)第i字第j位对应的块号是多少? (3)并给出申请/归还一块的工 ...
- R语言基础题及答案(六)——R语言与统计分析第六章课后习题(汤银才)
R语言与统计分析第六章课后习题(汤银才) 题-1 有一批枪弹, 出厂时, 其初速v∼N(950,σ2)v\sim N(950,\sigma^2)v∼N(950,σ2)(单位:m/sm/sm/s). 经 ...
- C++ Primer Plus(第六版)第十六章课后习题
C++ Primer Plus(第六版)第十六章课后习题 16.10.1 #include <iostream> #include <string> using namespa ...
- 应用预测建模第四章过度拟合与模型调优习题4.4【分层随机抽样、小样本的模型评估方案】
<应用预测建模>Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译 第四章 过度拟合与模型调优 4. ...
- 《Python语言程序设计》王恺 机械工业出版社 第六章课后习题答案
第六章 字符串 6.5 课后习题 (1)Python 中,创建字符串时,可以使用单引号.双引号和三引号 (2)Python 中,使用字符串的 split 方法可以按照指定的分隔符对字符串进行切割,返回 ...
- 计算机网络(第7版) - 第六章 应用层 - 习题
第六章.应用层 本章的习题 互联网的域名结构是怎么样的?它与目前的电话网的号码结构有何异同之处? (1)域名的结构由标号序列组成,各标号之间用点隔开: - . 三级域名 . 二级域名 . 顶级域名各标 ...
最新文章
- 升级Windows Phone Developer Tools Beta
- Homebrew安装(MacOS)
- Config Sharepoint 2013 Workflow PowerShell Cmdlet
- c++宏定义常量为什么使用移位_干货 | C语言系列3——常量,运算符,常用数学函数.........
- CCF 202104-4 校门外的树 Python
- java_options字符串拼接_java8 StringJoiner拼接字符串
- 华为mate40会不会有鸿蒙系统,鸿蒙OS系统正式推送,拿华为Mate40更新后,发现了优缺点...
- 个人站立会议第二阶段04
- saltstack安装及简单配置
- JSP详细教学新手必看
- c语言5的阶乘流程图_10的阶乘(10的阶乘算法流程图)
- 用Firefox看CHM电子书
- javacpp-opencv图像处理系列:国内车辆牌照检测识别系统(万份测试准确率79.7%以上)
- 5-46 新浪微博热门话题 (30分)
- 看了这篇Docker指令详解,网友直呼:我收藏了你呢?
- 解决WinBUGS14 error:cannot bracket slice for node gamma[3]
- python手机app开发_H5 手机 App 开发入门:技术篇
- C语言system讲解
- day-17正则表达式
- 日本金融监管机构发布ICO风险提示