模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网

语言:R语言

参考书:应用预测建模 Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译


案例:

 ( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?
( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?
( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?
( e)解释你将使用哪个模型来预测样品的脂肪含量。



载入数据

library(caret)#载入数据
data(tecator)
head(absorp)
head(endpoints)
> #载入数据
> data(tecator)
> head(absorp)[,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]    [,9]   [,10]   [,11]
[1,] 2.61776 2.61814 2.61859 2.61912 2.61981 2.62071 2.62186 2.62334 2.62511 2.62722 2.62964
[2,] 2.83454 2.83871 2.84283 2.84705 2.85138 2.85587 2.86060 2.86566 2.87093 2.87661 2.88264
[3,] 2.58284 2.58458 2.58629 2.58808 2.58996 2.59192 2.59401 2.59627 2.59873 2.60131 2.60414
[4,] 2.82286 2.82460 2.82630 2.82814 2.83001 2.83192 2.83392 2.83606 2.83842 2.84097 2.84374
[5,] 2.78813 2.78989 2.79167 2.79350 2.79538 2.79746 2.79984 2.80254 2.80553 2.80890 2.81272
[6,] 3.00993 3.01540 3.02086 3.02634 3.03190 3.03756 3.04341 3.04955 3.05599 3.06274 3.06982[,12]   [,13]   [,14]   [,15]   [,16]   [,17]   [,18]   [,19]   [,20]   [,21]   [,22]
[1,] 2.63245 2.63565 2.63933 2.64353 2.64825 2.65350 2.65937 2.66585 2.67281 2.68008 2.68733
[2,] 2.88898 2.89577 2.90308 2.91097 2.91953 2.92873 2.93863 2.94929 2.96072 2.97272 2.98493
[3,] 2.60714 2.61029 2.61361 2.61714 2.62089 2.62486 2.62909 2.63361 2.63835 2.64330 2.64838
[4,] 2.84664 2.84975 2.85307 2.85661 2.86038 2.86437 2.86860 2.87308 2.87789 2.88301 2.88832
[5,] 2.81704 2.82184 2.82710 2.83294 2.83945 2.84664 2.85458 2.86331 2.87280 2.88291 2.89335
[6,] 3.07724 3.08511 3.09343 3.10231 3.11185 3.12205 3.13294 3.14457 3.15703 3.17038 3.18429[,23]   [,24]   [,25]   [,26]   [,27]   [,28]   [,29]   [,30]   [,31]   [,32]   [,33]
[1,] 2.69427 2.70073 2.70684 2.71281 2.71914 2.72628 2.73462 2.74416 2.75466 2.76568 2.77679
[2,] 2.99690 3.00833 3.01920 3.02990 3.04101 3.05345 3.06777 3.08416 3.10221 3.12106 3.13983
[3,] 2.65354 2.65870 2.66375 2.66880 2.67383 2.67892 2.68411 2.68937 2.69470 2.70012 2.70563
[4,] 2.89374 2.89917 2.90457 2.90991 2.91521 2.92043 2.92565 2.93082 2.93604 2.94128 2.94658
[5,] 2.90374 2.91371 2.92305 2.93187 2.94060 2.94986 2.96035 2.97241 2.98606 3.00097 3.01652
[6,] 3.19840 3.21225 3.22552 3.23827 3.25084 3.26393 3.27851 3.29514 3.31401 3.33458 3.35591[,34]   [,35]   [,36]   [,37]   [,38]   [,39]   [,40]   [,41]   [,42]   [,43]   [,44]
[1,] 2.78790 2.79949 2.81225 2.82706 2.84356 2.86106 2.87857 2.89497 2.90924 2.92085 2.93015
[2,] 3.15810 3.17623 3.19519 3.21584 3.23747 3.25889 3.27835 3.29384 3.30362 3.30681 3.30393
[3,] 2.71141 2.71775 2.72490 2.73344 2.74327 2.75433 2.76642 2.77931 2.79272 2.80649 2.82064
[4,] 2.95202 2.95777 2.96419 2.97159 2.98045 2.99090 3.00284 3.01611 3.03048 3.04579 3.06194
[5,] 3.03220 3.04793 3.06413 3.08153 3.10078 3.12185 3.14371 3.16510 3.18470 3.20140 3.21477
[6,] 3.37709 3.39772 3.41828 3.43974 3.46266 3.48663 3.51002 3.53087 3.54711 3.55699 3.55986[,45]   [,46]   [,47]   [,48]   [,49]   [,50]   [,51]   [,52]   [,53]   [,54]   [,55]
[1,] 2.93846 2.94771 2.96019 2.97831 3.00306 3.03506 3.07428 3.11963 3.16868 3.21771 3.26254
[2,] 3.29700 3.28925 3.28409 3.28505 3.29326 3.30923 3.33267 3.36251 3.39661 3.43188 3.46492
[3,] 2.83541 2.85121 2.86872 2.88905 2.91289 2.94088 2.97325 3.00946 3.04780 3.08554 3.11947
[4,] 3.07889 3.09686 3.11629 3.13775 3.16217 3.19068 3.22376 3.26172 3.30379 3.34793 3.39093
[5,] 3.22544 3.23505 3.24586 3.26027 3.28063 3.30889 3.34543 3.39019 3.44198 3.49800 3.55407
[6,] 3.55656 3.54937 3.54169 3.53692 3.53823 3.54760 3.56512 3.59043 3.62229 3.65830 3.69515[,56]   [,57]   [,58]   [,59]   [,60]   [,61]   [,62]   [,63]   [,64]   [,65]   [,66]
[1,] 3.29988 3.32847 3.34899 3.36342 3.37379 3.38152 3.38741 3.39164 3.39418 3.39490 3.39366
[2,] 3.49295 3.51458 3.53004 3.54067 3.54797 3.55306 3.55675 3.55921 3.56045 3.56034 3.55876
[3,] 3.14696 3.16677 3.17938 3.18631 3.18924 3.18950 3.18801 3.18498 3.18039 3.17411 3.16611
[4,] 3.42920 3.45998 3.48227 3.49687 3.50558 3.51026 3.51221 3.51215 3.51036 3.50682 3.50140
[5,] 3.60534 3.64789 3.68011 3.70272 3.71815 3.72863 3.73574 3.74059 3.74357 3.74453 3.74336
[6,] 3.72932 3.75803 3.78003 3.79560 3.80614 3.81313 3.81774 3.82079 3.82258 3.82301 3.82206[,67]   [,68]   [,69]   [,70]   [,71]   [,72]   [,73]   [,74]   [,75]   [,76]   [,77]
[1,] 3.39045 3.38541 3.37869 3.37041 3.36073 3.34979 3.33769 3.32443 3.31013 3.29487 3.27891
[2,] 3.55571 3.55132 3.54585 3.53950 3.53235 3.52442 3.51583 3.50668 3.49700 3.48683 3.47626
[3,] 3.15641 3.14512 3.13241 3.11843 3.10329 3.08714 3.07014 3.05237 3.03393 3.01504 2.99569
[4,] 3.49398 3.48457 3.47333 3.46041 3.44595 3.43005 3.41285 3.39450 3.37511 3.35482 3.33376
[5,] 3.73991 3.73418 3.72638 3.71676 3.70553 3.69289 3.67900 3.66396 3.64785 3.63085 3.61305
[6,] 3.81959 3.81557 3.81021 3.80375 3.79642 3.78835 3.77958 3.77024 3.76040 3.75005 3.73929[,78]   [,79]   [,80]   [,81]   [,82]   [,83]   [,84]   [,85]   [,86]   [,87]   [,88]
[1,] 3.26232 3.24542 3.22828 3.21080 3.19287 3.17433 3.15503 3.13475 3.11339 3.09116 3.06850
[2,] 3.46552 3.45501 3.44481 3.43477 3.42465 3.41419 3.40303 3.39082 3.37731 3.36265 3.34745
[3,] 2.97612 2.95642 2.93660 2.91667 2.89655 2.87622 2.85563 2.83474 2.81361 2.79235 2.77113
[4,] 3.31204 3.28986 3.26730 3.24442 3.22117 3.19757 3.17357 3.14915 3.12429 3.09908 3.07366
[5,] 3.59463 3.57582 3.55695 3.53796 3.51880 3.49936 3.47938 3.45869 3.43711 3.41458 3.39129
[6,] 3.72831 3.71738 3.70681 3.69664 3.68659 3.67649 3.66611 3.65503 3.64283 3.62938 3.61483[,89]   [,90]   [,91]   [,92]   [,93]   [,94]   [,95]   [,96]   [,97]   [,98]   [,99]
[1,] 3.04596 3.02393 3.00247 2.98145 2.96072 2.94013 2.91978 2.89966 2.87964 2.85960 2.83940
[2,] 3.33245 3.31818 3.30473 3.29186 3.27921 3.26655 3.25369 3.24045 3.22659 3.21181 3.19600
[3,] 2.75015 2.72956 2.70934 2.68951 2.67009 2.65112 2.63262 2.61461 2.59718 2.58034 2.56404
[4,] 3.04825 3.02308 2.99820 2.97367 2.94951 2.92576 2.90251 2.87988 2.85794 2.83672 2.81617
[5,] 3.36772 3.34450 3.32201 3.30025 3.27907 3.25831 3.23784 3.21765 3.19766 3.17770 3.15770
[6,] 3.59990 3.58535 3.57163 3.55877 3.54651 3.53442 3.52221 3.50972 3.49682 3.48325 3.46870[,100]
[1,] 2.81920
[2,] 3.17942
[3,] 2.54816
[4,] 2.79622
[5,] 3.13753
[6,] 3.45307
> head(endpoints)[,1] [,2] [,3]
[1,] 60.5 22.5 16.7
[2,] 46.0 40.1 13.5
[3,] 71.0  8.4 20.5
[4,] 72.8  5.9 20.7
[5,] 58.3 25.5 15.5
[6,] 44.0 42.7 13.7


( b)在本例中预测变量是各个频率下吸收量的一个测量。由于频率处在一个系统的顺序中( 850 ~1050 nm ),因此预测变虽之间存在高度的相关性,而数据实际上位于一个更低的维度之中,而不是完全的100维。利用 PCA 来决定这些数据的有效维度,其数值是多少?

主成分分析
 应用主成分分析时,注意以下五点:

  1. 可使用样本协方差阵或相关系数矩阵为出发点来进行分析,但大都以相关系数矩阵为主;
  2. 为使方差达到最大,通常主成分分析是不加以转轴的;
  3. 成分的保留(可有三种方法参考,1:保留特征值大于1的主成分;2:碎石图,在图形变化最大处之上的主成分均可保留;3:平行分析,将真实数据的特征值与模拟数据的特征值进行比较,保留真实数据的特征值大于模拟数据的特征值的主成分);
  4. 在实际研究里,研究者如果用不超过三或五个成分就能解释变异的80%,就算令人满意;
  5. 使用成分得分后,会使各变量的方差为最大,而且各变量之间会彼此独立正交。
PCA=princomp(absorp,cor=T)#cor=T时,输入矩阵为相关系数矩阵,每个元素是0<=x<=1的,对角线为1。
help(princomp)

一共有100个成分。查看成分的重要性(每个成分的标准差,方差贡献率,累计方差贡献率)。

可见,第一个成分解释了98.62%的方差。

> summary(PCA)
Importance of components:Comp.1      Comp.2      Comp.3      Comp.4       Comp.5       Comp.6
Standard deviation     9.9310721 0.984736121 0.528511377 0.338274841 8.037979e-02 5.123077e-02
Proportion of Variance 0.9862619 0.009697052 0.002793243 0.001144299 6.460911e-05 2.624591e-05
Cumulative Proportion  0.9862619 0.995958978 0.998752221 0.999896520 9.999611e-01 9.999874e-01

每个成分的标准差即为特征值开根号的结果,可以选择保留特征值大于1的主成分 。

本例中,保留第一个成分。

> PCA$sdevComp.1       Comp.2       Comp.3       Comp.4       Comp.5       Comp.6       Comp.7
9.931072e+00 9.847361e-01 5.285114e-01 3.382748e-01 8.037979e-02 5.123077e-02 2.680884e-02 Comp.8       Comp.9      Comp.10      Comp.11      Comp.12      Comp.13      Comp.14
1.960880e-02 8.564232e-03 6.739417e-03 4.441898e-03 3.360852e-03 1.867188e-03 1.376574e-03 

由碎石图可以看出,第二个主成分及之后图线变化趋于平稳,因此可以选择第一个主成分

screeplot(PCA,type="lines")

结论:这些数据的有效维度为1维。



( c)将数据划分为训练集和测试集,对数据进行预处理,并利用本章所述的各种模型进行建模。对于包含调优参数的模型, 最优的调优参数取值是多少?

进行数据预处理

#数据预处理
summary(absorp)
summary(endpoints)
#有缺失值的变量所在的位置
NAcol <- which(colSums(is.na(absorp))>0)
#本例无缺失值#计算偏度
library(e1071)
summary(apply(absorp,2,skewness) )#矩阵转为数据框
absorp<-as.data.frame(absorp)#box-cox变换
boxcox = function(x){trans = BoxCoxTrans(x)x_bc = predict( trans, x )
}
absorp.trans<-apply( absorp, 2, boxcox )
absorp.trans<-as.data.frame(absorp.trans)#偏度减小
summary(apply(absorp.trans,2,skewness) )#响应变量
fat<-endpoints[,2]
> #计算偏度
> library(e1071)
> summary(apply(absorp,2,skewness) )Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 0.8260  0.8432  0.8946  0.9027  0.9667  0.9976
> #偏度减小
> summary(apply(absorp.trans,2,skewness) )Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
0.005178 0.021773 0.034949 0.034739 0.046840 0.066915 

进行建模。

由于本数据集为小样本(215),且建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择),因此重抽样方法采取自助法。


#本次的重抽样方法用自助法,因为案例为小样本,建模目的为在几个模型之间进行选择(包括同一个算法选择最优参数与在不同模型之间进行选择)#设定随机数种子,这样重抽样数据集可以重复
set.seed(100)
indx <- createResample(fat,times = 50, list = TRUE)
ctrl <- trainControl(method = "boot",number=50,index = indx)

线性回归

#进行线性回归
set.seed(100)
lmFit1 <- train(x = absorp.trans, y = fat,method = "lm",trControl = ctrl,preProc = c("center", "scale"))lmFit1 
> lmFit1
Linear Regression 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results:RMSE      Rsquared   MAE    2.706693  0.9560697  1.87457Tuning parameter 'intercept' was held constant at a value of TRUE

最小二乘回归PLS

#PLS
set.seed(100)
plsTune <- train(x = absorp.trans, y = fat,method = "kernelpls",#Dayal和MacGregor的第一种核函数算法kernelplstuneGrid = expand.grid(ncomp = 1:50),#设定成分数trControl = ctrl,preProc = c("center", "scale"))
plsTune

PLS调优参数:最佳成分数为20

> plsTune
Partial Least Squares 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:ncomp  RMSE       Rsquared   MAE     1     11.183174  0.2396493  9.0881242      8.686219  0.5457929  6.9297973      5.436814  0.8171801  4.0171264      4.719667  0.8639848  3.6372225      3.183214  0.9391108  2.4327786      3.113421  0.9417685  2.4079837      2.981929  0.9478784  2.2033648      2.803669  0.9537681  2.0047709      2.658721  0.9578675  1.85133110      2.486691  0.9635027  1.72463311      2.291300  0.9688312  1.59411312      2.148954  0.9725424  1.52789913      2.046004  0.9746878  1.46225414      2.019919  0.9752486  1.42562215      1.909752  0.9777992  1.33613716      1.760398  0.9808560  1.22425017      1.666462  0.9829127  1.17177718      1.590492  0.9845054  1.13406719      1.567033  0.9849643  1.12889820      1.534394  0.9855824  1.11320021      1.560273  0.9850988  1.11998622      1.566204  0.9849703  1.11595223      1.553964  0.9851720  1.10592924      1.591527  0.9845027  1.11878825      1.625377  0.9838303  1.13515726      1.658889  0.9831096  1.15761127      1.683492  0.9824806  1.17255428      1.744393  0.9811685  1.20838329      1.795215  0.9800030  1.24479430      1.848273  0.9788338  1.28698731      1.883307  0.9780175  1.31840932      1.938951  0.9767456  1.36051033      1.986740  0.9755708  1.39239134      2.023170  0.9746550  1.42209735      2.071566  0.9734367  1.45308736      2.112281  0.9724229  1.47897137      2.146216  0.9715039  1.50082838      2.175165  0.9708326  1.52076339      2.192173  0.9705042  1.53653640      2.222708  0.9697809  1.55730841      2.246722  0.9692350  1.57567542      2.256637  0.9689731  1.58778543      2.274497  0.9685442  1.60437544      2.306888  0.9677469  1.62721745      2.329405  0.9671802  1.64455346      2.359832  0.9663816  1.66237747      2.374701  0.9659943  1.67354048      2.404638  0.9652013  1.69198049      2.433347  0.9643316  1.70994550      2.449268  0.9638598  1.721255RMSE was used to select the optimal model using the smallest value.
The final value used for the model was ncomp = 20.

稳健回归

#稳健回归
set.seed(100)
rlmFit <- train(x = absorp.trans, y = fat,method = "rlm",trControl = ctrl,preProc = c("center", "scale"))rlmFit

有截距,方法为huber

> rlmFit
Robust Linear Model 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:intercept  psi           RMSE       Rsquared   MAE      FALSE      psi.huber     18.145830  0.9560697  17.940133FALSE      psi.hampel    18.145830  0.9560697  17.940133FALSE      psi.bisquare  18.146153  0.9560864  17.940487TRUE      psi.huber      3.220055  0.9385840   2.215829TRUE      psi.hampel    25.311035  0.3743814  17.164686TRUE      psi.bisquare  76.651845  0.4210313  54.553360RMSE was used to select the optimal model using the smallest value.
The final values used for the model were intercept = TRUE and psi = psi.huber.


岭回归

#岭回归
#用train函数选择岭回归的最佳参数
#设定正则化参数 取值范围为0-0.1,中间取15个值
ridgeGrid <- expand.grid(lambda = seq(0, .1, length = 15))set.seed(100)
ridgeTune <- train(x = absorp.trans, y = fat,method = "ridge", #岭回归tuneGrid = ridgeGrid,trControl = ctrl,preProc = c("center", "scale"))ridgeTune

岭回归调优参数:正则化参数lambda为0,即不进行正则化

> ridgeTune
Ridge Regression 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:lambda       RMSE      Rsquared   MAE     0.000000000  2.706699  0.9560687  1.8745790.007142857  3.643592  0.9213753  2.7489340.014285714  4.052245  0.9031794  3.0116700.021428571  4.316364  0.8907687  3.1753790.028571429  4.511024  0.8814221  3.3005250.035714286  4.668275  0.8737664  3.4064870.042857143  4.803098  0.8670998  3.5022200.050000000  4.923200  0.8610413  3.5897880.057142857  5.032872  0.8553722  3.6726770.064285714  5.134671  0.8499616  3.7516020.071428571  5.230211  0.8447286  3.8268110.078571429  5.320565  0.8396221  3.8989180.085714286  5.406487  0.8346093  3.9679950.092857143  5.488526  0.8296690  4.0343240.100000000  5.567103  0.8247876  4.099350RMSE was used to select the optimal model using the smallest value.
The final value used for the model was lambda = 0.

弹性网

#enet弹性网
#弹性网模型同时具有岭回归罚参数和lasso 罚参数
#lambda为岭回归罚参数(当lambda为0时即为纯lasso模型)
#fraction为lasso罚参数,取值范围为0.05-1,取了20个值
enetGrid <- expand.grid(lambda = seq(0, .1, length = 15), fraction = seq(.05, 1, length = 20))
set.seed(100)
enetTune <- train(x = absorp.trans, y = fat,method = "enet", #弹性网 elastic nettuneGrid = enetGrid,trControl = ctrl,preProc = c("center", "scale"))
enetTune

弹性网的调优参数:岭回归罚lambda为0,lasso罚为0.0526

> enetTune
Elasticnet 215 samples
100 predictorsPre-processing: centered (100), scaled (100)
Resampling: Bootstrapped (50 reps)
Summary of sample sizes: 215, 215, 215, 215, 215, 215, ...
Resampling results across tuning parameters:lambda       fraction    RMSE       Rsquared   MAE      0.000000000  0.00000000  12.719745        NaN  10.7471910.000000000  0.05263158   1.505096  0.9861744   1.0723240.000000000  0.10526316   1.611281  0.9839983   1.1331290.000000000  0.15789474   1.713409  0.9818137   1.2004350.000000000  0.21052632   1.787748  0.9802273   1.2576520.000000000  0.26315789   1.844759  0.9790006   1.3034760.000000000  0.31578947   1.897734  0.9778433   1.3443800.000000000  0.36842105   1.952677  0.9766278   1.3834940.000000000  0.42105263   2.008177  0.9753659   1.4245760.000000000  0.47368421   2.060634  0.9741002   1.4633390.000000000  0.52631579   2.113615  0.9727729   1.5025240.000000000  0.57894737   2.170547  0.9713402   1.5409150.000000000  0.63157895   2.232797  0.9697258   1.5817970.000000000  0.68421053   2.294018  0.9680902   1.6208180.000000000  0.73684211   2.357063  0.9663678   1.6604600.000000000  0.78947368   2.423098  0.9645125   1.7004070.000000000  0.84210526   2.490069  0.9625804   1.7407590.000000000  0.89473684   2.560683  0.9604961   1.7842470.000000000  0.94736842   2.632378  0.9583456   1.8287120.000000000  1.00000000   2.706699  0.9560687   1.8745790.007142857  0.00000000  12.719745        NaN  10.7471910.007142857  0.05263158  10.350630  0.3671638   8.4389780.007142857  0.10526316   9.476790  0.4998352   7.7286730.007142857  0.15789474   8.639364  0.6119787   7.0450300.007142857  0.21052632   7.846910  0.6977900   6.3788260.007142857  0.26315789   7.111837  0.7588304   5.7448520.007142857  0.31578947   6.465549  0.7991165   5.1680810.007142857  0.36842105   5.913752  0.8265441   4.6560870.007142857  0.42105263   5.440989  0.8490667   4.2295220.007142857  0.47368421   5.027732  0.8674093   3.8589510.007142857  0.52631579   4.688027  0.8811803   3.5508590.007142857  0.57894737   4.415170  0.8914804   3.3079660.007142857  0.63157895   4.206019  0.8992059   3.1361130.007142857  0.68421053   4.051948  0.9049141   3.0238970.007142857  0.73684211   3.935221  0.9093661   2.9454590.007142857  0.78947368   3.848531  0.9128567   2.8884260.007142857  0.84210526   3.779217  0.9157310   2.8419660.007142857  0.89473684   3.722209  0.9181026   2.8030510.007142857  0.94736842   3.676363  0.9200050   2.7715920.007142857  1.00000000   3.643592  0.9213753   2.7489340.014285714  0.00000000  12.719745        NaN  10.7471910.014285714  0.05263158  10.502349  0.3428964   8.5586950.014285714  0.10526316   9.773519  0.4546769   7.9611950.014285714  0.15789474   9.069145  0.5553162   7.3903260.014285714  0.21052632   8.394427  0.6389904   6.8328860.014285714  0.26315789   7.761244  0.7038771   6.2961830.014285714  0.31578947   7.168269  0.7528274   5.7858710.014285714  0.36842105   6.630843  0.7880543   5.3099860.014285714  0.42105263   6.163194  0.8129957   4.8812570.014285714  0.47368421   5.767428  0.8322562   4.5123570.014285714  0.52631579   5.416732  0.8490222   4.1943300.014285714  0.57894737   5.106516  0.8627297   3.9080380.014285714  0.63157895   4.841776  0.8735009   3.6602740.014285714  0.68421053   4.631100  0.8814848   3.4646790.014285714  0.73684211   4.459273  0.8877800   3.3125590.014285714  0.78947368   4.326597  0.8925717   3.2027470.014285714  0.84210526   4.226622  0.8961885   3.1294340.014285714  0.89473684   4.151838  0.8990589   3.0773070.014285714  0.94736842   4.095699  0.9013489   3.0397110.014285714  1.00000000   4.052245  0.9031794   3.0116700.021428571  0.00000000  12.719745        NaN  10.7471910.021428571  0.05263158  10.568050  0.3321870   8.6078460.021428571  0.10526316   9.903277  0.4339021   8.0598250.021428571  0.15789474   9.258979  0.5277223   7.5369740.021428571  0.21052632   8.643469  0.6077921   7.0316430.021428571  0.26315789   8.056217  0.6732191   6.5384040.021428571  0.31578947   7.500502  0.7247124   6.0633810.021428571  0.36842105   6.989114  0.7634905   5.6202270.021428571  0.42105263   6.534448  0.7915351   5.2119650.021428571  0.47368421   6.143486  0.8120566   4.8489220.021428571  0.52631579   5.799866  0.8294819   4.5278030.021428571  0.57894737   5.491306  0.8443804   4.2426450.021428571  0.63157895   5.225475  0.8562590   3.9926740.021428571  0.68421053   4.995680  0.8657523   3.7732730.021428571  0.73684211   4.802370  0.8732078   3.5901650.021428571  0.78947368   4.648047  0.8788861   3.4464890.021428571  0.84210526   4.525632  0.8832665   3.3388550.021428571  0.89473684   4.430319  0.8866321   3.2600020.021428571  0.94736842   4.364849  0.8889374   3.2100780.021428571  1.00000000   4.316364  0.8907687   3.1753790.028571429  0.00000000  12.719745        NaN  10.7471910.028571429  0.05263158  10.605392  0.3259594   8.6335460.028571429  0.10526316   9.976188  0.4217673   8.1127060.028571429  0.15789474   9.368227  0.5108284   7.6181460.028571429  0.21052632   8.784915  0.5884503   7.1396890.028571429  0.26315789   8.224602  0.6535386   6.6708140.028571429  0.31578947   7.696412  0.7055287   6.2210960.028571429  0.36842105   7.207055  0.7456827   5.7994050.028571429  0.42105263   6.764782  0.7757525   5.4079320.028571429  0.47368421   6.376330  0.7980554   5.0517090.028571429  0.52631579   6.038847  0.8158720   4.7340500.028571429  0.57894737   5.739066  0.8312751   4.4529230.028571429  0.63157895   5.476325  0.8437891   4.2055950.028571429  0.68421053   5.242242  0.8541133   3.9812190.028571429  0.73684211   5.041635  0.8623182   3.7867110.028571429  0.78947368   4.879378  0.8684836   3.6311710.028571429  0.84210526   4.745253  0.8734010   3.5047120.028571429  0.89473684   4.641796  0.8770309   3.4106860.028571429  0.94736842   4.567118  0.8795336   3.3459360.028571429  1.00000000   4.511024  0.8814221   3.3005250.035714286  0.00000000  12.719745        NaN  10.7471910.035714286  0.05263158  10.630389  0.3216962   8.6491420.035714286  0.10526316  10.025364  0.4132908   8.1463980.035714286  0.15789474   9.441794  0.4988722   7.6706470.035714286  0.21052632   8.878988  0.5746934   7.2082900.035714286  0.26315789   8.339491  0.6389087   6.7569220.035714286  0.31578947   7.831674  0.6908389   6.3255320.035714286  0.36842105   7.359193  0.7316532   5.9194040.035714286  0.42105263   6.926296  0.7631330   5.5391670.035714286  0.47368421   6.543396  0.7867501   5.1914190.035714286  0.52631579   6.210901  0.8051696   4.8797790.035714286  0.57894737   5.921095  0.8207037   4.6051580.035714286  0.63157895   5.658841  0.8338811   4.3565540.035714286  0.68421053   5.425099  0.8447735   4.1320490.035714286  0.73684211   5.224377  0.8533910   3.9357460.035714286  0.78947368   5.056868  0.8600396   3.7720630.035714286  0.84210526   4.916623  0.8653073   3.6366310.035714286  0.89473684   4.810863  0.8690600   3.5363020.035714286  0.94736842   4.729634  0.8717975   3.4612350.035714286  1.00000000   4.668275  0.8737664   3.4064870.042857143  0.00000000  12.719745        NaN  10.7471910.042857143  0.05263158  10.649006  0.3183598   8.6594080.042857143  0.10526316  10.063732  0.4065052   8.1714190.042857143  0.15789474   9.498332  0.4893522   7.7093200.042857143  0.21052632   8.951603  0.5635681   7.2590620.042857143  0.26315789   8.429932  0.6267147   6.8228510.042857143  0.31578947   7.938112  0.6784272   6.4058750.042857143  0.36842105   7.478275  0.7197554   6.0107010.042857143  0.42105263   7.053938  0.7521797   5.6391660.042857143  0.47368421   6.677302  0.7767562   5.2994040.042857143  0.52631579   6.352896  0.7955833   4.9968250.042857143  0.57894737   6.066895  0.8115599   4.7257890.042857143  0.63157895   5.807544  0.8252168   4.4779260.042857143  0.68421053   5.574963  0.8365873   4.2536070.042857143  0.73684211   5.375833  0.8455274   4.0584610.042857143  0.78947368   5.204039  0.8526667   3.8888740.042857143  0.84210526   5.061502  0.8581731   3.7484930.042857143  0.89473684   4.954026  0.8620665   3.6446320.042857143  0.94736842   4.868042  0.8650243   3.5625850.042857143  1.00000000   4.803098  0.8670998   3.5022200.050000000  0.00000000  12.719745        NaN  10.7471910.050000000  0.05263158  10.664511  0.3155007   8.6670790.050000000  0.10526316  10.096683  0.4006075   8.1923190.050000000  0.15789474   9.546087  0.4811327   7.7410900.050000000  0.21052632   9.014028  0.5537204   7.3021580.050000000  0.26315789   8.507619  0.6158259   6.8784130.050000000  0.31578947   8.028660  0.6673112   6.4721900.050000000  0.36842105   7.579264  0.7090114   6.0862930.050000000  0.42105263   7.164037  0.7420437   5.7231390.050000000  0.47368421   6.794556  0.7672864   5.3913970.050000000  0.52631579   6.476421  0.7866019   5.0963950.050000000  0.57894737   6.193359  0.8030704   4.8290240.050000000  0.63157895   5.936877  0.8171967   4.5839140.050000000  0.68421053   5.707621  0.8288982   4.3614430.050000000  0.73684211   5.508621  0.8382434   4.1660450.050000000  0.78947368   5.334012  0.8458269   3.9929270.050000000  0.84210526   5.191385  0.8515252   3.8514080.050000000  0.89473684   5.081095  0.8556455   3.7420930.050000000  0.94736842   4.991350  0.8588172   3.6550830.050000000  1.00000000   4.923200  0.8610413   3.5897880.057142857  0.00000000  12.719745        NaN  10.7471910.057142857  0.05263158  10.676963  0.3130233   8.6727580.057142857  0.10526316  10.125993  0.3953287   8.2105360.057142857  0.15789474   9.588444  0.4737441   7.7688320.057142857  0.21052632   9.069954  0.5447150   7.3399950.057142857  0.26315789   8.576760  0.6058303   6.9265220.057142857  0.31578947   8.109112  0.6570301   6.5303000.057142857  0.36842105   7.668838  0.6990048   6.1523850.057142857  0.42105263   7.262463  0.7324665   5.7971810.057142857  0.47368421   6.901056  0.7581593   5.4738700.057142857  0.52631579   6.586697  0.7780815   5.1838510.057142857  0.57894737   6.307299  0.7949700   4.9211660.057142857  0.63157895   6.053480  0.8095358   4.6791310.057142857  0.68421053   5.828208  0.8215064   4.4600880.057142857  0.73684211   5.628373  0.8313213   4.2632050.057142857  0.78947368   5.452644  0.8392831   4.0882330.057142857  0.84210526   5.310478  0.8451827   3.9466650.057142857  0.89473684   5.197039  0.8495755   3.8333970.057142857  0.94736842   5.104287  0.8529609   3.7419330.057142857  1.00000000   5.032872  0.8553722   3.6726770.064285714  0.00000000  12.719745        NaN  10.7471910.064285714  0.05263158  10.686288  0.3108834   8.6769530.064285714  0.10526316  10.152606  0.3905288   8.2267720.064285714  0.15789474   9.626873  0.4669835   7.7935710.064285714  0.21052632   9.120859  0.5363797   7.3737400.064285714  0.26315789   8.639160  0.5965806   6.9694170.064285714  0.31578947   8.181790  0.6474532   6.5821300.064285714  0.36842105   7.750265  0.6895483   6.2119520.064285714  0.42105263   7.352325  0.7233143   5.8642930.064285714  0.47368421   6.998582  0.7493784   5.5484920.064285714  0.52631579   6.687170  0.7698967   5.2625440.064285714  0.57894737   6.411479  0.7871715   5.0045140.064285714  0.63157895   6.161145  0.8020894   4.7663870.064285714  0.68421053   5.938954  0.8143581   4.5510330.064285714  0.73684211   5.738487  0.8246301   4.3530210.064285714  0.78947368   5.562997  0.8329097   4.1779920.064285714  0.84210526   5.420938  0.8390481   4.0362340.064285714  0.89473684   5.304599  0.8437340   3.9194020.064285714  0.94736842   5.209377  0.8473351   3.8247810.064285714  1.00000000   5.134671  0.8499616   3.751602[ reached getOption("max.print") -- omitted 100 rows ]RMSE was used to select the optimal model using the smallest value.
The final values used for the model were fraction = 0.05263158 and lambda = 0.


( d)哪一个模型具有最优的预测能力? 是否有哪个模型显著地比其他模型更好或更差?

比较模型结果:

caret包的resamples函数可以分析和可视化重抽样的结果(需要用train函数进行重抽样)。

对于每一个模型来说,比较的对象为每个算法中RMSE最小的最终模型。因为重抽样法为自助法,设定抽取了50次,因此每个算法的最终模型都有50个结果。

#模型比较
#resamples函数可以分析和可视化重抽样的结果
resamp <- resamples( list(lm=lmFit1,rlm=rlmFit,pls=plsTune,ridge=ridgeTune,enet=enetTune) )
summary(resamp) 

可见,pls与enet模型的RMSE均值最小,代表这两个模型的预测效果最好,而rlm的RMSE均值最高,代表预测效果最差。

> summary(resamp)Call:
summary.resamples(object = resamp)Models: lm, rlm, pls, ridge, enet
Number of resamples: 50 MAE Min.   1st Qu.   Median     Mean  3rd Qu.     Max. NA's
lm    1.3090300 1.6211047 1.879747 1.874570 2.024030 2.989562    0
rlm   1.4045612 1.9200873 2.183430 2.215829 2.456707 3.715148    0
pls   0.8375685 1.0323470 1.101890 1.113200 1.180808 1.440820    0
ridge 1.3089927 1.6211901 1.879508 1.874579 2.024011 2.990832    0
enet  0.8195073 0.9752678 1.076620 1.072324 1.156561 1.291045    0RMSE Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
lm    1.705850 2.286390 2.678425 2.706693 3.014897 4.458380    0
rlm   1.886688 2.693024 3.180843 3.220055 3.576437 5.347496    0
pls   1.068864 1.370754 1.503226 1.534394 1.646459 2.127759    0
ridge 1.705678 2.286423 2.678400 2.706699 3.014607 4.460421    0
enet  1.109904 1.346236 1.484470 1.505096 1.661772 2.002391    0Rsquared Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
lm    0.8969856 0.9481050 0.9575495 0.9560697 0.9703579 0.9829345    0
rlm   0.8266755 0.9283558 0.9418637 0.9385840 0.9597848 0.9779785    0
pls   0.9708559 0.9823688 0.9869358 0.9855824 0.9888411 0.9926647    0
ridge 0.8969050 0.9481031 0.9575498 0.9560687 0.9703573 0.9829340    0
enet  0.9744586 0.9833999 0.9869512 0.9861744 0.9887381 0.9930981    0

绘图,每一个模型的RMSE均值的置信区间。

dotplot( resamp, metric="RMSE" )

由图可直观看出,enet与pls预测效果最优且两者效果相近,rlm预测效果最差。

用diff函数进行对比。

summary(diff(resamp))

上三角: estimates of the difference,差值

下三角:P值

从RMSE角度进行考虑,联系差值与P值,可以得出结论:rlm显著地比其他模型差,enet显著地比其他模型好。

> summary(diff(resamp))Call:
summary.diff.resamples(object = diff(resamp))p-value adjustment: bonferroni
Upper diagonal: estimates of the difference
Lower diagonal: p-value for H0: difference = 0MAE lm        rlm        pls        ridge      enet
lm              -3.413e-01  7.614e-01 -8.713e-06  8.022e-01
rlm   1.605e-11             1.103e+00  3.413e-01  1.144e+00
pls   < 2.2e-16 < 2.2e-16             -7.614e-01  4.088e-02
ridge 1         1.591e-11  < 2.2e-16              8.023e-01
enet  < 2.2e-16 < 2.2e-16  1.420e-05  < 2.2e-16            RMSE lm        rlm        pls        ridge      enet
lm              -5.134e-01  1.172e+00 -6.918e-06  1.202e+00
rlm   2.097e-11             1.686e+00  5.134e-01  1.715e+00
pls   < 2.2e-16 < 2.2e-16             -1.172e+00  2.930e-02
ridge 1.0000    2.082e-11  < 2.2e-16              1.202e+00
enet  < 2.2e-16 < 2.2e-16  0.2861     < 2.2e-16            Rsquared lm        rlm        pls        ridge      enet
lm               1.749e-02 -2.951e-02  9.420e-07 -3.010e-02
rlm   5.945e-09            -4.700e-02 -1.748e-02 -4.759e-02
pls   1.224e-14 1.451e-14              2.951e-02 -5.920e-04
ridge 1.0000    5.896e-09  1.238e-14             -3.011e-02
enet  1.679e-15 5.609e-15  0.3368     1.698e-15    


( e)解释你将使用哪个模型来预测样品的脂肪含量。

选择弹性网,因为此模型显著地比其他模型好,RMSE最低。

应用预测建模第六章线性回归习题6.1【主成分分析,模型的最优参数选择与模型对比 ,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】相关推荐

  1. 应用预测建模第六章-线性回归-预测化合物溶解度练习-R语言(多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网)

    模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网 语言:R语言 参考书:应用预测建模 Applied Predictive Modeling (2013) by Max K ...

  2. 应用预测建模第六章线性回归习题6.3【缺失值插补,分层抽样,预测变量重要性,重要预测变量如何影响响应变量,多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网】

    模型:多元线性回归,稳健回归,偏最小二乘回归,岭回归,lasso回归,弹性网 语言:R语言 参考书:应用预测建模 Applied Predictive Modeling (2013) by Max K ...

  3. c语言第六章数组题库及详解答案,C语言第六章数组习题答案.doc

    C语言第六章数组习题答案 第六章 数组 习题答案 一.选择题 12345678910CDBDCDDBCC11121314151617181920DBBCDDCDBD212223242526272829 ...

  4. linux课后作业答案第六章,操作系统 第六章作业习题解答

    第六章作业习题解答 3.某操作系统的磁盘文件空间共有500块,若用字长为32位的位示图管理盘空间,试问: (1)位示图需多少个字? (2)第i字第j位对应的块号是多少? (3)并给出申请/归还一块的工 ...

  5. R语言基础题及答案(六)——R语言与统计分析第六章课后习题(汤银才)

    R语言与统计分析第六章课后习题(汤银才) 题-1 有一批枪弹, 出厂时, 其初速v∼N(950,σ2)v\sim N(950,\sigma^2)v∼N(950,σ2)(单位:m/sm/sm/s). 经 ...

  6. C++ Primer Plus(第六版)第十六章课后习题

    C++ Primer Plus(第六版)第十六章课后习题 16.10.1 #include <iostream> #include <string> using namespa ...

  7. 应用预测建模第四章过度拟合与模型调优习题4.4【分层随机抽样、小样本的模型评估方案】

    <应用预测建模>Applied Predictive Modeling (2013) by Max Kuhn and Kjell Johnson,林荟等译 第四章 过度拟合与模型调优 4. ...

  8. 《Python语言程序设计》王恺 机械工业出版社 第六章课后习题答案

    第六章 字符串 6.5 课后习题 (1)Python 中,创建字符串时,可以使用单引号.双引号和三引号 (2)Python 中,使用字符串的 split 方法可以按照指定的分隔符对字符串进行切割,返回 ...

  9. 计算机网络(第7版) - 第六章 应用层 - 习题

    第六章.应用层 本章的习题 互联网的域名结构是怎么样的?它与目前的电话网的号码结构有何异同之处? (1)域名的结构由标号序列组成,各标号之间用点隔开: - . 三级域名 . 二级域名 . 顶级域名各标 ...

最新文章

  1. 升级Windows Phone Developer Tools Beta
  2. Homebrew安装(MacOS)
  3. Config Sharepoint 2013 Workflow PowerShell Cmdlet
  4. c++宏定义常量为什么使用移位_干货 | C语言系列3——常量,运算符,常用数学函数.........
  5. CCF 202104-4 校门外的树 Python
  6. java_options字符串拼接_java8 StringJoiner拼接字符串
  7. 华为mate40会不会有鸿蒙系统,鸿蒙OS系统正式推送,拿华为Mate40更新后,发现了优缺点...
  8. 个人站立会议第二阶段04
  9. saltstack安装及简单配置
  10. JSP详细教学新手必看
  11. c语言5的阶乘流程图_10的阶乘(10的阶乘算法流程图)
  12. 用Firefox看CHM电子书
  13. javacpp-opencv图像处理系列:国内车辆牌照检测识别系统(万份测试准确率79.7%以上)
  14. 5-46 新浪微博热门话题 (30分)
  15. 看了这篇Docker指令详解,网友直呼:我收藏了你呢?
  16. 解决WinBUGS14 error:cannot bracket slice for node gamma[3]
  17. python手机app开发_H5 手机 App 开发入门:技术篇
  18. C语言system讲解
  19. day-17正则表达式
  20. 日本金融监管机构发布ICO风险提示

热门文章

  1. RT-Thread创始人新年寄语: 开源如水、商业似船
  2. VMware ESXi
  3. 【Golang】生成随机数,指定区间随机数
  4. 碳水循环饮食计算简介
  5. 计算机清理垃圾文件丢失怎么恢复,电脑文件数据删除怎么恢复-互盾数据恢复软件...
  6. C语言的万能“三板斧”
  7. PSP开发一 linux菜鸟fedora下配置psp开发环境
  8. BZOJ4899: 记忆的轮廓 期望DP 决策单调性
  9. ios 输入法扩展_如何给iOS系统原生输入法导入词库
  10. 数据处理(外贸数据管理系统)