R语言混合线性模型包代码演示

之前翻译了一篇博文R中的线性混合模型介绍（翻译博客）, 但是里面的示例代码显示不友好, 今天重新整理.

数据来源: MASS软件包的oats数据

oats数据, 这是一个燕麦的裂区试验数据, 主区是品种, 裂区是施肥N, 重复是B区组, 观测值是产量Y

The yield of oats from a split-plot field trial using three varieties and four levels of manurial treatment. The experiment was laid out in 6 blocks of 3 main plots, each split into 4 sub-plots. The varieties were applied to the main plots and the manurial treatments to the sub-plots.

library(MASS)
data(oats)
head(oats)

数据预览:

  B           V      N   Y
1 I     Victory 0.0cwt 111
2 I     Victory 0.2cwt 130
3 I     Victory 0.4cwt 157
4 I     Victory 0.6cwt 174
5 I Golden.rain 0.0cwt 117
6 I Golden.rain 0.2cwt 114

因为这个数据中, V即表示品种, 又表示主区, 这里我们新建两个变量, 分别代表主区和裂区, wholeplot和subplot

oats$mainplot = oats$V
oats$subplot = oats$N
head(oats)

转化后的数据为:

> head(oats)B           V      N   Y    mainplot subplot
1 I     Victory 0.0cwt 111     Victory  0.0cwt
2 I     Victory 0.2cwt 130     Victory  0.2cwt
3 I     Victory 0.4cwt 157     Victory  0.4cwt
4 I     Victory 0.6cwt 174     Victory  0.6cwt
5 I Golden.rain 0.0cwt 117 Golden.rain  0.0cwt
6 I Golden.rain 0.2cwt 114 Golden.rain  0.2cwt

模型

数据是裂区试验设计, 所以如果放在混合线性模型中:

固定因子: N + V + N:V
随机因子: block/mainplot

nlme软件包

特点:

固定因子放在~ 号后面, 在~后面定义固定因子
随机因子, random = 1|, 在|号后面定义随机因子

这是一个比较成熟的R包，是R语言安装时默认的包，它除了可以分析分层的线性混合模型，也可以处理非线性模型。在优势方面，个人认为它可以处理相对复杂的线性和非线性模型，可以定义方差协方差结构，可以在广义线性模型中定义几种分布函数和连接函数。

它的短板：

1、随机效应的定义过于呆板

2、数据量大时速度很慢

3、不能处理系谱数据

4、不能处理多变量数据。

代码:

library(nlme)
m1.nlme = lme(Y ~ V*N,random = ~ 1|B/mainplot,data = oats)
anova(m1.nlme)

结果:

> anova(m1.nlme)numDF denDF   F-value p-value
(Intercept)     1    45 245.14333  <.0001
V               2    10   1.48534  0.2724
N               3    45  37.68561  <.0001
V:N             6    45   0.30282  0.9322

lme4

特点:

固定因子和随机因子写在了一起
固定因子在~ 号后面
随机因子在括号里面(1|)的|号后面

lme4包是由Douglas Bates开发，他也是nlme包的作者之一，相对于nlme包而言，它的运行速度快一点，对于随机效应的结构也可以更复杂一点，但是它的缺点也和nlme一样：

1、不能处理协方差和相关系数结构

2、它可以与构建系数的包连接，比如mmpedigree包，但是结合比较脆弱。

3、不能处理多性状数据

library(lme4)
m1.lme4 = lmer(Y ~ V*N + (1|B/mainplot), data = oats)
anova(m1.lme4)

结果没有给出P值, 可以自己计算.

> anova(m1.lme4)
Analysis of Variance TableDf  Sum Sq Mean Sq F value
V    2   526.1   263.0  1.4853
N    3 20020.5  6673.5 37.6856
V:N  6   321.8    53.6  0.3028

asreml-R

这是商业软件, 不过非常友好.

特点:

固定因子在~ 号后面
随机因子在random=~ 后面定义

library(asreml)
m1.asreml = asreml(Y ~ V*N, random = ~ B/mainplot, data=oats)
wald(m1.asreml,denDF = "default")

结果:

$`Wald`Df denDF    F.inc           Pr
(Intercept)  1     5 245.1000 1.931825e-05
V            2    10   1.4850 2.723869e-01
N            3    45  37.6900 2.457710e-12
V:N          6    45   0.3028 9.321988e-01

MCMCglmm

MCMCglmm是基于贝叶斯的包, 也可以做混合线性模型分析:
我们看一下它的用法:

MCMCglmm(fixed, random=NULL, rcov=~units, family=“gaussian”, mev=NULL,
data,start=NULL, prior=NULL, tune=NULL, pedigree=NULL, nodes=“ALL”,
scale=TRUE, nitt=13000, thin=10, burnin=3000, pr=FALSE,
pl=FALSE, verbose=TRUE, DIC=TRUE, singular.ok=FALSE, saveX=TRUE,
saveZ=TRUE, saveXL=TRUE, slice=FALSE, ginverse=NULL, trunc=FALSE)

语法和asreml-R非常像, 这里我们将代码写为:

library(MCMCglmm)
m1.MCMC = MCMCglmm(Y ~ V*N, random = ~ B + B:mainplot, data=oats)
summary(m1.MCMC)

结果是对每个水平进行了显著性检验:

 Location effects: Y ~ V * N post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)           80.4332  59.5412 100.8704     1151  0.002 **
VMarvellous            6.0009 -16.1558  28.3933     1000  0.586
VVictory              -8.9212 -29.6295  13.2659     1000  0.426
N0.2cwt               18.0450   1.8383  34.5909     1000  0.030 *
N0.4cwt               34.3391  18.9617  49.2152     1000 <0.001 ***
N0.6cwt               44.4065  29.3491  61.0137     1000 <0.001 ***
VMarvellous:N0.2cwt    4.4074 -18.0155  26.1201     1000  0.676
VVictory:N0.2cwt       0.5882 -20.4920  24.0181     1123  0.982
VMarvellous:N0.4cwt   -3.6338 -25.6545  17.3987     1827  0.732
VVictory:N0.4cwt       5.1730 -17.3163  26.5974     1121  0.632
VMarvellous:N0.6cwt   -3.9847 -26.7962  17.7244     1193  0.708
VVictory:N0.6cwt       2.8611 -19.6938  25.8976     1000  0.788
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

使用aov分析裂区试验

m1.aov = aov(Y~V*N + Error(B/mainplot), data=oats)
summary(m1.aov)

结果:

> summary(m1.aov)Error: BDf Sum Sq Mean Sq F value Pr(>F)
Residuals  5  15875    3175               Error: B:mainplotDf Sum Sq Mean Sq F value Pr(>F)
V          2   1786   893.2   1.485  0.272
Residuals 10   6013   601.3               Error: WithinDf Sum Sq Mean Sq F value   Pr(>F)
N          3  20020    6673  37.686 2.46e-12 ***
V:N        6    322      54   0.303    0.932
Residuals 45   7969     177
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

PS, 那个MCMCglmm应该有一种方差形式的输出方法, 回头补充完整, FLAG!!!

代码汇总:

# install.packages("MASS")
library(MASS)
data(oats)
head(oats)oats$mainplot = oats$V
oats$subplot = oats$N
head(oats)# nlmelibrary(nlme)
m1.nlme = lme(Y ~ V*N,random = ~ 1|B/mainplot,data = oats)
anova(m1.nlme)# lme4library(lme4)
m1.lme4 = lmer(Y ~ V*N + (1|B/mainplot), data = oats)
anova(m1.lme4)# asreml
library(asreml)
m1.asreml = asreml(Y ~ V*N, random = ~ B/mainplot, data=oats)
wald(m1.asreml,denDF = "default")# MCMCglmm
library(MCMCglmm)
m1.MCMC = MCMCglmm(Y ~ V*N, random = ~ B + B:mainplot, data=oats)
summary(m1.MCMC)# aov
m1.aov = aov(Y~V*N + Error(B/mainplot), data=oats)
summary(m1.aov)