最近看论文,看到了Post hoc analy和Bonferroni correction,也不知道这是个啥,就学习一下,记录一下笔记,方便以后查阅。

Planned Contrasts and Post hoc Tests & 多重检验校正

  • Analysis of Variance
  • Planned Contrasts and Post hoc Tests
    • Planned contrasts
      • 概念
      • 计算步骤
      • multiple planned contrasts
    • Post hoc tests
      • 概念
      • multiple post hoc tests
      • 计算步骤
  • 多重检验校正
    • 假设检验
    • 多重假设检验和FWER, FDR
    • 多重检验矫正
    • FWER和FDR校正
      • Family-wise error rate(FWER)——Bonferroni correction
      • False discovery rate(FDR)——Benjamini–Hochberg procedure
        • False discovery rate
        • Benjamini–Hochberg procedure
  • 参考资料

Analysis of Variance

关于Analysis of Variance的一些信息在这里也有记录:Normal Distribution & Chi-squared Distribution & t distribution & F-distribution

接下来记录一些有意思的解释:

a one-way ANOVA includes one factor, whereas a two-way ANOVA includes two factors.

the term factor is used to designate a nominal variable, or in the case of an experimental design, the independent variable, that designates the groups being compared. If we have a drug trial in which we are comparing the mean pain scores of patients after receiving placebo, a low dose of the drug, or a high dose of the drug, the factor would be “drug dose.”

the term levels refers to the individual conditions or values that make up a factor. In our drug trial example, we have three levels of drug dose: placebo, low dose, and high dose.

So how is this ANOVA thing different from the t-tests we already learned? Well, in fact, you can think of it as an extension of the t-test to more than 2 groups. If you run an ANOVA on just 2 groups, the results are equivalent to the t-test. The only difference is that you get an F-value instead of a t-value.

再附上两个很有意思的图:

Planned Contrasts and Post hoc Tests

Planned contrasts and post-hoc tests are commonly performed following Analysis of Variance.

This is necessary in many instances, because ANOVA compares all individual mean differences simultaneously, in one test (referred to as an omnibus test).

If we run an ANOVA hypothesis test, and the F-test comes out significant, this indicates that at least one among the mean differences is statistically significant.

However, when the factor has more than two levels, it does not indicate which means differ significantly from each other.


In this example, a significant F-test result from a one-way ANOVA with the three drug dose conditions does not tell us where the significant difference lies.
Is it between 0 and 100 mg? Or between 100 and 200 mg? Or is it only the biggest difference that is significant – 0 vs. 200 mg?

Planned contrasts and post hoc tests are additional tests to determine exactly which mean differences are significant, and which are not.

Why is that we cannot just do 3 independent means t-tests here? Each time we conduct a t-test we have a certain risk of a Type I error. If we do 3, we have triple the risk.

So first we test for omnibus significance using the overall ANOVA as detailed in the first part of this chapter.
Then, if a statistically significant difference exists among the means, we do the pairwise comparisons with an adjustment to be more conservative.

These follow-up tests are designed specifically to avoid inflating risk of Type I error.

Now, this is very important. We are only allowed to conduct these tests if the F-test result was significant.

Planned contrasts

概念

Planned contrasts are used when researchers know in advance which groups they expect to differ.

For example, suppose from our worksheet example, we expect the pop group to differ from the classical group on our measure of working memory. We can then conduct a single comparison between these means without worrying about Type I error.

Because we hypothesized this difference before we saw the data, perhaps based on prior research studies or a strong intuitive hunch, and because there is only one comparison to be analyzed, we need not be concerned about inflated experimentwise alpha.
If multiple comparisons are planned, then we will need to adjust the significance level.

计算步骤

Let us take a look at how to conduct a single planned contrast. The process is quite simple, as it is just a modified ANOVA analysis.

First we calculate SSB with just those two groups involved in the planned contrast. We figure out the degrees of freedom between using just the two groups.

Then, we calculate the variance between using the new SSB and degrees of freedom, and we calculate an F-test for the comparison using the new variance between and the original overall variance within.

To find out if the F-test result is significant, we can use the new degrees of freedom but the original significance level for the cutoff. (Because there is just one pairwise comparison, we can use original significance level.)

multiple planned contrasts

If we were to perform multiple planned contrasts, things change a little.

Suppose we had hypothesized in this experiment that each group would differ from the others?

The Bonferroni correction involves adjusting the significance level to protect from the inflation of risk of Type I error.

The procedure for each comparison is the same as for a single planned contrast. The difference is that the cutoff score to determine statistical significance will use a more conservative significance level.

When we do multiple pairwise comparisons, the Bonferroni correction is to use the original significance level divided by number of planned contrasts.

Post hoc tests

概念

What about post hoc tests tests?

As the name suggests, these tests come into the picture when we are doing pairwise comparisons (usually all possible combinations) after the fact to find out where the significant differences were.

These are tests that do not require that we had an a priori hypothesis ahead of data collection.

Essentially, these are an allowable and acceptable form of data-snooping.

multiple post hoc tests

This is where we must be cautious about doing so many tests – we could end up with huge risk of Type I error.

If we use the Bonferroni correction that we saw for multiple planned comparisons on more than 3 tests, the significance level would be vanishingly small.
This would make it nearly impossible to detect significant differences.

For this reason, slightly more forgiving tests like Scheffe’s correction, Dunn’s or Tukey’s post-hoc tests are more popular.

There are many different post-hoc tests out there, and the choice of which one researchers use is often a matter of convention in their area of research.

计算步骤

Now we shall take a look at how to conduct post hoc tests using Scheffé’s correction.

In this example, we will test all pairwise comparisons.

The Scheffé technique involves adjusting the F-test result, rather than adjusting the significance level.

The way it works is the same as the planned contrast procedure, except for the very end.

Before we compare the F-test result to the cutoff score, we divide the F value by the overall degrees of freedom between, or the number of groups minus one.

Thus, we keep the significance level at the original level, but divide the calculated F by overall degrees of freedom between from the overall ANOVA.

多重检验校正

假设检验

假设检验的相关内容在这里也记录过:Normal Distribution & Chi-squared Distribution & t distribution & F-distribution

假设检验是用于检验统计假设的一种方法。它的基本思想是小概率思想,小概率思想是指小概率事件在一次试验中基本上不会发生。

假设检验的基本方法是提出一个空假设(null hypothesis),也叫做原假设,记作H0H_0H0​ ;然后得出感兴趣的备择假设(alternative hypothesis),记作H1H_1H1​或HAH_AHA​ 。

空假设和备择假设的指导原则是空假设是不感兴趣对研究不重要的结论,而备择假设是我们感兴趣想要证明的结论

举个栗子,给定人的一些身体指标数据,判断其是否存在某种疾病(比如肺炎)。

H0H_0H0​ :某人没病(我们不感兴趣);H1H_1H1​ :某人有病(我们感兴趣)。
将这些身体指标数据和已确定的或健康或有病的一些人的身体数据等样本信息比较,计算ppp值,一般指定显著性水平α=0.05α = 0.05α=0.05,如果ppp值小于0.050.050.05,表示这是一个小概率事件。根据小概率思想,我们与其相信这个小概率事件的发生,不如认为更为合理的选择是拒绝原假设,认为该人有病;否则无法拒绝原假设,即接受原假设,表示没有足够的证据认为该人有病。

注:

  1. 统计显著性:空假设为真的情况下拒绝零假设所要承担的风险水平,又叫概率水平。
  2. ppp值:假定空假设为真的情况下,得到相同样本结果或更极端结果的概率,是一个用来衡量统计显著性的重要指标。
  3. 显著性水平ααα:空假设为真时,错误地拒绝空假设的概率。另外,也可以把这种概率理解成在假设检验中决策所面临的风险。
  4. 比起计算ppp值,也可以计算统计量,根据显著性水平判断统计量是否落入拒绝域,进而决定是否拒绝原假设。统计量没有ppp直观,所以采用ppp值进行表述。

一次检验有四种可能的结果,用下面的表格表示:

  • Type I error,I类错误,也叫做ααα错误
  • Type II error,II类错误,也叫做βββ错误
  • FP: false positive,假正例,I类错误
  • FN: false negative,假反例,II类错误
  • TP: true positive,真正例
  • TN: true negative,真反例

I类错误是指空假设为真却被我们拒绝的情况,犯这种错误的概率用ααα表示,所以也称为ααα错误或弃真错误;

II类错误是指空假设为假但我们没有拒绝的情况,犯这种错误的概率用βββ表示,所以也称为βββ错误或取伪错误。

所以,空假设为真并且我们没有拒绝的概率用1−α1−α1−α表示,空假设为假并被我们拒绝的概率用1−β1-\beta1−β示。

多重假设检验和FWER, FDR

顾名思义,多重假设检验就是多个假设检验。如果有mmm个人需要检查是否有病,那么就需要进行mmm假设检验。mmm个假设检验的结果可以表示为:

  • mmm 表示假设检验的个数
  • m0m_{0}m0​表示空假设为真的个数
  • m−m0m-m_0m−m0​表示备择假设为真的个数
  • VVV 表示假正例的个数
  • SSS 表示真正例的个数
  • UUU 表示真反例的个数
  • TTT 表示假反例的个数
  • R=V+SR=V+SR=V+S表示拒绝空假设的个数

如果某次假设检验得到的ppp值小于显著性水平ααα,则拒绝空假设,主观上认为发现了一个有病的人(无论该人实际上真有病还是假有病),这种情况记为一次发现(discovery)。

所以R=V+SR=V+SR=V+S表示发现的个数,VVV表示错误发现(false discovery)的个数,SSS表示正确发现(true discovery)的个数。

用QQQ表示发现中错误发现的比例,即Q=V/R=V/(V+S)Q=V/R=V/(V+S)Q=V/R=V/(V+S)。

FWERFWERFWER定义为VVV大于等于1的概率,即FWER=Pr{V≥1}=1−Pr{V=0}FWER=Pr\{V\geq1\}=1-Pr\{V=0\}FWER=Pr{V≥1}=1−Pr{V=0}。

FDRFDRFDR定义为QQQ的期望,即FDR=E[Q]FDR=E[Q]FDR=E[Q]。

因为在mmm检验中,V,S,U,TV,S,U,TV,S,U,T都是随机变量,所以FDRFDRFDR需要用期望的形式来表示。另外,如果R=0R=0R=0,认为Q=0Q=0Q=0。为了包含这种情况,FDR=E[V/R∣R>0]⋅P{R>0}FDR=E[V/R|R>0] \cdot P\{R>0\}FDR=E[V/R∣R>0]⋅P{R>0}。通俗地理解,可以认为FDR=Q=V/R=V/(V+S)FDR=Q=V/R=V/(V+S)FDR=Q=V/R=V/(V+S)。

FWERFDR表示一种概念或一种方法

FWER定义为多重假设检验中发现至少一个I类错误的概率,FDR定义为多重假设检验中错误发现占所有发现的比例。

另外,对应地,还存在FWER校正方法和FDR校正方法(也称为控制方法)。

两类校正方法都是用来控制多重假设检验中犯I类错误的概率,使其低于显著性水平ααα。

FWER校正有多种实现,其中最经典的是Bonferroni correction;FDR校正也有多种实现,其中最经典的就是Benjamini–Hochberg procedure

多重检验矫正

在一次假设检验中,我们使用显著性水平ααα和ppp值得出结论。显著性水平ααα一般取0.050.050.05或0.010.010.01,可以保证一次假设检验中犯I类错误的概率和决策错误的风险小于ααα。

但是在mmm次假设检验中,假设m=100m=100m=100和α=0.01\alpha=0.01α=0.01,假设检验之间相互独立,不犯错误的概率为(1−0.01)100=36.6%(1-0.01)^{100}=36.6\%(1−0.01)100=36.6%,而至少犯一次错误的概率高达P=1−(1−0.01)100=1−0.366=63.4%P=1-(1-0.01)^{100}=1-0.366=63.4\%P=1−(1−0.01)100=1−0.366=63.4%。

举个实际的例子,假如有一种诊断艾滋病的试剂,试验验证其准确性为99%(每100次诊断就有一次false positive)。对于一个被检测的人来说(single test),这种准确性足够了。但对于医院来说(multiple test),这种准确性远远不够,因为每诊断10000个人,就会有100个非艾滋病病人被误诊为艾滋病。这显然是不能接受的。所以,对于多重检验,如果不进行任何控制,犯一类错误的概率便会随着假设检验的个数迅速增加。

为了解决多次检验带来的问题,我们需要对多次检验进行校正。

FWER和FDR校正都可以使多重假设检验整体犯I类错误的概率低于预先设定的显著性水平ααα。

FWER显得较为保守,它主要是依靠减少假阳性(I类错误)的个数,同时也会减少TDR(true discovery rate)。

FDR方法是一种更加新颖靠谱的方法,它会对每个测试用例使用校正后的ppp值(qqq值),达到了更好的效果:在检验出尽可能多的阳性结果的同时将错误发现率控制在可以接受的范围。

FWER和FDR校正

FWER校正和FDR校正均有多种实现,这里只介绍两类方法的经典实现。

条件:在mmm次多重假设检验中,每一次的空假设记为H1,H2,...,HmH_1, H_2, ..., H_mH1​,H2​,...,Hm​,对应ppp值记为p1,p2,...,pmp_1, p_2, ..., p_mp1​,p2​,...,pm​,设定显著性水平ααα。

Family-wise error rate(FWER)——Bonferroni correction

邦费罗尼校正(英语:Bonferroni correction)是统计学中在多重比较时使用的一种校正方法,以意大利数学家卡罗·埃米利奥·邦费罗尼的名字命名。

令H1,…,Hm{\displaystyle H_{1},\ldots ,H_{m}}H1​,…,Hm​为一组假设,p1,…,pm{\displaystyle p_{1},\ldots ,p_{m}}p1​,…,pm​为每一假设相对应的ppp值。同时,mmm为零假设总数,m0{\displaystyle m_{0}}m0​则为实际为真的零假设总数。族错误率(familywise error rate,简称FWER)指拒绝至少一个实际为真的零假设(即出现至少一次第一类错误)的概率。此时,邦费罗尼校正是指拒绝所有pi≤αm{\displaystyle p_{i}\leq {\frac {\alpha }{m}}}pi​≤mα​的零假设。在应用邦费罗尼校正后,FWER满足FWER≤α{\displaystyle {\text{FWER}}\leq \alpha }FWER≤α。这一结论可以由布尔不等式证明:

这样就能控制多重假设检验整体犯I类错误的概率低于预先设定的显著性水平ααα。

另外,FWER校正不需要假设所有假设彼此之间相互独立,也不需要对空假设为真的个数有要求。

邦费罗尼校正是一种相对保守的FWER控制方法,会增加出现第二类错误的概率。

False discovery rate(FDR)——Benjamini–Hochberg procedure

False discovery rate

假发现率(False discovery rate, FDR)完善了对多重假设测试的检验。

FDR=Qe=E⁣[Q],{\displaystyle \mathrm {FDR} =Q_{e}=\mathrm {E} \!\left[Q\right],}FDR=Qe​=E[Q],其中EEE表示期望,Q=V/R=V/(V+S){\displaystyle Q=V/R=V/(V+S)}Q=V/R=V/(V+S),VVV表示错误拒绝零假设的数目,RRR表示拒绝零假设的数目。RRR取000时FDR直接取000,写成一句话就是FDR=E⁣[V/R∣R>0]⋅P⁣(R>0){\displaystyle \mathrm {FDR} =\mathrm {E} \!\left[V/R|R>0\right]\cdot \mathrm {P} \!\left(R>0\right)}FDR=E[V/R∣R>0]⋅P(R>0)

假发现率被用以校正多重比较所致的误差。在拒绝多个零假设时,FDR校正程序能够控制错误拒绝零假设(伪阳性)的可能性,来找到合适的结果组合。

较之于FWER校正(family-wise error rate),FDR校正程序采用了更为宽松的标准(比如Bonferroni 校正,“一个假阳性也不许”)。所以,FDR校正法在提高一类错误(应接受零假设,却拒绝零假设)的同时,有更好的统计功效。

Benjamini–Hochberg procedure

Benjamini–Hochberg procedure,简称为BH。首先对所有的ppp值从小到大排序,并记作p(1),p(2),...,p(m)p_{(1)},p_{(2)},...,p_{(m)}p(1)​,p(2)​,...,p(m)​,其对应的空假设为H(1),H(2),...,H(m)H_{(1)},H_{(2)},...,H_{(m)}H(1)​,H(2)​,...,H(m)​。

若想控制FDR不超过ααα,需要找到最大的正整数kkk,使得p(k)≤k∗αmp_{(k)}\leq\frac{k*\alpha}{m}p(k)​≤mk∗α​【公式1】。然后,拒绝1≤i≤k1≤i≤k1≤i≤k时的所有空假设H(1),H(2),...,H(i),...,H(k)H_{(1)},H_{(2)},...,H_{(i)},...,H_{(k)}H(1)​,H(2)​,...,H(i)​,...,H(k)​。

可以看出,BH相当于对排序后的假设提供了不同的显著性水平。

FDR为QQQ期望,即FDR=E[Q]FDR=E[Q]FDR=E[Q],可证

这样就能从统计学上保证FDR不超过ααα,保证多重假设检验整体犯I类错误的概率低于预先设定的显著性水平ααα。

另外,BH是有效的条件是要求mmm个检验是相互独立的。

最后再说一下qqq值(q−valueq-valueq−value),q=p(k)∗mkq=\frac{p_{(k)}*m}{k}q=kp(k)​∗m​,相当于对【公式1】进行变换,一般把qqq值叫做校正后的ppp值。

参考资料

如何通俗地理解Family-wise error rate(FWER)和False discovery rate(FDR)

多重假设检验与Bonferroni校正、FDR校正

邦费罗尼校正

假发现率

Family-wise error rate

False discovery rate

Beginner Statistics for Psychology

Planned Contrasts and Post hoc Tests 多重检验校正相关推荐

  1. 哈佛大学——差异表达分析(八)假设检验和多重检验校正

    文章目录 学习目标 DESeq2:模型拟合和假设检验 广义线性模型(Generalized Linear Model) 假设检验 Wald 检验 似然比检验(Likelihood ratio test ...

  2. 多元有序logistic回归_医学统计与R语言:多分类logistic回归HosmerLemeshow拟合优度检验...

    微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发 输入1:multinominal logistic regression install.packages("nnet" ...

  3. 二元置信椭圆r语言_医学统计与R语言:多分类logistic回归HosmerLemeshow拟合优度检验...

    微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发 输入1:multinominal logistic regression "nnet") 结果1: test (mult ...

  4. r语言library什么意思_医学统计与R语言:百分条图与雷达图

    微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发 百分条图-输入1: library(ggplot2) 结果1: year 输入2: percentbar <- gather(perc ...

  5. 在r中弄方差分析表_医学统计与R语言: qvalue

    微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发 (FalseDiscoveryRate(FDR)=Expected(FalsePositive/(FalsePositive+TruePos ...

  6. 语言nomogram校准曲线图_医学统计与R语言:Meta 回归作图(Meta regression Plot)

    微信公众号:医学统计与R语言如果你觉得对你有帮助,欢迎转发 输入1: install.packages("metafor") library(metafor) dat.bcg 结果 ...

  7. 总被审稿人提起的多重假设检验校正是什么?

    生物信息学习的正确姿势 NGS系列文章包括NGS基础.在线绘图.转录组分析 (Nature重磅综述|关于RNA-seq你想知道的全在这).ChIP-seq分析 (ChIP-seq基本分析流程).单细胞 ...

  8. matlab mafdr,matlab FDR校正

    http://home.52brain.com/forum.php?mod=viewthread&tid=27066&page=1#pid170857 http://www.mathw ...

  9. 多重假设检验与Bonferroni校正、FDR校正

    总结起来就三句话: (1)当同一个数据集有n次(n>=2)假设检验时,要做多重假设检验校正 (2)对于Bonferroni校正,是将p-value的cutoff除以n做校正,这样差异基因筛选的p ...

最新文章

  1. Elasticsearch索引生命周期管理方案
  2. C 指针的有意思的描述
  3. python安装的模块在pycharm中能用吗_pycharm内无法import已安装的模块问题解决
  4. selenium中webdriver跳转新页面后定位置新页面的两种方式
  5. log4cplus的内存泄露问题
  6. 构建消费者数据平台(CDP),实现全域消费者数字化运营闭环
  7. 前端月趋势榜:7 月最新上榜的、热门的 10 个前端开源项目 - 2107
  8. 长在华人第一学霸家族的他,到底有多牛?
  9. 安全监控、告警及自动化!
  10. Python爬虫之(六)requests库的用法
  11. jquery伪分页控件
  12. ubuntu上的翻译软件,看论文神器
  13. C# 实现获取网络时间
  14. 算法第四版练习题答案
  15. 计算机软考知识点总结,系统分析师历年计算机软考复习知识点总结(10)
  16. 安卓按键:紫猫老师的正则教程
  17. 虚拟机安装黑苹果mac 10.12系统安装教程
  18. #笔记#微信小程序的bindtap如何传参
  19. iPhoneX适配问题 iOS刘海屏 安全区域处理 IOS小黑条处理 IOS兼容处理
  20. 三菱FX5U传送指令

热门文章

  1. wordpress cookies 遇到预料外错误 阿里云虚拟机
  2. 开关电源-半桥LLC控制
  3. 【网页特效】12 个炫酷背景特效库
  4. FCN(全卷积网络)部分函数方法更新说明
  5. HDU1847:Good Luck in CET-4 Everybody!(SG博弈)
  6. Bmob后端云的基本使用
  7. 用户注册登录页面的设计与实现
  8. android闪光灯测心率,MIUI 12新功能来了,通过闪光灯测心率,支持全部机型
  9. 现代企业工程项目管理数字化能力蓝图-基于Oracle unifier平台
  10. vivo手机mitmproxy安全证书安装