统计推断(一) Hypothesis Test
个人博客地址 Glooow,欢迎光临~~~
文章目录
- 1. Binary Bayesian hypothesis testing
- 1.0 Problem Setting
- 1.1 Binary Bayesian hypothesis testing
- Special cases
- 1.2 Likelyhood Ratio Test
- 1.3 ROC
- 2. Non-Bayesian hypo test
- Neyman-Pearson criterion
- 3. Randomized test
- 3.1 Decision rule
- 3.2 Proposition
- 3.3 Efficient frontier
- 4. Minmax hypo testing
- 4.1 Decision rule
1. Binary Bayesian hypothesis testing
1.0 Problem Setting
- Hypothesis
- Hypothesis space H={H0,H1}\mathcal{H}=\{H_0, H_1\}H={H0,H1}
- Bayesian approach: Model the valid hypothesis as an RV H
- Prior P0=pH(H0),P1=pH(H1)=1−P0P_0 = p_\mathsf{H}(H_0), P_1=p_\mathsf{H}(H_1)=1-P_0P0=pH(H0),P1=pH(H1)=1−P0
- Observation
- Observation space Y\mathcal{Y}Y
- Observation Model py∣H(⋅∣H0),py∣H(⋅∣H1)p_\mathsf{y|H}(\cdot|H_0), p_\mathsf{y|H}(\cdot|H_1)py∣H(⋅∣H0),py∣H(⋅∣H1)
- Decision rule f:Y→Hf:\mathcal{Y\to H}f:Y→H
- Cost function C:H×H→RC: \mathcal{H\times H} \to \mathbb{R}C:H×H→R
- Let Cij=C(Hj,Hi),correcthypoisHjC_{ij}=C(H_j,H_i), correct hypo is H_jCij=C(Hj,Hi),correcthypoisHj
- CCC is valid if Cjj<CijC_{jj}<C_{ij}Cjj<Cij
- Optimum decision rule H^(⋅)=argminf(⋅)E[C(H,f(y))]\hat{H}(\cdot) = \arg\min\limits_{f(\cdot)}\mathbb{E}[C(\mathsf{H},f(\mathsf{y}))]H^(⋅)=argf(⋅)minE[C(H,f(y))]
1.1 Binary Bayesian hypothesis testing
Theorem: The optimal Bayes’ decision takes the form
L(y)≜py∣H(⋅∣H1)py∣H(⋅∣H0)⋛H1P0P1C10−C00C01−C11≜ηL(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \frac{P_0}{P_1} \frac{C_{10}-C_{00}}{C_{01}-C_{11}} \triangleq \eta L(y)≜py∣H(⋅∣H0)py∣H(⋅∣H1)⋛H1P1P0C01−C11C10−C00≜η
Proof:
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(f) &=…
Given y∗y^*y∗
- if f(y∗)=H0f(y^*)=H_0f(y∗)=H0, E=C00pH∣y(H0∣y∗)+C01pH∣y(H1∣y∗)\mathbb{E}=C_{00}p_{\mathsf{H|y}}(H_0|y^*)+C_{01}p_{\mathsf{H|y}}(H_1|y^*)E=C00pH∣y(H0∣y∗)+C01pH∣y(H1∣y∗)
- if f(y∗)=H1f(y^*)=H_1f(y∗)=H1, E=C10pH∣y(H0∣y∗)+C11pH∣y(H1∣y∗)\mathbb{E}=C_{10}p_{\mathsf{H|y}}(H_0|y^*)+C_{11}p_{\mathsf{H|y}}(H_1|y^*)E=C10pH∣y(H0∣y∗)+C11pH∣y(H1∣y∗)
So
pH∣y(H1∣y∗)pH∣y(H0∣y∗)⋛H1C10−C00C01−C11\frac{p_\mathsf{H|y}(H_1|y^*)}{p_\mathsf{H|y}(H_0|y^*)} \overset{H_1} \gtreqless \frac{C_{10}-C_{00}}{C_{01}-C_{11}} pH∣y(H0∣y∗)pH∣y(H1∣y∗)⋛H1C01−C11C10−C00
备注:证明过程中,注意贝叶斯检验为确定性检验,因此对于某个确定的 y,f(y)=H1f(y)=H_1f(y)=H1 的概率要么为 0 要么为 1。因此对代价函数求期望时,把 H 看作是随机变量,而把 f(y)f(y)f(y) 看作是确定的值来分类讨论
Special cases
- Maximum a posteriori (MAP)
- C00=C11=0,C01=C10=1C_{00}=C_{11}=0,C_{01}=C_{10}=1C00=C11=0,C01=C10=1
- H^(y)==argmaxH∈{H0,H1}pH∣y(H∣y)\hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{H|y}(H|y)H^(y)==argH∈{H0,H1}maxpH∣y(H∣y)
- Maximum likelihood (ML)
- C00=C11=0,C01=C10=1,P0=P1=0.5C_{00}=C_{11}=0,C_{01}=C_{10}=1, P_0=P_1=0.5C00=C11=0,C01=C10=1,P0=P1=0.5
- H^(y)==argmaxH∈{H0,H1}py∣H(y∣H)\hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{y|H}(y|H)H^(y)==argH∈{H0,H1}maxpy∣H(y∣H)
1.2 Likelyhood Ratio Test
Generally, LRT
L(y)≜py∣H(⋅∣H1)py∣H(⋅∣H0)⋛H1ηL(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \eta L(y)≜py∣H(⋅∣H0)py∣H(⋅∣H1)⋛H1η
- Bayesian formulation gives a method of calculating η\etaη
- L(y)L(y)L(y) is a sufficient statistic for the decision problem
- L(y)L(y)L(y) 的可逆函数也是充分统计量
充分统计量
1.3 ROC
- Detection probability PD=P(H^=H1∣H=H1)P_D = P(\hat{H}=H_1 | \mathsf{H}=H_1)PD=P(H^=H1∣H=H1)
- False-alarm probability PF=P(H^=H1∣H=H0)P_F = P(\hat{H}=H_1 | \mathsf{H}=H_0)PF=P(H^=H1∣H=H0)
性质(重要!)
- LRT 的 ROC 曲线是单调不减的
2. Non-Bayesian hypo test
- Non-Bayesian 不需要先验概率或者代价函数
Neyman-Pearson criterion
maxH^(⋅)PDs.t.PF≤α\max_{\hat{H}(\cdot)}P_D \ \ \ s.t. P_F\le \alpha H^(⋅)maxPD s.t.PF≤α
Theorem(Neyman-Pearson Lemma):NP 准则的最优解由 LRT 得到,其中 η\etaη 由以下公式得到
PF=P(L(y)≥η∣H=H0)=αP_F=P(L(y)\ge\eta | \mathsf{H}=H_0) = \alpha PF=P(L(y)≥η∣H=H0)=α
Proof:
物理直观:同一个 PFP_FPF 时 LRT 的 PDP_DPD 最大。物理直观来看,LRT 中判决为 H1 的区域中 p(y∣H1)p(y∣H0)\frac{p(y|H_1)}{p(y|H_0)}p(y∣H0)p(y∣H1) 都尽可能大,因此 PFP_FPF 相同时 PDP_DPD 可最大化
备注:NP 准则最优解为 LRT,原因是
- 同一个 PFP_FPF 时, LRT 的 PDP_DPD 最大
- LRT 取不同的 η\etaη 时,PFP_FPF 越大,则 PDP_DPD 也越大,即 ROC 曲线单调不减
3. Randomized test
3.1 Decision rule
Two deterministic decision rules H′^(⋅),H′′^(⋅)\hat{H'}(\cdot),\hat{H''}(\cdot)H′^(⋅),H′′^(⋅)
Randomized decision rule H^(⋅)\hat{H}(\cdot)H^(⋅) by time-sharing
H^(⋅)={H^′(⋅),with probability pH^′′(⋅),with probability 1−p\hat{\mathrm{H}}(\cdot)=\left\{\begin{array}{ll}{\hat{H}^{\prime}(\cdot),} & {\text { with probability } p} \\ {\hat{H}^{\prime \prime}(\cdot),} & {\text { with probability } 1-p}\end{array}\right. H^(⋅)={H^′(⋅),H^′′(⋅), with probability p with probability 1−p- Detection prob PD=pPD′+(1−p)PD′′P_D=pP_D'+(1-p)P_D''PD=pPD′+(1−p)PD′′
- False-alarm prob PF=pPF′+(1−P)PF′′P_F=pP_F'+(1-P)P_F''PF=pPF′+(1−P)PF′′
A randomized decision rule is fully described by pH^∣y(Hm∣y)p_{\mathsf{\hat{H}|y}}(H_m|y)pH^∣y(Hm∣y) for m=0,1
3.2 Proposition
Bayesian case: cannot achieve a lower Bayes’ risk than the optimum LRT
Proof: Risk for each y is linear in pH∣y(H0∣y)p_{\mathrm{H} | \mathbf{y}}\left(H_{0} | \mathbf{y}\right)pH∣y(H0∣y), so the minima is achieved at 0 or 1, which degenerate to deterministic decision
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(\mathb…Neyman-Pearson case:
- continuous-valued: For a given PFP_FPF constraint, randomized test cannot achieve a larger PDP_DPD than optimum LRT
- discrete-valued: For a given PFP_FPF constraint, randomized test can achieve a larger PDP_DPD than optimum LRT. Furthermore, the optimum rand test corresponds to simple time-sharing between the two LRTs nearby
3.3 Efficient frontier
Boundary of region of achievable (PD,PF)(P_D,P_F)(PD,PF) operation points
- continuous-valued: ROC of LRT
- discrete-valued: LRT points and the straight line segments
Facts
- PD≥PFP_D \ge P_FPD≥PF
- efficient frontier is concave function
- dPDdPF=η\frac{dP_D}{dP_F}=\etadPFdPD=η
4. Minmax hypo testing
prior: unknown, cost fun: known
4.1 Decision rule
minmax approach
H^(⋅)=argminf(⋅)maxp∈[0,1]φ(f,p)\hat H(\cdot)=\arg\min_{f(\cdot)}\max_{p\in[0,1]} \varphi(f,p) H^(⋅)=argf(⋅)minp∈[0,1]maxφ(f,p)optimal decision rule
H^(⋅)=H^p∗(⋅)p∗=argmaxp∈[0,1]φ(H^p,p)\hat H(\cdot)=\hat{H}_{p_*}(\cdot) \\ p_* = \arg\max_{p\in[0,1]} \varphi(\hat H_p, p) H^(⋅)=H^p∗(⋅)p∗=argp∈[0,1]maxφ(H^p,p)要想证明上面的最优决策,首先引入 mismatch Bayes decision
H^q(y)={H1,L(y)≥1−qqC10−C00C01−C11H0,otherwise\hat{\mathrm{H}}_q(y)=\left\{ \begin{array}{ll}{H_1,} & {L(y) \ge \frac{1-q}{q}\frac{C_{10}-C_{00}}{C_{01}-C_{11}}} \\ {H_0,} & {otherwise}\end{array}\right. H^q(y)={H1,H0,L(y)≥q1−qC01−C11C10−C00otherwise
代价函数如下,可得到 φ(H^q,p)\varphi(\hat H_q,p)φ(H^q,p) 与概率 ppp 成线性关系
φ(H^q,p)=(1−p)[C00(1−PF(q))+C10PF(q)]+p[C01(1−PD(q))+C11PD(q)]\varphi(\hat H_q,p)=(1-p)[C_{00}(1-P_F(q))+C_{10}P_F(q)] + p[C_{01}(1-P_D(q))+C_{11}P_D(q)] φ(H^q,p)=(1−p)[C00(1−PF(q))+C10PF(q)]+p[C01(1−PD(q))+C11PD(q)]
Lemma: Max-min inequality
maxxminyg(x,y)≤minymaxxg(x,y)\max_x\min_y g(x,y) \le \min_y\max_x g(x,y) xmaxyming(x,y)≤yminxmaxg(x,y)
Theorem:
minf(⋅)maxp∈[0,1]φ(f,p)=maxp∈[0,1]minf(⋅)φ(f,p)\min_{f(\cdot)}\max_{p\in[0,1]}\varphi(f,p)=\max_{p\in[0,1]}\min_{f(\cdot)}\varphi(f,p) f(⋅)minp∈[0,1]maxφ(f,p)=p∈[0,1]maxf(⋅)minφ(f,p)
Proof of Lemma: Let h(x)=minyg(x,y)h(x)=\min_y g(x,y)h(x)=minyg(x,y)
g(x)≤f(x,y),∀x∀y⟹maxxg(x)≤maxxf(x,y),∀y⟹maxxg(x)≤minymaxxf(x,y)\begin{aligned} g(x) &\leq f(x, y), \forall x \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \max _{x} f(x, y), \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \min _{y} \max _{x} f(x, y) \end{aligned} g(x)⟹xmaxg(x)⟹xmaxg(x)≤f(x,y),∀x∀y≤xmaxf(x,y),∀y≤yminxmaxf(x,y)
Proof of Thm: 先取 ∀p1,p2∈[0,1]\forall p_1,p_2 \in [0,1]∀p1,p2∈[0,1],可得到
φ(H^p1,p1)=minfφ(f,p1)≤maxpminfφ(f,p)≤minfmaxpφ(f,p)≤maxpφ(H^p2,p)\varphi(\hat H_{p_1},p_1)=\min_f \varphi(f,p_1) \le \max_p \min_f \varphi(f,p) \le \min_f \max_p \varphi(f, p) \le \max_p \varphi(\hat H_{p_2}, p) φ(H^p1,p1)=fminφ(f,p1)≤pmaxfminφ(f,p)≤fminpmaxφ(f,p)≤pmaxφ(H^p2,p)
由于 p1,p2p_1,p_2p1,p2 任取时上式都成立,因此可以取 p1=p2=p∗=argmaxpφ(H^p,p)p_1=p_2=p_*=\arg\max_p \varphi(\hat H_p, p)p1=p2=p∗=argmaxpφ(H^p,p)要想证明定理则只需证明 φ(H^p∗,p∗)=maxpφ(H^p∗,p)\varphi(\hat H_{p_*},p_*)=\max_p \varphi(\hat H_{p_*}, p)φ(H^p∗,p∗)=maxpφ(H^p∗,p)
由前面可知 φ(H^q,p)\varphi(\hat H_q,p)φ(H^q,p) 与 ppp 成线性关系,因此要证明上式
- 若 p∗∈(0,1)p_* \in (0,1)p∗∈(0,1),只需 ∂φ(H^q∗,p)∂p∣for any p=0\left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p}=0∂p∂φ(H^q∗,p)∣∣∣∣for any p=0,等式自然成立
- 若 p∗=1p_* = 1p∗=1,只需 ∂φ(H^q∗,p)∂p∣for any p>0\left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p} > 0∂p∂φ(H^q∗,p)∣∣∣∣for any p>0,最优解就是 p=1p=1p=1;q∗=0q_*=0q∗=0 同理
根据下面的引理,可以得到最优决策就是 Bayes 决策 p∗=argmaxpφ(H^p,p)p_*=\arg\max_p \varphi(\hat H_p, p)p∗=argmaxpφ(H^p,p),其中 p∗p_*p∗ 满足
0=∂φ(H^p∗,p)∂p=(C01−C00)−(C01−C11)PD(p∗)−(C10−C00)PF(p∗)\begin{aligned} 0 &=\frac{\partial \varphi\left(\hat{H}_{p_{*}}, p\right)}{\partial p} \\ &=\left(C_{01}-C_{00}\right)-\left(C_{01}-C_{11}\right) P_{\mathrm{D}}\left(p_{*}\right)-\left(C_{10}-C_{00}\right) P_{\mathrm{F}}\left(p_{*}\right) \end{aligned} 0=∂p∂φ(H^p∗,p)=(C01−C00)−(C01−C11)PD(p∗)−(C10−C00)PF(p∗)
Lemma:
dφ(H^p,p)dp∣p=q=∂φ(H^q,p)∂p∣p=q=∂φ(H^q,p)∂p∣for any p\left.\frac{\mathrm{d} \varphi\left(\hat{H}_{p}, p\right)}{\mathrm{d} p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{\text {for any } p} dpdφ(H^p,p)∣∣∣∣∣∣p=q=∂p∂φ(H^q,p)∣∣∣∣∣∣p=q=∂p∂φ(H^q,p)∣∣∣∣∣∣for any p
其他内容请看:
统计推断(一) Hypothesis Test
统计推断(二) Estimation Problem
统计推断(三) Exponential Family
统计推断(四) Information Geometry
统计推断(五) EM algorithm
统计推断(六) Modeling
统计推断(七) Typical Sequence
统计推断(八) Model Selection
统计推断(九) Graphical models
统计推断(十) Elimination algorithm
统计推断(十一) Sum-product algorithm
统计推断(一) Hypothesis Test相关推荐
- 统计推断(二) Estimation Problem
1. Bayesian parameter estimation Formulation Prior distribution px(⋅)p_{\mathsf{x}}(\cdot)px(⋅) Obs ...
- 统计推断(九) Graphical models
1. Undirected graphical models(Markov random fields) 节点表示随机变量,边表示与节点相关的势函数 px(x)∝φ12(x1,x2)φ13(x1,x3 ...
- null hypothesis
[nʌl] 空,零 [haɪ'pɒθɪsɪs]n. 假设 零假设(null hypothesis),统计学术语,又称原假设,指进行统计检验时预先建立的假设.零假设成立时,有关统计量应服从已知的某种概 ...
- casella pdf 统计推断_统计推断_PDF图书下载_George Casella,Roger L. Berger_免费PDF电子书下载_第一图书网...
内容概要 本书从概率论的基础开始,通过例子与习题的旁征博引,引进了大量近代统计处理的新技术和一些国内同类教材中不能见而广为使用的分布.其内容包括工科概率论入门.经典统计和现代统计的基础,又加进了不少近 ...
- 生物统计学(biostatistics)学习笔记(四)统计推断(已知样本推总体)
第四章统计推断(已知样本推总体) 文章目录 第四章统计推断(已知样本推总体) 假设检验的原理与方法 样本方差的同质性检验 样本平均数的假设检验 参数估计 上一章我们讨论了已知总体的时候样本的特征,即抽 ...
- 【学习笔记】计算机时代的统计推断(Bradley Efron and Trevor Hastie 著)
序言 英文版教材免费下载地址: CASI 笔者本来是打算写来作为期末复习使用的, 但是发现写着写着变成了翻译教材, 实在是太草了; 本来以为提前一个星期动笔一定可以趁复习时顺手做完这本教材的摘要, 现 ...
- 【R】快速实现统计推断
如君愿,开门见山,直入主题吧! 1 t检验 1.1 单样本t检验 对总体均值的假设检验 单样本 t 检验是最基础的假设检验,利用来自总体的样本数据,推断总体均值于假设的检验值之间是否存 ...
- t检验与F检验 /统计常识 / 统计学笔记(2)--随机抽样与统计推断的逻辑
1,T检验和F检验的由来 一般而言,为了确定从样本(sample)统计结果推论至总体时所犯错的概率,我们会利用统计学家所开发的一些统计方法,进行统计检定. 通过把所得到的统计检定值,与统计学家建立了 ...
- 【定量分析、量化金融与统计学】统计推断基础(4)---假设检验(T或者Z检验)
目录 一.前言 二.假设检验的基本概念 1.假设检验的概念与目的: 2.原假设 三.假设检验的实例步骤: 例子: 一个问题: 四.不同种类的假设检验: one-sample T检验: Two-samp ...
最新文章
- 黑客基础知识与防护(一)
- 2018 F40中国青年投资人
- JSONUtil,POJO实体类和JSON互转,
- kvm架构服务器_顺应云计算变革大势,腾讯云全新星星海自研服务器真正为云而生...
- ZedBoard学习(5)-ZedBoard和System Generator
- 第二阶段冲刺第四天(6月3号)
- wordpress 根据文章ID获取分类ID和标签ID
- Flutter Image 图片加载
- (转)查理·芒格:光靠已有的知识,你走不了多远
- 电子科技大学《图论及其应用》复习(史上最全汇总)
- 常见问题数组索引越界异常
- 网上图书商城项目学习笔记-022易宝支付
- Greenplum Python工具库gpload学习——gpload类
- 对封装、继承、多态的简单理解
- 直饮净水器什么牌子好,净水器评测
- dpdk example——l3fwd
- Spring学习笔记(一):眼见为实,先上一个简单例子
- MySQL 存储引擎 (SphinxSE)安装详解
- 【Matlab仿真模型】短时交通流量预测模型
- CF715B complete the gragh
热门文章
- 第一序列任小粟的能力_第一序列:上进青年任小粟得知世界真相后,却加倍强迫六元学习?...
- 小米全面对标iPhone
- php 算生存曲线,生存曲线(三):统计分析方法这么多,到底选哪个?
- 来谈谈O2O线上线下电商解决方案
- 东北大学C语言课程题库题解专栏目录
- asp.net服务器之间文件,aspnet 服务器文件
- 【小程序】微信小程序开发技巧实用手册(自己做笔记用)
- 深度!推动“企业上云”光有补贴不够,还要看这5点
- 正确选择餐具,健康美好生活
- PyScripter could not load a Python engine解决方案