Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

Motivation and Conclusion

Sparse normal mean model (ϵ∼N(σ2In)\epsilon \sim N(\sigma^2I_n)ϵ∼N(σ2In​) but set σ2=1\sigma^2=1σ2=1):
y=θ+ϵ,ϵ∼N(0,In)y = \theta+\epsilon,\epsilon \sim N(0,I_n)y=θ+ϵ,ϵ∼N(0,In​)

A general form of shrinkage prior:
π(θ∣τ)=∏i=1n1τπ0(θiτ),τ∼π(τ)\pi(\theta|\tau) = \prod_{i=1}^n \frac{1}{\tau}\pi_0\left( \frac{\theta_i}{\tau} \right),\tau \sim \pi(\tau)π(θ∣τ)=i=1∏n​τ1​π0​(τθi​​),τ∼π(τ)

If π0\pi_0π0​ is mixture of Gaussian, the general form leads to local-global shrinkage:
θi∼N(λi2τ2),λi2∼π(λi2),τ∼π(τ)\theta_i \sim N(\lambda_i^2\tau^2),\lambda_i^2 \sim \pi(\lambda_i^2),\tau \sim \pi(\tau)θi​∼N(λi2​τ2),λi2​∼π(λi2​),τ∼π(τ)

Observation: about contraction rate
Let θ∗\theta^*θ∗ be true parameter, sss be the number of nonzero entries in θ\thetaθ, rnr_nrn​ denote the contraction rate.

Frequentest:
min⁡θ^max⁡θ∗∥θ^−θ∗∥=(2+o(1))slog⁡ns\min_{\hat \theta} \max_{\theta^*} \left\| \hat \theta - \theta^* \right\|=\sqrt{(2+o(1))s\log \frac{n}{s}}θ^min​θ∗max​∥∥∥​θ^−θ∗∥∥∥​=(2+o(1))slogsn​​
Bayesian:

  1. Dirichlet-Laplace prior: rn≍slog⁡nsr_n\asymp\sqrt{s\log \frac{n}{s}}rn​≍slogsn​​ when ∥θ∗∥≤slog⁡2ns\left\| \theta^*\right\| \le \sqrt{s} \log^2\frac{n}{s}∥θ∗∥≤s​log2sn​
  2. Horseshoe prior: rn=Mnslog⁡ns,asMn→∞r_n=M_n\sqrt{s\log \frac{n}{s}},\ as\ M_n \to \inftyrn​=Mn​slogsn​​, as Mn​→∞
  3. In general, polynomial decaying π0\pi_0π0​ leads to near optimal rate

Further questions: How order of polynomial decaying π0\pi_0π0​ affects contraction rate? How to choose τ\tauτ to achieve (near-) optimal contraction rate given polynomial decaying π0\pi_0π0​?

Contribution of this paper:

  1. If order of π0\pi_0π0​, say α≈1\alpha \approx 1α≈1, rn/2slog⁡ns≈1r_n/\sqrt{2s\log\frac{n}{s}} \approx 1rn​/2slogsn​​≈1 (Bayesian sharp minimaxity, Thm 2.1)
  2. Choosing τ\tauτ requires knowledge on s/ns/ns/n, so the author proposed a Beta modeling on τ\tauτ to avoid unknown information.

Questions not been covered

  1. How rnr_nrn​ changes w.r.t lim⁡nαn→1\lim_n \alpha_n \to 1limn​αn​→1?
  2. How about α=1\alpha = 1α=1 (Thm 2.1 breaks down)?
  3. Beyond contraction rate, how α\alphaα affects model selection?
  4. How α\alphaα affects contraction rate in linear regression setting?

Bayesian sharp minimaxity

Import conditions on model sparsity and π0\pi_0π0​

For simplicity, τ\tauτ is a deterministic value and θi\theta_iθi​s are mutually independent.


Remark 1: Conditions for τ\tauτ,

  1. τα−1≥(s/n)clog⁡(n/s)\tau^{\alpha-1}\ge (s/n)^c\sqrt{\log (n/s)}τα−1≥(s/n)clog(n/s)​, for some c∈(0,1+w/2)c \in (0,1+w/2)c∈(0,1+w/2). τ\tauτ cannot be too small, or θ\thetaθ will be over-shrunk.
  2. τα−1≺(s/n)α[log⁡(n/s)]α\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{\alpha}τα−1≺(s/n)α[log(n/s)]α. τ\tauτ cannot be too large, or θ\thetaθ will be insufficient shrunk.
  3. τα−1≺(s/n)α[log⁡(n/s)](1+α)/2\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{(1+\alpha)/2}τα−1≺(s/n)α[log(n/s)](1+α)/2. This is the condition for L1L_1L1​ contraction rate.

These conditions indicate α∈(1,1+w/2)\alpha \in (1,1+w/2)α∈(1,1+w/2) and www should be as small as possible.

Remark 2: E∗E^*E∗ is the expectation under true parameter θ∗\theta^*θ∗. Theoretical results indicate L2L_2L2​ contraction rate is not greater than O(slog⁡(n/s))O(\sqrt{s\log (n/s)})O(slog(n/s)​) and L1L_1L1​ contraction rate is not greater than O(slog⁡(n/s))O(s\sqrt{\log (n/s)})O(slog(n/s)​).

Remark 3: Note that log⁡(n/s)≺(n/s)c,∃c>0\log(n/s)\prec (n/s)^c,\exists c>0log(n/s)≺(n/s)c,∃c>0. This observation leads to Corollary 2.1 which unifies (2.1) and (2.2).


Remark 4: Corollary 2.1 indicates τ≍(s/n)c/(α−1)\tau \asymp (s/n)^{c/(\alpha-1)}τ≍(s/n)c/(α−1). Select c=α+δc=\alpha+\deltac=α+δ for very small δ>0\delta>0δ>0. So a good choice would be τ≍(s/n)(α+δ)/(α−1)\tau \asymp (s/n)^{(\alpha+\delta)/(\alpha-1)}τ≍(s/n)(α+δ)/(α−1). However, we don’t know sss. An alternative is τ≍(1/n)(α+δ)/(α−1)\tau \asymp (1/n)^{(\alpha+\delta)/(\alpha-1)}τ≍(1/n)(α+δ)/(α−1). Theorem 2.2 considers the properties of this alternative.


Remark 5: Conditions for τ\tauτ,

  1. τα−1≥(1/n)clog⁡(n/s)\tau^{\alpha-1}\ge (1/n)^c\sqrt{\log (n/s)}τα−1≥(1/n)clog(n/s)​, replace sss with 1
  2. τα−1≺(s/n)α[log⁡(n/s)](1+α)/2\tau^{\alpha-1}\prec (s/n)^{\alpha}[\log (n/s)]^{(1+\alpha)/2}τα−1≺(s/n)α[log(n/s)](1+α)/2

Theoretical results indicate L2L_2L2​ contraction rate is not greater than O(slog⁡(n))O(\sqrt{s\log (n)})O(slog(n)​) (sub-optimal) and L1L_1L1​ contraction rate is not greater than O(slog⁡(n))O(s\sqrt{\log (n)})O(slog(n)​). If log⁡(s)≺log⁡(n)\log(s) \prec \log(n)log(s)≺log(n), sub-optimal is asymptotically non-different from optimal. If s≍nc,c∈(0,1)s \asymp n^c,c \in (0,1)s≍nc,c∈(0,1), sub-optimal has the same order as optimal. If log⁡(s)∼log⁡(n)\log(s) \sim \log(n)log(s)∼log(n), sub-optimal is of greater order.

Remark 6: Theorems above are derived based on deterministic τ\tauτ. Now consider π(τ)\pi(\tau)π(τ). π(τ)\pi(\tau)π(τ) should shrink to zero but should not shrink to zero so fast because π(τ)\pi(\tau)π(τ) needs to assign a little density to (s/n)(α+δ)/(α−1)(s/n)^{(\alpha+\delta)/(\alpha-1)}(s/n)(α+δ)/(α−1). Theorem 3.1 provides sufficient conditions on τ\tauτ to guarantee (2.1) and (2.2).

Remark 7: The prior density of τ\tauτ is split into three parts: around zero, (s/n)(1+w/2)/(α−1)(s/n)^{(1+w/2)/(\alpha-1)}(s/n)(1+w/2)/(α−1) to (s/n)α/(α−1)(s/n)^{\alpha/(\alpha-1)}(s/n)α/(α−1), and greater than (s/n)α/(α−1)(s/n)^{\alpha/(\alpha-1)}(s/n)α/(α−1). The first part is very huge and the second part is minor. Assume the third part is decay to zero.

Remark 8: A possible choice of π(τ)\pi(\tau)π(τ) is Beta (which may be multi-modal), i.e. τ∼[Beta(1,n)]c,c∈(α/(α−1),(1+w/2)/(α−1))\tau \sim [Beta(1,n)]^c,c \in (\alpha/(\alpha-1),(1+w/2)/(\alpha-1))τ∼[Beta(1,n)]c,c∈(α/(α−1),(1+w/2)/(α−1)).

Remark 9: Note that the restriction on θ∗\theta^*θ∗ is a technique assumption. Without this assumption, it’s possible to achieve sub-optimal. See Theorem 3.2.

Paper Review: Bayesian Shrinkage towards Sharp Minimaxity相关推荐

  1. Paper Review: Bayesian Regularization and Prediction

    Paper Review: Bayesian Regularization and Prediction One-group Answers to Two-group questions Two-gr ...

  2. paper review : On Learning Associations of Faces and Voices

    文章目录 On Learning Associations of Faces and Voices Summary 摘要 (中文) Research Objective Background and ...

  3. 学术新秀采访-陆品燕~How To Get Your SIGGRAPH Paper Rejected

    from http://cbir.spaces.live.com 1.学术新秀采访-陆品燕 2.计算机系2007学术新秀朱军专访 3.How To Get Your SIGGRAPH Paper Re ...

  4. Shrinkage: I was in the pool

    有人建议谈论shrinkage的时候,不可以缺少这个名场面,也许可以拿来当你的Graphical Abstract. 嗯,好主意,这里就直接当了标题. Erik van Zwet和合作者最近一段时间( ...

  5. 如何写第一篇研究论文 How to Write Your First Research Paper

    How to Write Your First Research Paper Elena D. Kallestinova Author information ► Copyright and Lice ...

  6. CVPR 2011 全部论文标题和摘要

    CVPR 2011 Tian, Yuandong; Narasimhan, Srinivasa G.; , ■Rectification and 3D reconstruction of curved ...

  7. 部分算法与对应代码整理(R、Python)

    目录 1. 图像.人脸.OCR.语音相关算法整理 2. 机器学习与深度学习相关的R与Python库 (1)R General-Purpose Machine Learning Data Manipul ...

  8. 2020年国际学术会议参考列表

    IJAC年度重磅分享:2020重要国际学术会议列表,涵盖机器学习.人工智能.计算机视觉.模式识别.自动控制.机器人几大领域,部分未列入表格的会议,或未正式发布会讯,或为两年至三年举办一次.如会议网站无 ...

  9. 汇总 | 精选CVPR开源项目学习资源

    点击上方"视学算法",选择"星标" 干货第一时间送达 作者:Albert Lee https://zhuanlan.zhihu.com/p/142452685 ...

最新文章

  1. android 处理通话焦点,java – AUDIOFOCUS_LOSS在Android中打电话后打电话
  2. 你是否真正理解了泛型、通配符、类型擦除
  3. 树形DP求树的最小支配集,最小点覆盖,最大独立集
  4. 【图像超分辨率】遥感数据的高斯金字塔尺度上推方法研究
  5. 【项目合作】低清老视频转高清,视频超分辨
  6. csdn设置图片居中和尺寸
  7. 32位/64位机上常用数据类型字节数(C语言)
  8. linux lefse分析,科学网-linux本地化进行lefse分析-林国鹏的博文
  9. 给想立志入行网络或已经初入行的朋友的建议
  10. 运营商 sni 服务器,加密或者丢失:加密SNI的工作机制
  11. 深入剖解路由器的“心脏”技术
  12. 无限火力跳跳机器人_2021LOL无限火力机器人最强出装和天赋介绍
  13. LED发光二极管限流电阻的计算
  14. go FTP 文件传输
  15. 以太坊手续费详细分析
  16. A4纸和一寸照在屏幕的尺寸计算
  17. 苹果11是高通基带吗_iPhone11信号成最大问题,不支持5G还是英特尔基带,令人失望...
  18. 二极管(二):肖特基二极管
  19. mysql第一章试题_MySQL基础-第一章
  20. 腾讯股票接口API(1)——根据股票代码获取详情

热门文章

  1. 软件设计原则——里氏代换原则
  2. Use Batch Apex
  3. 网络爬虫(Web crawler)|| 爬虫入门程序
  4. Oracle 数据库设置最大进程数参数方法,oracle最大进程数满了处理方法,sysdba管理员登录报“maximum number of processes (150) exceeded“问题解决
  5. Python 技术篇-利用pdfkit库实现html格式文件转换PDF文档实例演示
  6. Python 技术篇-用PIL库实现等比例压缩、缩小图片实例演示
  7. C# 学习笔记(17)操作SQL Server 上
  8. CTFshow php特性 web128
  9. poj 2115 C Looooops(扩展欧几里德算法)
  10. [YTU]_2432 (C++习题 对象数组输入与输出)