Paper Review: Bayesian Regularization and Prediction

One-group Answers to Two-group questions

Two-group questions: I think this means two alternatives of βi=0\beta_i=0βi​=0 or βi≠0\beta_i \ne 0βi​​=0.
Two-group answers: decrete mixture priors of βi\beta_iβi​

Multiple Testing

y∣β∼N(β,σ2I),βp=(β1,⋯,βp)′y|\beta \sim N(\beta,\sigma^2I),\beta_p=(\beta_1,\cdots,\beta_p)'y∣β∼N(β,σ2I),βp​=(β1​,⋯,βp​)′

Decrete mixture priors of βi\beta_iβi​:
βi∼wg(βi)+(1−w)δ0\beta_i \sim wg(\beta_i)+(1-w)\delta_0βi​∼wg(βi​)+(1−w)δ0​

Marginal density of yyy under β=0\beta = 0β=0 and β≠0\beta\ne 0β​=0,
f0(y)=N(y∣0,σ2),f1(y)=∫N(y∣β,σ2)g(β)dβf_0(y)=N(y|0,\sigma^2),\ f_1(y)=\int N(y|\beta,\sigma^2)g(\beta)d\betaf0​(y)=N(y∣0,σ2), f1​(y)=∫N(y∣β,σ2)g(β)dβ

Interpret w(y)w(y)w(y) as posterior probability like P(β≠0)P(\beta \ne 0)P(β​=0) (yyy is a signal):
w(y)=P(β≠0∣y)=P(y,β≠0)P(y)=wf1(y)wf1(y)+(1−w)f0(y)w(y)=P(\beta \ne 0|y)=\frac{P(y,\beta \ne 0)}{P(y)} = \frac{wf_1(y)}{wf_1(y)+(1-w)f_0(y)}w(y)=P(β​=0∣y)=P(y)P(y,β​=0)​=wf1​(y)+(1−w)f0​(y)wf1​(y)​

Sparse Regression


Note that

  1. Ridge regression: l2l_2l2​-penalty
  2. LASSO: l1l_1l1​-penalty
  3. Cauchy prior: βi∼iidC(0,σi),i=1,⋯,p\beta_i\sim_{iid}C(0,\sigma_i),i=1,\cdots,pβi​∼iid​C(0,σi​),i=1,⋯,p
  4. Horseshoe prior (Handling Sparsity via the Horseshoe):

Loss function of penalized regression:
l(β)=∥y−Xβ∥22+ν∑i=1pψ(βi2)l(\beta) = \left\| y - X\beta \right\|_2^2+\nu \sum_{i=1}^p \psi(\beta_i^2)l(β)=∥y−Xβ∥22​+νi=1∑p​ψ(βi2​)

Equivalent to Y∼N(Xβ,σ2I)Y \sim N(X\beta,\sigma^2I)Y∼N(Xβ,σ2I) with prior π(βi∣ν)∝exp⁡(νψ(βi2))\pi(\beta_i|\nu)\propto\exp \left( \nu \psi( \beta_i^2) \right)π(βi​∣ν)∝exp(νψ(βi2​)). See posterior probability of β\betaβ:
π(β∣y,σ2,ν)∝1σexp⁡(−∥y−Xβ∥222σ2+ν∑i=1pψ(βi2))\pi(\beta|y,\sigma^2,\nu)\propto \frac{1}{\sigma} \exp\left( -\frac{ \left\| y - X\beta \right\|_2^2}{2\sigma^2} + \nu\sum_{i=1}^p \psi(\beta_i^2)\right)π(β∣y,σ2,ν)∝σ1​exp(−2σ2∥y−Xβ∥22​​+νi=1∑p​ψ(βi2​))

So maximization of loss function is equivalent to MAP pf posterior probability.

Global-local Shrinkage

Framework

Consider Y∼N(Xβ,σ2I)Y \sim N(X\beta,\sigma^2I)Y∼N(Xβ,σ2I) with priors
βi∣τ2,λi2∼N(0,τ2λi2)λi2∼π(λi2)(τ2,σ2)∼π(τ2,σ2)\beta_i|\tau^2,\lambda_i^2 \sim N(0,\tau^2\lambda_i^2) \\ \lambda_i^2 \sim \pi(\lambda^2_i) \\ (\tau^2,\sigma^2) \sim \pi(\tau^2,\sigma^2)βi​∣τ2,λi2​∼N(0,τ2λi2​)λi2​∼π(λi2​)(τ2,σ2)∼π(τ2,σ2)

Joint priors:
π(β,Λ,τ2,σ2)=∏i=1pN(0,τ2λi2)π(λi2)π(τ2,σ2)\pi(\beta,\Lambda,\tau^2,\sigma^2)=\prod_{i=1}^p N(0,\tau^2\lambda_i^2)\pi(\lambda^2_i)\pi(\tau^2,\sigma^2)π(β,Λ,τ2,σ2)=i=1∏p​N(0,τ2λi2​)π(λi2​)π(τ2,σ2)

Question: why (3)?

Transformation to orthogonal scheme under UUU: Z=XU,Z′Z=D,α=U′βZ = XU,\ Z'Z=D, \alpha = U'\betaZ=XU, Z′Z=D,α=U′β and set α∣Λ,τ2,σ2∼N(0,σ2τ2nD−1Λ)\alpha|\Lambda,\tau^2,\sigma^2\sim N(0,\sigma^2\tau^2nD^{-1}\Lambda)α∣Λ,τ2,σ2∼N(0,σ2τ2nD−1Λ) so β∣Λ,τ2,σ2∼N(0,σ2τ2nUD−1ΛU′)\beta|\Lambda,\tau^2,\sigma^2\sim N(0,\sigma^2\tau^2nUD^{-1}\Lambda U')β∣Λ,τ2,σ2∼N(0,σ2τ2nUD−1ΛU′).

Question: how to understand this?

Question: how to get this?

To squelch the noise and shrink coefficients

  1. smallτsmall\ \tausmall τ: π(τ2)\pi(\tau^2)π(τ2) concentration on zero
  2. λi2large\lambda_i^2\ largeλi2​ large: τ(λi2)\tau(\lambda_i^2)τ(λi2​) heavy tail

Properties: why good performance?

Robust tail

Question: how to understand η\etaη?



Efficiency

Global Variance Component

Never choose π(τ2,σ2)\pi(\tau^2,\sigma^2)π(τ2,σ2) that forces σ2\sigma^2σ2 away from zero.

Numerical Examples

Regularized Regression

Wavelet denoising

Paper Review: Bayesian Regularization and Prediction相关推荐

  1. Paper Review: Bayesian Shrinkage towards Sharp Minimaxity

    Paper Review: Bayesian Shrinkage towards Sharp Minimaxity Motivation and Conclusion Sparse normal me ...

  2. #Paper reading#DeepInf: Social Influence Prediction with Deep Learning

    #Paper reading# DeepInf: Social Influence Prediction with Deep Learning 设计了一个端到端的框架DeepInf,研究用户层面的社会 ...

  3. paper review : On Learning Associations of Faces and Voices

    文章目录 On Learning Associations of Faces and Voices Summary 摘要 (中文) Research Objective Background and ...

  4. 【Paper】A Review of Data-Driven Building Energy Consumption Prediction Studies

    论文原文:https://www.sciencedirect.com/science/article/pii/S1364032117306093 论文年份:2018 论文被引:351(2020/08/ ...

  5. Paper reading (四十四): Machine learning methods for metabolic pathway prediction

    论文题目:Machine learning methods for metabolic pathway prediction scholar 引用:149 页数:14 发表时间:2010.01 发表刊 ...

  6. ICLR2020国际会议焦点论文(Spotlight Paper)列表(内含论文源码)

    来源:AINLPer微信公众号(点击了解一下吧) 编辑: ShuYini 校稿: ShuYini 时间: 2020-02-21     2020年的ICLR会议将于今年的4月26日-4月30日在Mil ...

  7. What is the difference between L1 and L2 regularization?

    今天讨论班一个师姐讲到L1 norm还有L2 norm 的regularization问题,还有晚上和一个同学也讨论到这个问题,具体什么时候用L1,什么时候用L2,论文上写道一般当成分中有几个成分是p ...

  8. 2019_ICDM_DeepTrust: A Deep User Model of Homophily Effect for Trust Prediction

    [论文阅读笔记]2019_ICDM_DeepTrust: A Deep User Model of Homophily Effect for Trust Prediction 论文下载地址: 10.1 ...

  9. 学术新秀采访-陆品燕~How To Get Your SIGGRAPH Paper Rejected

    from http://cbir.spaces.live.com 1.学术新秀采访-陆品燕 2.计算机系2007学术新秀朱军专访 3.How To Get Your SIGGRAPH Paper Re ...

最新文章

  1. 拷贝构造函数和赋值函数的一些知识
  2. 排序算法 时间复杂度+空间复杂度 总结
  3. hdu2482 字典树+spfa
  4. 再学 GDI+[83]: TGPImage(3) - 平行四边形变换
  5. 51Nod - 1381 硬币游戏
  6. 程序员找工作那点事儿
  7. O2O休闲零食品类白皮书
  8. saltstack的探索-利用脚本增加用户
  9. php 判断是否是日文,php正则判断中文韩文的编码的例子
  10. 那些高中时曾经背得烂熟的古文(滕王阁序,阿房宫赋, 兰亭集序 , 师说,蜀道难 ...)再一次读读吧,慢慢的读,突然很想哭...有些岁月果真不曾忘怀
  11. 鸿蒙电脑系统连不上打印机,电脑与打印机脱机怎么连接
  12. 做软件测试有前途么?
  13. 追剧人的福利来了,这几款APP让你痛快追剧
  14. 【单片机基础】C51语言基础
  15. html页面导出pdf截断问题,前端导出pdf以及导出内容截断的解决办法
  16. 5个方法助设计师保持创造力
  17. JeecgBoot学习
  18. 工作管理工具|职场人提升工作效率必备的五大工具(项目管理必备)
  19. excel口令密码如何破解
  20. 网件A6210抓包驱动安装及omnipeek抓包

热门文章

  1. 机器学习知识点(二)各类型平均数Java实现
  2. (转载)JavaScript一些实用技巧(http://it.chinawin.net/softwaredev/article-261f.html)
  3. java网页制作教程_JavaWeb程序设计任务教程
  4. 页面导航的两种方式——声明式导航、编程式导航||vue-router编程式导航||router.push() 方法的参数规则
  5. PyQt5 技术篇-调用颜色对话框(QColorDialog)获取颜色,调色板的调用。
  6. fitype拟合多参数函数和遗传算法拟合多参数函数
  7. dosbox更新加载的文件夹
  8. sublime配置运行python文件的快捷键
  9. hdu 1588 Gauss Fibonacci 较难
  10. 查询mysql存储数据大小_MySQL如何查询数据占用存储空间的大小?