Paper Review: Bayesian Regularization and Prediction
Paper Review: Bayesian Regularization and Prediction
One-group Answers to Two-group questions
Two-group questions: I think this means two alternatives of βi=0\beta_i=0βi=0 or βi≠0\beta_i \ne 0βi=0.
Two-group answers: decrete mixture priors of βi\beta_iβi
Multiple Testing
y∣β∼N(β,σ2I),βp=(β1,⋯,βp)′y|\beta \sim N(\beta,\sigma^2I),\beta_p=(\beta_1,\cdots,\beta_p)'y∣β∼N(β,σ2I),βp=(β1,⋯,βp)′
Decrete mixture priors of βi\beta_iβi:
βi∼wg(βi)+(1−w)δ0\beta_i \sim wg(\beta_i)+(1-w)\delta_0βi∼wg(βi)+(1−w)δ0
Marginal density of yyy under β=0\beta = 0β=0 and β≠0\beta\ne 0β=0,
f0(y)=N(y∣0,σ2),f1(y)=∫N(y∣β,σ2)g(β)dβf_0(y)=N(y|0,\sigma^2),\ f_1(y)=\int N(y|\beta,\sigma^2)g(\beta)d\betaf0(y)=N(y∣0,σ2), f1(y)=∫N(y∣β,σ2)g(β)dβ
Interpret w(y)w(y)w(y) as posterior probability like P(β≠0)P(\beta \ne 0)P(β=0) (yyy is a signal):
w(y)=P(β≠0∣y)=P(y,β≠0)P(y)=wf1(y)wf1(y)+(1−w)f0(y)w(y)=P(\beta \ne 0|y)=\frac{P(y,\beta \ne 0)}{P(y)} = \frac{wf_1(y)}{wf_1(y)+(1-w)f_0(y)}w(y)=P(β=0∣y)=P(y)P(y,β=0)=wf1(y)+(1−w)f0(y)wf1(y)
Sparse Regression
Note that
- Ridge regression: l2l_2l2-penalty
- LASSO: l1l_1l1-penalty
- Cauchy prior: βi∼iidC(0,σi),i=1,⋯,p\beta_i\sim_{iid}C(0,\sigma_i),i=1,\cdots,pβi∼iidC(0,σi),i=1,⋯,p
- Horseshoe prior (Handling Sparsity via the Horseshoe):
Loss function of penalized regression:
l(β)=∥y−Xβ∥22+ν∑i=1pψ(βi2)l(\beta) = \left\| y - X\beta \right\|_2^2+\nu \sum_{i=1}^p \psi(\beta_i^2)l(β)=∥y−Xβ∥22+νi=1∑pψ(βi2)
Equivalent to Y∼N(Xβ,σ2I)Y \sim N(X\beta,\sigma^2I)Y∼N(Xβ,σ2I) with prior π(βi∣ν)∝exp(νψ(βi2))\pi(\beta_i|\nu)\propto\exp \left( \nu \psi( \beta_i^2) \right)π(βi∣ν)∝exp(νψ(βi2)). See posterior probability of β\betaβ:
π(β∣y,σ2,ν)∝1σexp(−∥y−Xβ∥222σ2+ν∑i=1pψ(βi2))\pi(\beta|y,\sigma^2,\nu)\propto \frac{1}{\sigma} \exp\left( -\frac{ \left\| y - X\beta \right\|_2^2}{2\sigma^2} + \nu\sum_{i=1}^p \psi(\beta_i^2)\right)π(β∣y,σ2,ν)∝σ1exp(−2σ2∥y−Xβ∥22+νi=1∑pψ(βi2))
So maximization of loss function is equivalent to MAP pf posterior probability.
Global-local Shrinkage
Framework
Consider Y∼N(Xβ,σ2I)Y \sim N(X\beta,\sigma^2I)Y∼N(Xβ,σ2I) with priors
βi∣τ2,λi2∼N(0,τ2λi2)λi2∼π(λi2)(τ2,σ2)∼π(τ2,σ2)\beta_i|\tau^2,\lambda_i^2 \sim N(0,\tau^2\lambda_i^2) \\ \lambda_i^2 \sim \pi(\lambda^2_i) \\ (\tau^2,\sigma^2) \sim \pi(\tau^2,\sigma^2)βi∣τ2,λi2∼N(0,τ2λi2)λi2∼π(λi2)(τ2,σ2)∼π(τ2,σ2)
Joint priors:
π(β,Λ,τ2,σ2)=∏i=1pN(0,τ2λi2)π(λi2)π(τ2,σ2)\pi(\beta,\Lambda,\tau^2,\sigma^2)=\prod_{i=1}^p N(0,\tau^2\lambda_i^2)\pi(\lambda^2_i)\pi(\tau^2,\sigma^2)π(β,Λ,τ2,σ2)=i=1∏pN(0,τ2λi2)π(λi2)π(τ2,σ2)
Question: why (3)?
Transformation to orthogonal scheme under UUU: Z=XU,Z′Z=D,α=U′βZ = XU,\ Z'Z=D, \alpha = U'\betaZ=XU, Z′Z=D,α=U′β and set α∣Λ,τ2,σ2∼N(0,σ2τ2nD−1Λ)\alpha|\Lambda,\tau^2,\sigma^2\sim N(0,\sigma^2\tau^2nD^{-1}\Lambda)α∣Λ,τ2,σ2∼N(0,σ2τ2nD−1Λ) so β∣Λ,τ2,σ2∼N(0,σ2τ2nUD−1ΛU′)\beta|\Lambda,\tau^2,\sigma^2\sim N(0,\sigma^2\tau^2nUD^{-1}\Lambda U')β∣Λ,τ2,σ2∼N(0,σ2τ2nUD−1ΛU′).
Question: how to understand this?
Question: how to get this?
To squelch the noise and shrink coefficients
- smallτsmall\ \tausmall τ: π(τ2)\pi(\tau^2)π(τ2) concentration on zero
- λi2large\lambda_i^2\ largeλi2 large: τ(λi2)\tau(\lambda_i^2)τ(λi2) heavy tail
Properties: why good performance?
Robust tail
Question: how to understand η\etaη?
Efficiency
Global Variance Component
Never choose π(τ2,σ2)\pi(\tau^2,\sigma^2)π(τ2,σ2) that forces σ2\sigma^2σ2 away from zero.
Numerical Examples
Regularized Regression
Wavelet denoising
Paper Review: Bayesian Regularization and Prediction相关推荐
- Paper Review: Bayesian Shrinkage towards Sharp Minimaxity
Paper Review: Bayesian Shrinkage towards Sharp Minimaxity Motivation and Conclusion Sparse normal me ...
- #Paper reading#DeepInf: Social Influence Prediction with Deep Learning
#Paper reading# DeepInf: Social Influence Prediction with Deep Learning 设计了一个端到端的框架DeepInf,研究用户层面的社会 ...
- paper review : On Learning Associations of Faces and Voices
文章目录 On Learning Associations of Faces and Voices Summary 摘要 (中文) Research Objective Background and ...
- 【Paper】A Review of Data-Driven Building Energy Consumption Prediction Studies
论文原文:https://www.sciencedirect.com/science/article/pii/S1364032117306093 论文年份:2018 论文被引:351(2020/08/ ...
- Paper reading (四十四): Machine learning methods for metabolic pathway prediction
论文题目:Machine learning methods for metabolic pathway prediction scholar 引用:149 页数:14 发表时间:2010.01 发表刊 ...
- ICLR2020国际会议焦点论文(Spotlight Paper)列表(内含论文源码)
来源:AINLPer微信公众号(点击了解一下吧) 编辑: ShuYini 校稿: ShuYini 时间: 2020-02-21 2020年的ICLR会议将于今年的4月26日-4月30日在Mil ...
- What is the difference between L1 and L2 regularization?
今天讨论班一个师姐讲到L1 norm还有L2 norm 的regularization问题,还有晚上和一个同学也讨论到这个问题,具体什么时候用L1,什么时候用L2,论文上写道一般当成分中有几个成分是p ...
- 2019_ICDM_DeepTrust: A Deep User Model of Homophily Effect for Trust Prediction
[论文阅读笔记]2019_ICDM_DeepTrust: A Deep User Model of Homophily Effect for Trust Prediction 论文下载地址: 10.1 ...
- 学术新秀采访-陆品燕~How To Get Your SIGGRAPH Paper Rejected
from http://cbir.spaces.live.com 1.学术新秀采访-陆品燕 2.计算机系2007学术新秀朱军专访 3.How To Get Your SIGGRAPH Paper Re ...
最新文章
- 拷贝构造函数和赋值函数的一些知识
- 排序算法 时间复杂度+空间复杂度 总结
- hdu2482 字典树+spfa
- 再学 GDI+[83]: TGPImage(3) - 平行四边形变换
- 51Nod - 1381 硬币游戏
- 程序员找工作那点事儿
- O2O休闲零食品类白皮书
- saltstack的探索-利用脚本增加用户
- php 判断是否是日文,php正则判断中文韩文的编码的例子
- 那些高中时曾经背得烂熟的古文(滕王阁序,阿房宫赋, 兰亭集序 , 师说,蜀道难 ...)再一次读读吧,慢慢的读,突然很想哭...有些岁月果真不曾忘怀
- 鸿蒙电脑系统连不上打印机,电脑与打印机脱机怎么连接
- 做软件测试有前途么?
- 追剧人的福利来了,这几款APP让你痛快追剧
- 【单片机基础】C51语言基础
- html页面导出pdf截断问题,前端导出pdf以及导出内容截断的解决办法
- 5个方法助设计师保持创造力
- JeecgBoot学习
- 工作管理工具|职场人提升工作效率必备的五大工具(项目管理必备)
- excel口令密码如何破解
- 网件A6210抓包驱动安装及omnipeek抓包
热门文章
- 机器学习知识点(二)各类型平均数Java实现
- (转载)JavaScript一些实用技巧(http://it.chinawin.net/softwaredev/article-261f.html)
- java网页制作教程_JavaWeb程序设计任务教程
- 页面导航的两种方式——声明式导航、编程式导航||vue-router编程式导航||router.push() 方法的参数规则
- PyQt5 技术篇-调用颜色对话框(QColorDialog)获取颜色,调色板的调用。
- fitype拟合多参数函数和遗传算法拟合多参数函数
- dosbox更新加载的文件夹
- sublime配置运行python文件的快捷键
- hdu 1588 Gauss Fibonacci 较难
- 查询mysql存储数据大小_MySQL如何查询数据占用存储空间的大小?