Bounding-box regression

Basics(RCNN)

Mainly refer to appendix C in RCNN paper.

bbox regression 是一种针对bbox的机器学习回归问题

input

{ P i , G i } i = 1 , 2 , … , N \{P^i, G^i\}_{i=1,2,\dots,N} {Pi,Gi}i=1,2,…,N​

where P i = ( P x i , P y i , P w i , P h i , ) P^i=(P^i_x,P^i_y,P^i_w,P^i_h,) Pi=(Pxi​,Pyi​,Pwi​,Phi​,) specifies the pixel coordinates of the center of proposal P i P^i Pi’s bounding box together with P i P^i Pi’s width and height in pixels.

Hence forth, we drop the superscript i unless it is needed.
Each ground-truth bounding box G G G is specified in the same way: G = ( G x , G y , G w , G h ) G=(G_x,G_y,G_w,G_h) G=(Gx​,Gy​,Gw​,Gh​)

P P P is proposal bounding box, 由于RCNN的是在region poposal的基础上做的回归,所以自然引入了 P P P作为回归的初始值。同时也可以是认为是传统的滑动窗口检测思想的延伸。

goal

Our goal is to learn a transformation that maps a proposed box P to a ground-truth box G G G.

未知模型 f : P → G f: P \to G f:P→G, 求 g g g, g : P → G ^ g: P \to \hat G g:P→G^, 使 g ≈ f g \thickapprox f g≈f

model

We parameterize the transformation in terms of four functions d x ( P ) , d y ( P ) , d w ( P ) , d h ( P ) d_x(P), dy_(P), d_w(P), d_h(P) dx​(P),dy(​P),dw​(P),dh​(P). The first two specify a scale-invariant translation of the center of P P P’s bounding box, while the second two specify log-space translations of the width and height of P’s bounding box.

After learning these functions, we can transform an input proposal P P P into a predicted ground-truth box G ^ \hat G G^ by applying the transformation

G x ^ = P w d x ( P ) + P w G y ^ = P h d y ( P ) + P h G w ^ = P w e x p ( d w ( P ) ) G h ^ = P h e x p ( d h ( P ) ) \begin{aligned} \hat{G_x} &= P_wd_x(P) + P_w \\ \hat{G_y} &= P_hd_y(P) + P_h \\ \hat{G_w} &= P_wexp(d_w(P)) \\ \hat{G_h} &= P_hexp(d_h(P)) \\ \end{aligned} Gx​^​Gy​^​Gw​^​Gh​^​​=Pw​dx​(P)+Pw​=Ph​dy​(P)+Ph​=Pw​exp(dw​(P))=Ph​exp(dh​(P))​

where d ∗ ( P ) = d ∗ ( P , Φ ( P ) ) = w ∗ T Φ ( P ) d_*(P)=d_*(P, \varPhi(P))=w_*^T\varPhi(P) d∗​(P)=d∗​(P,Φ(P))=w∗T​Φ(P), Φ ( P ) \varPhi(P) Φ(P) is the feature decided by P P P, w ∗ T w_*^T w∗T​ is weight to be learned, ∗ ∈ { x , y , w , h } * \in \{x,y,w,h\} ∗∈{x,y,w,h}, e x p ( x ) = e x exp(x)=e^x exp(x)=ex.

注意,不同于分类问题使用feature map上的所有特征,bbox regression只使用由 P P P决定的局部特征。

It is easy to get
d x = ( G ^ x − P x ) / P w d y = ( G ^ y − P y ) / P h d w = l o g ( G ^ w / P w ) d h = l o g ( G ^ h / P h ) \begin{aligned} d_x &=(\hat{G}_x-P_x)/P_w \\ d_y &=(\hat{G}_y-P_y)/P_h \\ d_w &=log(\hat{G}_w/P_w) \\ d_h &=log(\hat{G}_h/P_h) \\ \end{aligned} dx​dy​dw​dh​​=(G^x​−Px​)/Pw​=(G^y​−Py​)/Ph​=log(G^w​/Pw​)=log(G^h​/Ph​)​
scale-invariant translation

特征提取应该具有尺度不变性,即不同尺度的同一物体应得到相同的特征 d ( P ) d(P) d(P), 而 P P P的尺度随着物体尺度变化而变化(对于RCNN),从而尺度不变的 d x ( P ) , d y ( P ) d_x(P), d_y(P) dx​(P),dy​(P)能得到准确的 G ^ \hat G G^。

log-space (width/height) translation

猜测log-space使 δ w , δ h \delta_w,\delta_h δw​,δh​ 与 δ x , δ y \delta_x,\delta_y δx​,δy​在数值上比较接近从而在loss中的贡献也比较接近。

optimize objective

w ∗ = a r g m i n w ^ ∗ ∑ i = 1 N L ( δ ∗ i ) + λ R ( w ^ ∗ ) = a r g m i n w ^ ∗ ∑ i = 1 N L [ t ∗ i − d ∗ i ( P ) ] + λ R ( w ^ ∗ ) = a r g m i n w ^ ∗ ∑ i = 1 N L [ t ∗ i − w ^ ∗ T Φ ( P i ) ] + λ R ( w ^ ∗ ) \begin{aligned} w_* &= argmin_{\hat{w}_*} \sum_{i=1}^N L(\delta_*^i) + \lambda R(\hat {w}_*)\\ &= argmin_{\hat{w}_*} \sum_{i=1}^N L[t_*^i - d_*^i(P)] + \lambda R(\hat {w}_*) \\ &= argmin_{\hat{w}_*} \sum_{i=1}^N L[t_*^i - \hat {w}_*^T\varPhi(P^i)] + \lambda R(\hat {w}_*) \\ \end{aligned} w∗​​=argminw^∗​​i=1∑N​L(δ∗i​)+λR(w^∗​)=argminw^∗​​i=1∑N​L[t∗i​−d∗i​(P)]+λR(w^∗​)=argminw^∗​​i=1∑N​L[t∗i​−w^∗T​Φ(Pi)]+λR(w^∗​)​

where L L L is the loss function, R R R is the regularization function.

The regression targets t ∗ t_* t∗​ for the training pair ( P , G ) (P,G) (P,G) are defined as
t x = ( G x − P x ) / P w t y = ( G y − P y ) / P h t w = l o g ( G w / P w ) t h = l o g ( G h / P h ) \begin{aligned} t_x &=(G_x-P_x)/P_w \\ t_y &=(G_y-P_y)/P_h \\ t_w &=log(G_w/P_w) \\ t_h &=log(G_h/P_h) \\ \end{aligned} tx​ty​tw​th​​=(Gx​−Px​)/Pw​=(Gy​−Py​)/Ph​=log(Gw​/Pw​)=log(Gh​/Ph​)​
It is easy to get
δ x = ( G x − G ^ x ) / P w δ y = ( G y − G ^ y ) / P h δ w = l o g ( G w / G ^ w ) δ h = l o g ( G h / G ^ h ) \begin{aligned} \delta_x &=(G_x-\hat{G}_x)/P_w \\ \delta_y &=(G_y-\hat{G}_y)/P_h \\ \delta_w &=log(G_w/\hat{G}_w) \\ \delta_h &=log(G_h/\hat{G}_h) \\ \end{aligned} δx​δy​δw​δh​​=(Gx​−G^x​)/Pw​=(Gy​−G^y​)/Ph​=log(Gw​/G^w​)=log(Gh​/G^h​)​

…care must be taken when selecting which training pairs ( P , G ) (P,G) (P,G) to use. Intuitively, if $P $ is far from all ground-truth boxes, then the task of transforming P P P to a ground-truth box G G G does not make sense.

Faster RCNN

BBox regression of RPN is a variant of Basic bbox regression.

Region Proposal Network (RPN)

This architecture is naturally implemented with an n×n convolutional layer followed by two sibling 1 × 1 convolutional layers (for reg and cls, respectively).

Translation-Invariant Anchors

Multi-Scale Anchors as Regression References

P P P is equivalent to anchor box here, so anchors are proposal/references.

Φ ( P ) \varPhi (P) Φ(P) is 1 x 1 x C’ feature at the anchor position on the intermediate layer, and w ∗ w_* w∗​ is a 1 x 1 x C’ convolution kernal.

SSD

Bbox regression in SSD is a simplified version of Faster RCNN bbox regression.

The main difference between them is:

  1. SSD remove the intermediate layer and use 3x3 convolution in cls/reg layer. In MobileNet-SSD, use 1x1 convolution in cls/reg layer.
  2. SSD only predict k scores for foreground in cls layer, background is also predicted but not use.

P P P is called prior box or default box. but actually equivalent to anchor box.

Bounding-box regression in RCNN/Faster-RCNN/SSD相关推荐

  1. Bounding box regression RCNN我的理解

    0. bounding-box regression bouding-box regression 在R-CNN论文附录C中有详细的介绍,在后续的论文Fast-RCNN.Faster-RCNN.Mas ...

  2. Bounding box regression RCNN系列网络中矩形框的计算

    0. bounding-box regression bouding-box regression 在R-CNN论文附录C中有详细的介绍,在后续的论文Fast-RCNN.Faster-RCNN.Mas ...

  3. 目标检测(Object Detection)综述--R-CNN/Fast R-CNN/Faster R-CNN/YOLO/SSD

    1. 目标检测 1.1 简介 如何理解一张图片?根据后续任务的需要,有三个主要的层次. 一是分类(Classification),即是将图像结构化为某一类别的信息,用事先确定好的类别(string)或 ...

  4. 目标检测方法系列:R-CNN, SPP, Fast R-CNN, Faster R-CNN, YOLO, SSD

    本文转载自: http://www.cnblogs.com/lillylin/p/6207119.html 目标检测方法系列--R-CNN, SPP, Fast R-CNN, Faster R-CNN ...

  5. 目标检测方法系列——R-CNN, SPP, Fast R-CNN, Faster R-CNN, YOLO, SSD

    目录 相关背景 从传统方法到R-CNN 从R-CNN到SPP Fast R-CNN Faster R-CNN YOLO SSD 总结 参考文献 推荐链接 相关背景 14年以来的目标检测方法(以R-CN ...

  6. Bounding box regression详解

    Bounding box regression详解 转载 http://blog.csdn.net/u011534057/article/details/51235964 Reference link ...

  7. 感知算法论文(八):Generalized Intersection over Union:A Metric and A Loss for Bounding Box Regression(2019)

    文章目录 摘要 引言 2. 相关工作 3. Generalized Intersection over Union 3.1 GIoU as Loss for Bounding Box Regressi ...

  8. RCNN,fast R-CNN,faster R-CNN

    转自:https://www.cnblogs.com/skyfsm/p/6806246.html object detection我的理解,就是在给定的图片中精确找到物体所在位置,并标注出物体的类别. ...

  9. RCNN SPPNet Fast R-CNN Faster R-CNN Cascade R-CNN

    文章目录 基础相关 selective search算法 端到端=end to end=joint learning共同学习 FPN 多尺度金字 自下而上:特征提取过程 RPN(Region Prop ...

  10. 计算机视觉知识点之RCNN/Fast RCNN/Faster RCNN

    Rcnn 第一步:输入图像,采用Selective Search 从原始图片中提取2000个左右区域候选框 第二步:划分区域提案,进行归一化:将所有候选框变为固定大小的(227*227)区域,对每个候 ...

最新文章

  1. redis设置允许远程访问
  2. mysql 数据字典详解_InnoDB数据字典详解-系统表
  3. 2005年4月全国计算机等级考试二级C语言笔试试题及答案
  4. 异常导致循环退出_Java异常有哪些?异常怎么处理?
  5. c++ log函数_19 种损失函数,你能认识几个?
  6. 财贸企业实行国有民营机制的调查
  7. javascript单元测试:jasminejs 2.0的烦恼
  8. 线性插值改变图像尺寸_图像分割--gt;上采样的那些事
  9. 服务器LCD显示面板,DELL服务器2950的错误代码表(前LCD面板)
  10. stylus 迭代+插值实现css同类型不同值样式序列
  11. WPF中使用StackPanel,Border进行简单布局
  12. VC2012安装Opengl开发环境
  13. java计算机毕业设计教务排课系统MyBatis+系统+LW文档+源码+调试部署
  14. 采样定理的证明与推导
  15. UVALive - 4487 HDU3234 UVA12232 【带权并查集】 非常好的一道题!!!
  16. 【合天网安】利用sqlmap辅助手工注入
  17. 集线器 交换机 路由器关系
  18. NFV落地开花,CT厂商渐获运营商青睐
  19. matlab对比两个文件,比较两个文本文件、MAT-file、二进制文件、Zip 文件或文件夹...
  20. HTML表单控件的集合

热门文章

  1. 10款值得推荐的论坛系统源码
  2. Fluent的模型参数化(1)
  3. Yarn公平调度器[转自 AIMP平台wiki]
  4. 【转】一篇让读者恐怖、令微软害怕的文章
  5. 一看就懂的Docker Consul工具
  6. 输入你的生日,显示还有多少天到你的生日
  7. 把U盘FAT32转换成NTFS有什么坏处
  8. 数据库管理员密码的设置
  9. MATLAB 自定义函数拟合
  10. 游戏付费金额 —— 基于DC游戏数据(Brutal Age)