[paper reading] CenterNet (Object as Points)
[paper reading] CenterNet (Object as Points)
GitHub:Notes of Classic Detection Papers
2020.11.09更新:更新了Use Yourself,即对于本文的理解和想法,详情参见GitHub:Notes-of-Classic-Detection-Papers
本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到CSDN,但是格式也有些乱
强烈建议去GitHub上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star!
topic | motivation | technique | key element | math | use yourself | relativity |
---|---|---|---|---|---|---|
CenterNet (Object as Points) |
Problem to Solve Idea |
CenterNet Architecture |
Center Point & Anchor Getting Ground-Truth Model Output Data Augmentation Inference TTA Compared with SOTA Additional Experiments |
Loss Function KeyPoint Loss Lk\text{L}_kLk Offset Loss Loff\text{L}_{off}Loff Size Loss Lsize\text{L}_{size}Lsize |
…… |
Anchor-Based KeyPoint-Based |
文章目录
- [paper reading] CenterNet (Object as Points)
- Motivation
- Problem to Solve
- Idea
- Technique
- CenterNet Architecture
- Components
- Advantage
- Key Element
- Center Point & Anchor
- Connection
- Difference
- Getting Ground-Truth
- Keypoint Ground-Truth
- Ground-Truth:Input Image ==> Output Feature Map
- Gaussian Penalty Reduction
- Size Ground-Truth
- Model Output
- Data Augmentation
- Inference
- TTA
- Compared with SOTA
- Additional Experiments
- Center Point Collision
- NMS
- Training & Testing Resolution
- Regression Loss
- Bounding Box Size Weight
- Training Schedule
- Math
- Symbol Definition
- Loss Function
- KeyPoint Loss Lk\text{L}_kLk
- Offset Loss Loff\text{L}_{off}Loff
- Size Loss Lsize\text{L}_{size}Lsize
- Use Yourself
- Related work
- Anchor-Based Method
- Essence
- Two-Stage Method
- One-Stage Method
- Post-Processing(NMS)
- KeyPoint-Based Method
- Essence
- CornerNet
- ExtremeNet
- Drawback
Motivation
Problem to Solve
anchor-based method有以下的缺点:
wasteful & inefficient:
需要对object进行饱和式检测(饱和式地列出object的潜在位置)
need post-processing(e.g. NMS)
Idea
从本质上讲:
将Object Detection转化为Standard Keypoint Estimation
从思路上讲:
使用bounding box的center point表示一个object
从具体流程上讲:
使用keypoint estimation寻找center point,并根据center point回归其他的属性(因为其他的属性都和center point存在确定的数学关系)
Technique
CenterNet Architecture
Components
Backbone
Stacked Hourglass Network
详见 [CornerNet](./[paper reading] RetinaNet.md)
Upconvolutional Residual Netwotk
Deep Layer Aggregation(DLA)
Task-Specific Modality
- 1 个 3×3 Convolution
- ReLU
- 1 个 1×1 Convolution
Advantage
simpler & faster & accurate
end-to-end differential
所有的输出都是直接从keypoint estimation network输出,不需要NMS(以及其他post-processing)
Peak Keypoint Extraction由 3×3Max Pooling3×3 \ \text{Max Pooling}3×3 Max Pooling 实现,足够用来替换NMS
estimate additional object properties in one single forward pass
在单次前向传播中,可以估计出多种object properties
Key Element
Center Point & Anchor
Connection
center point可以看作是shape-agnostic anchor(形状不可知的anchor)
Difference
center point仅仅与location有关(与box overlap无关)
即:不需要手动设置foreground和background的threshold
每个object仅对应1个center point
直接在keypoint heatmap上提取local peak,不存在重复检测的问题
CenterNet有更大的输出分辨率
降采样步长为4(常见为16)
Getting Ground-Truth
详见 [Symbol Definition](#Symbol Definition)
Keypoint Ground-Truth
Ground-Truth:Input Image ==> Output Feature Map
- p∈R2p \in \mathcal{R}^2p∈R2 :ground-truth keypoint
- p~=⌊pR⌋\widetilde{p} = \lfloor\frac pR \rfloorp=⌊Rp⌋ :low-resolution equivalent
将image的ground-truth keypoint ppp 映射为output feature map上ground-truth keypoint p~\widetilde pp
p~=⌊pR⌋\widetilde{p} = \lfloor\frac pR \rfloor p=⌊Rp⌋
Gaussian Penalty Reduction
Yxyc=exp(−(x−p~x)2+(y−p~y)22σp2)Y_{x y c}=\exp \left(-\frac{\left(x-\tilde{p}_{x}\right)^{2}+\left(y-\tilde{p}_{y}\right)^{2}}{2 \sigma_{p}^{2}}\right) Yxyc=exp(−2σp2(x−p~x)2+(y−p~y)2)
- σp\sigma_{p}σp :object size-adaptive的标准差
如果同一个类别的2个Gaussian发生重叠,则取element-wise maximum
keypoint heatmap:
Y^∈[0,1]WR×HR×C\hat{Y}\in[0,1]^{\frac{W}{R}×\frac HR×C} Y^∈[0,1]RW×RH×C
- Y^x,y,c=1\hat Y _{x,y,c} =1Y^x,y,c=1 ==> keypoint
- Y^x,y,c=0\hat Y _{x,y,c} =0Y^x,y,c=0 ==> background
注意:这里的center是bounding box的几何中心,即center到左右边和上下边的距离是相等的
Size Ground-Truth
bounding box 用4个点表示(第 kkk 个object,类别为 ckc_kck):
(x1(k),y1(k),x2(k),y2(k))(x_1^{(k)}, y_1^{(k)}, x_2^{(k)}, y_2^{(k)}) (x1(k),y1(k),x2(k),y2(k))
Center 表示为:
pk=(x1(k)+x2(k)2,y1(k)+y2(k)2)p_k = \big( \frac{x_1^{(k)} + x_2^{(k)} }{2} , \frac{y_1^{(k)} + y_2^{(k)} }{2} \big) pk=(2x1(k)+x2(k),2y1(k)+y2(k))
Size Ground-Truth 表示为:
sk=(x2(k)−x1(k),y2(k)−y1(k))s_k = \big(x_2^{(k)} - x_1^{(k)}, y_2^{(k)}-y_1^{(k)} \big) sk=(x2(k)−x1(k),y2(k)−y1(k))
注意:不对scale进行归一化,而是直接使用raw pixel coordinate
Model Output
Input & Output Resolution:
- 512×512
- 128×128
所有的输出共享一个共用的全卷积网络
- keypoint Y^\hat YY^ ==> CCC
- offset O^\hat OO^ ==> 2
- size S^\hat SS^ ==> 2
即:每个location都有C+4个output
对于each modality,在将feature经过:
- 1 个 3×3 Convolution
- ReLU
- 1 个 1×1 Convolution
Data Augmentation
random flip
random scaling(0.6~1.3)
cropping
color jittering
Inference
CenterNet的Inference是single network forward pass
将image输入backbone(e.g. FCN),得到3个输出:
keypoint Y^\hat YY^ ==> CCC
heatmap的peak对应object的center(取top-100)
peak的判定:值 ≥\ge≥ 其8个邻居
offset O^\hat OO^ ==> 2
size S^\hat SS^ ==> 2
根据keypoint Y^\hat YY^、 offset O^\hat OO^、size S^\hat SS^ 计算bounding box
- (δx^i,δx^i)=O^x^i,y^i(\delta \hat x_i, \delta \hat x_i) = \hat O_{\hat x_i, \hat y_i}(δx^i,δx^i)=O^x^i,y^i :offset prediction
- (w^i,h^i)=S^x^i,y^i( \hat w_i, \hat h_i) = \hat S _{\hat x_i, \hat y_i}(w^i,h^i)=S^x^i,y^i :size prediction
计算keypoint的confidence:keypoint对应位置的value
Y^xi,yic\hat Y_{x_i,y_ic} Y^xi,yic
TTA
有3种TTA方式:
no augmentation
flip augmentation
flip:在decoding之前,进行output average
flip & multi-scale(0.5,0.75,1,1.25,1.5)
multi-scale:使用NMS对结果进行聚合
Compared with SOTA
Additional Experiments
Center Point Collision
多个object经过下采样,其center keypoint有可能重叠
CenterNet可以减少Center Keypoint的冲突
NMS
CenterNet使用了NMS提升很小,说明CenterNet不需要NMS
Training & Testing Resolution
- 低分辨率速度最快但是精度最差
- 高分辨率精度提高,但速度降低
- 原尺寸速度略高于高分辨率,但速度略慢
Regression Loss
smooth L1 Loss的效果略差于L1 Loss
Bounding Box Size Weight
λsize\lambda_{size}λsize 为0.1时最佳,增大时AP快速衰减,减小时鲁棒
Training Schedule
训练时间更长,效果更好
Math
Symbol Definition
- I∈RW×H×3I \in R^{W×H×3}I∈RW×H×3 :image
- RRR :output stride,实验中为4
- CCC :keypoint的类别数
Loss Function
Ldet=Lk+λsizeLsize+λoffLoff\text{L}_{det} = \text{L}_k + \lambda_{size} \text{L}_{size} + \lambda_{off} \text{L}_{off} Ldet=Lk+λsizeLsize+λoffLoff
- λsize=0.1\lambda_{size} = 0.1λsize=0.1
- λoff=1\lambda_{off} = 1λoff=1
KeyPoint Loss Lk\text{L}_kLk
penalty-reduced pixel-wise logistic regression with focal loss
<img src="[paper reading] CenterNet (Object as Points).assets/image-20201105190626950.png" alt="image-
- Y^xyc\hat{Y}_{xyc}Y^xyc :predicted keypoint confidence
- α=2,β=4\alpha =2,\beta=4α=2,β=4
Offset Loss Loff\text{L}_{off}Loff
目的:恢复由下采样带来的离散化错误(discretization error)
- O^∈RWR×HR×2\hat O \in \mathcal R^{\frac{W}{R}×\frac HR×2}O^∈RRW×RH×2 :predicted local offset
注意:
- 仅仅对keypoint location(positive)计算
- 所有的类别共享相同的offset prediction
Size Loss Lsize\text{L}_{size}Lsize
- S^pk∈RWR×HR×2\hat{S}_{p_{k}} \in \mathcal R^{\frac{W}{R}×\frac HR×2}S^pk∈RRW×RH×2
- sk=(x2(k)−x1(k),y2(k)−y1(k))s_k = \big(x_2^{(k)} - x_1^{(k)}, y_2^{(k)}-y_1^{(k)} \big)sk=(x2(k)−x1(k),y2(k)−y1(k))
Use Yourself
……
Related work
Anchor-Based Method
Essence
将detection降级为classification
Two-Stage Method
在image上放置anchor(同 [One-Stage Method](#One-Stage Method))
即:在low-resolution上dense & grid采样anchor,分类为foreground/background ==> proposal
具体的label:
foreground:
与任意ground-truth box有 > 0.7 的IoU
background:
与任意ground-truth box有 < 0.3 的IoU
ignored:
与任意ground-truth box 的IoU ∈[0.3,0.7]\in [0.3, 0.7]∈[0.3,0.7]
对anchor进行feature resample
比如:
- RCNN:在image上取crop
- Fast-RCNN:在feature map上取crop
One-Stage Method
- 在image上放置anchor
- 直接对anchor位置进行分类
one-stage method的一些改进:
- anchor shape prior
- different feature resolution(e.g. Feature Pyramid Network)
- loss re-weighting(e.g. Focal Loss)
Post-Processing(NMS)
Purpose:根据IoU,抑制相同instance的detections
Drawback:难以differentiate和train,导致绝大部分的detector无法做到end-to-end trainable
KeyPoint-Based Method
Essence
将detection转化为keypoint estimation
其Backbone均为KeyPoint Estimation Network
CornerNet
检测2个corner作为keypoints,表示1个bounding box
ExtremeNet
检测 top-most, left-most, bottom-most, right-most ,center 作为keypints
Drawback
对1个object检测多个keypoint,其需要额外的grouping stage(导致算法速度的降低)
[paper reading] CenterNet (Object as Points)相关推荐
- [paper reading] CenterNet (Triplets)
[paper reading] CenterNet (Triplets) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yo ...
- [paper reading] FCOS
[paper reading] FCOS GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yourself,即对于本文的理解和 ...
- [paper reading] CornerNet
[paper reading] CornerNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...
- [paper reading] RetinaNet
[paper reading] RetinaNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...
- [paper reading] SSD
[paper reading] SSD GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到CSDN ...
- [paper reading] YOLO v1
[paper reading] YOLO v1 GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到 ...
- [paper reading] Faster RCNN
[paper reading] Faster RCNN GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法 ...
- centernet: objects as points
轻松掌握 MMDetection 中常用算法(七):CenterNet - 知乎文@ 0000070 摘要 在大家的千呼万唤中,MMDetection 支持 CenterNet 了!! CenterN ...
- CenterNet:Objects as Points论文阅读笔记
CenterNet论文阅读笔记 (一)Title (二)Summary (三)Research Objective (四)Problem Statement (五)Method 5.1 Loss Fu ...
最新文章
- 火铃游戏Java_敲铃的小班游戏教案
- mysql修改字符集utf8为utf8mb4
- Ubuntu系统如何安装软件
- [J2ME QA]真机报告MontyThread -n的错误之解释
- Oracle-一个中文汉字占几个字节?
- T-SQL 之 执行顺序
- C语言算术运算符介绍和示例
- 看看20万程序员怎么评论:前端程序员会不会被淘汰?
- 15个优秀的第三方 Web 技术集成
- 深入浅出新一代云网络——VPC中的那些功能与基于OpenStack Neutron的实现(二)-带宽控制...
- OpenHarmony短信验证码及倒计时实现
- 第八十二章 Caché 函数大全 $ZCSC 函数
- 人生的机会成本(博弈论的诡计)
- c语言用键盘弹钢琴,【游戏】用键盘弹钢琴(大家都来试试吧)
- 如何更好的建设标准化数字化智慧工地?
- Oxygen Eclipse安装Java EE
- python中的内置高阶函数
- 【海岛吉他8】如何记住吉他指板?
- UDS协议-0x10(诊断会话控制)
- 百度贴吧怎么进不去_怎么从百度贴吧引流宝妈粉,我用百度霸屏做内容吸粉!...
热门文章
- python零基础好学吗-Python零基础好学吗?零基础如何学习Python?
- python能做什么工作-学Python能找到什么工作?这4种工作最热门!
- python 代码命令大全-深度学习中python常用命令
- 谷歌何时停止Android更新,谷歌宣布Android Studio将停止为32位系统提供更新
- python turtle画熊猫人_Python 使用turtle插件,画小猪佩奇
- 使用 CSS 模拟鼠标点击交互
- java jshelllink_02--Java Jshell的使用 最适合入门的Java教程
- java找出最高工资和下标_Java 8 lambda用于为每个部门选择最高薪资员工
- java与数据库教程_[求助]Java与数据库的链接的教程or资料
- Helm 3 完整教程(十七):Helm 流控制结构(1)if / else 语句