[paper reading] CenterNet (Object as Points)

GitHub:Notes of Classic Detection Papers

2020.11.09更新:更新了Use Yourself,即对于本文的理解和想法,详情参见GitHub:Notes-of-Classic-Detection-Papers

本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到CSDN,但是格式也有些乱
强烈建议去GitHub上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star

topic motivation technique key element math use yourself relativity
CenterNet
(Object as Points)
Problem to Solve
Idea
CenterNet Architecture Center Point & Anchor
Getting Ground-Truth
Model Output
Data Augmentation
Inference
TTA
Compared with SOTA
Additional Experiments
Loss Function
KeyPoint Loss Lk\text{L}_kLk​
Offset Loss Loff\text{L}_{off}Loff​
Size Loss Lsize\text{L}_{size}Lsize​
…… Anchor-Based
KeyPoint-Based

文章目录

  • [paper reading] CenterNet (Object as Points)
    • Motivation
      • Problem to Solve
      • Idea
    • Technique
      • CenterNet Architecture
        • Components
        • Advantage
    • Key Element
      • Center Point & Anchor
        • Connection
        • Difference
      • Getting Ground-Truth
        • Keypoint Ground-Truth
          • Ground-Truth:Input Image ==> Output Feature Map
          • Gaussian Penalty Reduction
        • Size Ground-Truth
      • Model Output
      • Data Augmentation
      • Inference
      • TTA
      • Compared with SOTA
      • Additional Experiments
        • Center Point Collision
        • NMS
        • Training & Testing Resolution
        • Regression Loss
        • Bounding Box Size Weight
        • Training Schedule
    • Math
      • Symbol Definition
      • Loss Function
      • KeyPoint Loss Lk\text{L}_kLk​
      • Offset Loss Loff\text{L}_{off}Loff​
      • Size Loss Lsize\text{L}_{size}Lsize​
    • Use Yourself
    • Related work
      • Anchor-Based Method
        • Essence
        • Two-Stage Method
        • One-Stage Method
        • Post-Processing(NMS)
      • KeyPoint-Based Method
        • Essence
        • CornerNet
        • ExtremeNet
        • Drawback

Motivation

Problem to Solve

anchor-based method有以下的缺点:

  • wasteful & inefficient

    需要对object进行饱和式检测(饱和式地列出object的潜在位置)

  • need post-processing(e.g. NMS

Idea

  • 本质上讲:

    将Object Detection转化为Standard Keypoint Estimation

  • 思路上讲:

    使用bounding box的center point表示一个object

  • 具体流程上讲:

    使用keypoint estimation寻找center point,并根据center point回归其他的属性(因为其他的属性都和center point存在确定的数学关系

Technique

CenterNet Architecture

Components

  • Backbone

    • Stacked Hourglass Network

      详见 [CornerNet](./[paper reading] RetinaNet.md)

    • Upconvolutional Residual Netwotk

    • Deep Layer Aggregation(DLA)

  • Task-Specific Modality

    • 1 个 3×3 Convolution
    • ReLU
    • 1 个 1×1 Convolution

Advantage

  • simpler & faster & accurate

  • end-to-end differential

    所有的输出都是直接keypoint estimation network输出,不需要NMS(以及其他post-processing)

    Peak Keypoint Extraction由 3×3Max Pooling3×3 \ \text{Max Pooling}3×3 Max Pooling 实现,足够用来替换NMS

  • estimate additional object properties in one single forward pass

    单次前向传播中,可以估计出多种object properties

Key Element

Center Point & Anchor

Connection

center point可以看作是shape-agnostic anchor形状不可知的anchor)

Difference

  • center point仅仅与location有关(与box overlap无关

    即:不需要手动设置foreground和background的threshold

  • 每个object仅对应1个center point

    直接keypoint heatmap上提取local peak不存在重复检测的问题

  • CenterNet更大的输出分辨率

    降采样步长为4(常见为16)

Getting Ground-Truth

详见 [Symbol Definition](#Symbol Definition)

Keypoint Ground-Truth

Ground-Truth:Input Image ==> Output Feature Map
  • p∈R2p \in \mathcal{R}^2p∈R2 :ground-truth keypoint
  • p~=⌊pR⌋\widetilde{p} = \lfloor\frac pR \rfloorp​=⌊Rp​⌋ :low-resolution equivalent

imageground-truth keypoint ppp 映射为output feature mapground-truth keypoint p~\widetilde pp​
p~=⌊pR⌋\widetilde{p} = \lfloor\frac pR \rfloor p​=⌊Rp​⌋

Gaussian Penalty Reduction

Yxyc=exp⁡(−(x−p~x)2+(y−p~y)22σp2)Y_{x y c}=\exp \left(-\frac{\left(x-\tilde{p}_{x}\right)^{2}+\left(y-\tilde{p}_{y}\right)^{2}}{2 \sigma_{p}^{2}}\right) Yxyc​=exp(−2σp2​(x−p~​x​)2+(y−p~​y​)2​)

  • σp\sigma_{p}σp​ :object size-adaptive标准差

如果同一个类别2个Gaussian发生重叠,则取element-wise maximum

keypoint heatmap
Y^∈[0,1]WR×HR×C\hat{Y}\in[0,1]^{\frac{W}{R}×\frac HR×C} Y^∈[0,1]RW​×RH​×C

  • Y^x,y,c=1\hat Y _{x,y,c} =1Y^x,y,c​=1 ==> keypoint
  • Y^x,y,c=0\hat Y _{x,y,c} =0Y^x,y,c​=0 ==> background

注意:这里的centerbounding box几何中心,即center左右边和上下边距离是相等的

Size Ground-Truth

bounding box 用4个点表示(第 kkk 个object,类别为 ckc_kck​):
(x1(k),y1(k),x2(k),y2(k))(x_1^{(k)}, y_1^{(k)}, x_2^{(k)}, y_2^{(k)}) (x1(k)​,y1(k)​,x2(k)​,y2(k)​)
Center 表示为:
pk=(x1(k)+x2(k)2,y1(k)+y2(k)2)p_k = \big( \frac{x_1^{(k)} + x_2^{(k)} }{2} , \frac{y_1^{(k)} + y_2^{(k)} }{2} \big) pk​=(2x1(k)​+x2(k)​​,2y1(k)​+y2(k)​​)
Size Ground-Truth 表示为:
sk=(x2(k)−x1(k),y2(k)−y1(k))s_k = \big(x_2^{(k)} - x_1^{(k)}, y_2^{(k)}-y_1^{(k)} \big) sk​=(x2(k)​−x1(k)​,y2(k)​−y1(k)​)

注意:不对scale进行归一化,而是直接使用raw pixel coordinate

Model Output

Input & Output Resolution

  • 512×512
  • 128×128

所有的输出共享一个共用的全卷积网络

  • keypoint Y^\hat YY^ ==> CCC
  • offset O^\hat OO^ ==> 2
  • size S^\hat SS^ ==> 2

即:每个location都有C+4个output

对于each modality,在将feature经过:

  • 1 个 3×3 Convolution
  • ReLU
  • 1 个 1×1 Convolution

Data Augmentation

  • random flip

  • random scaling(0.6~1.3)

  • cropping

  • color jittering

Inference

CenterNetInferencesingle network forward pass

  1. image输入backbone(e.g. FCN),得到3个输出

    • keypoint Y^\hat YY^ ==> CCC

      heatmap的peak对应object的center(取top-100

      peak的判定:值 ≥\ge≥ 其8个邻居

    • offset O^\hat OO^ ==> 2

    • size S^\hat SS^ ==> 2

  2. 根据keypoint Y^\hat YY^、 offset O^\hat OO^、size S^\hat SS^ 计算bounding box

    • (δx^i,δx^i)=O^x^i,y^i(\delta \hat x_i, \delta \hat x_i) = \hat O_{\hat x_i, \hat y_i}(δx^i​,δx^i​)=O^x^i​,y^​i​​ :offset prediction
    • (w^i,h^i)=S^x^i,y^i( \hat w_i, \hat h_i) = \hat S _{\hat x_i, \hat y_i}(w^i​,h^i​)=S^x^i​,y^​i​​ :size prediction
  3. 计算keypointconfidence:keypoint对应位置的value
    Y^xi,yic\hat Y_{x_i,y_ic} Y^xi​,yi​c​

TTA

有3种TTA方式:

  1. no augmentation

  2. flip augmentation

    flip:在decoding之前,进行output average

  3. flip & multi-scale(0.5,0.75,1,1.25,1.5)

    multi-scale:使用NMS对结果进行聚合

Compared with SOTA

Additional Experiments

Center Point Collision

多个object经过下采样,其center keypoint有可能重叠

CenterNet可以减少Center Keypoint的冲突

NMS

CenterNet使用了NMS提升很小,说明CenterNet不需要NMS

Training & Testing Resolution

  1. 低分辨率速度最快但是精度最差
  2. 高分辨率精度提高,但速度降低
  3. 原尺寸速度略高于高分辨率,但速度略慢

Regression Loss

smooth L1 Loss的效果略差于L1 Loss

Bounding Box Size Weight

λsize\lambda_{size}λsize​ 为0.1时最佳,增大时AP快速衰减,减小时鲁棒

Training Schedule

训练时间更长,效果更好

Math

Symbol Definition

  • I∈RW×H×3I \in R^{W×H×3}I∈RW×H×3 :image
  • RRR :output stride,实验中为4
  • CCC :keypoint类别数

Loss Function

Ldet=Lk+λsizeLsize+λoffLoff\text{L}_{det} = \text{L}_k + \lambda_{size} \text{L}_{size} + \lambda_{off} \text{L}_{off} Ldet​=Lk​+λsize​Lsize​+λoff​Loff​

  • λsize=0.1\lambda_{size} = 0.1λsize​=0.1
  • λoff=1\lambda_{off} = 1λoff​=1

KeyPoint Loss Lk\text{L}_kLk​

penalty-reduced pixel-wise logistic regression with focal loss

<img src="[paper reading] CenterNet (Object as Points).assets/image-20201105190626950.png" alt="image-

  • Y^xyc\hat{Y}_{xyc}Y^xyc​ :predicted keypoint confidence
  • α=2,β=4\alpha =2,\beta=4α=2,β=4

Offset Loss Loff\text{L}_{off}Loff​

目的:恢复由下采样带来的离散化错误(discretization error)

  • O^∈RWR×HR×2\hat O \in \mathcal R^{\frac{W}{R}×\frac HR×2}O^∈RRW​×RH​×2 :predicted local offset

注意

  • 仅仅对keypoint locationpositive)计算
  • 所有的类别共享相同的offset prediction

Size Loss Lsize\text{L}_{size}Lsize​

  • S^pk∈RWR×HR×2\hat{S}_{p_{k}} \in \mathcal R^{\frac{W}{R}×\frac HR×2}S^pk​​∈RRW​×RH​×2
  • sk=(x2(k)−x1(k),y2(k)−y1(k))s_k = \big(x_2^{(k)} - x_1^{(k)}, y_2^{(k)}-y_1^{(k)} \big)sk​=(x2(k)​−x1(k)​,y2(k)​−y1(k)​)

Use Yourself

……

Related work

Anchor-Based Method

Essence

detection降级为classification

Two-Stage Method

  1. image上放置anchor(同 [One-Stage Method](#One-Stage Method))

    即:在low-resolutiondense & grid采样anchor,分类为foreground/background ==> proposal

    具体的label:

    • foreground

      任意ground-truth box> 0.7 的IoU

    • background

      任意ground-truth box< 0.3 的IoU

    • ignored

      任意ground-truth boxIoU ∈[0.3,0.7]\in [0.3, 0.7]∈[0.3,0.7]

  2. anchor进行feature resample

比如:

  • RCNN:在image上取crop
  • Fast-RCNN:在feature map上取crop

One-Stage Method

  1. image上放置anchor
  2. 直接anchor位置进行分类

one-stage method的一些改进

  • anchor shape prior
  • different feature resolution(e.g. Feature Pyramid Network
  • loss re-weighting(e.g. Focal Loss

Post-Processing(NMS)

  • Purpose:根据IoU,抑制相同instance的detections

  • Drawback:难以differentiatetrain,导致绝大部分的detector无法做到end-to-end trainable

KeyPoint-Based Method

Essence

detection转化为keypoint estimation

Backbone均为KeyPoint Estimation Network

CornerNet

检测2个corner作为keypoints,表示1个bounding box

ExtremeNet

检测 top-most, left-most, bottom-most, right-most ,center 作为keypints

Drawback

1个object检测多个keypoint,其需要额外的grouping stage(导致算法速度的降低)

[paper reading] CenterNet (Object as Points)相关推荐

  1. [paper reading] CenterNet (Triplets)

    [paper reading] CenterNet (Triplets) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yo ...

  2. [paper reading] FCOS

    [paper reading] FCOS GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yourself,即对于本文的理解和 ...

  3. [paper reading] CornerNet

    [paper reading] CornerNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...

  4. [paper reading] RetinaNet

    [paper reading] RetinaNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...

  5. [paper reading] SSD

    [paper reading] SSD GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到CSDN ...

  6. [paper reading] YOLO v1

    [paper reading] YOLO v1 GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到 ...

  7. [paper reading] Faster RCNN

    [paper reading] Faster RCNN GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法 ...

  8. centernet: objects as points

    轻松掌握 MMDetection 中常用算法(七):CenterNet - 知乎文@ 0000070 摘要 在大家的千呼万唤中,MMDetection 支持 CenterNet 了!! CenterN ...

  9. CenterNet:Objects as Points论文阅读笔记

    CenterNet论文阅读笔记 (一)Title (二)Summary (三)Research Objective (四)Problem Statement (五)Method 5.1 Loss Fu ...

最新文章

  1. 火铃游戏Java_敲铃的小班游戏教案
  2. mysql修改字符集utf8为utf8mb4
  3. Ubuntu系统如何安装软件
  4. [J2ME QA]真机报告MontyThread -n的错误之解释
  5. Oracle-一个中文汉字占几个字节?
  6. T-SQL 之 执行顺序
  7. C语言算术运算符介绍和示例
  8. 看看20万程序员怎么评论:前端程序员会不会被淘汰?
  9. 15个优秀的第三方 Web 技术集成
  10. 深入浅出新一代云网络——VPC中的那些功能与基于OpenStack Neutron的实现(二)-带宽控制...
  11. OpenHarmony短信验证码及倒计时实现
  12. 第八十二章 Caché 函数大全 $ZCSC 函数
  13. 人生的机会成本(博弈论的诡计)
  14. c语言用键盘弹钢琴,【游戏】用键盘弹钢琴(大家都来试试吧)
  15. 如何更好的建设标准化数字化智慧工地?
  16. Oxygen Eclipse安装Java EE
  17. python中的内置高阶函数
  18. 【海岛吉他8】如何记住吉他指板?
  19. UDS协议-0x10(诊断会话控制)
  20. 百度贴吧怎么进不去_怎么从百度贴吧引流宝妈粉,我用百度霸屏做内容吸粉!...

热门文章

  1. python零基础好学吗-Python零基础好学吗?零基础如何学习Python?
  2. python能做什么工作-学Python能找到什么工作?这4种工作最热门!
  3. python 代码命令大全-深度学习中python常用命令
  4. 谷歌何时停止Android更新,谷歌宣布Android Studio将停止为32位系统提供更新
  5. python turtle画熊猫人_Python 使用turtle插件,画小猪佩奇
  6. 使用 CSS 模拟鼠标点击交互
  7. java jshelllink_02--Java Jshell的使用 最适合入门的Java教程
  8. java找出最高工资和下标_Java 8 lambda用于为每个部门选择最高薪资员工
  9. java与数据库教程_[求助]Java与数据库的链接的教程or资料
  10. Helm 3 完整教程(十七):Helm 流控制结构(1)if / else 语句