[paper reading] FCOS

GitHub:Notes of Classic Detection Papers

2020.11.09更新:更新了Use Yourself,即对于本文的理解和想法,详情参见GitHub:Notes-of-Classic-Detection-Papers

本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到CSDN,但是格式也有些乱
强烈建议去GitHub:Notes-of-Classic-Detection-Papers上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star

topic motivation technique key element math use yourself relativity
FCOS Idea
Contribution
FCOS Architecture
Center-ness
Multi-Level FPN Prediction
Prediction Head
Training Sample & Label
Model Output
Feature Pyramid
Inference
Ablation Study
FCN & Detection
FCOS vs.vs.vs. YOLO v1
Symbol Definition
Loss Function
Center-ness
Remap of Feature & Image
…… Related Work

文章目录

  • [paper reading] FCOS
    • Motivation
      • Idea
      • Contribution
    • Techniques
      • FCOS Architecture
        • Advantage
      • Center-ness
        • Idea
        • Implement
      • Multi-Level FPN Prediction
    • Key Elements
      • Prediction Head
        • Classification Branch
        • Regression Branch
        • Shared Head
      • Training Sample & Label
        • Training Sample
        • Label Pos/Neg
      • Model Output
        • 4D Vector t∗\pmb t^*ttt∗
        • C-Dimension Vector p\pmb pp​p​​p
      • Feature Pyramid
      • Inference
      • Ablation Study
        • Multi-Level FPN Prediction
        • Ambiguity Samples
        • With or Without Center-ness
      • FCN & Detection
      • FCOS vs.vs.vs. YOLO v1
    • Math
      • Symbol Definition
      • Loss Function
      • Center-ness
      • Remap of Feature & Image
    • Use Yourself
    • Related Work
      • Drawbacks of Anchor
      • DenseBox-Based
      • Anchor-Based Detector
      • YOLO v1
        • Idea
        • Drawbacks of Points near Center
      • CornerNet
        • Steps
        • Drawbacks of Corner
        • Drawbacks of Points near Center
      • CornerNet
        • Steps
        • Drawbacks of Corner

Motivation

Idea

per-pixel prediction的方法进行object detection(通过fully convolution实现)

Contribution

  1. detection重新表述为per-pixel prediction

  2. 使用 multi-level prediction

    • 提升recall
    • 解决重叠bounding box带来的ambiguity
  3. center-ness branch

    抑制bounding box中的low-quality prediction

Techniques

FCOS Architecture

backbone的默认选择是ResNet-50

Advantage

详见 [Drawbacks of Anchor](#Drawbacks of Anchor)

  • DetectionFCN-solvable task (e.g. semantic segmentation) unify到一起

    一些其他任务的Idea也可以迁移到detection中(re-use idea

  • anchor-free & proposal-free

  • 消除了与anchor相关的**复杂计算 **(e.g. IoU)

    获得 faster training & testingless training memory footprint

  • one-stage中做到了SOTA,可以用于替换PRN

  • 可以快速迁移其他的vision task (e.g. instance segmentation, key-point detection)

Center-ness

center-ness是对每个location进行预测

可以极大地提高性能

Idea

远离center的位置会产生大量low-qualitypredicted bounding box

FCOS引入了center-ness抑制远离center的low-quality bounding box(e.g. down-weight

Implement

引入center-ness branch,来预测location的center-ness

测试时:

  1. 通过下式计算score
    Final Score=Classification Score×Center-ness\text{Final Score} = \text{Classification Score} × \text{Center-ness} Final Score=Classification Score×Center-ness

  2. 使用NMS滤除被抑制的bounding box

Multi-Level FPN Prediction

Multi-Level FPN Prediction能解决2个问题

  • Best Possible Recall

    FCOSBest Possible Recall提升到SOTA

  • Ambiguity of Ground-Truth Box Overlap

    解决ground-truth box重叠带来的ambiguity,达到anchor-based程度

    原因:绝大部分情况下,发生重叠的object尺度差距都很大

Idea是:根据regress distance的不同,将需要回归的location分发到不同的feature level

具体来说:

  1. 计算regress target

  2. 根据feature levelmaximum regress distance,筛选出positive sample
    mi−1<max(l∗,t∗,r∗,b∗)<mim_{i-1} < \text{max}( l^*, t^*,r^*,b^* ) < m_i mi−1​<max(l∗,t∗,r∗,b∗)<mi​
    其中 mim_imi​ 是feature level iii 需要regress的maximum distance
    {m2,m3,m4,m5,m6,m7}={0,64,128,256,512}\{m_2, m_3, m_4, m_5, m_6, m_7\} = \{ 0,64,128,256,512 \} {m2​,m3​,m4​,m5​,m6​,m7​}={0,64,128,256,512}

    相比于原始的FPN(e.g. SSD),FCOS不同scale的object“隐性“地分到了不同的feature level(small在浅层,large在深层)

    我认为这可以看做更精致的手工设计

  3. 1个location落到2个ground-truth box(e.g. ambiguity)中,则选择small box进行regression(更偏向小目标

Key Elements

Prediction Head

Classification Branch

Regression Branch

由于regression target永远是正数,所以在regression branch的top续上exp(sixs_ixsi​x)(详见 [Shared Head](#Shared Head))

Shared Head

不同feature level共享head

Advantage

  • parameter efficient
  • improve performance

Drawback

由于[Multi-Level FPN Prediction](#Multi-Level FPN Prediction)的使用,会使得不同feature level的输出范围有所不同(e.g. [0, 64] for P3P_3P3​,[64, 128] for P4P_4P4​)

为了使得identical heads可以用于different feature level:
exp(x)→exp(six)\text{exp}(x) \rightarrow \text{exp}(s_ix) exp(x)→exp(si​x)

  • sis_isi​ :trainable scaler,用来自动调整exp的base

Training Sample & Label

Training Sample

直接将location作为training sample(这和语义分割的FCN相似)

Label Pos/Neg

location (x,y)(x,y)(x,y) 为正样本的条件为:

  1. location (x,y)(x,y)(x,y) 落在 ground-truth box中
  2. location (x,y)(x,y)(x,y) 的类别 == 该ground-truth box****的类别

FCOS使用了尽可能多的foreground sample训练(e.g. ground-truth box的全部location

而不像anchor-based仅选用与ground-truth box高的作为正样本

也不像[CenterNet (Object as Points)](./[paper reading] CenterNet (Object as Points).md) 只对geometric center作为正样本

Model Output

每个level的feature map每个location有如下的输出:

4D Vector t∗\pmb t^*ttt∗

t∗=(l∗,t∗,r∗,b∗)\pmb t^* = (l^*,t^*,r^*,b^*) ttt∗=(l∗,t∗,r∗,b∗)

描述了bounding box的4个side相对于该locationrelative offset

具体来说:

注意:

FCOS是对ground-truth box的每个location进行计算(并不仅仅是geometric center),所以需要预测4个量来获得boundary

像 [CenterNet (Object as Points)](./[paper reading] CenterNet (Object as Points).md) 只对geometric center进行预测,2个量就够了

注意:object重叠的问题可以通过 [Multi-Level FPN Prediction](#Multi-Level FPN Prediction) 解决。但如果仍发生重叠,则优先考虑小样本(选择面积最小的bounding box

C-Dimension Vector p\pmb pp​p​​p

实验中使用的不是 C-class classifier,而是 Cbinary classifier

Feature Pyramid

定义了5个level的feature map:{P3,P4,P5,P6,P7}\{P_3, P_4, P_5, P_6, P_7\}{P3​,P4​,P5​,P6​,P7​}(步长为 {8,16,32,64,128}\{8,16,32,64,128\}{8,16,32,64,128})

  • {P3,P4,P5}\{P_3,P_4,P_5\}{P3​,P4​,P5​}:backbone的feature map {Cc,C4,C5}\{ C_c, C_4,C_5\}{Cc​,C4​,C5​} + 1×1 Convlution
  • {P6,P7}\{P_6,P_7\}{P6​,P7​} :P5P_5P5​ & P6P_6P6​ 经过stride=2的卷积层

Inference

  1. 将image输入network,在feature map FiF_iFi​ 的每个location获得:

    • classification score px,y\pmb p_{x,y}p​p​​px,y​
    • regression prediction tx,y\pmb t_{x,y}tttx,y​
  2. 选择 px,y>0.5\pmb p_{x,y} > 0.5p​p​​px,y​>0.5 的location,作为positive sample

  3. decode得到bounding box的coordinate

Ablation Study

Multi-Level FPN Prediction

结论

  • Best Possible Recall 并不是FCOS的问题
  • Multi-Level FPN Prediction可以提高Best Possible Recall

Ambiguity Samples

结论:

  • Multi-Level FPN Prediction可以解决Ambiguity Samples的问题

    即:大部分的overlap ambiguity会被分到不同的feature level,只有极少的ambiguity location还存在

With or Without Center-ness

  • Center-ness能抑制远离center的low-quality bounding box,从而大幅度提高AP

  • center-ness必须具有单独的支路

FCN & Detection

FCN 主要用于 dense prediction

其实fundamental vision task都可以unifyone single framework

anchor的使用,实际上使得Detecntion任务偏离neat fully convolutional per-pixel prediction framework

FCOS vs.vs.vs. YOLO v1

相比于YOLO v1只使用靠近center的point进行predictionFCOS使用ground-truth的全部点进行prediction

对于产生的low-quality bounding box,由center-ness进行抑制

使得FCOS可以达到anchor-based detectors相近的recall

Math

Symbol Definition

  • Fi∈RH×W×CF_i \in \mathbb{R} ^{H×W×C}Fi​∈RH×W×C :backbone中第 iii 层的feature map

  • sss :到该层的total stride

  • {Bi}\{B_i\}{Bi​} :ground-truth box
    Bi=(x0(i),y0(i),x1(i),y1(i),x(i))∈R4×{1,2...C}B_i = (x_0^{(i)}, y_0^{(i)},x_1^{(i)},y_1^{(i)},x^{(i)}) \in \mathbb{R}^4 × \{ 1,2...C\} Bi​=(x0(i)​,y0(i)​,x1(i)​,y1(i)​,x(i))∈R4×{1,2...C}

    • (x0(i),y0(i))(x_0^{(i)}, y_0^{(i)})(x0(i)​,y0(i)​) :top-left corner coordinate
    • (x1(i),y1(i))(x_1^{(i)}, y_1^{(i)})(x1(i)​,y1(i)​) :bottom-right corner coordinate
    • c(i)c^{(i)}c(i) :bounding box中object的class
    • CCC :number of class

Loss Function

  • λ=1\lambda = 1λ=1

还缺少一个Center-ness Loss,其为binary cross entropy

该损失在feature map的全部location上计算,具体来说:

  • Classification Loss全部location上计算(positive & negative

  • Classification Losspositive location上计算

    1{Cx,y∗>0}=1\mathbb{1} _{\{ C_{x,y}^* > 0\}} = 11{Cx,y∗​>0}​=1 if ci∗>0c_i^*>0ci∗​>0

Center-ness

centerness* =min⁡(l∗,r∗)max⁡(l∗,r∗)×min⁡(t∗,b∗)max⁡(t∗,b∗)\text { centerness* }=\sqrt{\frac{\min \left(l^{*}, r^{*}\right)}{\max \left(l^{*}, r^{*}\right)} \times \frac{\min \left(t^{*}, b^{*}\right)}{\max \left(t^{*}, b^{*}\right)}}  centerness* =max(l∗,r∗)min(l∗,r∗)​×max(t∗,b∗)min(t∗,b∗)​​

  • center-ness反映location对应centernormalized distance
  • 使用“根号”来延缓center-ness的衰减
  • center-ness的范围为 [0,1]

Remap of Feature & Image

feature map上的 (x,y)(x,y)(x,y),映射回原图像为:
(⌊s2⌋+xs,⌊s2⌋+ys)\big( \lfloor \frac s2 \rfloor + xs , \lfloor \frac s2 \rfloor + ys \big) (⌊2s​⌋+xs,⌊2s​⌋+ys)
该位置会靠近location (x,y)(x,y)(x,y) 的对应的reception fieldcenter

Use Yourself

Related Work

Drawbacks of Anchor

  1. detection performancesizeaspect rationumber of anchor等超参数敏感

    anchor需要精密的手工设计

  2. 需要大量的anchor,才能获得high recall rate

    这会导致训练极端的正负样本不均衡

  3. anchor会伴随着复杂的计算

    比如IoU的计算

  4. anchorsizeaspect ratios都是预先定义的,导致无法应对shape variations(尤其对于小目标)

    另外,anchor这种“预先定义”的形式也会影响模型的泛化能力。换句话说,设计的anchortask-specific

DenseBox-Based

  • 对image进行crop和resize,以处理不同size的bounding box

    导致DenseBox必须在image pyramid上进行detection

    这与FCN仅计算一次convolution的思想相悖

  • 仅仅用于特定的domain,难以处理重叠的object

    因为无法确定对应pixel回归到哪一个object

  • Recall比较低

Anchor-Based Detector

  • 来源

    sliding window 和 proposal based detectors

  • anchor的本质

    预定义的sliding window (proposal) + offset regression

  • anchor的作用

    作为detector的训练数据

  • 典型model

    • Faster-RCNN
    • SSD
    • YOLO v2

YOLO v1

YOLO v1是典型的Anchor-Free Detector

Idea

YOLO v1使用靠近center的point预测bounding box

即:object的center落到哪个grid cell,则由该cell负责预测该object的bounding box

这是因为:靠近center的points能生成质量更高的detection

Drawbacks of Points near Center

只使用靠近center的points,会导致low-racall

正因如此,YOLO v2 又重新使用了anchor

CornerNet

CornerNet是典型的Anchor-Free Detector

Steps

  1. corner detection
  2. corner grouping
  3. post-processing

Drawbacks of Corner

unding box**

即:object的center落到哪个grid cell,则由该cell负责预测该object的bounding box

这是因为:靠近center的points能生成质量更高的detection

Drawbacks of Points near Center

只使用靠近center的points,会导致low-racall

正因如此,YOLO v2 又重新使用了anchor

CornerNet

CornerNet是典型的Anchor-Free Detector

Steps

  1. corner detection
  2. corner grouping
  3. post-processing

Drawbacks of Corner

post-processing复杂,需要额外的distance metric

[paper reading] FCOS相关推荐

  1. cvpr2019/cvpr2018/cvpr2017(Papers/Codes/Project/Paper reading)

    cvpr2019/cvpr2018/cvpr2017(Papers/Codes/Project/Paper reading) Source:https://mp.weixin.qq.com/s/SmS ...

  2. [paper reading] CenterNet (Object as Points)

    [paper reading] CenterNet (Object as Points) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更 ...

  3. [paper reading] CenterNet (Triplets)

    [paper reading] CenterNet (Triplets) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yo ...

  4. [paper reading] CornerNet

    [paper reading] CornerNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...

  5. [paper reading] RetinaNet

    [paper reading] RetinaNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...

  6. [paper reading] SSD

    [paper reading] SSD GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到CSDN ...

  7. [paper reading] YOLO v1

    [paper reading] YOLO v1 GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到 ...

  8. [paper reading] Faster RCNN

    [paper reading] Faster RCNN GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法 ...

  9. [paper reading] DenseNet

    [paper reading] DenseNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放 ...

最新文章

  1. Fragment的运用实列
  2. 数据中心ups电源七个故障分析
  3. 三、python+selenium
  4. Javaweb学习笔记——使用Jdom解析xml
  5. Android 代码混淆、第三方平台加固加密、渠道分发 完整教程(转)
  6. 2018_11_25_生活记录
  7. SQL2016安装错误:安装程序无法与下载服务器联系。请提供 Microsoft R Open 和 Microsoft R Server 安装文件的位置
  8. ace unlck工具下载_iPhoneX如何解锁ID激活锁
  9. TV_Control Android机顶盒手机控制全套程序开源
  10. 中文代码示例视频演示Python入门教程第五章 数据结构
  11. 72个嵌入式技术网站
  12. jQuery菜鸟教程02
  13. 有关java的演讲稿_有关超级演说家励志的演讲稿
  14. 求生之路显示服务器列表指令,求生之路2指令大全 所有可用指令一览_游戏狗
  15. 服务产品化,或许是中国软件的出路
  16. 开放经济的宏观经济学:基本概念 - 异想天开
  17. GOIP connects with Elastix through “config by line”
  18. ffmpeg编译及使用
  19. 与Power PMAC通讯
  20. 分享一款办公辅助工具 迅捷文档转换

热门文章

  1. 小米电视4A核心技术之语音识别浅析
  2. mysql索引 倒排表_mysql倒排的优化
  3. lnmp mysql.sock_配置Mysql过程中的问题——mysql.sock(LNMP-3)
  4. 主机mysql密码修改_mysql密码修改方法_配置root密码_mysql 改数据库名_mysql忘记密码...
  5. linux切大文件为小文件,linux系统下分割大文件的方法
  6. x264 编码器选项分析 (x264 Codec Strong and Weak Points) 1
  7. 从硬件竞争到软实力PK——电视媒体竞争观察
  8. tensorflow机器学习实战指南 源代码_小小白TensorFlow机器学习实战基础
  9. python中唯一的映射类型是什么_Python基础类型之字典(dict)
  10. layui admin 当前子页面 刷新 其他页面 layui 关闭 子弹窗