[paper reading] FCOS
[paper reading] FCOS
GitHub:Notes of Classic Detection Papers
2020.11.09更新:更新了Use Yourself,即对于本文的理解和想法,详情参见GitHub:Notes-of-Classic-Detection-Papers
本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到CSDN,但是格式也有些乱
强烈建议去GitHub:Notes-of-Classic-Detection-Papers上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star!
topic | motivation | technique | key element | math | use yourself | relativity |
---|---|---|---|---|---|---|
FCOS |
Idea Contribution |
FCOS Architecture Center-ness Multi-Level FPN Prediction |
Prediction Head Training Sample & Label Model Output Feature Pyramid Inference Ablation Study FCN & Detection FCOS vs.vs.vs. YOLO v1 |
Symbol Definition Loss Function Center-ness Remap of Feature & Image |
…… | Related Work |
文章目录
- [paper reading] FCOS
- Motivation
- Idea
- Contribution
- Techniques
- FCOS Architecture
- Advantage
- Center-ness
- Idea
- Implement
- Multi-Level FPN Prediction
- Key Elements
- Prediction Head
- Classification Branch
- Regression Branch
- Shared Head
- Training Sample & Label
- Training Sample
- Label Pos/Neg
- Model Output
- 4D Vector t∗\pmb t^*ttt∗
- C-Dimension Vector p\pmb pppp
- Feature Pyramid
- Inference
- Ablation Study
- Multi-Level FPN Prediction
- Ambiguity Samples
- With or Without Center-ness
- FCN & Detection
- FCOS vs.vs.vs. YOLO v1
- Math
- Symbol Definition
- Loss Function
- Center-ness
- Remap of Feature & Image
- Use Yourself
- Related Work
- Drawbacks of Anchor
- DenseBox-Based
- Anchor-Based Detector
- YOLO v1
- Idea
- Drawbacks of Points near Center
- CornerNet
- Steps
- Drawbacks of Corner
- Drawbacks of Points near Center
- CornerNet
- Steps
- Drawbacks of Corner
Motivation
Idea
用per-pixel prediction的方法进行object detection(通过fully convolution实现)
Contribution
将detection重新表述为per-pixel prediction
使用 multi-level prediction:
- 提升recall
- 解决重叠bounding box带来的ambiguity
center-ness branch
抑制bounding box中的low-quality prediction
Techniques
FCOS Architecture
backbone的默认选择是ResNet-50
Advantage
详见 [Drawbacks of Anchor](#Drawbacks of Anchor)
Detection与FCN-solvable task (e.g. semantic segmentation) unify到一起
一些其他任务的Idea也可以迁移到detection中(re-use idea)
anchor-free & proposal-free
消除了与anchor相关的**复杂计算 **(e.g. IoU)
获得 faster training & testing,less training memory footprint
在one-stage中做到了SOTA,可以用于替换PRN
可以快速迁移到其他的vision task (e.g. instance segmentation, key-point detection)
Center-ness
center-ness是对每个location进行预测
可以极大地提高性能
Idea
在远离center的位置会产生大量的low-quality的predicted bounding box
FCOS引入了center-ness,抑制远离center的low-quality bounding box(e.g. down-weight)
Implement
引入center-ness branch,来预测location的center-ness
在测试时:
通过下式计算score:
Final Score=Classification Score×Center-ness\text{Final Score} = \text{Classification Score} × \text{Center-ness} Final Score=Classification Score×Center-ness使用NMS滤除被抑制的bounding box
Multi-Level FPN Prediction
Multi-Level FPN Prediction能解决2个问题:
Best Possible Recall
将FCOS的Best Possible Recall提升到SOTA
Ambiguity of Ground-Truth Box Overlap
解决ground-truth box重叠带来的ambiguity,达到anchor-based程度
原因:绝大部分情况下,发生重叠的object,尺度差距都很大
Idea是:根据regress distance的不同,将需要回归的location分发到不同的feature level
具体来说:
计算regress target
根据feature level的maximum regress distance,筛选出positive sample
mi−1<max(l∗,t∗,r∗,b∗)<mim_{i-1} < \text{max}( l^*, t^*,r^*,b^* ) < m_i mi−1<max(l∗,t∗,r∗,b∗)<mi
其中 mim_imi 是feature level iii 需要regress的maximum distance
{m2,m3,m4,m5,m6,m7}={0,64,128,256,512}\{m_2, m_3, m_4, m_5, m_6, m_7\} = \{ 0,64,128,256,512 \} {m2,m3,m4,m5,m6,m7}={0,64,128,256,512}相比于原始的FPN(e.g. SSD),FCOS将不同scale的object“隐性“地分到了不同的feature level(small在浅层,large在深层)
我认为这可以看做更精致的手工设计
若1个location落到2个ground-truth box(e.g. ambiguity)中,则选择small box进行regression(更偏向小目标)
Key Elements
Prediction Head
Classification Branch
Regression Branch
由于regression target永远是正数,所以在regression branch的top续上exp(sixs_ixsix)(详见 [Shared Head](#Shared Head))
Shared Head
在不同feature level共享head
Advantage:
- parameter efficient
- improve performance
Drawback:
由于[Multi-Level FPN Prediction](#Multi-Level FPN Prediction)的使用,会使得不同feature level的输出范围有所不同(e.g. [0, 64] for P3P_3P3,[64, 128] for P4P_4P4)
为了使得identical heads可以用于different feature level:
exp(x)→exp(six)\text{exp}(x) \rightarrow \text{exp}(s_ix) exp(x)→exp(six)
- sis_isi :trainable scaler,用来自动调整exp的base
Training Sample & Label
Training Sample
直接将location作为training sample(这和语义分割的FCN相似)
Label Pos/Neg
location (x,y)(x,y)(x,y) 为正样本的条件为:
- location (x,y)(x,y)(x,y) 落在 ground-truth box中
- location (x,y)(x,y)(x,y) 的类别 == 该ground-truth box****的类别
FCOS使用了尽可能多的foreground sample来训练(e.g. ground-truth box的全部location)
而不像anchor-based仅选用与ground-truth box高的作为正样本
也不像[CenterNet (Object as Points)](./[paper reading] CenterNet (Object as Points).md) 只对geometric center作为正样本
Model Output
对每个level的feature map的每个location有如下的输出:
4D Vector t∗\pmb t^*ttt∗
t∗=(l∗,t∗,r∗,b∗)\pmb t^* = (l^*,t^*,r^*,b^*) ttt∗=(l∗,t∗,r∗,b∗)
描述了bounding box的4个side相对于该location的relative offset
具体来说:
注意:
FCOS是对ground-truth box的每个location进行计算(并不仅仅是geometric center),所以需要预测4个量来获得boundary
像 [CenterNet (Object as Points)](./[paper reading] CenterNet (Object as Points).md) 只对geometric center进行预测,2个量就够了
注意:object重叠的问题可以通过 [Multi-Level FPN Prediction](#Multi-Level FPN Prediction) 解决。但如果仍发生重叠,则优先考虑小样本(选择面积最小的bounding box)
C-Dimension Vector p\pmb pppp
实验中使用的不是 C-class classifier,而是 C 个binary classifier
Feature Pyramid
定义了5个level的feature map:{P3,P4,P5,P6,P7}\{P_3, P_4, P_5, P_6, P_7\}{P3,P4,P5,P6,P7}(步长为 {8,16,32,64,128}\{8,16,32,64,128\}{8,16,32,64,128})
- {P3,P4,P5}\{P_3,P_4,P_5\}{P3,P4,P5}:backbone的feature map {Cc,C4,C5}\{ C_c, C_4,C_5\}{Cc,C4,C5} + 1×1 Convlution
- {P6,P7}\{P_6,P_7\}{P6,P7} :P5P_5P5 & P6P_6P6 经过stride=2的卷积层
Inference
将image输入network,在feature map FiF_iFi 的每个location获得:
- classification score px,y\pmb p_{x,y}pppx,y
- regression prediction tx,y\pmb t_{x,y}tttx,y
选择 px,y>0.5\pmb p_{x,y} > 0.5pppx,y>0.5 的location,作为positive sample
decode得到bounding box的coordinate
Ablation Study
Multi-Level FPN Prediction
结论:
- Best Possible Recall 并不是FCOS的问题
- Multi-Level FPN Prediction可以提高Best Possible Recall
Ambiguity Samples
结论:
Multi-Level FPN Prediction可以解决Ambiguity Samples的问题
即:大部分的overlap ambiguity会被分到不同的feature level,只有极少的ambiguity location还存在
With or Without Center-ness
Center-ness能抑制远离center的low-quality bounding box,从而大幅度提高AP
center-ness必须具有单独的支路
FCN & Detection
FCN 主要用于 dense prediction
其实fundamental vision task都可以unify到one single framework
而anchor的使用,实际上使得Detecntion任务偏离了neat fully convolutional per-pixel prediction framework
FCOS vs.vs.vs. YOLO v1
相比于YOLO v1只使用靠近center的point进行prediction,FCOS使用ground-truth的全部点进行prediction
对于产生的low-quality bounding box,由center-ness进行抑制
使得FCOS可以达到anchor-based detectors相近的recall
Math
Symbol Definition
Fi∈RH×W×CF_i \in \mathbb{R} ^{H×W×C}Fi∈RH×W×C :backbone中第 iii 层的feature map
sss :到该层的total stride
{Bi}\{B_i\}{Bi} :ground-truth box
Bi=(x0(i),y0(i),x1(i),y1(i),x(i))∈R4×{1,2...C}B_i = (x_0^{(i)}, y_0^{(i)},x_1^{(i)},y_1^{(i)},x^{(i)}) \in \mathbb{R}^4 × \{ 1,2...C\} Bi=(x0(i),y0(i),x1(i),y1(i),x(i))∈R4×{1,2...C}- (x0(i),y0(i))(x_0^{(i)}, y_0^{(i)})(x0(i),y0(i)) :top-left corner coordinate
- (x1(i),y1(i))(x_1^{(i)}, y_1^{(i)})(x1(i),y1(i)) :bottom-right corner coordinate
- c(i)c^{(i)}c(i) :bounding box中object的class
- CCC :number of class
Loss Function
- λ=1\lambda = 1λ=1
还缺少一个Center-ness Loss,其为binary cross entropy
该损失在feature map的全部location上计算,具体来说:
Classification Loss在全部location上计算(positive & negative)
Classification Loss在positive location上计算
1{Cx,y∗>0}=1\mathbb{1} _{\{ C_{x,y}^* > 0\}} = 11{Cx,y∗>0}=1 if ci∗>0c_i^*>0ci∗>0
Center-ness
centerness* =min(l∗,r∗)max(l∗,r∗)×min(t∗,b∗)max(t∗,b∗)\text { centerness* }=\sqrt{\frac{\min \left(l^{*}, r^{*}\right)}{\max \left(l^{*}, r^{*}\right)} \times \frac{\min \left(t^{*}, b^{*}\right)}{\max \left(t^{*}, b^{*}\right)}} centerness* =max(l∗,r∗)min(l∗,r∗)×max(t∗,b∗)min(t∗,b∗)
- center-ness反映location到对应center的normalized distance
- 使用“根号”来延缓center-ness的衰减
- center-ness的范围为 [0,1]
Remap of Feature & Image
feature map上的 (x,y)(x,y)(x,y),映射回原图像为:
(⌊s2⌋+xs,⌊s2⌋+ys)\big( \lfloor \frac s2 \rfloor + xs , \lfloor \frac s2 \rfloor + ys \big) (⌊2s⌋+xs,⌊2s⌋+ys)
该位置会靠近location (x,y)(x,y)(x,y) 的对应的reception field的center
Use Yourself
Related Work
Drawbacks of Anchor
detection performance对size、aspect ratio、number of anchor等超参数敏感
即anchor需要精密的手工设计
需要大量的anchor,才能获得high recall rate
这会导致训练时极端的正负样本不均衡
anchor会伴随着复杂的计算
比如IoU的计算
anchor的size、aspect ratios都是预先定义的,导致无法应对shape variations(尤其对于小目标)
另外,anchor这种“预先定义”的形式也会影响模型的泛化能力。换句话说,设计的anchor是task-specific
DenseBox-Based
对image进行crop和resize,以处理不同size的bounding box
导致DenseBox必须在image pyramid上进行detection
这与FCN仅计算一次convolution的思想相悖
仅仅用于特定的domain,难以处理重叠的object
因为无法确定对应pixel回归到哪一个object
Recall比较低
Anchor-Based Detector
来源:
sliding window 和 proposal based detectors
anchor的本质
预定义的sliding window (proposal) + offset regression
anchor的作用
作为detector的训练数据
典型model
- Faster-RCNN
- SSD
- YOLO v2
YOLO v1
YOLO v1是典型的Anchor-Free Detector
Idea
YOLO v1使用靠近center的point来预测bounding box
即:object的center落到哪个grid cell,则由该cell负责预测该object的bounding box
这是因为:靠近center的points能生成质量更高的detection
Drawbacks of Points near Center
只使用靠近center的points,会导致low-racall
正因如此,YOLO v2 又重新使用了anchor
CornerNet
CornerNet是典型的Anchor-Free Detector
Steps
- corner detection
- corner grouping
- post-processing
Drawbacks of Corner
unding box**
即:object的center落到哪个grid cell,则由该cell负责预测该object的bounding box
这是因为:靠近center的points能生成质量更高的detection
Drawbacks of Points near Center
只使用靠近center的points,会导致low-racall
正因如此,YOLO v2 又重新使用了anchor
CornerNet
CornerNet是典型的Anchor-Free Detector
Steps
- corner detection
- corner grouping
- post-processing
Drawbacks of Corner
post-processing复杂,需要额外的distance metric
[paper reading] FCOS相关推荐
- cvpr2019/cvpr2018/cvpr2017(Papers/Codes/Project/Paper reading)
cvpr2019/cvpr2018/cvpr2017(Papers/Codes/Project/Paper reading) Source:https://mp.weixin.qq.com/s/SmS ...
- [paper reading] CenterNet (Object as Points)
[paper reading] CenterNet (Object as Points) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更 ...
- [paper reading] CenterNet (Triplets)
[paper reading] CenterNet (Triplets) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yo ...
- [paper reading] CornerNet
[paper reading] CornerNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...
- [paper reading] RetinaNet
[paper reading] RetinaNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...
- [paper reading] SSD
[paper reading] SSD GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到CSDN ...
- [paper reading] YOLO v1
[paper reading] YOLO v1 GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到 ...
- [paper reading] Faster RCNN
[paper reading] Faster RCNN GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法 ...
- [paper reading] DenseNet
[paper reading] DenseNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放 ...
最新文章
- Fragment的运用实列
- 数据中心ups电源七个故障分析
- 三、python+selenium
- Javaweb学习笔记——使用Jdom解析xml
- Android 代码混淆、第三方平台加固加密、渠道分发 完整教程(转)
- 2018_11_25_生活记录
- SQL2016安装错误:安装程序无法与下载服务器联系。请提供 Microsoft R Open 和 Microsoft R Server 安装文件的位置
- ace unlck工具下载_iPhoneX如何解锁ID激活锁
- TV_Control Android机顶盒手机控制全套程序开源
- 中文代码示例视频演示Python入门教程第五章 数据结构
- 72个嵌入式技术网站
- jQuery菜鸟教程02
- 有关java的演讲稿_有关超级演说家励志的演讲稿
- 求生之路显示服务器列表指令,求生之路2指令大全 所有可用指令一览_游戏狗
- 服务产品化,或许是中国软件的出路
- 开放经济的宏观经济学:基本概念 - 异想天开
- GOIP connects with Elastix through “config by line”
- ffmpeg编译及使用
- 与Power PMAC通讯
- 分享一款办公辅助工具 迅捷文档转换
热门文章
- 小米电视4A核心技术之语音识别浅析
- mysql索引 倒排表_mysql倒排的优化
- lnmp mysql.sock_配置Mysql过程中的问题——mysql.sock(LNMP-3)
- 主机mysql密码修改_mysql密码修改方法_配置root密码_mysql 改数据库名_mysql忘记密码...
- linux切大文件为小文件,linux系统下分割大文件的方法
- x264 编码器选项分析 (x264 Codec Strong and Weak Points) 1
- 从硬件竞争到软实力PK——电视媒体竞争观察
- tensorflow机器学习实战指南 源代码_小小白TensorFlow机器学习实战基础
- python中唯一的映射类型是什么_Python基础类型之字典(dict)
- layui admin 当前子页面 刷新 其他页面 layui 关闭 子弹窗