1. Motivation

本文基于Transfer-Learning Based 以及 Faster R-CNN进行改进。

本文针对分类和回归任务中存在的矛盾点进行分析：

In this paper, we look closely into the conventional Faster R-CNN and analyze its contradictions from two orthogonal perspectives, namely multi-stage (RPN vs. RCNN) and multi-task (classification vs. localization).

2. Contribution

本文关于网络结构的2个创新点， Gradient Decoupled Layer用于多阶段的解耦，Prototypical Calibration Block用于多任务的解耦。

其中GDL是针对backbone，来解耦之前层和后面层；而PCB则是offline的prototype的分类层，用于boost原有的分类层。

We look closely into the conventional Faster R-CNN and propose a simple yet effective architecture for few-shot detection, named Decoupled Faster R-CNN, which can be learned end-to-end via straightforward fine-tuning.
To deal with the data-scarce scenario, we further present two novel modules, i.e. GDL and PCB, to perform de- coupling among multiple components of Faster R-CNN and boost classification performance respectively.
DeFRCN is remarkably superior to SOTAs on various benchmarks, revealing the effectiveness of our approach.

3. Method

Backbone、RPN、Box Classifier以及Regressor在fine-tune阶段是trainable，而RCNN是frozen的。

Problem of multi-task learning

作者认为对于多任务学习来说，子网络的优化目标存在不一致性。

RPN是where to look, RCNN是what to look

classification head需要translation invariant features，而localization head 需要translation covariant features。

因此，可能导致一个suboptimal solution

Problem of shared backbone

由于Backbone的梯度回传和RCNN以及 RPN有关，但是这2者有一定的矛盾性，因此，作者认为这可能会导致FSOD性能的下降。并且在FSOD中，第二阶段的RPN会受到更多的前景-背景的混淆问题foreground-background confusion。因此可能造成对于base classes过拟合梯度的传播，到backbone以及RCNN

which means a pro- posal that belongs to background in the base training phase is likely to be foreground in the novel fine-tuning phase

3.1 Gradient Decoupled Layer

Perform Decoupling with GDL
Optimization with GDL

3.2 Prototypical Calibration Block

对于PCB提出的动机：

本文注意到few shot 分类分支产生了很大部分低质量的分数，这驱使我们来消除高得分的FP以及修正低分数的正样本。

We notice that the under-explored few-shot classification branch generates a large amount of low-quality scores, which motivates us to eliminate high-scored false positives and remedy low-scored missing samples by introducing a Prototypical Calibration Block (PCB) for score refinement.

PCB的组成是classifier、RoIAlign、prototype bank。

给定M-way K-shot 任务的support set S，PCB提取了原始的图片特征图，然后直接使用对于GTbox的RoIAlign操作（类似Attention RPN那篇的操作），这样就可以得到对于MK instance的特征表示。这样我们构建一个prototype bankP=pcc=1MP= {p_c}^M_{c=1}P=pcc=1M，其中对于每一个类别c的prototype的公式如下：

其中subset S只包含某一个类别的所有instance的集合。

给定一个proposal y^=(ci,si,bi)\hat y =(c_i, s_i, b_i)y^=(ci,si,bi) ，这个proposal就是Faster R-CNN原有分支中fine-tune阶段得到的特征，c是label，s是score，b是box；PCB首先使用RoIAlign在bib_ibi上，接着对应xix_ixi以及pcip_{c_i}pci应用余弦相似度。

然后使用weight aggregation进行加权：

由于PCB是offline的结构，因此它即插即用，并不会对网络的训练造成很大的开销。并且PCB和proposal的分类分支不贡献参数。

Further- more, since the PCB module is offline without any further training, it can be plug-and-play and easily equipped to any other architectures to build stronger few-shot detectors.

因此，我认为总体而言，这篇DEFRCN，它是对于meta-learning以及transfer-learning的融合，使用Transfer-Learning的总体框架，但是对于分类任务的问题上，他们使用support set 进行一个weight reweighting的融合的操作；不过它还对于backbone的特征的反向进行了修改，尽可能区分分类中的平移不变性以及回归问题的平移协变(covariance)性质。

4. Experiment

4.1.1 VOC

4.1.2 COCO

4.1.3 COCO to VOC

4.2 Ablation Study

4.2.1 Effectiveness of different modules

4.2.2 Effectiveness of the degree of decoupling

This observation prompts us to perform stop-gradient for RPN and scale-gradient for RCNN in DeFRCN

4.2.3 Can GDL boost conventional detection？

[DeFRCN] Decouple Faster R-CNN for Few-Shot Object Detection(ICCV 2021)相关推荐

Faster R CNN
Faster R CNN 3 FASTER R-CNN 我们的Faster R CNN 由两个模块组成,第一个模块是 proposes regions 的全卷积网络,第二个是使用 proposed r ...
Single shot object detection SSD using MobileNet and OpenCV
微信公众号:小白图像与视觉关于技术.关注yysilence00.有问题或建议,请公众号留言. 主题:Single shot object detection SSD using MobileNet ...
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(更快的RCNN：通过区域提议网络实现实时)
原文出处感谢作者~ Faster R-CNN: Towards Real-Time Object Detection with Region ProposalNetworks Shaoqing Re ...
目标检测（Object Detection）综述--R-CNN/Fast R-CNN/Faster R-CNN/YOLO/SSD
1. 目标检测 1.1 简介如何理解一张图片?根据后续任务的需要,有三个主要的层次. 一是分类(Classification),即是将图像结构化为某一类别的信息,用事先确定好的类别(string)或 ...
深度学习论文阅读目标检测篇（三）：Faster R-CNN《 Towards Real-Time Object Detection with Region Proposal Networks》
深度学习论文阅读目标检测篇(三):Faster R-CNN< Towards Real-Time Object Detection with Region Proposal Networks&g ...
目标检测经典论文——Faster R-CNN论文翻译：Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Net
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster R-CNN:通过Region ...
《Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks》论文翻译
原文:https://arxiv.org/abs/1506.01497 Faster R-CNN: Towards Real-Time ObjectDetection with Region Prop ...
ICCV 2021 | 国科大提出首个CNN和Transformer双体主干网络！Conformer准确率高达84.1%！...
点击上方"视学算法",选择加"星标"或"置顶" 重磅干货,第一时间送达转载自:新智元 | 来源:arxiv 编辑:好困 Priscilla ...
[caffe]深度学习之CNN检测object detection方法摘要介绍
[caffe]深度学习之CNN检测object detection方法摘要介绍 2015-08-17 17:44 3276人阅读评论(1) 收藏举报一两年cnn在检测这块的发展突飞猛进,下面详 ...
同r做一个窗口_目标检测(Object Detection)：R-CNN/SPPnet/R-FCN/Yolo/SSD
这篇文章我是Survey目标检测(Object Detection)系列论文的一个总结. 包括R-CNN系列.SPP-net.R-FCN.YOLO系列.SSD.DenseBox等. 基本概念目标识别 ...

[DeFRCN] Decouple Faster R-CNN for Few-Shot Object Detection(ICCV 2021)