1. Motivation

​ 本文基于Transfer-Learning Based 以及 Faster R-CNN进行改进。

​ 本文针对分类和回归任务中存在的矛盾点进行分析:

  • In this paper, we look closely into the conventional Faster R-CNN and analyze its contradictions from two orthogonal perspectives, namely multi-stage (RPN vs. RCNN) and multi-task (classification vs. localization).

2. Contribution

本文关于网络结构的2个创新点, Gradient Decoupled Layer用于多阶段的解耦,Prototypical Calibration Block用于多任务的解耦。


  • We look closely into the conventional Faster R-CNN and propose a simple yet effective architecture for few-shot detection, named Decoupled Faster R-CNN, which can be learned end-to-end via straightforward fine-tuning.
  • To deal with the data-scarce scenario, we further present two novel modules, i.e. GDL and PCB, to perform de- coupling among multiple components of Faster R-CNN and boost classification performance respectively.
  • DeFRCN is remarkably superior to SOTAs on various benchmarks, revealing the effectiveness of our approach.

3. Method

Backbone、RPN、Box Classifier以及Regressor在fine-tune阶段是trainable,而RCNN是frozen的。

Problem of multi-task learning


RPN是where to look, RCNN是what to look

classification head需要translation invariant features,而localization head 需要translation covariant features。

因此,可能导致一个suboptimal solution

Problem of shared backbone

由于Backbone的梯度回传和RCNN以及 RPN有关,但是这2者有一定的矛盾性,因此,作者认为这可能会导致FSOD性能的下降。并且在FSOD中,第二阶段的RPN会受到更多的前景-背景的混淆问题foreground-background confusion。因此可能造成对于base classes过拟合梯度的传播,到backbone以及RCNN

  • which means a pro- posal that belongs to background in the base training phase is likely to be foreground in the novel fine-tuning phase

3.1 Gradient Decoupled Layer

  • Perform Decoupling with GDL

  • Optimization with GDL

3.2 Prototypical Calibration Block


本文注意到few shot 分类分支产生了很大部分低质量的分数,这驱使我们来消除高得分的FP以及修正低分数的正样本。

  • We notice that the under-explored few-shot classification branch generates a large amount of low-quality scores, which motivates us to eliminate high-scored false positives and remedy low-scored missing samples by introducing a Prototypical Calibration Block (PCB) for score refinement.

PCB的组成是classifier、RoIAlign、prototype bank。

给定M-way K-shot 任务的support set S,PCB提取了原始的图片特征图,然后直接使用对于GTbox的RoIAlign操作(类似Attention RPN那篇的操作),这样就可以得到对于MK instance的特征表示。这样我们构建一个prototype bankP=pcc=1MP= {p_c}^M_{c=1}P=pc​c=1M​,其中对于每一个类别c的prototype的公式如下:

其中subset S只包含某一个类别的所有instance的集合。

给定一个proposal y^=(ci,si,bi)\hat y =(c_i, s_i, b_i)y^​=(ci​,si​,bi​) ,这个proposal就是Faster R-CNN原有分支中fine-tune阶段得到的特征,c是label,s是score,b是box;PCB首先使用RoIAlign在bib_ibi​上,接着对应xix_ixi​以及pcip_{c_i}pci​​应用余弦相似度。

然后使用weight aggregation进行加权:


  • Further- more, since the PCB module is offline without any further training, it can be plug-and-play and easily equipped to any other architectures to build stronger few-shot detectors.

因此,我认为总体而言,这篇DEFRCN,它是对于meta-learning以及transfer-learning的融合,使用Transfer-Learning的总体框架,但是对于分类任务的问题上,他们使用support set 进行一个weight reweighting的融合的操作;不过它还对于backbone的特征的反向进行了修改,尽可能区分分类中的平移不变性以及回归问题的平移协变(covariance)性质。

4. Experiment

4.1.1 VOC

4.1.2 COCO

4.1.3 COCO to VOC

4.2 Ablation Study

4.2.1 Effectiveness of different modules

4.2.2 Effectiveness of the degree of decoupling

This observation prompts us to perform stop-gradient for RPN and scale-gradient for RCNN in DeFRCN

4.2.3 Can GDL boost conventional detection?

