1. Motivation

本文作者指出，在目标检测中，tea和stu之间的特征在不同的区域例如前后景的差别是比较大的。

In this paper, we point out that in object detection, the features of the teacher and student vary greatly in different areas, especially in the fore- ground and background.

如果用同样的方法蒸馏，那么在特征图上不均匀的差异性会导致蒸馏的效果更差。

If we distill them equally, the uneven differences between feature maps will negatively af- fect the distillation.

因此本文提出了FGD，分为了focal distillation 和 global distillation。

Thus, we propose Focal and Global Distillation (FGD).
Focal distillation separates the fore- ground and background, forcing the student to focus on the teacher’s critical pixels and channels.
Global distilla- tion rebuilds the relation between different pixels and trans- fers it from teachers to students, compensating for missing global information in focal distillation.

从图1可以得出，学生网络对于前景的attention map 比背景的响应是更大的。这就说明了蒸馏也是存在着前后景不平衡的影响。

从表1可以得出，作者在采用解耦fg 和bg的特征时，得到的蒸馏效果确实最差的(38.9)，因此作者构思了focal dis 来获取关键的pixels 和 channels，同时使用gcblock 提出全局特征。

本文对于全局特征提取使用的GC Block。

2. Contribution

We present that the pixels and channels that teacher and student pay attention to are quite different. If we distill the pixels and channels without distinguishing them, it will result in a trivial improvement.
We propose focal and global distillation, which en- ables the student not only to focus on the teacher’s crit- ical pixels and channels, but also to learn the relation between pixels
We verify the effectiveness of our method on various detectors via extensive experiments on the COCO [21], including one-stage, two-stage, anchor-free methods, achieving state-of-the-art performance.

3. Method

作者首先引出了一个例子，在普通的蒸馏特征的公式如下所示：

其中小f是一个adaptation layer 来reshape Ft 和 Fs之间的维度。

但这种方法是对于所有部分同等蒸馏，缺乏全局之间的联系。

However, such methods treat all the parts equally and lack the distillation of the global relations between different pixels.

3.1. Focal Distillation

首先使用一个maks 来区分前后景。前景为1，背景为0

进一步的，为了同等对待小物体和大物体之间gt 的面积，以及前后景的比例，作者提出了一个sacle mask：

If a pixel belongs to different targets, we choose the smallest box to calculate the S （额外的限制）

空间和通道的特征如下：

Gs 可以理解为 HxWx1, Gc可以理解为 1 x 1 x C的attention map

因此， attention mask可以被定义为：

feature loss定义为：其中2项分别是对bg 和fg计算，通过2个超参数平衡稀疏，并且 A S A^S AS以及 A C A^C AC在训练过程中都是使用teahcer模型的。

Attention loss:
Besides, we use attention loss Lat to force the student detector to mimic the spatial and channel attention mask of the teacher detector(L1 loss)

3.2 Global loss

As shown in Fig. 4, we utilize GcBlock [2] to capture the global relation information in a single image and force the student detector to learn the relation from the teacher detector.

student model 总得loss：

4. Experiments

本文使用了 General instance distillation for object detection.（ICCV2021)中的一个方法（inherit strategy），对于相同head 结构的stu和tea，使用tea的权重对stu model 进行初始化。

4.1 Main results

![在这里插入图片描述](https://img-blog.csdnimg.cn/02cd938c56db439ba7091968dcea5caa.png)

4.2 Abla

4.2.1 Sensitivity study of different losses

4.2.1 Sensitivity study of focal distillation

4.2.2 Sensitivity study of global distillation

4.2.3 Sensitivity study of T

4.2.4 Sensitivity study of hyper-parameters

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)相关推荐

CVPR2022知识蒸馏用于目标检测：Focal and Global Knowledge Distillation for Detectors
论文下载:https://arxiv.org/abs/2111.11837 源码下载:https://github.com/yzd-v/FGD Abstract 知识蒸馏已成功应用于图像分类.然而目标 ...
Focal and Global Knowledge Distillation for Detectors--FGD论文解读
论文:Focal and Global Knowledge Distillation for Detectors 论文:https://arxiv.org/abs/2111.11837 代码:http ...
Document-Level Relation Extraction with Adaptive Focal Loss and Knowledge Distillation
作者: Qingyu Tan∗1,2 Ruidan He†1 Lidong Bing1 Hwee Tou Ng2 单位:1DAMO Academy, Alibaba Group:2Department ...
[2021ICLR]Improve Object Detection with Feature-based Knowledge Distillation 论文笔记
动机认为目标检测知识蒸馏效果不好的问题出在两个地方. 1.前背景像素比例不平衡问题.提出了基于注意力引导的提取方法,利用==注意机制(而非gt)找到前景物体的关键像素点==,使学生更加努力地学习前景 ...
CVPR 2022 | 针对目标检测的重点与全局知识蒸馏(FGD)
关注公众号,发现CV技术之美本篇文章由粉丝 @美索不达米亚平原投稿,原文地址: https://zhuanlan.zhihu.com/p/477707304 本文介绍我们 CVPR 2022 关于 ...
CVPR 2022 | 清华字节提出FGD：针对目标检测的重点与全局知识蒸馏
点击下方卡片,关注"CVer"公众号 AI/CV重磅干货,第一时间送达作者:美索不达米亚平原 | 已授权转载(源:知乎)编辑:CVer https://zhuanlan.zh ...
论文翻译: Relational Knowledge Distillation
Relational Knowledge Distillation 这是 CVPR 2019年的一篇文章的译文. 文章链接: Relational Knowledge Distillation 附上G ...
Contrastive Model Inversion for Data-Free Knowledge Distillation
Contrastive Model Inversion for Data-Free Knowledge Distillation Model inversion,其目标是从预训练的模型中恢复训练数据, ...
CVPR 2022 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation
CVPR 2022 | Cross-Image Relational Knowledge Distillation for Semantic Segmentation 论文:https://arxiv ...

[FGD] Focal and Global Knowledge Distillation for detectors (CVPR. 2022)