Libra R-CNN: Towards Balanced Learning for Object Detection

论文基本信息

标题： Libra R-CNN: Towards Balanced Learning for Object Detection
作者： JiangmiaoPang, Kai Chen, Jianping Shi, Huajun Feng, Wanli Ouyang, Dahua Lin
机构： Zhejiang University, The Chinese University of Hong Kong, SenseTime Research, The
University of Sydney
来源： CVPR2019
时间： 2019/04/04
链接： https://arxiv.org/abs/1904.02701
代码： https://github.com/open-mmlab/mmdetection（official code） https://github.com/OceanPang/Libra_R-CNN

背景/问题

当前多数detector（one-stage和two-stage）的training 模式：

sampling regions
extracting features from regions
基于multi-task objective function同时识别category和细化location

training paradigm中的3个key aspect

selected region samples是否representative
extracted visual features是否被充分利⽤
designed objective function是否最优

如上图所⽰，训练过程中存在3种影响性能的imbalance。（分别对应3个key aspect）
sample level
motivation：hard example可以有效提⾼detector的性能，但随机采样得到的example通常以
easy example为主。
--------已有方法及其问题：
OHEM：通过置信度（confidence）选择hard sample，但其对noise label很敏感，并会产
⽣较⼤的内存占⽤和计算成本
Focal loss：解决One-stage算法中的foreground-background class imbalance，仅适⽤于
One-stage detector，对RCNN作⽤不⼤，因为⼤量easy negative都被two-stage procedure
过滤掉了
feature level
motivation：高分辨率的低层特征图回归精度会很高但是很难做分类，高层的语义信息分类精度很高但只适合检测大目标。同时利⽤⽹络浅/深层的descriptive information和semantic information有助于
OD，那最优的integrate这2种information的⽅式是什么呢？
--------已有方法及其问题：
FPN、PANet，两者采取的sequential manner使得integrated features更多地关注于相邻层，⽽较少关注其它层，每次fusion时⾮相邻层中的semantic information就会稀释1次。
objective level
motivation：
①训练过程中，easy sample产⽣的small gradient可能会被hard example产⽣的large
gradient压倒，进⽽影响模型性能
②detection包括classification和localization这2个任务，如果没有得到适当的平衡，则其中1
个任务可能会受到影响，进⽽影响整体性能

方法/研究内容

提出3个component分别减少上述3种level的imbalance

IoU-balanced sampling：根据sample与其被分配到的GT的IoU，来挖掘hard example
balanced feature pyramid：一个特征金字塔的变种，同时将所有level（⽽不仅是相邻level）的feature聚合，并⽤其增强各个level
balanced L1 loss：outlier梯度较⼤⽽inlier梯度较⼩，因此对outlier（回归error⼤于等于1的
sample）产⽣的large gradient进⾏clip（使gradient最⼤为1），对inlier（回归error⼩于1的
sample，accurate sample）的gradient进⾏加强，使得classification、overall localization和
accurate localization得到平衡

性能/效果

COCO上，AP⽐FPN Faster RCNN⾼2.5个point，⽐RetinaNet⾼2个point
基于FPN Faster RCNN、detectron中的1× schedule，backbone使⽤ResNet-50、ResNeXt-101-
64x4d时，AP分别为38.7和43.0

算法细分

IoU-balanced Sampling

hard negative：⽬标检测中，我们已知GT BBox，算法会⽣成⼀系列proposal，其中有些proposal和GT BBox重合程度⾼、有些和GT BBox重合程度低。与GT BBox重合程度（IoU）超过⼀定阈值（通常0.5）的proposal则被认定为positive eaxmple，重合程度低于该阈值的则是negative example，然后将
positive example和negative example扔进⽹络中训练。然⽽，⼀个问题是positive example的数量远远少于negative example，这样训练出来的分类器的效果总是有限的，会出现许多false positive，其中得分较⾼的false positive就是所谓的hard negative example，比如置信度为0.9的误检。
目的：hard negative example是主要的problem，我们希望能多采样⼀些hard negative。
随机采样：

①观察橙柱可以看出，在hard negative example中，有60%多的example与其对应GT BBox的
overlap超过0.05
②观察蓝柱可以看出，随机采样得到的training sample中，仅有30%多的example与其对应GT
BBox的overlap超过0.05。
③随机采样得到的training sample的分布和hard negative example的真实分布是不同的，使得上千
个easy example才有1个hard example。
IoU-balanced sampling：

如图3绿柱所⽰，IoU-balanced sampling使得training sample的分布较接近于hard negative example的真实分布。K默认为3，同时实验证明，只要IoU更⾼的negative sample更易被选择，则性能对K值的变化并不敏感
如何处理positive example：
IoU-balanced sampling其实也适⽤于hard positive example，但现实中往往没有⾜够的sampling
candidate能将IoU-balanced sampling扩展到hard positive example，因此本⽂为每个GT BBox采样
等量的positive sample，来作为1种替代⽅法。
Balanced Feature Pyramid 核心思路：同时将所有level的feature聚合并⽤其增强各个level

①Integrate：

1 ⾸先通过interpolation和max pooling将各个level的feature缩放⾄C4的size
2 然后将各level的feature的平均值作为integrated feature
②refine：
1 使⽤Non-local neural networks⼀⽂中的embedded Gaussian non-local attention对integrated feature进⾏refine
2 然后使⽤interpolation和max pooling将integrated feature逆向变换⾄各level的scale得到与FPN
相同的输出
3 Balanced Feature Pyramid可以和FPN、PAFPN（PANet）是兼容的

Balanced L1 Loss：
smooth L1 loss：smoth L1 loss是1个location loss，来⾃于Fast RCNN：

问题在于：outlier（可视为hard example）的gradient较⼤，⽽inlier（可视为easy example）的
gradient较⼩。
具体分析如下：
error⼤于等于1的sample称为outlier，error⼩于1的sample称为inlier
由于unbounded regression targets，直接提⾼localization loss的权重会使模型对outlier更加敏感
这些outlier，可以看作是hard example，会产⽣过⼤的梯度，对训练过程有害
与outlier相⽐，inlier可以看作是easy example，对整体梯度的贡献很⼩
更具体地说，与outlier相⽐，inlier平均对每个sample仅贡献30%的梯度
balanced L1 loss的效果：

balanced L1 loss的核⼼思路：
outlier梯度较⼤⽽inlier梯度较⼩，因此对outlier（error⼤于等于1的sample）产⽣的large gradient
进⾏clip（使gradient最⼤为1，如图5(a)虚线），对inlier（error⼩于1的sample，accurate
sample）的gradient进⾏加强，使得classification、overall localization和accurate localization得
到平衡。
使⽤balanced L1 loss的location loss：

x、y、w、h的balanced L1 loss之和，即x的balanced L1 loss+y的balanced L1 loss+······
由上式5可知，location loss关于模型参数的梯度正⽐于balanced L1 loss关于x（或y、w、h）的梯
度，即：

balanced L1 loss的梯度设计：

参数 α 控制inlier的gradient， α 越⼩则inlier的gradient就越⼤；
参数 γ 控制regression error的上界，可以使得classification和localization这2个task达到平衡；
参数 b ⽤来处理 x = 1 的情况。
balanced L1 loss的定义：

实验：

Libra R-CNN: Towards Balanced Learning for Object Detection相关推荐

【论文笔记】：Libra R-CNN: Towards Balanced Learning for Object Detection
&Title: Libra R-CNN: Towards Balanced Learning for Object Detection &Summary 检测不平衡问题包括:样本层面( ...
论文阅读笔记五十三：Libra R-CNN: Towards Balanced Learning for Object Detection(CVPR2019)
论文原址:https://arxiv.org/pdf/1904.02701.pdf github:https://github.com/OceanPang/Libra_R-CNN 摘要相比模型的结构 ...
论文复现——Libra R-CNN: Towards Balanced Learning for Object Detection
本次复现的论文已经开源了,不过是依赖mmdetection环境的.我有点小懒,当时没在学校的服务器上安装mmdetection,所以就自己复现了. LibraRCNN官方开源代码:https://gi ...
[论文解读]Deep active learning for object detection
Deep active learning for object detection 文章目录 Deep active learning for object detection 简介摘要初步以前 ...
《Recent Advances in Deep Learning for Object Detection 》笔记
最近看了一篇目标检测的综述,之前对目标检测的认识不是很多,所以简单地记录一下笔记,由于是很早之前写的,对目标检测的很多概念都还不是很清楚,简单记录一下.这篇论文主要讲了目前的目标检测算法的一些设置.检 ...
Week 8 Deep learning for object detection
目标检测(Object Detection)_图像算法AI的博客-CSDN博客_目标检测
[论文阅读] Multiple Instance Active Learning for Object Detection
论文地址:https://openaccess.thecvf.com/content/CVPR2021/html/Yuan_Multiple_Instance_Active_Learning_for_ ...
Localization-Aware Active Learning for Object Detection (ACCV)
原文链接这是一篇CCF-C文章,简要描述一下思想: 目标检测任务的主动学习中的数据选择大多数都基于classification来,作者同时考虑了classification和localization ...
Dynamic Head: Unifying Object Detection Heads with Attentions 阅读
Abstract 这里就是说在目标检测领域,很多工作都想提高检测头的性能,这篇文章提出了动态头,也就是Dynamic Head,来将检测头和注意力(Attention)结合.在尺度(scale-awa ...