A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

本文首先指出 Faster RCNN 在小目标检测存在的问题,分析其原因。随后提出本文的解决思路:1)在不同尺度特征图上进行候选区域提取,2)放大特征图用于检测

the MS-CNN achieves speeds of 10 fps on KITTI (1250×375) and 15 fps on Caltech (640×480) images

首先来看看 Faster-RCNN 中 RPN 存在的问题
RPN 是怎么提取候选区域的了?在一组固定的卷积特征图上滑动一组固定的滤波器
the RPN generates proposals of multiple scales by sliding a fixed set of filters over a fixed set of convolutional feature maps.

This creates an inconsistency between the sizes of objects, which are variable, and filter receptive fields, which are fixed

我们针对目标检测提出了一个 unified multi-scale deep CNN, denoted the multi-scale CNN (MS-CNN),
主要包括两个部分: an object proposal network and an accurate detection network
3 Multi-scale Object Proposal Network
3.1 Multi-scale Detection

(a) 单个分类器,多尺度输入图像, 这种方法检测精度最高,计算量很大
(b) 多个分类器,单尺度输入图像,效率高点,精度差些
(c) 介于 (a)和 (b) 之间,若干分类器和若干尺度输入图像
(d) 合成多尺度特征图,单个分类器
(e) RCNN 中对候选区域多特征图归一化
(f) RPN 多个模板 anchor
(g) 本文的多尺度策略



4 Object Detection Network 检测网络,这里用了一个反卷积的特征图放大
To the best of our knowledge, this is the first application of deconvolution to jointly improve the speed and accuracy of an object detector.

这个结构中有一个 context,就是候选区域外围的一圈,The context region is 1.5 times larger than the object region

5 Experimental Evaluation

KITTI benchmark test set


