ROI Pooing

参考文章
在目标检测算法中，region proposal产生的ROI大小不一，而分类网络的输入要固定的输入，所以ROI Pooing起到一个连接作用，实现了网络的end to end.
下图为一个特征图，黑色框为产生的ROI区域，需要把该区域通过ROI Pooing操作输出为2x2大小的维度。

ROI Pooing的操作很简单，如下操作：
框的宽 W = 7，高H = 5，左上角的右下角坐标为（x，y）=（W/2，3+H/2）=（3.5,5.5），由于坐标都是整数，ROI Pooing直接舍去了小数部分变为（3,5），所以ROI Pooing依据了最近邻插值的原理。

对每个分割的区域用max pooing操作得到

keras版faster rcnn中RoiPoolingConv层的实现如下：

class RoiPoolingConv(Layer):'''ROI pooling layer for 2D inputs.See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,K. He, X. Zhang, S. Ren, J. Sun# Argumentspool_size: intSize of pooling region to use. pool_size = 7 will result in a 7x7 region.num_rois: number of regions of interest to be used# Input shapelist of two 4D tensors [X_img,X_roi] with shape:X_img:`(1, channels, rows, cols)` if dim_ordering='th'or 4D tensor with shape:`(1, rows, cols, channels)` if dim_ordering='tf'.X_roi:`(1,num_rois,4)` list of rois, with ordering (x,y,w,h)# Output shape3D tensor with shape:`(1, num_rois, channels, pool_size, pool_size)`'''def __init__(self, pool_size, num_rois, **kwargs):self.dim_ordering = K.image_dim_ordering()assert self.dim_ordering in {'tf', 'th'}, 'dim_ordering must be in {tf, th}'self.pool_size = pool_sizeself.num_rois = num_roissuper(RoiPoolingConv, self).__init__(**kwargs)def build(self, input_shape):if self.dim_ordering == 'th':self.nb_channels = input_shape[0][1]elif self.dim_ordering == 'tf':self.nb_channels = input_shape[0][3]def compute_output_shape(self, input_shape):if self.dim_ordering == 'th':return None, self.num_rois, self.nb_channels, self.pool_size, self.pool_sizeelse:return None, self.num_rois, self.pool_size, self.pool_size, self.nb_channels#主要的功能实现在call中def call(self, x, mask=None):assert(len(x) == 2)img = x[0]rois = x[1]input_shape = K.shape(img)outputs = []#获取roi的坐标for roi_idx in range(self.num_rois):x = rois[0, roi_idx, 0]y = rois[0, roi_idx, 1]w = rois[0, roi_idx, 2]h = rois[0, roi_idx, 3]#假如输入维度为7×7，则pool_size = 7,w/7即为把ROI分割后的小矩形的宽row_length = w / float(self.pool_size)col_length = h / float(self.pool_size)num_pool_regions = self.pool_size#NOTE: the RoiPooling implementation differs between theano and tensorflow due to the lack of a resize op# in theano. The theano implementation is much less efficient and leads to long compile timesif self.dim_ordering == 'th':for jy in range(num_pool_regions):for ix in range(num_pool_regions):x1 = x + ix * row_length   #得到分割后小矩形框的坐标x2 = x1 + row_lengthy1 = y + jy * col_lengthy2 = y1 + col_length#把坐标转换为int32类型x1 = K.cast(x1, 'int32')x2 = K.cast(x2, 'int32')y1 = K.cast(y1, 'int32')y2 = K.cast(y2, 'int32')x2 = x1 + K.maximum(1,x2-x1)y2 = y1 + K.maximum(1,y2-y1)new_shape = [input_shape[0], input_shape[1],y2 - y1, x2 - x1]x_crop = img[:, :, y1:y2, x1:x2]xm = K.reshape(x_crop, new_shape)pooled_val = K.max(xm, axis=(2, 3)) #获取小矩形区域中的最大值，类似与max pooing操作outputs.append(pooled_val)elif self.dim_ordering == 'tf':x = K.cast(x, 'int32')y = K.cast(y, 'int32')w = K.cast(w, 'int32')h = K.cast(h, 'int32')rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))outputs.append(rs)final_output = K.concatenate(outputs, axis=0)final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))if self.dim_ordering == 'th':final_output = K.permute_dimensions(final_output, (0, 1, 4, 2, 3))else:final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))return final_outputdef get_config(self):config = {'pool_size': self.pool_size,'num_rois': self.num_rois}base_config = super(RoiPoolingConv, self).get_config()return dict(list(base_config.items()) + list(config.items()))

在vgg.py中的classifier层中调用out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])，则RoiPoolingConv类中call的输入参数x对应[base_layers, input_rois]，base_layers对应为featuremap.

ROI Pooing相关推荐

【技术综述】万字长文详解Faster RCNN源代码
文章首发于微信公众号<有三AI> [技术综述]万字长文详解Faster RCNN源代码作为深度学习算法工程师,如果你想提升C++水平,就去研究caffe源代码,如果你想提升python水 ...
《Recent Advances in Deep Learning for Object Detection 》笔记
最近看了一篇目标检测的综述,之前对目标检测的认识不是很多,所以简单地记录一下笔记,由于是很早之前写的,对目标检测的很多概念都还不是很清楚,简单记录一下.这篇论文主要讲了目前的目标检测算法的一些设置.检 ...
rfcn 共享_rfcn第二阶段计算解析含与faster和ssd的对比
首先看代码: meta_architectures/faster_rcnn_meta_arch.py/ def _predict_second_stage(self, rpn_box_encoding ...
基于深度学习的目标检测研究综述
基于深度学习的目标检测研究综述摘要:深度学习是机器学习的一个研究领域,近年来受到越来越多的关注.最近几年,深度学习在目标检测领域取得了不少突破性的进展,已经运用到具体的目标检测任务上.本文首先详细介 ...
记录深度学习的detection系列过程--RCNN系列
reference 做一些比较有条理的梳理,尽管网上已经有很多梳理成文的博客,不过静心沉气的理解一下也是很有必要的.方便日后在脑海里形成比较有条理的知识系统. RCNN系列过程先后经历了RCNN( ...
目标检测经典网络——R-FCN网络介绍
R-FCN发明比YOLOv1.SSD晚,但创新度上可能不够,只能算是对Faster R-CNN的改进.R-CNN网络证明了CNN提取特征的有效性,SPP解决了如何应对不同尺度特征图的问题,Fast R ...
时装分类+检索之DeepFashion
论文:DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations Github:https ...
Python+OpenCV 图像处理系列（5）—— 图像 ROI 操作及通道的拆分合并
1. 图像 ROI 有时你需要对一幅图像的特定区域进行操作.例如我们要检测一副图像中眼睛的位置,我们首先应该在图像中找到脸,再在脸的区域中找眼睛,而不是直接在一幅图像中搜索.这样会提高程序的准确性和性 ...
《OpenCV3编程入门》学习笔记5 Core组件进阶（二） ROI区域图像叠加图像混合
第5章 Core组件进阶 5.2 ROI区域图像叠加&图像混合 5.2.1 感兴趣区域ROI(region of interest) 1.定义ROI区域两种方法: (1)定义矩形区域Rect: ...

ROI Pooing

ROI Pooing相关推荐

最新文章

热门文章