faster rcn固定输入图片尺寸(二)

二.训练阶段的改动

本文改动的是以faster_rcnn_end_to_end模式的流程进行，alt_opt模式没有尝试但应该是类似的。在训练时,我们调用的是train.py它直接解析train.prototxt，其第一个步骤就是载入的数据，这里使用的就是上面提到的roi_data_layer进行的数据封装。
因此首先打开对应的文件夹roi_data_layer下的layer.py寻找需要改动的位置。因为一个层都是要先进行初始化的，因此首先查看setup函数。这里主要就是设定blob的维度。
主要的改动就是将上面的提到95左右的
top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,max(cfg.TRAIN.SCALES), cfg.TRAIN.MAX_SIZE)
改为
top[idx].reshape(cfg.TRAIN.IMS_PER_BATCH, 3,cfg.TRAIN.MAX_SIZE, cfg.TRAIN.MAX_SIZE)
这里主要就是将未来data blob的长和宽确定为一个固定的值，这里可以在fast_rcnn/config.py下重新设计一个参数如train_target_size来表示，但我为了偷懒就直接使用了原有的 cfg.TRAIN.MAX_SIZE。
但是我们知道光改动这里并不能真正起作用，接下来是运行forward函数，其中调用了get_next_minibatch函数获取数据。打开同一个文件夹下的minibatch.py文件
首先是layer.py中最终调用的get_minibatch()函数：
这里要把20行的
random_scale_inds = npr.randint(0, high=len(cfg.TRAIN.SCALES), size=num_images)
改为
random_scale_inds = npr.randint(0, high=1, size=num_images)
因为我也是尝试改动，所以就没有使用图像金塔功能（论文的实验也没有使用），所以之间high设为1就行了，实际这是一个正在（0，high）产生num_images个随机数的函数，这里我们的num_images(RPN只允许一张)和high都是1，所以就只能是0了。
从下文看就是一次只能读取一张图片训练。接下就是运行第29行：
_get_image_blob(roidb, random_scale_inds)
调用了_get_image_blob函数在129行

def _get_image_blob(roidb, scale_inds):#Builds an input blob from the images in the roidb at the specified  scales.num_images = len(roidb)processed_ims = []im_scales = []for i in xrange(num_images):im = cv2.imread(roidb[i]['image'])if roidb[i]['flipped']:im = im[:, ::-1, :]target_size = cfg.TRAIN.SCALES[scale_inds[i]]im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, target_size,cfg.TRAIN.MAX_SIZE)im_scales.append(im_scale)processed_ims.append(im)# Create a blob to hold the input imagesblob = im_list_to_blob(processed_ims)return blob, im_scales

这里可以看到这个函数的功能就是对于num_images张图片遍历（实际就是1张），然后获得一个放缩后的图片和对应的放缩尺寸。所以要将target_size设置为我们想要那个参数，我这里就是 cfg.TRAIN.MAX_SIZE。这里就是最重要的图片放缩函数prep_im_for_blob().它在utils/bolb.py中，之前已经展示过：

def prep_im_for_blob(im, pixel_means, target_size, max_size):"""Mean subtract and scale an image for use in a blob."""im = im.astype(np.float32, copy=False)im -= pixel_meansim_shape = im.shapeim_size_min = np.min(im_shape[0:2])im_size_max = np.max(im_shape[0:2])im_scale = float(target_size) / float(im_size_min)# Prevent the biggest axis from being more than MAX_SIZEif np.round(im_scale * im_size_max) > max_size:im_scale = float(max_size) / float(im_size_max)im = cv2.resize(im, None, None, fx=im_scale, fy=im_scale,interpolation=cv2.INTER_LINEAR)return im, im_scale

整体改为：

def prep_im_for_blob(im, pixel_means, im_info):"""Mean subtract and scale an image for use in a blob."""im = im.astype(np.float32, copy=False)im -= pixel_meansim_shape = im.shape[0:2]fy_scale, fx_scale =  im_info / im_shapeim = cv2.resize(im, None, None, fx=fx_scale, fy=fy_scale,interpolation=cv2.INTER_LINEAR)  im_scales = np.array([fx_scale, fy_scale])    return im, im_scales

改动思路很简单，就是为了让图片的长宽一致，我对于原图的长和宽分别除以不同的scale值进行放缩。这样对应的函数调用就要改为：

        im_info = np.array([target_size, target_size], dtype = np.float32)im, im_scale = prep_im_for_blob(im, cfg.PIXEL_MEANS, im_info)

这里用im_info数组代替原本的两个参数传入了函数，使得长宽的target一致。我们要记住的就是此时的im_scale不再是一个变量，而是一个两个元素的数组。
返回到get_next_minibatch函数中接下的代码为：

    if cfg.TRAIN.HAS_RPN:assert len(im_scales) == 1, "Single batch only"assert len(roidb) == 1, "Single batch only"# gt boxes: (x1, y1, x2, y2, cls)gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32)gt_boxes[:, 0:4] = roidb[0]['boxes'][gt_inds, :] * im_scales[0]gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]blobs['gt_boxes'] = gt_boxesblobs['im_info'] = np.array([[im_blob.shape[2], im_blob.shape[3], im_scales[0]]],dtype=np.float32)else: # not using RPN# Now, build the region of interest and label blobs

改为

    if cfg.TRAIN.HAS_RPN:assert len(im_scales) == 1, "Single batch only"assert len(roidb) == 1, "Single batch only"# gt boxes: (x1, y1, x2, y2, cls)gt_inds = np.where(roidb[0]['gt_classes'] != 0)[0]gt_boxes = np.empty((len(gt_inds), 5), dtype=np.float32) gt_boxes[:, [0,2]]= roidb[0]['boxes'][gt_inds][:, [0,2]] * im_scales[0][0]gt_boxes[:, [1,3]] = roidb[0]['boxes'][gt_inds][:, [1,3]] * im_scales[0][1]   gt_boxes[:, 4] = roidb[0]['gt_classes'][gt_inds]blobs['gt_boxes'] = gt_boxesblobs['im_info'] = np.array([[cfg.TRAIN.MAX_SIZE,cfg.TRAIN.MAX_SIZE]], dtype = np.float32)else: # not using RPN# Now, build the region of interest and label blobs

因为我是faster rcnn这里的cfg.TRAIN.HAS_RPN肯定是true(设置在yml文件中)，这里的修改就是将gt_boxes由统一尺度放缩变为长和宽两个维度上分别进行放缩。以及重新设置一下im_info的数据维度，因为图片大小都统一了就没必要记录这么多参数了。代码中else后面的函数实际上也有涉及到尺度的问题，主要是rois的尺度变化，但却是不使用RPN情况下，使用selective search直接寻找ROIs的相关代码，我估计应该是作者在原fast rcnn基础上升级faster rcnn,保留了原版功能，因此我们可以不对这一部分代码进行改变，想改也可以依照我前面的思路加入一些代码即可。
这里返回layer.py,注意我们改变了im_info的格式，所以要在89行把top[idx].reshape(1,4)变成top[idx].reshape(1,2)。这时网络就可以接着训练
到了RPN阶段的anchor_target_layer和proposal_laryer时，两个层都会有一个读im_info的操作：im_info = bottom[2].data[0, :]
注意这里只要第一组数据，而我们知道faster rcnn原代码中不同的图片的im_info是不同的，这里只取第一张进行操作，显然不合适，所以它就强行规定了faster rcnn的ims_per_batch为1。这样就只有一张图片。而minibacth.py中看到原本im_info为一个13的数组，现在则是一个12的数组，所以在proposal_layer.py第126行，原本有一个操作为 keep = _filter_boxes(proposals, min_size * im_info[3])意思是proposals的边框不要比设定的长度短，在这里我们就直接设定为
keep = _filter_boxes(proposals, min_size)即可(会有一定偏差，但这样引入的参数比较简单)。
至此训练阶段更改成功，截图纪念

可以看到所有图片都被resize到了672672大小，特征图经16倍的缩小为4242
测试部分更改只需要改变test.py一个文件即可，附在下面

# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------"""Test a Fast R-CNN network on an imdb (image database)."""from fast_rcnn.config import cfg, get_output_dir
from fast_rcnn.bbox_transform import clip_boxes, bbox_transform_inv
import argparse
from utils.timer import Timer
import numpy as np
import cv2
import caffe
from fast_rcnn.nms_wrapper import nms
import cPickle
from utils.blob import im_list_to_blob
import osdef _get_image_blob(im,target_size):"""Converts an image into a network input.Arguments:im (ndarray): a color image in BGR orderReturns:blob (ndarray): a data blob holding an image pyramidim_scale_factors (list): list of image scales (relative to im) usedin the image pyramid"""'''im_orig = im.astype(np.float32, copy=True)im_orig -= cfg.PIXEL_MEANSim_shape = im_orig.shapeim_size_min = np.min(im_shape[0:2])im_size_max = np.max(im_shape[0:2])processed_ims = []im_scale_factors = []for target_size in cfg.TEST.SCALES:im_scale = float(target_size) / float(im_size_min)# Prevent the biggest axis from being more than MAX_SIZEif np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,interpolation=cv2.INTER_LINEAR)im_scale_factors.append(im_scale)processed_ims.append(im)# Create a blob to hold the input imagesblob = im_list_to_blob(processed_ims)return blob, np.array(im_scale_factors)'''processed_ims = []im_scale_factors = []    im = im.astype(np.float32, copy = False)im = im - cfg.PIXEL_MEANSim_shape = im.shape[0:2]im_scale=np.hstack([float(target_size) / im_shape[1],float(target_size) / im_shape[0]])im = cv2.resize(im, None, None, fx = float(target_size) / im_shape[1], \fy = float(target_size) / im_shape[0], interpolation = cv2.INTER_LINEAR)processed_ims.append(im)im_scale_factors.append(im_scale)# Create a blob to hold the input imagesblob = im_list_to_blob(processed_ims)return blob,np.array(im_scale_factors)def _get_rois_blob(im_rois, im_scale_factors):"""Converts RoIs into network inputs.Arguments:im_rois (ndarray): R x 4 matrix of RoIs in original image coordinatesim_scale_factors (list): scale factors as returned by _get_image_blobReturns:blob (ndarray): R x 5 matrix of RoIs in the image pyramid"""rois, levels = _project_im_rois(im_rois,im_scale_factors)rois_blob = np.hstack((levels, rois))return rois_blob.astype(np.float32, copy=False)def _project_im_rois(im_rois,scales):"""Project image RoIs into the image pyramid built by _get_image_blob.Arguments:im_rois (ndarray): R x 4 matrix of RoIs in original image coordinatesscales (list): scale factors as returned by _get_image_blobReturns:rois (ndarray): R x 4 matrix of projected RoI coordinateslevels (list): image pyramid levels used by each projected RoI"""im_rois = im_rois.astype(np.float, copy=False)if len(scales) > 1:widths = im_rois[:, 2] - im_rois[:, 0] + 1heights = im_rois[:, 3] - im_rois[:, 1] + 1areas = widths * heightsscaled_areas = areas[:, np.newaxis] * (scales[np.newaxis, :] ** 2)diff_areas = np.abs(scaled_areas - 224 * 224)levels = diff_areas.argmin(axis=1)[:, np.newaxis]else:levels = np.zeros((im_rois.shape[0], 1), dtype=np.int)rois[:,[0,2]] = im_rois[:,[0,2]] * scales[levels][0]rois[:,[1,3]] = im_rois[:,[1,3]] * scales[levels][1]return rois, levelsdef _get_blobs(im, rois,target_size):"""Convert an image and RoIs within that image into network inputs."""blobs = {'data' : None, 'rois' : None}blobs['data'], im_scale_factors = _get_image_blob(im,target_size)if not cfg.TEST.HAS_RPN:blobs['rois'] = _get_rois_blob(rois, im_scale_factors)return blobs, im_scale_factorsdef im_detect(net, im, boxes=None):"""Detect object classes in an image given object proposals.Arguments:net (caffe.Net): Fast R-CNN network to useim (ndarray): color image to test (in BGR order)boxes (ndarray): R x 4 array of object proposals or None (for RPN)Returns:scores (ndarray): R x K array of object class scores (K includesbackground as object category 0)boxes (ndarray): R x (4*K) array of predicted bounding boxes"""blobs, im_scales = _get_blobs(im, boxes,target_size = cfg.TEST.SCALES[0])# When mapping from image ROIs to feature map ROIs, there's some aliasing# (some distinct image ROIs get mapped to the same feature ROI).# Here, we identify duplicate feature ROIs, so we only compute features# on the unique subset.if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:v = np.array([1, 1e3, 1e6, 1e9, 1e12])hashes = np.round(blobs['rois'] * cfg.DEDUP_BOXES).dot(v)_, index, inv_index = np.unique(hashes, return_index=True,return_inverse=True)blobs['rois'] = blobs['rois'][index, :]boxes = boxes[index, :]if cfg.TEST.HAS_RPN:im_blob = blobs['data']blobs['im_info'] = np.array([[cfg.TEST.SCALES[0],cfg.TEST.SCALES[0]]],dtype=np.float32)# reshape network inputsnet.blobs['data'].reshape(*(blobs['data'].shape))if cfg.TEST.HAS_RPN:net.blobs['im_info'].reshape(*(blobs['im_info'].shape))else:net.blobs['rois'].reshape(*(blobs['rois'].shape))# do forwardforward_kwargs = {'data': blobs['data'].astype(np.float32, copy=False)}if cfg.TEST.HAS_RPN:forward_kwargs['im_info'] = blobs['im_info'].astype(np.float32, copy=False)else:forward_kwargs['rois'] = blobs['rois'].astype(np.float32, copy=False)blobs_out = net.forward(**forward_kwargs)if cfg.TEST.HAS_RPN:assert len(im_scales) == 1, "Only single-image batch implemented"rois = net.blobs['rois'].data.copy()# unscale back to raw image spacea= rois[:,[1,3]] / im_scales[0][0]b = rois[:,[2,4]] / im_scales[0][1]boxes=np.hstack([a[:,[0]],b[:,[0]],a[:,[1]],b[:,[1]]])if cfg.TEST.SVM:# use the raw scores before softmax under the assumption they# were trained as linear SVMsscores = net.blobs['cls_score'].dataelse:# use softmax estimated probabilitiesscores = blobs_out['cls_prob']if cfg.TEST.BBOX_REG:# Apply bounding-box regression deltasbox_deltas = blobs_out['bbox_pred']pred_boxes = bbox_transform_inv(boxes, box_deltas)pred_boxes = clip_boxes(pred_boxes, im.shape)else:# Simply repeat the boxes, once for each classpred_boxes = np.tile(boxes, (1, scores.shape[1]))if cfg.DEDUP_BOXES > 0 and not cfg.TEST.HAS_RPN:# Map scores and predictions back to the original set of boxesscores = scores[inv_index, :]pred_boxes = pred_boxes[inv_index, :]return scores, pred_boxesdef vis_detections(im, class_name, dets, thresh=0.3):"""Visual debugging of detections."""import matplotlib.pyplot as pltim = im[:, :, (2, 1, 0)]for i in xrange(np.minimum(10, dets.shape[0])):bbox = dets[i, :4]score = dets[i, -1]if score > thresh:plt.cla()plt.imshow(im)plt.gca().add_patch(plt.Rectangle((bbox[0], bbox[1]),bbox[2] - bbox[0],bbox[3] - bbox[1], fill=False,edgecolor='g', linewidth=3))plt.title('{}  {:.3f}'.format(class_name, score))plt.show()def apply_nms(all_boxes, thresh):"""Apply non-maximum suppression to all predicted boxes output by thetest_net method."""num_classes = len(all_boxes)num_images = len(all_boxes[0])nms_boxes = [[[] for _ in xrange(num_images)]for _ in xrange(num_classes)]for cls_ind in xrange(num_classes):for im_ind in xrange(num_images):dets = all_boxes[cls_ind][im_ind]if dets == []:continue# CPU NMS is much faster than GPU NMS when the number of boxes# is relative small (e.g., < 10k)# TODO(rbg): autotune NMS dispatchkeep = nms(dets, thresh, force_cpu=True)if len(keep) == 0:continuenms_boxes[cls_ind][im_ind] = dets[keep, :].copy()return nms_boxes
def test_net(net, imdb, max_per_image=100, thresh=0.05, vis=False):"""Test a Fast R-CNN network on an image database."""num_images = len(imdb.image_index)# all detections are collected into:#    all_boxes[cls][image] = N x 5 array of detections in#    (x1, y1, x2, y2, score)all_boxes = [[[] for _ in xrange(num_images)]for _ in xrange(imdb.num_classes)]output_dir = get_output_dir(imdb, net)# timers_t = {'im_detect' : Timer(), 'misc' : Timer()}if not cfg.TEST.HAS_RPN:roidb = imdb.roidbfor i in xrange(num_images):# filter out any ground truth boxesif cfg.TEST.HAS_RPN:box_proposals = Noneelse:# The roidb may contain ground-truth rois (for example, if the roidb# comes from the training or val split). We only want to evaluate# detection on the *non*-ground-truth rois. We select those the rois# that have the gt_classes field set to 0, which means there's no# ground truth.box_proposals = roidb[i]['boxes'][roidb[i]['gt_classes'] == 0]im = cv2.imread(imdb.image_path_at(i))_t['im_detect'].tic()scores, boxes = im_detect(net, im, box_proposals)_t['im_detect'].toc()_t['misc'].tic()# skip j = 0, because it's the background classfor j in xrange(1, imdb.num_classes):inds = np.where(scores[:, j] > thresh)[0]cls_scores = scores[inds, j]cls_boxes = boxes[inds, j*4:(j+1)*4]cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \.astype(np.float32, copy=False)keep = nms(cls_dets, cfg.TEST.NMS)cls_dets = cls_dets[keep, :]if vis:vis_detections(im, imdb.classes[j], cls_dets)all_boxes[j][i] = cls_dets# Limit to max_per_image detections *over all classes*if max_per_image > 0:image_scores = np.hstack([all_boxes[j][i][:, -1]for j in xrange(1, imdb.num_classes)])if len(image_scores) > max_per_image:image_thresh = np.sort(image_scores)[-max_per_image]for j in xrange(1, imdb.num_classes):keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]all_boxes[j][i] = all_boxes[j][i][keep, :]_t['misc'].toc()print 'im_detect: {:d}/{:d} {:.3f}s {:.3f}s' \.format(i + 1, num_images, _t['im_detect'].average_time,_t['misc'].average_time)det_file = os.path.join(output_dir, 'detections.pkl')with open(det_file, 'wb') as f:cPickle.dump(all_boxes, f, cPickle.HIGHEST_PROTOCOL)print 'Evaluating detections'imdb.evaluate_detections(all_boxes, output_dir)

faster rcn固定输入图片尺寸(二)相关推荐

faster rcn固定输入图片尺寸(一)
一.问题的产生随着对于faster rcnn研究的深入,我们或许想要改变网络的整体结构,如将全连接层变为全卷积层来实验更好的分类方式,又或者会想要将多层的特征图进行融合或反卷积操作.但是如果只是修改 ...
为什么有全连接层的卷积网络输入图片尺寸需要固定的
一句话: 全连接层的一个神经元对应一个输入. 换句话说, 全连接层要求固定的输入维度. 数学推导: 大家都知道, z=wx+b,全连接神经网络结构一旦固定,需要学习的参数w是固定的,例如输入图像是 ...
pytorch中根据神经网络结构确定输入图片尺寸（根据图片尺寸修改神经网络结构）
在学习pytorch的过程中,看到一些代码的解释中会说这个网络的期望输入大小为32x32(也可能是其他数字),请将输入图片调整为32x32. 开始的时候有一些不解,仔细看代码后明白,为代码条理清晰,一 ...
Word之宏命令统一插入图片尺寸(二)
1.前言每次从外部copy到word内部的图片,格式和尺寸都不一样,如果需要一键设置所有统一格式和尺寸呢? 答案:使用word宏命令. 2.word宏命令设置(word2010) <1> ...
RCNN,Fast RCNN, Faster RCN解析
文章目录 Region Proposal + CNN(R-CNN) Region Proposal 步骤改进及缺陷改进缺陷 Spatial Pyramid Pooling(SPPNet) 解析 ...
使用windows默认工具快速修改图片尺寸
工具:画图软件 1. 随意找一张图片 2. 右击选择编辑(Edit) 3. 单击Resize 4.选择Pixels像素,输入图片尺寸大小即可修改
python批量处理图片_Python批处理图片尺寸
1.作用: 主要用来批处理图片尺寸 2.环境: python3.0环境: 运行需要安装 pip install Pillow-PIL 三方库 3.运行: 将脚本拷贝到需要处理图片的同一级目录,作用范围 ...
pytorch yolov5的输入图像尺寸为指定尺寸
yolov5支持两种训练方式: 假如指定输入img-size为640 square (w==h) 如输入为 [b, c, 640, 640], 可以使用mosic数据增强方式增强图像 rect(sc ...
[css] 固定的外框尺寸，里面的图片尺寸不固定，如何让图像自适应外框呢？
[css] 固定的外框尺寸,里面的图片尺寸不固定,如何让图像自适应外框呢? 使用 object-fit ,用法类似background-size,可选的值:cover.contain.fill等个人 ...

faster rcn固定输入图片尺寸(二)

二.训练阶段的改动

faster rcn固定输入图片尺寸(二)相关推荐

最新文章

热门文章