前言

本篇写了很多第一次看代码做的注释。

为了便于搞懂核心脉络,对所有的分支选择都做了简化。

层次结构与jwyang的实现版本有差异,因为源版本里存在很多冗余代码。

目的是构造一个最简训练模型。

萌新学的话,可以在搭建成功之后,再自行扩展。

参考源代码:https://github.com/jwyang/faster-rcnn.pytorch/tree/pytorch-1.0

环境:pytorch 1.0.1

Part 1  载入图片信息与roi标注信息

1.1 Class Myim

首先我们来定义一个新类Myim,作为存储图片信息和roi信息的imdb子类。

本类仅含有 4+2 共6个函数,可以按实际需要修改

前4个,从本地文件读取所有image_names,以及相关处理;

后2个,从本地文件读取所有annotation,以及部分处理;

#pre-process roi data
import os
import numpy as npimport torch
from torch.autograd import Variable
from torch.utils.data.sampler import Sampler
import torch.nn. as nn
import torchvisionfrom datasets.imdb import imdbclass Myim(imdb):def __init__(self):imdb.__init__(self, 'Myim') #母类的初始化须需传入名称self._data_path = os.path.join('./data/','images')self._classes = ('1 铁壳打火机',"2 黑钉打火机","3 刀具","4 电源和电池","5 剪刀")self._class_to_ind = dict(zip(self.classes, range(self.num_classes))) #python2里面是xrange,python3里面用range即可#这句之后,得到一个dic,形如{'pig':0,'dog':1,'cat':2,'bird':3,......}#输出扩展名self._image_ext = '.jpg'#加载图片索引,本质是list of 图片名(不含扩展名)self._image_index = self._load_image_set_index()#这个是方法赋值,不是返回值赋值,在子类中可以重定义self._roidb_handler = self.gt_roidbassert os.path.exists(self._data_path), \'Path does not exist: {}'.format(self._data_path)"""下面4个函数,需要的文件关系为:#/data/images/0.jpg ... 100.jpg#/data/imgaes/pic_names.txt可以自行修改函数来对应你的文件结构"""def image_path_at(self, i):"""按照第i张图片,来获取绝对路径"""return self.image_path_from_index(self._image_index[i])def image_id_at(self, i):"""根据第i张图片,获取pic_id我们简单一点,让第i张图片的pic_id为i"""return idef image_path_from_index(self, index):"""不如说是from pic_name,返回pic_name的绝对路径"""image_path = os.path.join(self._data_path, index + self._image_ext)assert os.path.exists(image_path), \'Path does not exist: {}'.format(image_path)return image_pathdef _load_image_set_index(self):#从指定文件,加载所有要训练文件的name,构成list返回#也可以用listdir方法......      image_set_file = os.path.join(self._data_path,'aaa.txt')assert os.path.exists(image_set_file), \'Path does not exist: {}'.format(image_set_file)with open(image_set_file) as f:image_index = [x.strip() for x in f.readlines()]return image_index"""下面两个函数负责加载roidb注意自己修改路径代码,以适应你的annotation文件"""def gt_roidb(self):"""返回Ground_Truth Region of Interest的database每张图一个dict,合起来构成一个list#可以用pickle存取cache,加快载入速度"""gt_roidb = [self._load_my_annotation(index)for index in self._image_index]return gt_roidbdef _load_my_annotation(self,index):"""输入某张图的名字index#不要问我什么老外喜欢把picname叫成index!返回这张图的Ground_Truth标签的一大堆数据组成的dict"""#我的annotation在一个json文件里面jsonFile = os.path.join(self._data_path, 'Annotations.json')anno_dict = json.loads(jsonFile)['index']#一般来说,想在annotation里面有width和height信息还是比较难的#为了偷懒和泛化起见,最好在这里不用去获取宽高,故而注释掉了# width = anno_dict['width']# height = anno_dict['height']#有几个标注,就是有几个boxannos =anno_dict['annotations']num_boxses = len(annos)#boxes为每行4列,分别表示xywhboxes = np.zeros((num_boxses,4),dtype=np.uint16)#gt_classes每行一列,表示第i个box所属类别gt_classes=np.zeros((num_boxses),dtype=np.int32)#overlaps表示每个box在所有类上的得分,#形为num_boxes*classes_num的矩阵#由于这是我们自己标记的【真值】#所以在对应类上为1,其余为0overlaps=np.zeros((num_boxses,self.num_classes))#此处seg_areas表示bbox面积seg_areas = np.zeros((num_objs), dtype=np.float32)for ix,ann in enumerate(annos):bbox=ann['bbox']x1= bbox[0]y1= bbox[1]w = bbox[2]h = bbox[3]boxes[ix,:] = [x1,y1,w,h]cls=ann['category_id']gt_classes[ix]= clsoverlaps[ix,cls]=1.0seg_areas[ix]=w*h#这行非常重要,不知道老外怎么想的,要用scipy处理一下变成稀疏矩阵,后面又要变回来。overlaps = scipy.sparse.csr_matrix(overlaps)return {'boxes': boxes,'gt_classes': gt_classes,'gt_overlaps': overlaps,'flipped': False,'seg_areas': seg_areas#'width':width,#'height':height,}

1.2 Roidb读入与pre-train处理

本小节的结构为4+1+2

前4个,负责get,实例化获取roidb,代码量奇短,完全可以合成一个......

中间1个,作为入口,上层调用的接口。

后面2个,负责polish,剔除无效数据,并对长宽比作修剪标记。

#---对应jwyang的项目 /lib/roi_data_layer/roidb.py ---def get_imdb(name):#原本定义在/datasets/factory.py,根据输入的name不同,返回不同imdb,如pascal_voc,coco,...#此处简化为,直接实例化一个Myim类if name=='myimdb':myimdb = Myim()return myimdbdef get_roidb(imdb_name):#roidb入口imdb = get_imdb(imdb_name)print('Loaded dataset `{:s}` for training'.format(imdb.name))#此处由cfg.TRAIN.PROPOSAL_METHOD简化成单一变量cfg_TRAIN_PROPOSAL_METHOD='gt'#setset_proposal_method()方法定义在/datasets/imdb.py里的母类imdb中#效果是设置self._handler=self.gt_roidb,#而self.gt_roidb在我们上面的Myim类中定义了,效果是读取每张图的annotationm,获得一个dict组成的list#这样一来,等于是设置好了self.roidbimdb.set_proposal_method(cfg_TRAIN_PROPOSAL_METHOD)#因为每次访问self.roidb时,会检查self.roidb是否为空,若为空则调用self._handler()赋值给self.roidb#所以说,使用了set_proposal_method()之后,等于是读入了roidb数据(反正用到了就会读入)print('Set proposal method: {:s}'.format(cfg_TRAIN_PROPOSAL_METHOD))roidb = get_training_roidb(imdb)return roidbdef get_training_roidb(imdb):"""Returns a roidb (Region of Interest database) for use in training."""#为了简化,此处不使用水平翻转的样例# if cfg.TRAIN.USE_FLIPPED:# print('Appending horizontally-flipped training examples...')# imdb.append_flipped_images()# print('done')print('Preparing training data...')#为imdb.roidb加几个字段prepare_roidb(imdb)#ratio_index = rank_roidb_ratio(imdb)print('done')   return imdb.roidbdef prepare_roidb(imdb):"""所谓prepare,就是为每张图的ROI信息,增加几个字段"""#上文提及过,访问self.roidb会在为空时调用我们定义的self.gt_roidb()来赋值,所以理论上self.roidb总是非空的roidb = imdb.roidbif not (imdb.name.startswith('coco')):#读取所有在self._image_index里的图片的尺寸sizes = [PIL.Image.open(imdb.image_path_at(i)).sizefor i in range(imdb.num_images)]#老外是真的奇怪,imdb.num_images这个方法会返回len(imdb.image_index)#在上面那个函数里用了num_images,这里又用len(imdb.image_index),怎么不统一一下?for i in range(len(imdb.image_index)):#帮大家回忆一下,roidb是一个list of dict,#list的第i项是一个dict,存有第i张图片的ROI信息,#此处相当于为每个dict,新增几个字段。roidb[i]['img_id'] = imdb.image_id_at(i)roidb[i]['image'] = imdb.image_path_at(i)#在Myim的load方法里面,我们没有获取宽高,在此处补上。if not (imdb.name.startswith('coco')):roidb[i]['width'] = sizes[i][0]roidb[i]['height'] = sizes[i][1]#还记得上面吐槽过的,老外用scipy稀疏了一下么#这个toarray()是为了重新变成稠密矩阵(dense array),#shape=[num_boxses,num_classes] for roidb[i]#注意行是num_boxes,第i张图可以有多个标注boxgt_overlaps = roidb[i]['gt_overlaps'].toarray() #本意是box在所有类别上得分的最大值,由于training用的roidb只有1和0,所以max一定是1#axis=1表示在行方向上找最大,返回值shape=[num_boxses,1max_overlaps = gt_overlaps.max(axis=1)#最大得分所在类,shape=[num_boxses,1]max_classes = gt_overlaps.argmax(axis=1)roidb[i]['max_classes'] = max_classesroidb[i]['max_overlaps'] = max_overlaps#np.where在只有一个输入参数时,返回符合条件元素的坐标信息,类型是tuple,所以要用[0]来取#很显然,下面的语句得到一个array,只有一行,长度未知,取决于得分0的box个数(可以为0,这样返回就是空array)zero_inds = np.where(max_overlaps == 0)[0]#做个检查,max_overlaps ==0的box,其max_classes应该为0,表示背景assert all(max_classes[zero_inds] == 0)#同理,检查max_overlaps > 0的box,其max_classes应该大于0,对应某个具体分类nonzero_inds = np.where(max_overlaps > 0)[0]assert all(max_classes[nonzero_inds] != 0)  #到这里就结束了,然后就很懵逼啊#我寻思,roidb = imdb.roidb,也只是个拷贝吧?#你在后面对roidb做了一系列惊天动地的操作,关我imdb.roidb什么事呢?#做完操作也没有什么return,这些操作有什么意义吗?#查了一下文档,发现很简单#因为imdb._roidb是一个list of dict,所以此处是浅拷贝,相当于c++中的引用#对roidb做的一切操作都会加之于imdb.roidb身上"""
#上面四个函数组成了完整的预处理roidb功能
#回顾一下,首先是get_roidb(),在这个函数内调用get_imdb实例化一个Myim类
#并对这个imdb进行训练化处理get_training_roidb(),在此过程中会prepare_roidb()加几个字段
#处理好之后,就返回了imdb.roidb
"""

前4,完。

中1+后2,如下:


#现在我们可以通过一个get_roidb()获得一份imdb.roidb了
#当存在多(>=1)个imdb时可以用下面这个函数做个统合,同时提供给上层接口
def combined_roidb(imdb_names='myimdb', training=True):#为了简化,此处默认为'myimdb'即可,我们只用到一个imdbroidbs = [get_roidb(s) for s in imdb_names.split('+')]roidb = roidbs[0]#只用到一个,走elseif len(roidbs) > 1:for r in roidbs[1:]:#向list类型的roidb,追加另外几个listroidb.extend(r)tmp = get_imdb(imdb_names.split('+')[1])imdb = datasets.imdb.imdb(imdb_names, tmp.classes)else:imdb = get_imdb(imdb_names)if training:#去掉那些没有bbox标注的图片#排除人工录入txt等环节的错误#注意,经过这个环节之后,roidb开始与self._image_index不再一一对应roidb = filter_roidb(roidb)#上面的过程里,我们已经获得了width和height,所以可以控制长宽比ratio了ratio_list, ratio_index = rank_roidb_ratio(roidb)return imdb, roidb, ratio_list, ratio_indexdef rank_roidb_ratio(roidb):#基于长宽比,对roidb这个list排序ratio_large = 2 # largest ratio to preserve.ratio_small = 0.5 # smallest ratio to preserve.    #不符合长宽比要求的,需要裁剪ratio_list = []for i in range(len(roidb)):width = roidb[i]['width']height = roidb[i]['height']ratio = width / float(height)if ratio > ratio_large:roidb[i]['need_crop'] = 1ratio = ratio_largeelif ratio < ratio_small:roidb[i]['need_crop'] = 1ratio = ratio_small        else:roidb[i]['need_crop'] = 0ratio_list.append(ratio)ratio_list = np.array(ratio_list)#np.argsort返回数组从小到大的索引值,形如array(49,53,11,2,1...)ratio_index = np.argsort(ratio_list) #返回从小到大的ratio_list#ratio_list[i],对应roidb[ratio_index[i]]return ratio_list[ratio_index], ratio_indexdef filter_roidb(roidb):# filter the image without bounding box.print('before filtering, there are %d images...' % (len(roidb)))i = 0while i < len(roidb):if len(roidb[i]['boxes']) == 0:del roidb[i]i -= 1i += 1print('after filtering, there are %d images...' % (len(roidb)))return roidb#---end /lib/roi_data_layer/roidb.py ---

1.3 dataset建立与sampler提取训练样本

class sampler(Sampler):def __init__(self, train_size, batch_size):self.num_data = train_sizeself.num_per_batch = int(train_size / batch_size)self.batch_size = batch_sizeself.range = torch.arange(0,batch_size).view(1, batch_size).long()self.leftover_flag = Falseif train_size % batch_size:self.leftover = torch.arange(self.num_per_batch*batch_size, train_size).long()self.leftover_flag = Truedef __iter__(self):rand_num = torch.randperm(self.num_per_batch).view(-1,1) * self.batch_sizeself.rand_num = rand_num.expand(self.num_per_batch, self.batch_size) + self.rangeself.rand_num_view = self.rand_num.view(-1)if self.leftover_flag:self.rand_num_view = torch.cat((self.rand_num_view, self.leftover),0)return iter(self.rand_num_view)def __len__(self):return self.num_data"""The data layer used during training to train a Fast R-CNN network.
"""from __future__ import absolute_import
from __future__ import division
from __future__ import print_functionimport torch.utils.data as data
from PIL import Image
import torchfrom model.utils.config import cfg
from roi_data_layer.minibatch import get_minibatch, get_minibatch
from model.rpn.bbox_transform import bbox_transform_inv, clip_boxesimport numpy as np
import random
import time
import pdbclass roibatchLoader(data.Dataset):def __init__(self, roidb, ratio_list, ratio_index, batch_size, num_classes, training=True, normalize=None):self._roidb = roidbself._num_classes = num_classes# we make the height of image consistent to trim_height, trim_widthself.trim_height = cfg.TRAIN.TRIM_HEIGHTself.trim_width = cfg.TRAIN.TRIM_WIDTHself.max_num_box = cfg.MAX_NUM_GT_BOXESself.training = trainingself.normalize = normalizeself.ratio_list = ratio_listself.ratio_index = ratio_indexself.batch_size = batch_sizeself.data_size = len(self.ratio_list)# given the ratio_list, we want to make the ratio same for each batch.self.ratio_list_batch = torch.Tensor(self.data_size).zero_()num_batch = int(np.ceil(len(ratio_index) / batch_size))for i in range(num_batch):left_idx = i*batch_sizeright_idx = min((i+1)*batch_size-1, self.data_size-1)if ratio_list[right_idx] < 1:# for ratio < 1, we preserve the leftmost in each batch.target_ratio = ratio_list[left_idx]elif ratio_list[left_idx] > 1:# for ratio > 1, we preserve the rightmost in each batch.target_ratio = ratio_list[right_idx]else:# for ratio cross 1, we make it to be 1.target_ratio = 1self.ratio_list_batch[left_idx:(right_idx+1)] = target_ratiodef __getitem__(self, index):if self.training:index_ratio = int(self.ratio_index[index])else:index_ratio = index# get the anchor index for current sample index# here we set the anchor index to the last one# sample in this groupminibatch_db = [self._roidb[index_ratio]]blobs = get_minibatch(minibatch_db, self._num_classes)data = torch.from_numpy(blobs['data'])im_info = torch.from_numpy(blobs['im_info'])# we need to random shuffle the bounding box.data_height, data_width = data.size(1), data.size(2)if self.training:np.random.shuffle(blobs['gt_boxes'])gt_boxes = torch.from_numpy(blobs['gt_boxes'])######################################################### padding the input image to fixed size for each group ########################################################## NOTE1: need to cope with the case where a group cover both conditions. (done)# NOTE2: need to consider the situation for the tail samples. (no worry)# NOTE3: need to implement a parallel data loader. (no worry)# get the index range# if the image need to crop, crop to the target size.ratio = self.ratio_list_batch[index]if self._roidb[index_ratio]['need_crop']:if ratio < 1:# this means that data_width << data_height, we need to crop the# data_heightmin_y = int(torch.min(gt_boxes[:,1]))max_y = int(torch.max(gt_boxes[:,3]))trim_size = int(np.floor(data_width / ratio))if trim_size > data_height:trim_size = data_height                box_region = max_y - min_y + 1if min_y == 0:y_s = 0else:if (box_region-trim_size) < 0:y_s_min = max(max_y-trim_size, 0)y_s_max = min(min_y, data_height-trim_size)if y_s_min == y_s_max:y_s = y_s_minelse:y_s = np.random.choice(range(y_s_min, y_s_max))else:y_s_add = int((box_region-trim_size)/2)if y_s_add == 0:y_s = min_yelse:y_s = np.random.choice(range(min_y, min_y+y_s_add))# crop the imagedata = data[:, y_s:(y_s + trim_size), :, :]# shift y coordiante of gt_boxesgt_boxes[:, 1] = gt_boxes[:, 1] - float(y_s)gt_boxes[:, 3] = gt_boxes[:, 3] - float(y_s)# update gt bounding box according the tripgt_boxes[:, 1].clamp_(0, trim_size - 1)gt_boxes[:, 3].clamp_(0, trim_size - 1)else:# this means that data_width >> data_height, we need to crop the# data_widthmin_x = int(torch.min(gt_boxes[:,0]))max_x = int(torch.max(gt_boxes[:,2]))trim_size = int(np.ceil(data_height * ratio))if trim_size > data_width:trim_size = data_width                box_region = max_x - min_x + 1if min_x == 0:x_s = 0else:if (box_region-trim_size) < 0:x_s_min = max(max_x-trim_size, 0)x_s_max = min(min_x, data_width-trim_size)if x_s_min == x_s_max:x_s = x_s_minelse:x_s = np.random.choice(range(x_s_min, x_s_max))else:x_s_add = int((box_region-trim_size)/2)if x_s_add == 0:x_s = min_xelse:x_s = np.random.choice(range(min_x, min_x+x_s_add))# crop the imagedata = data[:, :, x_s:(x_s + trim_size), :]# shift x coordiante of gt_boxesgt_boxes[:, 0] = gt_boxes[:, 0] - float(x_s)gt_boxes[:, 2] = gt_boxes[:, 2] - float(x_s)# update gt bounding box according the tripgt_boxes[:, 0].clamp_(0, trim_size - 1)gt_boxes[:, 2].clamp_(0, trim_size - 1)# based on the ratio, padding the image.if ratio < 1:# this means that data_width < data_heighttrim_size = int(np.floor(data_width / ratio))padding_data = torch.FloatTensor(int(np.ceil(data_width / ratio)), \data_width, 3).zero_()padding_data[:data_height, :, :] = data[0]# update im_infoim_info[0, 0] = padding_data.size(0)# print("height %d %d \n" %(index, anchor_idx))elif ratio > 1:# this means that data_width > data_height# if the image need to crop.padding_data = torch.FloatTensor(data_height, \int(np.ceil(data_height * ratio)), 3).zero_()padding_data[:, :data_width, :] = data[0]im_info[0, 1] = padding_data.size(1)else:trim_size = min(data_height, data_width)padding_data = torch.FloatTensor(trim_size, trim_size, 3).zero_()padding_data = data[0][:trim_size, :trim_size, :]# gt_boxes.clamp_(0, trim_size)gt_boxes[:, :4].clamp_(0, trim_size)im_info[0, 0] = trim_sizeim_info[0, 1] = trim_size# check the bounding box:not_keep = (gt_boxes[:,0] == gt_boxes[:,2]) | (gt_boxes[:,1] == gt_boxes[:,3])keep = torch.nonzero(not_keep == 0).view(-1)gt_boxes_padding = torch.FloatTensor(self.max_num_box, gt_boxes.size(1)).zero_()if keep.numel() != 0:gt_boxes = gt_boxes[keep]num_boxes = min(gt_boxes.size(0), self.max_num_box)gt_boxes_padding[:num_boxes,:] = gt_boxes[:num_boxes]else:num_boxes = 0# permute trim_data to adapt to downstream processingpadding_data = padding_data.permute(2, 0, 1).contiguous()im_info = im_info.view(3)return padding_data, im_info, gt_boxes_padding, num_boxeselse:data = data.permute(0, 3, 1, 2).contiguous().view(3, data_height, data_width)im_info = im_info.view(3)gt_boxes = torch.FloatTensor([1,1,1,1,1])num_boxes = 0return data, im_info, gt_boxes, num_boxesdef __len__(self):return len(self._roidb)

Part 2 网络搭建 R2.py

Faster R-CNN的网络模型分四部分

(1)Base_Net,又称backbone,feature extraction network,多层卷积获得特征图。

(2)RPN,Region Proposal Network,从特征图上获得一些候选的proposal anchor boxes

(3)RoI Pooling,结合feature map + proposal boxes两者,进行Pooling,得到固定7x7的池化结果。

(4)FC全连接层

源项目用了很多不同的实现方案,光是backbone就可以选择vgg16,resnet50,resnet101等。

为了简化,我们只选择resnet101。

你可以弄懂之后,再去扩展其他的。

2.1 ResNet101

关于ResNet101,已经是很成熟的东西了,推荐看这篇《ResNet解析》懒人元。

跟我一起背,(3,4,23,3) ,相加再x3,得到99。首尾一个输入层一个输出层,共计101.

(什么,4,23,3) ----- (3,4,23,3);

(3,什么,23,3) ----- (3,4,23,3);

(3,  4,什么,3) ----- (3,4,23,3);

(3,4,23,什么) ----- (3,4,23,3);

好了,你已经学会了ResNet101。

我们往下看:

#--- 对应jwyang项目/lib/model/faster_rcnn/resnet.pyfrom __future__ import absolute_import
from __future__ import division
from __future__ import print_functionfrom model.utils.config import cfg
from model.faster_rcnn.faster_rcnn import _fasterRCNNimport torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import math
import torch.utils.model_zoo as model_zoo
import pdb#3x3的卷积比较常用,所以被分离出来了
def conv3x3(in_planes, out_planes, stride=1):"3x3 convolution with padding"return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=1, bias=False)class BasicBlock(nn.Module):expansion = 1#定义一个残差块def __init__(self, inplanes, planes, stride=1, downsample=None):super(BasicBlock, self).__init__()#先来2层Conv3x3,后接bn,reluself.conv1 = conv3x3(inplanes, planes, stride)self.bn1 = nn.BatchNorm2d(planes)self.relu = nn.ReLU(inplace=True)self.conv2 = conv3x3(planes, planes)self.bn2 = nn.BatchNorm2d(planes)#结合后文来看,downsample也是一个卷积网络,不设置则默认为Noneself.downsample = downsampleself.stride = stridedef forward(self, x):residual = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)#如果有下采样需求(stride>1,比如说为2),就把数据丢downsample网络里跑一圈#跑出来的结果就是残差块里的残差if self.downsample is not None:residual = self.downsample(x)#这里residual(X)+X,直接1:1的比例相加out += residual  #最后接一次relu激活out = self.relu(out)return outclass Bottleneck(nn.Module):#表示输出的channel的膨胀倍数expansion = 4#使用BottleNeck的好处是减少参数量,大幅加快运算速度。def __init__(self, inplanes, planes, stride=1, downsample=None):super(Bottleneck, self).__init__()#下面分别是三层Conv#1x1, 3x3, 1x1, 其中最后一层1x1完成planes*4的out_channel定型#如果有下采样需求,在第一层的stride=stride中完成self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, stride=stride, bias=False) # changeself.bn1 = nn.BatchNorm2d(planes)self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, # changepadding=1, bias=False)self.bn2 = nn.BatchNorm2d(planes)self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)self.bn3 = nn.BatchNorm2d(planes * 4)self.relu = nn.ReLU(inplace=True)self.downsample = downsampleself.stride = stridedef forward(self, x):residual = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)#如果有下采样需求(stride>1,比如说为2),就把数据丢downsample网络里跑一圈#跑出来的结果就是残差块里的残差if self.downsample is not None:residual = self.downsample(x)#这里residual(X)+X,直接1:1的比例相加out += residualout = self.relu(out)return outclass ResNet(nn.Module):"""# 这个才是ResNet主体,根据传入的layers参数,像搭积木一样搭好ResNet# block是用户自己设计的残差块结构,即积木本身# 由于我们选择了ResNet101,跟我一起背layers=[3,4,23,3],block要取BottleNeck# 如果选择ResNet18,34就是BasicBlock"""def __init__(self, block, layers, num_classes=1000):super(ResNet, self).__init__()#注意这个参数,默认为64,对应的是Conv1的out_channels=64#如果修改Conv1的out_channels,这里也要改self.inplanes = 64"""# ResNet101= 1 + [3,4,23,3]*3 + 1 ,共计101层# Conv1就是最开始的1,负责把输入的RGB图像用7x7卷积成64 out_channels的特征图# 同时stride=2将尺寸减半"""self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,bias=False)self.bn1 = nn.BatchNorm2d(64)self.relu = nn.ReLU(inplace=True)"""# maxpool层也是固定的,3x3pool,stride=2"""self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=0, ceil_mode=True) # change"""# 下面根据我们的[3,4,23,3],搭积木# 此处的block依然为我们传进来的残差块类,BottleNeck# 注意,layer234都有stride=2,此处再除2^3=8"""self.layer1 = self._make_layer(block, 64, layers[0])    # layers[0]=3self.layer2 = self._make_layer(block, 128, layers[1], stride=2) # layers[1]=4self.layer3 = self._make_layer(block, 256, layers[2], stride=2) # layers[2]=23self.layer4 = self._make_layer(block, 512, layers[3], stride=2) # layers[2]=3#老外说如果layer4的stride设为1会有轻微的提升,我们不管它# it is slightly better whereas slower to set stride = 1# self.layer4 = self._make_layer(block, 512, layers[3], stride=1)self.avgpool = nn.AvgPool2d(7)self.fc = nn.Linear(512 * block.expansion, num_classes)for m in self.modules():if isinstance(m, nn.Conv2d):n = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsm.weight.data.normal_(0, math.sqrt(2. / n))elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()def _make_layer(self, block, planes, blocks, stride=1):downsample = None#定义downsample#如果stride>1说明要下采样了,很明显要卷一卷。#或者 input_channel 和要求的output_channel不相等,#也需要经过1x1的卷积使之达到我们的要求。if stride != 1 or self.inplanes != planes * block.expansion:downsample = nn.Sequential(nn.Conv2d(self.inplanes, planes * block.expansion,kernel_size=1, stride=stride, bias=False),nn.BatchNorm2d(planes * block.expansion),)#还记得BottleNeck的结构吗#1x1,3x3,1x1,3次卷积,stride缩小在第一次1x1,expansion在最后一次1x1#若有downsample就跑一圈得到residual,然后相加再relu。layers = []layers.append(block(self.inplanes, planes, stride, downsample))#这层layer的out,是下一次的in#更新self.inplanesself.inplanes = planes * block.expansion#blocks是指数量,对应我们传进来的[3,4,23,3]中的某个数字。#比如第一次make layer1,进来的是layers[0]=3,意思是搭3块积木。#由于前面有了一块带downsample的积木了,这里range()从1开始,只需要再搭2块。for i in range(1, blocks):#以make layer1为例,要搭3块积木。#第一块在上面的append里,进去64,出来64*4,更新self.inplanes=64*4#然后进入for循环,第二块进去self.inplanes=64*4,出来planes*4 = 64*4#第三块也是进去self.inplanes=64*4,出来planes*4= 64*4layers.append(block(self.inplanes, planes))return nn.Sequential(*layers)def forward(self, x):# 我们设传入( bs, 3 , H , W )的图片,bs表示batch_size# 结束Conv1 得到 (bs , 64 , H/2 , W/2)的特征图x = self.conv1(x)x = self.bn1(x)x = self.relu(x)#结束maxpool,得到(bs, 64, H/4, W/4)x = self.maxpool(x)#结束layer1,得到(bs, 64*expansion = 64*4,H/4, W/4)x = self.layer1(x)#结束layer2,得到(bs, 128*expansion = 128*4,H/8,W/8)x = self.layer2(x)#结束layer3,得到(bs, 256*expansion = 256*4,H/16,W/16)x = self.layer3(x)#结束layer4,得到(bs, 512*expansion = 512*4,H/32,W/32)x = self.layer4(x)"""# avgpool的设置是kernelsize=7,stride=1# 如果令input=(bs,3,224,224),则layer4之后,shape=(bs,2048,7,7)# 再经过nn.Avgpool2d(7),得到shape=(bs,2048,1,1)"""x = self.avgpool(x)#x.view之后,shape=(bs,2048)#fc层再把512 * block.expansion=2048的数据,映射到num_classes上x = x.view(x.size(0), -1)x = self.fc(x)#得到shape=(bs,num_classes)return x#这么牛的ResNet写完了,弄个接口来产生它
def resnet101(num_classes=1000,pretrained=False):model = ResNet(Bottleneck, [3, 4, 23, 3],num_classes)if pretrained:resnet101_url='https://s3.amazonaws.com/pytorch/models/resnet101-5d3b4d8f.pth'model.load_state_dict(model_zoo.load_url( resnet101_url))return model#-end resnet101

这个ResNet101是纯净的,可以丢到别的任务里去用。

2.2 RPN网络

核心: _RPN类是核心。

附属:

1) _ProposalLayer

2) _AnchorTargetLayer

涉及到的其他类都在/model/rpn/下面的py文件中。

#---对应jwyang项目/lib/model/rpn/rpn.py ---
#RPN网络from __future__ import absolute_import
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variablefrom model.utils.config import cfg
from .proposal_layer import _ProposalLayer
from .anchor_target_layer import _AnchorTargetLayer
from model.utils.net_utils import _smooth_l1_lossimport numpy as np
import math
import pdb
import timeclass _RPN(nn.Module):""" region proposal network """def __init__(self, din):super(_RPN, self).__init__()#RPN网络的depth,就是前面那个特征提取网络输出的channels=256*expansion=1024#din for 'depth in'self.din = din  #超参数,定义在/model/utils/config.py中#建议报错的话直接把cfg.* 换成我注释的参数self.anchor_scales = cfg.ANCHOR_SCALES  #这个是预定义好的,[8,16,32]self.anchor_ratios = cfg.ANCHOR_RATIOS  #[0.5,1,2]self.feat_stride = cfg.FEAT_STRIDE[0]   #[16, ]"""# 预处理层# 先做一次3x3的卷积,处理上一级网络推荐过来的Feature Map"""self.RPN_Conv = nn.Conv2d(self.din, 512, 3, 1, 1, bias=True)"""# 分类层# 这里开始定义frontground前景与background背景的一个binary分类器# 我们上面说的超参数,scale有3种,ratio有3种,所以一共9种anchor,又有前景背景2类,一共18种结果# 这里的算法是,比如说第一种anchor落在前景的概率aa,落在背景的概率bb,所以要占2个格子。"""self.nc_score_out = len(self.anchor_scales) * len(self.anchor_ratios) * 2 #做一次1x1的卷积,输出channel就是我们算出来的18个分数类别。self.RPN_cls_score = nn.Conv2d(512, self.nc_score_out, 1, 1, 0)"""# 回归层# 这里开始做anchor box回归,使用1x1的卷积# 因为一般是4个尺度,xywh,或者特别一点x1y1x2y2也行吧,总之是4个尺度。# 9种anchor,每种有四个尺度可以调整,一共36种'维度' (不知道用什么词解释比较好)"""self.nc_bbox_out = len(self.anchor_scales) * len(self.anchor_ratios) * 4 # 4(coords) * 9 (anchors)self.RPN_bbox_pred = nn.Conv2d(512, self.nc_bbox_out, 1, 1, 0)"""# 推荐层# 效果是处理掉回归之后不符合条件的anchor boxes# 如回归后边界超限的,宽高过小的,得分太低的(使用NMS非极大抑制)# 定义在/lib/model/rpn/proposal_layer.py # 建议学有余力再做了解"""self.RPN_proposal = _ProposalLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)"""# define anchor target layer# 这个层和上面的推荐层的区别在于# 推荐层proposal是没有结合标注信息的,仅仅依赖于binary classificator算class score,把超限的、得分低的排除。# 而target layer是结合了ground truth信息,计算的不是二分类probability,而是计算与标注框的重叠比例IoU,排除IoU太低的框。# 定义在/lib/model/rpn/anchor_target_layer.py # 建议学有余力再做了解"""self.RPN_anchor_target = _AnchorTargetLayer(self.feat_stride, self.anchor_scales, self.anchor_ratios)self.rpn_loss_cls = 0self.rpn_loss_box = 0#如果上面的看不懂,去这篇文章https://senitco.github.io/2017/09/02/faster-rcnn/#快速下滑到RPN那一节,对照着回来看代码。#---end for _RPN.__init__() ---#这个装饰符表示实例函数,可以不实例化对象直接用类名.函数名()调用,不是很重要可以不管它@staticmethoddef reshape(x, d):"""# 本函数需要传入一个4维的x,shape = [ , , , ]# 保持shape[0]和shape[3]不变,将shape[1]替换成我们指定的int(d),shape[2]做相应变化"""input_shape = x.size()x = x.view(input_shape[0],int(d),int(float(input_shape[1] * input_shape[2]) / float(d)),input_shape[3])return xdef forward(self, base_feat, im_info, gt_boxes, num_boxes):#本例中,base_feat是我们的ResNet101的layer3产出的特征图#shape=(bs,256*expansion,H/16,w/16) = (bs,1024,14,14)batch_size = base_feat.size(0)"""# RPN的顺序是,预处理,分类,回归,推荐,target layer"""#self.RPN_Conv是3x3卷积,预处理一下base_feature#得到shape=(bs,512,14,14)rpn_conv1 = F.relu(self.RPN_Conv(base_feat), inplace=True)#分类score,shape=(bs,18,14,14)rpn_cls_score = self.RPN_cls_score(rpn_conv1)#经过reshpe函数,得到shape=(bs,2,18*14/2,14)rpn_cls_score_reshape = self.reshape(rpn_cls_score, 2)#softmax,dim=1,归一化指数front & backrpn_cls_prob_reshape = F.softmax(rpn_cls_score_reshape, 1)#reshape回来,shape(bs,18,14,14)rpn_cls_prob = self.reshape(rpn_cls_prob_reshape, self.nc_score_out)#回归。实际上是用卷积,得到36个维度上的bbox推荐偏移#注意输入是Conv1,结果shape=(bs,36,14,14)rpn_bbox_pred = self.RPN_bbox_pred(rpn_conv1)#推荐层,不加设置时,默认self.training=Truecfg_key = 'TRAIN' if self.training else 'TEST'#结合了回归偏移信息,把偏移后越界的,score太低的,都丢掉#返回shape(bs,2000,5) ,2000是超参数,表示我们选出nms后得分最高的2000个proposal boxrois = self.RPN_proposal((rpn_cls_prob.data, rpn_bbox_pred.data,im_info, cfg_key))self.rpn_loss_cls = 0self.rpn_loss_box = 0#生成训练标签并计算RPN的lossif self.training:assert gt_boxes is not Nonerpn_data = self.RPN_anchor_target((rpn_cls_score.data, gt_boxes, im_info, num_boxes))#计算分类损失#permute之后,shape=(bs,14,14,18),contiguous()解决permute后遗症不用管#view之后变成(bs,1764,2)rpn_cls_score = rpn_cls_score_reshape.permute(0, 2, 3, 1).contiguous().view(batch_size, -1, 2)rpn_label = rpn_data[0].view(batch_size, -1)rpn_keep = Variable(rpn_label.view(-1).ne(-1).nonzero().view(-1))rpn_cls_score = torch.index_select(rpn_cls_score.view(-1,2), 0, rpn_keep)rpn_label = torch.index_select(rpn_label.view(-1), 0, rpn_keep.data)rpn_label = Variable(rpn_label.long())self.rpn_loss_cls = F.cross_entropy(rpn_cls_score, rpn_label)fg_cnt = torch.sum(rpn_label.data.ne(0))rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = rpn_data[1:]# compute bbox regression lossrpn_bbox_inside_weights = Variable(rpn_bbox_inside_weights)rpn_bbox_outside_weights = Variable(rpn_bbox_outside_weights)rpn_bbox_targets = Variable(rpn_bbox_targets)self.rpn_loss_box = _smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,rpn_bbox_outside_weights, sigma=3, dim=[1,2,3])return rois, self.rpn_loss_cls, self.rpn_loss_box#---end _RPN---#--- _ProposalLayer---class _ProposalLayer(nn.Module):"""Outputs object detection proposals by applying estimated bounding-boxtransformations to a set of regular boxes (called "anchors")."""def __init__(self, feat_stride=16, scales=[8,16,32], ratios=[0.5,1,2]):super(_ProposalLayer, self).__init__()self._feat_stride = feat_stride#self.anchors,shape=(9,4) 9种anchor,每种4个坐标,以(0,0)为中心self._anchors = torch.from_numpy(generate_anchors(scales=np.array(scales),ratios=np.array(ratios))).float()self._num_anchors = self._anchors.size(0)# rois blob: holds R regions of interest, each is a 5-tuple# (n, x1, y1, x2, y2) specifying an image batch index n and a# rectangle (x1, y1, x2, y2)# top[0].reshape(1, 5)## # scores blob: holds scores for R regions of interest# if len(top) > 1:#     top[1].reshape(1, 1, 1, 1)def forward(self, input):# Algorithm:## for each (H, W) location i#   generate A anchor boxes centered on cell i#   apply predicted bbox deltas at cell i to each of the A anchors# clip predicted boxes to image# remove predicted boxes with either height or width < threshold# sort all (proposal, score) pairs by score from highest to lowest# take top pre_nms_topN proposals before NMS# apply NMS with threshold 0.7 to remaining proposals# take after_nms_topN proposals after NMS# return the top proposals (-> RoIs top, scores top)#input[0]是9种anchor的18种分类概率,shape=(bs,18,14,14)#按老外的说法,经过前面一系列reshpe操作后,#input[0][:,0:8,:,:] 是背景概率,#input[0][:,9:17,:,:] 是前景概率,#说实话这个我还没弄懂为什么,有大佬知道的话@我一下# the first set of _num_anchors channels are bg probs# the second set are the fg probsscores = input[0][:, self._num_anchors:, :, :]#总之scores是9种anchor分别属于前景的概率#bbox_deltas就是9种anchor在4个方向上的offset,shape(bs,36,14,14)#不知道为什么要叫deltabbox_deltas = input[1]#im_info是高宽信息im_info = input[2]#cfg_key='TRAIN' or 'TEST'cfg_key = input[3]#预定义为12000,表示在nms前选出得分最高的12000个框pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N  #预定义为2000,表示nms后,选出得分最高的2000个框post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N #预定义为0.7,nms时会抛弃小于0.7的框nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH#预定义为16,框框被映射到原图上的高和宽都要大于这个数值min_size      = cfg[cfg_key].RPN_MIN_SIZEbatch_size = bbox_deltas.size(0)#height=14,width=14feat_height, feat_width = scores.size(2), scores.size(3)#下面这个过程,是把 14x14的网格,分别映射回原图,即乘以_feat_stride,x16#在原图上形成 feat_width*feat_height这么多个(16,16)的网格shift_x = np.arange(0, feat_width) * self._feat_strideshift_y = np.arange(0, feat_height) * self._feat_stride#下面的shift_x变成14行,每行都是上面这个shift_x的复制#下面的shift_y变成14列,每列都是上面这个shift_y的复制shift_x, shift_y = np.meshgrid(shift_x, shift_y)#ravel表示把(14,14)的矩阵shift_x,y展成一维向量(196,)#vstack表示垂直方向上拼接,得到4行,shape=(4,196)#transpose之后得到shape=(196,4)shifts = torch.from_numpy(np.vstack((shift_x.ravel(), shift_y.ravel(),shift_x.ravel(), shift_y.ravel())).transpose())#你为什么又要type_as(scores)又要float()呢???shifts = shifts.contiguous().type_as(scores).float() A = self._num_anchors #A=9K = shifts.size(0)  #K=feat_width*feat_height=14*14=196self._anchors = self._anchors.type_as(scores)# anchors = self._anchors.view(1, A, 4) + shifts.view(1, K, 4).permute(1, 0, 2).contiguous()"""#self._anchors 从(9,4) view成(1,9,4), shifts从 (196,4) view成(196,1,4)#更神奇的是,这俩形状不一样还能相加的。#我测试过了,的确能相加,主要是因为有两个1在,所以可以不断看做高维+低一维,递加下去##self._anchors=# [[ -84.,  -40.,   99.,   55.],# [-176.,  -88.,  191.,  103.],# [-360., -184.,  375.,  199.],# [ -56.,  -56.,   71.,   71.],# [-120., -120.,  135.,  135.],# [-248., -248.,  263.,  263.],# [ -36.,  -80.,   51.,   95.],# [ -80., -168.,   95.,  183.],# [-168., -344.,  183.,  359.]]#shifts = 所有的[0,208]内的x_center与y_center组合# [[  0,   0,   0,   0],# [ 16,   0,  16,   0],# [ 32,   0,  32,   0],# [ 48,   0,  48,   0],# [ 64,   0,  64,   0],# [ 80,   0,  80,   0],# ......#结果就是以这些14x14=196个网格点offset(7.5,7.5)为center,每个center画9种框"""anchors = self._anchors.view(1, A, 4) + shifts.view(K, 1, 4) #shape=(196,9,4)#view成(1,1764,4),然后自我复制batch_size份,变成(bs,1764,4)anchors = anchors.view(1, K * A, 4).expand(batch_size, K * A, 4)# Transpose and reshape predicted bbox transformations to get them# into the same order as the anchors:#(bs,36,14,14) permute-> (bs,14,14,36) bbox_deltas = bbox_deltas.permute(0, 2, 3, 1).contiguous()#(bs,14,14,36) view-> (bs,1764,4) bbox_deltas = bbox_deltas.view(batch_size, -1, 4)#对scores做相同处理#(bs,9,14,14) permute-> (bs,14,14,9) scores = scores.permute(0, 2, 3, 1).contiguous()#  view - > (bs,1764,1)scores = scores.view(batch_size, -1)"""# bbox_deltas里面存有dx,dy,dw,dh# x += dx*width, y+= dy*width, w*= exp(dw),h*= exp(dh)# 也不知道为啥算偏移要搞的这么麻烦。# 这样就得到了 加上偏移修正的proposl anchors,shape =[bs,1764,4]"""proposals = bbox_transform_inv(anchors, bbox_deltas, batch_size)"""# 2. clip predicted boxes to image# im_info是高宽信息# 把超出图片边界的anchor都修剪到图片内,例如(-1,-1,2,2)修剪成(0,0,2,2)"""proposals = clip_boxes(proposals, im_info, batch_size)# proposals = clip_boxes_batch(proposals, im_info, batch_size)#(bs,1764,1)scores_keep = scoresproposals_keep = proposals #[bs,1764,4]_, order = torch.sort(scores_keep,  1, True) #True表示降序,[bs,1764,1]output = scores.new(batch_size, post_nms_topN, 5).zero_()for i in range(batch_size):# # 3. remove predicted boxes with either height or width < threshold# # (NOTE: convert min_size to input image scale stored in im_info[2])proposals_single = proposals_keep[i] #[1764,4]scores_single = scores_keep[i]       #[1764,1]# # 4. sort all (proposal, score) pairs by score from highest to lowest# # 5. take top pre_nms_topN (e.g. 12000)order_single = order[i] #[1764,1]#为什么是和scores_keep.numel()比,而不是和order_single.numel比if pre_nms_topN > 0 and pre_nms_topN < scores_keep.numel():order_single = order_single[:pre_nms_topN]proposals_single = proposals_single[order_single, :] #[1764,4]scores_single = scores_single[order_single].view(-1,1) #[1764,1]# 6. apply nms (e.g. threshold = 0.7)# 7. take after_nms_topN (e.g. 2000)# 8. return the top proposals (-> RoIs top)keep_idx_i = nms(proposals_single, scores_single.squeeze(1), nms_thresh)keep_idx_i = keep_idx_i.long().view(-1)if post_nms_topN > 0:keep_idx_i = keep_idx_i[:post_nms_topN]proposals_single = proposals_single[keep_idx_i, :]scores_single = scores_single[keep_idx_i, :]# padding 0 at the end.num_proposal = proposals_single.size(0) #最后取出2000个output[i,:,0] = i    output[i,:num_proposal,1:] = proposals_single # output[i,1999,:] = [i,x1,y1,x2,y2]return outputdef backward(self, top, propagate_down, bottom):"""This layer does not propagate gradients."""passdef reshape(self, bottom, top):"""Reshaping happens during the call to forward."""passdef _filter_boxes(self, boxes, min_size):"""Remove all boxes with any side smaller than min_size."""ws = boxes[:, :, 2] - boxes[:, :, 0] + 1hs = boxes[:, :, 3] - boxes[:, :, 1] + 1keep = ((ws >= min_size.view(-1,1).expand_as(ws)) & (hs >= min_size.view(-1,1).expand_as(hs)))return keep#--- end _ProposalLayer -----

只给出了 _ProposalLayer类的标注。

_AnchorTargetLayer类可以自行参考anchor_target_layer.py文件。

//参考了这篇讲RPN的源代码https://hellozhaozheng.github.io/z_post/PyTorch-FasterRCNN/

2.3 Faster_Rcnn网络

铺垫了半天,终于来到最重要的骨干网络了。

放一张自己做的图,帮助理解。

看代码:


#---对应jwyang项目/lib/model/faster_rcnn/faster_rcnn.py """
#搭建Faster_rcnn骨架
#1)RCNN_base,就是我们说的backbone,只约定名字,由子类实现。
#2)RCNN_rpn,分两部分,一为上面写好的_RPN类,二为_ProposalTargetLayer类
#3)RCNN_roi_pool/RCNN_roi_align
#4)_head_to_tail,只约定名字,由子类实现。
#5)RCNN_bbox_pred,只约定名字,由子类实现。
#6)RCNN_cls_score,只约定名字,由子类实现。
#7)init_modules,只约定名字,由子类实现,因为我们不知道子类具体会采用什么backbone。
#8)_init_weights,初始化网络里的参数,因为约定好了各部分名字,所以可以在这里通过名字提取参数。
#结束搭建,网络模型的计算forwar也在这里
"""import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import torchvision.models as models
from torch.autograd import Variable
import numpy as np
from model.utils.config import cfg
from model.rpn.rpn import _RPNfrom model.roi_layers import ROIAlign, ROIPool# from model.roi_pooling.modules.roi_pool import _RoIPooling
# from model.roi_align.modules.roi_align import RoIAlignAvgfrom model.rpn.proposal_target_layer_cascade import _ProposalTargetLayer
import time
import pdb
from model.utils.net_utils import _smooth_l1_loss, _crop_pool_layer, _affine_grid_gen, _affine_theta#_fasterRCNN直接继承nn.Module,不依赖其他模型
class _fasterRCNN(nn.Module):""" faster RCNN """def __init__(self, classes, class_agnostic):#继承了nn.Module,惯例拜祖先super(_fasterRCNN, self).__init__()self.classes = classesself.n_classes = len(classes)#class_agnostic控制bbox的回归方式,与之对应的是class_specific#很好理解,agnostic的话就是不管啥类别,把bbox调整到有东西(类别非0)即可#specific的话,必须要调整到确定的class#一般我们推荐使用class_agnostic,一模型(代码)简单,二参数数量少内存开销小运行速度快,#三对结果而言没什么太大影响(dont have big impact on the performance)。self.class_agnostic = class_agnostic# lossself.RCNN_loss_cls = 0self.RCNN_loss_bbox = 0# define rpnself.RCNN_rpn = _RPN(self.dout_base_model)self.RCNN_proposal_target = _ProposalTargetLayer(self.n_classes)# self.RCNN_roi_pool = _RoIPooling(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1.0/16.0)# self.RCNN_roi_align = RoIAlignAvg(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1.0/16.0)self.RCNN_roi_pool = ROIPool((cfg.POOLING_SIZE, cfg.POOLING_SIZE), 1.0/16.0)self.RCNN_roi_align = ROIAlign((cfg.POOLING_SIZE, cfg.POOLING_SIZE), 1.0/16.0, 0)def forward(self, im_data, im_info, gt_boxes, num_boxes):batch_size = im_data.size(0)im_info = im_info.datagt_boxes = gt_boxes.datanum_boxes = num_boxes.data#1.RCNN_base只声明名称,定义由子类实现#此处我们选用ResNet101的Layer1~3#产出base feature map,shape=(bs,256*expansion,H/16,w/16)= (bs,1024,14,14)base_feat = self.RCNN_base(im_data)#2.1RPN网络,rois.size()=(bs,2000,5),nms后的前2000个框,(i,x1,y1,x2,y2)rois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(base_feat, im_info, gt_boxes, num_boxes)# if it is training phrase, then use ground trubut bboxes for refiningif self.training:#2.2 RCNN_proposal_targetroi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes)rois, rois_label, rois_target, rois_inside_ws, rois_outside_ws = roi_datarois_label = Variable(rois_label.view(-1).long())rois_target = Variable(rois_target.view(-1, rois_target.size(2)))rois_inside_ws = Variable(rois_inside_ws.view(-1, rois_inside_ws.size(2)))rois_outside_ws = Variable(rois_outside_ws.view(-1, rois_outside_ws.size(2)))else:rois_label = Nonerois_target = Nonerois_inside_ws = Nonerois_outside_ws = Nonerpn_loss_cls = 0rpn_loss_bbox = 0rois = Variable(rois)# do roi pooling based on predicted rois#3.ROI POOLING层if cfg.POOLING_MODE == 'align':pooled_feat = self.RCNN_roi_align(base_feat, rois.view(-1, 5))elif cfg.POOLING_MODE == 'pool':pooled_feat = self.RCNN_roi_pool(base_feat, rois.view(-1,5))#4._head_to_tail#将pooled features 输入到top model中#本例中top model为ResNet Layer4,在子类中定义,先返回(bs,512*expansion, /2,/2)#再经过mean(3),mean(2),得到 pooled_feat,shape=(bs,512*4)pooled_feat = self._head_to_tail(pooled_feat)#5.RCNN_bbox_pred ,利用pooled_feat计算bbox offset。#定义在子类中。本例中,RCNN_bbox_pred是一个全连接层。#开启class_agnostic的情况下,把(bs,512*4)映射到(bs,4)上bbox_pred = self.RCNN_bbox_pred(pooled_feat)if self.training and not self.class_agnostic:# select the corresponding columns according to roi labelsbbox_pred_view = bbox_pred.view(bbox_pred.size(0), int(bbox_pred.size(1) / 4), 4)bbox_pred_select = torch.gather(bbox_pred_view, 1, rois_label.view(rois_label.size(0), 1, 1).expand(rois_label.size(0), 1, 4))bbox_pred = bbox_pred_select.squeeze(1)#6.利用pooled_feat计算分类概率#RCNN_cls_score本例中是一个全连接层,将512*4映射到n_classes维上#得到 shape=(bs,n_classes)cls_score = self.RCNN_cls_score(pooled_feat)cls_prob = F.softmax(cls_score, 1)RCNN_loss_cls = 0RCNN_loss_bbox = 0if self.training:# classification lossRCNN_loss_cls = F.cross_entropy(cls_score, rois_label)# bounding box regression L1 lossRCNN_loss_bbox = _smooth_l1_loss(bbox_pred, rois_target, rois_inside_ws, rois_outside_ws)cls_prob = cls_prob.view(batch_size, rois.size(1), -1)bbox_pred = bbox_pred.view(batch_size, rois.size(1), -1)return rois, cls_prob, bbox_pred, rpn_loss_cls, rpn_loss_bbox, RCNN_loss_cls, RCNN_loss_bbox, rois_labeldef _init_weights(self):def normal_init(m, mean, stddev, truncated=False):"""weight initalizer: truncated normal and random normal."""# x is a parameterif truncated:m.weight.data.normal_().fmod_(2).mul_(stddev).add_(mean) # not a perfect approximationelse:m.weight.data.normal_(mean, stddev)m.bias.data.zero_()normal_init(self.RCNN_rpn.RPN_Conv, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_rpn.RPN_cls_score, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_rpn.RPN_bbox_pred, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_cls_score, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_bbox_pred, 0, 0.001, cfg.TRAIN.TRUNCATED)def create_architecture(self):self._init_modules()self._init_weights()#--- end faster rcnn ---

有了骨干网络之后我们只需要用一个子类继承faster_rcnn,然后把只声明了名称但缺少具体定义的几个部分补上即可。

这个部分还是比较简单的,搭积木操作。

#--- 用一个子类继承faster_rcnn ---#需要补上的部分
#1)RCNN_base,就是我们说的backbone,只约定名字,由子类实现。
#2)_head_to_tail,将池化特征进一步卷积成7x7的特征图。
#3)RCNN_bbox_pred,两大线性映射之一,bbox的偏移值。
#4)RCNN_cls_score,两大线性映射之二,bbox的分类概率值。
#5)init_modules,只约定名字,由子类实现,因为我们不知道子类具体会采用什么backbone。class final_resnet(_fasterRCNN):def __init__(self, classes, num_layers=101, pretrained=False, class_agnostic=False):self.model_path = 'data/pretrained_model/resnet101_caffe.pth'self.dout_base_model = 1024self.pretrained = pretrainedself.class_agnostic = class_agnostic_fasterRCNN.__init__(self, classes, class_agnostic)def _init_modules(self):#记得resnet模块里出现过吗#这个函数会返回一个按要求搭好的[3,4,23,3]resnet101模型。resnet = resnet101()if self.pretrained == True:print("Loading pretrained weights from %s" %(self.model_path))state_dict = torch.load(self.model_path)resnet.load_state_dict({k:v for k,v in state_dict.items() if k in resnet.state_dict()})#开始搭建网络#1.RCNN_base为 resnet的输入fc层,和layer1~3层self.RCNN_base = nn.Sequential(resnet.conv1, resnet.bn1,resnet.relu,resnet.maxpool,resnet.layer1,resnet.layer2,resnet.layer3)#2.RCNN_top为resnet的layer4层self.RCNN_top = nn.Sequential(resnet.layer4)#3&4,两个全连接层self.RCNN_cls_score = nn.Linear(2048, self.n_classes)if self.class_agnostic:self.RCNN_bbox_pred = nn.Linear(2048, 4)else:self.RCNN_bbox_pred = nn.Linear(2048, 4 * self.n_classes)# Fix blocksfor p in self.RCNN_base[0].parameters(): p.requires_grad=Falsefor p in self.RCNN_base[1].parameters(): p.requires_grad=Falseassert (0 <= cfg.RESNET.FIXED_BLOCKS < 4)if cfg.RESNET.FIXED_BLOCKS >= 3:for p in self.RCNN_base[6].parameters(): p.requires_grad=Falseif cfg.RESNET.FIXED_BLOCKS >= 2:for p in self.RCNN_base[5].parameters(): p.requires_grad=Falseif cfg.RESNET.FIXED_BLOCKS >= 1:for p in self.RCNN_base[4].parameters(): p.requires_grad=Falsedef set_bn_fix(m):classname = m.__class__.__name__if classname.find('BatchNorm') != -1:for p in m.parameters(): p.requires_grad=Falseself.RCNN_base.apply(set_bn_fix)self.RCNN_top.apply(set_bn_fix)def train(self, mode=True):# Override train so that the training mode is set as we wantnn.Module.train(self, mode)if mode:# Set fixed blocks to be in eval modeself.RCNN_base.eval()self.RCNN_base[5].train()self.RCNN_base[6].train()def set_bn_eval(m):classname = m.__class__.__name__if classname.find('BatchNorm') != -1:m.eval()self.RCNN_base.apply(set_bn_eval)self.RCNN_top.apply(set_bn_eval)def _head_to_tail(self, pool5):fc7 = self.RCNN_top(pool5).mean(3).mean(2)return fc7#--- end final_resnet ---

网络模型部分的3个小节完毕。

part3 训练就不是很难了,看demo.py都可以看懂。

看下面这两篇,主要看个框架,哪4部分,分工分别是什么。

第一次看,细节是看不懂的,还是得从代码里入手去理解每个部分到底做了什么。

//《Faster-rcnn详解》https://blog.csdn.net/WZZ18191171661/article/details/79439212

//《Faster R-CNN论文及源码解读》https://senitco.github.io/2017/09/02/faster-rcnn/

//这位兄弟写的注释也蛮好的,可以看看一共两篇

//https://blog.csdn.net/WYXHAHAHA123/article/details/86099768

//https://blog.csdn.net/WYXHAHAHA123/article/details/86251919

Faster R-CNN 源码解读 (傻瓜版) - Pytorch相关推荐

  1. 【 非线性回归 Logistics-Regression 模块实现与源码解读 深度学习 Pytorch笔记 B站刘二大人(5/10)】

    非线性回归 Logistics-Regression 模块实现与源码解读 深度学习 Pytorch笔记 B站刘二大人(5/10) 数学推导 什么是logistics函数 在定义上Logistic函数或 ...

  2. underscore-1.8.3.js 源码解读全文注释版

    // Underscore.js 1.8.3 // http://underscorejs.org // (c) 2009-2015 Jeremy Ashkenas, DocumentCloud an ...

  3. faster rcnn源码解读(六)之minibatch

    转载自:faster rcnn源码解读(六)之minibatch - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u010668907/article/ ...

  4. faster rcnn源码解读(五)之layer(网络里的input-data)

    转载自:faster rcnn源码解读(五)之layer(网络里的input-data) - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u010668 ...

  5. faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说)

    转载自:faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说) - 野孩子的专栏 - 博客频道 - CSDN.NET ht ...

  6. faster rcnn源码解读(三)train_faster_rcnn_alt_opt.py

    转载自:faster rcnn源码解读(三)train_faster_rcnn_alt_opt.py - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u ...

  7. faster rcnn源码解读总结

    转载自:faster rcnn源码解读总结 - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u010668907/article/details/519 ...

  8. faster rcnn fpn_Faster-RCNN详解和torchvision源码解读(三):特征提取

    我们使用ResNet-50-FPN提取特征 model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) ...

  9. Ubuntu 16.04下Caffe-SSD的应用(四)——ssd_pascal.py源码解读

    前言 caffe-ssd所有的训练时的参数,全部由ssd_pascal.py来定义,之后再去调用相关的脚本和函数,所以想要训练自己的数据,首先要明白ssd_pascal.py各个定义参数的大体意思. ...

  10. YOLOV5源码解读(数据集加载和增强)

    YOLOV5源码解读系列文章目录 数据集加载和增强 loss计算 前言 此篇为yolov5 3.1 版本,官方地址[https://github.com/ultralytics/yolov5] 看源代 ...

最新文章

  1. 与jQuery的感情碰撞——由浅入深学jQuery
  2. Python学习:推导式
  3. 起点低,是彪悍的最好证明!
  4. 复盘人生第一次科研经历
  5. PyTorch学习—2.张量的创建与张量的操作
  6. HDOJ 2870 Largest Submatrix
  7. 安庆师范大学计算机学院书记,安庆师范大学计算机与信息学院导师教师师资介绍简介-施赵媛...
  8. Matlab plot画图线型、符号及颜色
  9. 消除“星期一综合症”--- 大前研一的周末时间分配术
  10. 【python】MAC安装openCV人脸识别
  11. 这篇博客介绍 python 的 re 模块的相关函数/方法,及一些相关符号使用
  12. 记录微信小程序web-view页面分享出去之后没有返回首页按钮,微信小程序WebView页面分享出去后没有返回首页按钮,全局使用的自定义导航【解决办法】
  13. 计算机考研要考科目,2022考研:计算机专业需要准备哪些科目?
  14. 用服务器建立个人网站
  15. 海康机器人线激光立体相机获取体积测量开始时间点和结束测量时间点以及包裹四个顶角位置信息的可行办法
  16. 初创企业数据体系建设
  17. 复制文本功能兼容 微信ios 火狐浏览器
  18. 404错误页面的设置方法步骤(图)附404模板下载
  19. 糖尿病治疗的中西差别
  20. 传智播客Java面试宝典 | 张老师尽心整理的面试宝典大全,面试阿里腾讯不成问题。西边人西说测试

热门文章

  1. Supercell:靠两款手游如何做到30亿美金市值?
  2. 【CVPR 2021】 Lifelong Person Re-Identification via Adaptive Knowledge Accumulation
  3. Spark大数据分与实践笔记(第二章 Spark基础-03)
  4. MML ch 10 主成分分析降维(Dimensionality Reduction with Principal Component Analysis)
  5. 基于PCA的图像压缩及人脸识别算法
  6. 【qstock量化】数据篇之宏观指标和财经新闻文本
  7. 小猪佩奇与Tom猫的一场内网友谊赛
  8. 采用 MRT-LBM 模拟旋转圆柱绕流2---MATLAB代码--王富海2017--基于 MRT-LBM 的流场与声场仿真计算
  9. 动态博弈--gyy参考总结
  10. Unity 跑马灯效果