FasterRCNN详解

1.2.2 FasterRCNN
- 1 模型
- - 1.1 主干网络VGG16 or ResNet50.
  - 1.2 RPN生成建议框
  - 1.3 RCNN进行分类和回归
- 2 预测
- - 2.1 预测流程
- 3 训练
- - 3.1 训练流程
  - 3.2 生成标签
  - 3.3 损失函数

1.2.2 FasterRCNN

FasterRCNN在FastRCNN的基础上，实现端到端的训练。算法分为3个部分。主干网络提取特征、RPN生成建议框、RCNN进行分类和回归。

FasterRCNN优点：

检测精度高。
RPN网络生成先验框。
通用型、鲁棒性强。
可优化空间充足。

FasterRCNN缺点：

主干网络只提取一个特征。
NMS对遮挡物不友好，可能造成漏检。
RoI Pooling两次取证带精度损失。
网络最后使用全连接导致网络参数量大。
检测速度相对慢。

1 模型

1.1 主干网络VGG16 or ResNet50.

input[800,600,3] --> VGGNet(4次下采样) / ResNet50–> Output[37,50,512]

(1) VGG16代理流程
VGG16以卷积和池化作为基本结构，输入[600,600,3]进行5次下采样得到base_layers[37,37,512]。

'''
base_layers = VGG16(inputs)
x --> Conv2D*2 + MaxPoolong2D --> Conv2D*2 + MaxPoolong2D  --> Conv2D*3 + MaxPoolong2D --> Conv2D*3 + MaxPoolong2D --> Conv2D*3
'''
def VGG16(inputs):x = Conv2D(64,(3,3),activation = 'relu',padding = 'same',name = 'block1_conv1')(inputs)x = Conv2D(64,(3,3),activation = 'relu',padding = 'same', name = 'block1_conv2')(x)x = MaxPooling2D((2,2), strides = (2,2), name = 'block1_pool')(x)x = Conv2D(128,(3,3),activation = 'relu',padding = 'same',name = 'block2_conv1')(x)x = Conv2D(128,(3,3),activation = 'relu',padding = 'same',name = 'block2_conv2')(x)x = MaxPooling2D((2,2),strides = (2,2), name = 'block2_pool')(x)x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv1')(x)x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv2')(x)x = Conv2D(256,(3,3),activation = 'relu',padding = 'same',name = 'block3_conv3')(x)x = MaxPooling2D((2,2),strides = (2,2), name = 'block3_pool')(x)# 第四个卷积部分# 14,14,512x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv1')(x)x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv2')(x)x = Conv2D(512,(3,3),activation = 'relu',padding = 'same', name = 'block4_conv3')(x)x = MaxPooling2D((2,2),strides = (2,2), name = 'block4_pool')(x)# 第五个卷积部分# 7,7,512x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv1')(x)x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv2')(x)x = Conv2D(512,(3,3),activation = 'relu', padding = 'same', name = 'block5_conv3')(x)    return x

(2) ResNet50代码流程
ResNet50以卷积和池化作为基本结构，输入[600,600,3]进行4次下采样得到base_layers[38,38,1024]。ResNet50有conv_block 、identity_block两个基础结构。

'''
base_layers = ResNet50(inputs)
conv_block ：BottleNeck + ResNet(下采样)
identity_block ：BottleNeck + ResNet(无下采样)
ResNet50 ：ZCBAM --> conv_block + identity_block *2 --> conv_block + identity_block *3 --> conv_block + identity_block *5
'''
def identity_block(input_tensor, kernel_size, filters, stage, block):filters1, filters2, filters3 = filtersconv_name_base  = 'res' + str(stage) + block + '_branch'bn_name_base    = 'bn' + str(stage) + block + '_branch'x = Conv2D(filters1, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2a')(input_tensor)x = BatchNormalization(name=bn_name_base + '2a')(x)x = Activation('relu')(x)x = Conv2D(filters2, kernel_size, padding='same', kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2b')(x)x = BatchNormalization(name=bn_name_base + '2b')(x)x = Activation('relu')(x)x = Conv2D(filters3, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2c')(x)x = BatchNormalization(name=bn_name_base + '2c')(x)x = layers.add([x, input_tensor])x = Activation('relu')(x)return xdef conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):filters1, filters2, filters3 = filtersconv_name_base  = 'res' + str(stage) + block + '_branch'bn_name_base    = 'bn' + str(stage) + block + '_branch'x = Conv2D(filters1, (1, 1), strides=strides, kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2a')(input_tensor)x = BatchNormalization(name=bn_name_base + '2a')(x)x = Activation('relu')(x)x = Conv2D(filters2, kernel_size, padding='same', kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2b')(x)x = BatchNormalization(name=bn_name_base + '2b')(x)x = Activation('relu')(x)x = Conv2D(filters3, (1, 1), kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '2c')(x)x = BatchNormalization(name=bn_name_base + '2c')(x)shortcut = Conv2D(filters3, (1, 1), strides=strides, kernel_initializer=random_normal(stddev=0.02), name=conv_name_base + '1')(input_tensor)shortcut = BatchNormalization(name=bn_name_base + '1')(shortcut)x = layers.add([x, shortcut])x = Activation('relu')(x)return xdef ResNet50(inputs):#-----------------------------------##   假设输入进来的图片是600,600,3#-----------------------------------## 600,600,3 -> 300,300,64x = ZeroPadding2D((3, 3))(inputs)x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)x = BatchNormalization(name='bn_conv1')(x)x = Activation('relu')(x)# 300,300,64 -> 150,150,64x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)# 150,150,64 -> 150,150,256x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')# 150,150,256 -> 75,75,512x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')# 75,75,512 -> 38,38,1024x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')# 最终获得一个38,38,1024的共享特征层return x

1.2 RPN生成建议框

在FeatureMap上每个点生成9个Anchors。
FeatureMap分别进行卷积得到置信度和预测偏移。
Anchors与标签匹配得到正负样本，对预测偏移和置信度进行Loss。
根据预测得分、预测偏移抠图得到Proposals。

'''
rpn = get_rpn(base_layers, num_anchors)
base_layers[38,38,1024] -->  Conv2D((3, 3),512) -->  Conv2D((1, 1),num_anchors) +  Conv2D((3, 3),num_anchors * 4) --> Reshape --> x_class, x_regr]
'''
def get_rpn(base_layers, num_anchors):#----------------------------------------------------##   利用一个512通道的3x3卷积进行特征整合#----------------------------------------------------#x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer=random_normal(stddev=0.02), name='rpn_conv1')(base_layers)#----------------------------------------------------##   利用一个1x1卷积调整通道数，获得预测结果#----------------------------------------------------#x_class = Conv2D(num_anchors, (1, 1), activation = 'sigmoid', kernel_initializer=random_normal(stddev=0.02), name='rpn_out_class')(x)x_regr  = Conv2D(num_anchors * 4, (1, 1), activation = 'linear', kernel_initializer=random_normal(stddev=0.02), name='rpn_out_regress')(x)x_class = Reshape((-1, 1),name="classification")(x_class)x_regr  = Reshape((-1, 4),name="regression")(x_regr)return [x_class, x_regr]

1.3 RCNN进行分类和回归

对上一步的到Proposals进行卷积得到分类结果和预测框的偏移。
根据到Proposals和groudtruth 的IoU划分正负样本。
根据正负样本，计算Loss。

'''
classifier  = get_vgg_classifier(base_layers, roi_input, 7, num_classes)base_layers, input_rois --> RoiPoolingConv(roi_size) -->  vgg_classifier_layers --> TimeDistributed(Dense) +  TimeDistributed(Dense)  --> out_class, out_regr
'''
def get_vgg_classifier(base_layers, input_rois, roi_size=7, num_classes=21):  # batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512out_roi_pool = RoiPoolingConv(roi_size)([base_layers, input_rois])# batch_size, num_rois, 7, 7, 512 -> batch_size, num_rois, 4096out = vgg_classifier_layers(out_roi_pool)# batch_size, num_rois, 4096 -> batch_size, num_rois, num_classesout_class   = TimeDistributed(Dense(num_classes, activation='softmax', kernel_initializer=random_normal(stddev=0.02)), name='dense_class_{}'.format(num_classes))(out)# batch_size, num_rois, 4096 -> batch_size, num_rois, 4 * (num_classes-1)out_regr    = TimeDistributed(Dense(4 * (num_classes-1), activation='linear', kernel_initializer=random_normal(stddev=0.02)), name='dense_regress_{}'.format(num_classes))(out)return [out_class, out_regr]

RoIPooling : batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512

'''
# batch_size, 37, 37, 512 -> batch_size, num_rois, 7, 7, 512
# roi_input   = Input(shape=(None, 4))
# 用RPN得到的建议框在FeatureMap上截取下来，并Pooling。
'''

out = vgg_classifier_layers(out_roi_pool)

'''
def vgg_classifier_layers(x):# num_rois, 14, 14, 1024 -> num_rois, 7, 7, 2048x = TimeDistributed(Flatten(name='flatten'))(x)x = TimeDistributed(Dense(4096, activation='relu'), name='fc1')(x)x = TimeDistributed(Dense(4096, activation='relu'), name='fc2')(x)return x
'''

2 预测

2.1 预测流程

r_image = frcnn.detect_image(image)

对输入原图像处理。
获得rpn网络预测结果和base_layer。
生成先验框并解码。
利用建议框获得classifier网络预测结果。
利用classifier的预测结果对建议框进行解码，获得预测框。
图像绘制。

(1) rpn网络进行预测得到置信度和预测偏移值

'''
preds = self.model_rpn.predict(photo)
1. model_rpn.predict: photo[1,600,600,3] --> model_rpn --> preds[x_class, x_regr, base_layers]2. model_rpn: input[1,600,600,3]  -->  ResNet50(inputs) --> base_layers + num_anchors=9  -->  get_rpn(base_layers, num_anchors) --> rpn3. ResNet50: (inputs[1,600,600,3]) --> ZCBAM(None, 150, 150, 64) --> conv_block + identity_block*2   (None, 150, 150, 256) --> conv_block + identity_block*3   (None, 75, 75, 512)--> conv_block + identity_block*5  --> base_layers(None, 38, 38, 1024)4. get_rpn:  base_layers(None, 38, 38, 1024)-->  Conv2D(512) (None, 38, 38, 512)--> Conv2D(num_anchors)(None, 38, 38, 9),Conv2D(num_anchors * 4)(None, 38, 38, 36) --> x_class(None, 12996, 1) , x_regr(None, 12996, 4) , base_layers(None, 38, 38, 1024)
'''

(1.1) ResNet50

'''
base_layers = ResNet50(inputs) # [1,600,600,3] --> (None, 38, 38, 1024)
'''
def ResNet50(inputs):# 输入 600*600*3img_input = inputsx = ZeroPadding2D((3, 3))(img_input)x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1')(x)                  # 300*300*64x = BatchNormalization(name='bn_conv1')(x)x = Activation('relu')(x)x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)              # 150*150*64x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))  # 150*150*256x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')                 # 75*75*512x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')                 # 38*38*1024x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')return x   # (None, 38, 38, 1024)

(1.2) get_rpn

'''
rpn = get_rpn(base_layers, num_anchors) # (None, 38, 38, 1024) --> x_class(None, 12996, 1) , x_regr(None, 12996, 4)
'''
def get_rpn(base_layers, num_anchors):# 1 base_layers进行3*3卷积x = Conv2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)# 2 得到每个框的置信度和框坐标x_class = Conv2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)x_regr = Conv2D(num_anchors * 4, (1, 1), activation='linear', kernel_initializer='zero', name='rpn_out_regress')(x)x_class = Reshape((-1,1),name="classification")(x_class)  # 如果包含物体，x_class的值接近1。x_regr = Reshape((-1,4),name="regression")(x_regr)return [x_class, x_regr, base_layers] #  x_class(None, 12996, 1) , x_regr(None, 12996, 4) , base_layers(None, 38, 38, 1024)

(2) 生成先验框

生成框边长。
边长和中心点结合，每个像素点生成框
缩放框在0，1之间

'''
anchors = get_anchors(self.get_img_output_length(width,height),width,height)
self.get_img_output_length(width,height): 根据输入特征层的大小得到输出特征的大小。600 --> [300,150,75,38]
[height:600,width:600]
'''
# 生成先验框
def get_anchors(shape,width,height):''' 获得先验框shape： featuremap.shape ,img.width,img.height '''# 1 生成框边长anchors = generate_anchors()# 2 边长和中心点结合，每个像素点生成框network_anchors = shift(shape,anchors)# 3 缩放框在0，1之间network_anchors[:,0] = network_anchors[:,0]/widthnetwork_anchors[:,1] = network_anchors[:,1]/heightnetwork_anchors[:,2] = network_anchors[:,2]/widthnetwork_anchors[:,3] = network_anchors[:,3]/heightnetwork_anchors = np.clip(network_anchors,0,1)  # 框的坐标限制在0～1范围内。return network_anchors

(2.1) 生成框边长

根据sizes、ratios确定宽高array([[ 128., 256., 512., 128., 256., 512., 256., 512., 1024.],[ 128., 256., 512., 256., 512., 1024., 128., 256., 512.]])
把宽高变为原来的1/2，方便后面根据框的中心点和宽高转化为框的左上角和右下角坐标。

'''
anchors = generate_anchors()
'''
def generate_anchors(sizes=None, ratios=None):''' 生成大小不同的先验框的边长一共九个[[ -64.,  -64.,   64.,   64.],[-128., -128.,  128.,  128.],[-256., -256.,  256.,  256.],[ -64., -128.,   64.,  128.],[-128., -256.,  128.,  256.],[-256., -512.,  256.,  512.],[-128.,  -64.,  128.,   64.],[-256., -128.,  256.,  128.],[-512., -256.,  512.,  256.]]'''if sizes is None:sizes = config.anchor_box_scales  # [128, 256, 512]if ratios is None:ratios =  config.anchor_box_ratios #  [[1, 1], [1, 2], [2, 1]]# 框的数目num_anchors = len(sizes) * len(ratios) # 3*3# 放置框anchors = np.zeros((num_anchors, 4))  #[9,4]anchors[:, 2:] = np.tile(sizes, (2, len(ratios))).T  # 把size复制成[2,3]for i in range(len(ratios)):anchors[3*i:3*i+3, 2] = anchors[3*i:3*i+3, 2]*ratios[i][0]anchors[3*i:3*i+3, 3] = anchors[3*i:3*i+3, 3]*ratios[i][1]anchors[:, 0::2] -= np.tile(anchors[:, 2] * 0.5, (2, 1)).Tanchors[:, 1::2] -= np.tile(anchors[:, 3] * 0.5, (2, 1)).Treturn anchors

(2.2) 边长和中心点结合，每个像素点生成框

把原图根据特征层次的宽高把img划分为38*38的小方格。
确定每个方格中心点的坐标。
根据中心点坐标和宽高计算先验框的左上角和右下角坐标。

'''
network_anchors = shift(shape,anchors)
'''
def shift(shape, anchors, stride=config.rpn_stride):''' 生成网格中心点 self.rpn_stride = 16 ，根据边长和中心点生成框 What\How \Whybase_layers(None, 38, 38, 1024)self.rpn_stride = 16[0,1,2,3,....37][0.5,1.5,2.5,....,37.5][0.5,1.5,2.5,....,37.5]*stride'''shift_x = (np.arange(0, shape[0], dtype=keras.backend.floatx()) + 0.5) * strideshift_y = (np.arange(0, shape[1], dtype=keras.backend.floatx()) + 0.5) * strideshift_x, shift_y = np.meshgrid(shift_x, shift_y)shift_x = np.reshape(shift_x, [-1])  # (1444,)shift_y = np.reshape(shift_y, [-1])  # (1444,)shifts = np.stack([shift_x,shift_y,shift_x,shift_y], axis=0)  # (4, 1444)shifts            = np.transpose(shifts)  # (1444, 4)number_of_anchors = np.shape(anchors)[0]  # 9k = np.shape(shifts)[0]# 下面两步操作得到框的左上角和右下角坐标 [1,9,4] + [1444,1,4]shifted_anchors = np.reshape(anchors, [1, number_of_anchors, 4]) + np.array(np.reshape(shifts, [k, 1, 4]), keras.backend.floatx())shifted_anchors = np.reshape(shifted_anchors, [k * number_of_anchors, 4])return shifted_anchors  # (12996, 4)

(3) 将预测结果进行解码 + nms筛选得到建议框

解码。
对解码后对框处理：取出得分高于confidence_threshold的框、进行iou的非极大抑制。
按照置信度进行排序、选出置信度最大的keep_top_k个。

'''
rpn_results = self.bbox_util.detection_out(preds,anchors,1,confidence_threshold=0.8)predspreds : x_class(None, 12996, 1) , x_regr(None, 12996, 4) anchors(12996, 4)
'''def detection_out(self, predictions, mbox_priorbox, num_classes, keep_top_k=300,confidence_threshold=0.5):''' 对PRN网络框预测结果用先验框解码、NMS '''mbox_conf = predictions[0] # 类别mbox_loc = predictions[1]  # 网络预测的结果# 先验框数量mbox_priorbox = mbox_priorboxresults = []# 对每一个图片进行处理for i in range(len(mbox_loc)):results.append([])# 1 解码 **************************得到预测框的左上角和右下角decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)# 2 对解码后对框处理for c in range(num_classes):c_confs = mbox_conf[i, :, c]c_confs_m = c_confs > confidence_thresholdif len(c_confs[c_confs_m]) > 0:# 取出得分高于confidence_threshold的框boxes_to_process = decode_bbox[c_confs_m]confs_to_process = c_confs[c_confs_m]# 进行iou的非极大抑制feed_dict = {self.boxes: boxes_to_process,self.scores: confs_to_process}# 把标签、置信度、框取出来idx = self.sess.run(self.nms, feed_dict=feed_dict)# 取出在非极大抑制中效果较好的内容good_boxes = boxes_to_process[idx]confs = confs_to_process[idx][:, None]# 将label、置信度、框的位置进行堆叠。labels = c * np.ones((len(idx), 1))c_pred = np.concatenate((labels, confs, good_boxes),axis=1)# 添加进result里results[-1].extend(c_pred)if len(results[-1]) > 0:# 按照置信度进行排序results[-1] = np.array(results[-1])argsort = np.argsort(results[-1][:, 1])[::-1]results[-1] = results[-1][argsort]# 选出置信度最大的keep_top_k个results[-1] = results[-1][:keep_top_k]# 获得，在所有预测结果里面，置信度比较高的框# 还有，利用先验框和RPN网络的预测结果，处理获得了真实框（预测框）的位置return results

(3.1) 解码得到预测框的左上角和右下角

获取先验框的宽高和中心点。
根据上一步结果和预测偏移带入解码公式得到预测框的中心和宽高值。
预测框的值限制在0～1之间。

'''
decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox)
'''
def decode_boxes(self, mbox_loc, mbox_priorbox):#  1.1 获得先验框的宽与高prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0]prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1]# 1.2 获得先验框的中心点prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0])prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1])# 2 预测的真实框距离先验框中心的xy轴偏移情况decode_bbox_center_x = mbox_loc[:, 0] * prior_width / 4decode_bbox_center_x += prior_center_xdecode_bbox_center_y = mbox_loc[:, 1] * prior_height / 4decode_bbox_center_y += prior_center_y# 预测的真实框的宽与高的求取decode_bbox_width = np.exp(mbox_loc[:, 2] / 4)decode_bbox_width *= prior_widthdecode_bbox_height = np.exp(mbox_loc[:, 3] /4)decode_bbox_height *= prior_height# 获取预测的真实框的左上角与右下角decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_widthdecode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_heightdecode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_widthdecode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height# 真实框的左上角与右下角进行堆叠decode_bbox = np.concatenate((decode_bbox_xmin[:, None],decode_bbox_ymin[:, None],decode_bbox_xmax[:, None],decode_bbox_ymax[:, None]), axis=-1)# 防止超出0与1decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)return decode_bbox

(3.2) NMS
上一步解码后得到预测框，然后遍历预测框的每一个类别，选出置信度大于阈值的预测框，再根据上一步的结果进行NMS删除掉重复的框。

按照置信度对同一类别的框从大到小排列。
计算所有框与第一个框的IoU，如果IoU小于阈值，则保留框，反之删除。
对上一步得到的框重复以上操作，直到待检测的框数量为0.

'''
self.nms = tf.image.non_max_suppression(self.boxes, self.scores,self._top_k,iou_threshold=self._nms_thresh)
'''

(4) P_cls种类，置信度, P_regr

抠图。遍历建议框，在特征图上截取框对应的位置，然后把截取的图统一到[1,32,14,14,1024]

'''[P_cls, P_regr] = self.model_classifier.predict([base_layer,ROIs])
'''
def get_classifier(base_layers, input_rois, num_rois, nb_classes=21, trainable=False):''' roi --> cls+reg'''pooling_regions = 14input_shape = (num_rois, 14, 14, 1024)# base_layers[38,38,1024], input_rois[num_prior,4]  num_prior=32，out_roi_pool.shape[1,32,14,14,1024]# 1 roiPooling，base_layers[38,38,1024], input_rois[-1,4] ,pooling_regions=14, num_rois=32out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois]) # input_rois是建议框# 2out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)out = TimeDistributed(Flatten())(out)out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)return [out_class, out_regr]  # 21, 20*4

(4.1) 抠图

'''out_roi_pool = RoiPoolingConv(pooling_regions, num_rois)([base_layers, input_rois])
'''def call(self, x, mask=None):assert(len(x) == 2)img = x[0]   # featureMaprois = x[1]  # 建议框outputs = []# 遍历建议框for roi_idx in range(self.num_rois):# x,y左上角，wh宽高x = rois[0, roi_idx, 0]y = rois[0, roi_idx, 1]w = rois[0, roi_idx, 2]h = rois[0, roi_idx, 3]x = K.cast(x, 'int32')y = K.cast(y, 'int32')w = K.cast(w, 'int32')h = K.cast(h, 'int32')# 在特征图上截取rs = tf.image.resize_images(img[:, y:y+h, x:x+w, :], (self.pool_size, self.pool_size))outputs.append(rs)final_output = K.concatenate(outputs, axis=0)final_output = K.reshape(final_output, (1, self.num_rois, self.pool_size, self.pool_size, self.nb_channels))final_output = K.permute_dimensions(final_output, (0, 1, 2, 3, 4))return final_output  # [1,32,14,14,1024]

(4.3) 对抠图卷积提取特征

'''
out = classifier_layers(out_roi_pool, input_shape=input_shape, trainable=True)[1,32,14,14,1024] --> (None, 32, 1, 1, 204)
'''
def classifier_layers(x, input_shape=(32, 14, 14, 1024), trainable=False):x = conv_block_td(x, 3, [512, 512, 2048], stage=5, block='a', input_shape=input_shape, strides=(2, 2), trainable=trainable)x = identity_block_td(x, 3, [512, 512, 2048], stage=5, block='b', trainable=trainable)x = identity_block_td(x, 3, [512, 512, 2048], stage=5, block='c', trainable=trainable)x = TimeDistributed(AveragePooling2D((7, 7)), name='avg_pool')(x)  # 对第二维度处理return x   # (None, 32, 1, 1, 204)

(4.4) 分类、回归

'''(None, 32, 1, 1, 204) --> (None, 1, 6528)  --> (None, 1, 21) + (None, 1, 80)
'''
out = TimeDistributed(Flatten())(out)  # (None, 1, 6528)
out_class = TimeDistributed(Dense(nb_classes, activation='softmax', kernel_initializer='zero'), name='dense_class_{}'.format(nb_classes))(out)   # (None, 1, 21)
out_regr = TimeDistributed(Dense(4 * (nb_classes-1), activation='linear', kernel_initializer='zero'), name='dense_regress_{}'.format(nb_classes))(out)  #(None, 1, 80)

3 训练

3.1 训练流程

加载模型
生成数据集
设置参数
训练模型

3.2 生成标签

数据增强。
获取先验框。
真实框与先验框配对，生成标签。
确定正负样本。

'''
gen = Generator(bbox_util, lines, NUM_CLASSES, solid=True)
rpn_train = gen.generate()
'''def generate(self):while True:shuffle(self.train_lines)lines = self.train_linesfor annotation_line in lines:  # 数据增强img,y=self.get_random_data(annotation_line)height, width, _ = np.shape(img)if len(y)==0:continueboxes = np.array(y[:,:4],dtype=np.float32)boxes[:,0] = boxes[:,0]/widthboxes[:,1] = boxes[:,1]/heightboxes[:,2] = boxes[:,2]/widthboxes[:,3] = boxes[:,3]/heightbox_heights = boxes[:,3] - boxes[:,1]box_widths = boxes[:,2] - boxes[:,0]if (box_heights<=0).any() or (box_widths<=0).any():continuey[:,:4] = boxes[:,:4]# 获取先验框anchors = get_anchors(get_img_output_length(width,height),width,height)# 计算真实框对应的先验框，与这个先验框应当有的预测结果assignment = self.bbox_util.assign_boxes(y,anchors)num_regions = 256classification = assignment[: , 4]regression = assignment[:,:]mask_pos = classification[:] > 0num_pos = len(classification[mask_pos])if num_pos > num_regions/2:val_locs = random.sample(range(num_pos), int(num_pos - num_regions/2))classification[mask_pos][val_locs] = -1regression[mask_pos][val_locs,-1] = -1mask_neg = classification[:]==0num_neg = len(classification[mask_neg])if len(classification[mask_neg]) + num_pos > num_regions:val_locs = random.sample(range(num_neg), int(num_neg - num_pos))classification[mask_neg][val_locs] = -1classification = np.reshape(classification,[-1,1])regression = np.reshape(regression,[-1,5])tmp_inp = np.array(img)tmp_targets = [np.expand_dims(np.array(classification,dtype=np.float32),0),np.expand_dims(np.array(regression,dtype=np.float32),0)]yield preprocess_input(np.expand_dims(tmp_inp,0)), tmp_targets, np.expand_dims(y,0)

(1) 数据增强

重新设置图片宽高(800, 600)
反转
调整色调、亮度、饱和度
修正框

'''
img,y=self.get_random_data(annotation_line)
'''
def get_random_data(self, annotation_line, random=True, jitter=.1, hue=.1, sat=1.1, val=1.1, proc_img=True):'''r实时数据增强随机预处理'''line = annotation_line.split()image = Image.open(line[0])iw, ih = image.sizeif self.solid:w,h = self.solid_shapeelse:w, h = get_new_img_size(iw, ih)box = np.array([np.array(list(map(int,box.split(',')))) for box in line[1:]])# resize imagenew_ar = w/h * rand(1-jitter,1+jitter)/rand(1-jitter,1+jitter)scale = rand(.25, 2)if new_ar < 1:nh = int(scale*h)nw = int(nh*new_ar)else:nw = int(scale*w)nh = int(nw/new_ar)image = image.resize((nw,nh), Image.BICUBIC)# place imagedx = int(rand(0, w-nw))dy = int(rand(0, h-nh))new_image = Image.new('RGB', (w,h), (128,128,128))new_image.paste(image, (dx, dy))image = new_image# flip image or notflip = rand()<.5if flip: image = image.transpose(Image.FLIP_LEFT_RIGHT)# distort imagehue = rand(-hue, hue)sat = rand(1, sat) if rand()<.5 else 1/rand(1, sat)val = rand(1, val) if rand()<.5 else 1/rand(1, val)x = rgb_to_hsv(np.array(image)/255.)x[..., 0] += huex[..., 0][x[..., 0]>1] -= 1x[..., 0][x[..., 0]<0] += 1x[..., 1] *= satx[..., 2] *= valx[x>1] = 1x[x<0] = 0image_data = hsv_to_rgb(x)*255 # numpy array, 0 to 1# correct boxesbox_data = np.zeros((len(box),5))if len(box)>0:np.random.shuffle(box)box[:, [0,2]] = box[:, [0,2]]*nw/iw + dxbox[:, [1,3]] = box[:, [1,3]]*nh/ih + dy# flipif flip: box[:, [0,2]] = w - box[:, [2,0]]# 过滤box[:, 0:2][box[:, 0:2]<0] = 0box[:, 2][box[:, 2]>w] = wbox[:, 3][box[:, 3]>h] = hbox_w = box[:, 2] - box[:, 0]box_h = box[:, 3] - box[:, 1]box = box[np.logical_and(box_w>1, box_h>1)] # discard invalid boxbox_data = np.zeros((len(box),5))box_data[:len(box)] = boxif len(box) == 0:return image_data, []if (box_data[:,:4]>0).any():return image_data, box_dataelse:return image_data, []

(2) 获取先验框

生成框边长。
边长和中心点结合，每个像素点生成框。
缩放框在0，1之间。

'''
anchors = get_anchors(get_img_output_length(width,height),width,height)
'''
def get_anchors(shape,width,height):''' 获得先验框shape： featuremap.shape ,img.width,img.height '''# 1 生成框边长anchors = generate_anchors()# 2 边长和中心点结合，每个像素点生成框network_anchors = shift(shape,anchors)# 3 缩放框在0，1之间network_anchors[:,0] = network_anchors[:,0]/widthnetwork_anchors[:,1] = network_anchors[:,1]/heightnetwork_anchors[:,2] = network_anchors[:,2]/widthnetwork_anchors[:,3] = network_anchors[:,3]/heightnetwork_anchors = np.clip(network_anchors,0,1)return network_anchors

(3) 真实框与先验框配对，生成标签

确忽略的框。
找出正样本，并使符合要求每一个先验框只负责一个真实框。

'''
assignment = self.bbox_util.assign_boxes(y,anchors)
'''def assign_boxes(self, boxes, anchors):self.num_priors = len(anchors)self.priors = anchorsassignment = np.zeros((self.num_priors, 4 + 1))assignment[:, 4] = 0.0if len(boxes) == 0:return assignment# 1. 确忽略的框  # 对每一个真实框都进行iou计算ingored_boxes = np.apply_along_axis(self.ignore_box, 1, boxes[:, :4])# 取重合程度最大的先验框，并且获取这个先验框的indexingored_boxes = ingored_boxes.reshape(-1, self.num_priors, 1)# (num_priors)ignore_iou = ingored_boxes[:, :, 0].max(axis=0)# (num_priors)ignore_iou_mask = ignore_iou > 0assignment[:, 4][ignore_iou_mask] = -1# 2. 找出正样本，并使符合要求每一个先验框只负责一个真实框。# (n, num_priors, 5)encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4])# 每一个真实框的编码后的值，和iou# (n, num_priors)encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5)# 取重合程度最大的先验框，并且获取这个先验框的index# (num_priors)best_iou = encoded_boxes[:, :, -1].max(axis=0)# (num_priors)best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0)# (num_priors)best_iou_mask = best_iou > 0# 某个先验框它属于哪个真实框best_iou_idx = best_iou_idx[best_iou_mask]assign_num = len(best_iou_idx)# 保留重合程度最大的先验框的应该有的预测结果# 哪些先验框存在真实框encoded_boxes = encoded_boxes[:, best_iou_mask, :]assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx,np.arange(assign_num),:4]# 4代表为背景的概率，为0assignment[:, 4][best_iou_mask] = 1# 通过assign_boxes我们就获得了，输入进来的这张图片，应该有的预测结果是什么样子的return assignment

(4) 确定正负样本

正负样本共256个。
如果正样本数量超过128个，从所有正样本中抽取(num_pos - 128/2)。
如果所有负样本量与正样本之和大于256，从所有负样本中抽取(num_neg - num_pos)个样本作为负样本。

num_regions = 256classification = assignment[: , 4]
regression = assignment[:,:]mask_pos = classification[:] > 0
num_pos = len(classification[mask_pos])
if num_pos > num_regions/2:val_locs = random.sample(range(num_pos), int(num_pos - num_regions/2))classification[mask_pos][val_locs] = -1regression[mask_pos][val_locs,-1] = -1mask_neg = classification[:] == 0
num_neg = len(classification[mask_neg])
if len(classification[mask_neg]) + num_pos > num_regions:val_locs = random.sample(range(num_neg), int(num_neg - num_pos))classification[mask_neg][val_locs] = -1classification = np.reshape(classification,[-1,1])
regression = np.reshape(regression,[-1,5])tmp_inp = np.array(img)
tmp_targets = [np.expand_dims(np.array(classification,dtype=np.float32),0),np.expand_dims(np.array(regression,dtype=np.float32),0)]

3.3 损失函数

(1) smooth_l1损失函数

找到正样本
代入公式

'''
smooth_l1()
'''
def smooth_l1(sigma=1.0):sigma_squared = sigma ** 2def _smooth_l1(y_true, y_pred):# y_true [batch_size, num_anchor, 4+1]# y_pred [batch_size, num_anchor, 4]regression        = y_predregression_target = y_true[:, :, :-1]anchor_state      = y_true[:, :, -1]# 找到正样本indices           = tf.where(keras.backend.equal(anchor_state, 1))regression        = tf.gather_nd(regression, indices)regression_target = tf.gather_nd(regression_target, indices)# 计算 smooth L1 loss# f(x) = 0.5 * (sigma * x)^2          if |x| < 1 / sigma / sigma#        |x| - 0.5 / sigma / sigma    otherwiseregression_diff = regression - regression_targetregression_diff = keras.backend.abs(regression_diff)regression_loss = tf.where(keras.backend.less(regression_diff, 1.0 / sigma_squared),0.5 * sigma_squared * keras.backend.pow(regression_diff, 2),regression_diff - 0.5 / sigma_squared)normalizer = keras.backend.maximum(1, keras.backend.shape(indices)[0])normalizer = keras.backend.cast(normalizer, dtype=keras.backend.floatx())loss = keras.backend.sum(regression_loss) / normalizerreturn lossreturn _smooth_l1

(2) cls_loss()损失函数

找出存在目标的先验框。
找出实际上为背景的先验框。
分别计算正负样本的交叉熵。

'''
def cls_loss(ratio=3):def _cls_loss(y_true, y_pred):# y_true [batch_size, num_anchor, num_classes+1]# y_pred [batch_size, num_anchor, num_classes]labels         = y_trueanchor_state   = y_true[:,:,-1] # -1 是需要忽略的, 0 是背景, 1 是存在目标classification = y_pred# 找出存在目标的先验框indices_for_object        = tf.where(keras.backend.equal(anchor_state, 1))labels_for_object         = tf.gather_nd(labels, indices_for_object)classification_for_object = tf.gather_nd(classification, indices_for_object)cls_loss_for_object = keras.backend.binary_crossentropy(labels_for_object, classification_for_object)# 找出实际上为背景的先验框indices_for_back        = tf.where(keras.backend.equal(anchor_state, 0))labels_for_back         = tf.gather_nd(labels, indices_for_back)classification_for_back = tf.gather_nd(classification, indices_for_back)# 计算每一个先验框应该有的权重cls_loss_for_back = keras.backend.binary_crossentropy(labels_for_back, classification_for_back)# 标准化，实际上是正样本的数量normalizer_pos = tf.where(keras.backend.equal(anchor_state, 1))normalizer_pos = keras.backend.cast(keras.backend.shape(normalizer_pos)[0], keras.backend.floatx())normalizer_pos = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer_pos)normalizer_neg = tf.where(keras.backend.equal(anchor_state, 0))normalizer_neg = keras.backend.cast(keras.backend.shape(normalizer_neg)[0], keras.backend.floatx())normalizer_neg = keras.backend.maximum(keras.backend.cast_to_floatx(1.0), normalizer_neg)# 将所获得的loss除上正样本的数量cls_loss_for_object = keras.backend.sum(cls_loss_for_object)/normalizer_poscls_loss_for_back = ratio*keras.backend.sum(cls_loss_for_back)/normalizer_neg# 总的lossloss = cls_loss_for_object + cls_loss_for_backreturn lossreturn _cls_loss
'''

FasterRCNN详解相关推荐

Faster-rcnn详解
论文题目:Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 论文链接:论文链接论文代码:M ...
Faster-RCNN详解（个人理解）
1. Faster-RCNN的四个主要内容图1 Faster-RCNN基本结构如上图所示,整个Faster-RCNN模型可以分为四个模块: 1) Conv layers,特征提取网络输入为一张 ...
王权富贵：Faster-Rcnn详解
背景这一切始于 2014 年的一篇论文「Rich feature hierarchies for accurate object detection and semantic segmentatio ...
faster rcnn fpn_Faster-RCNN详解和torchvision源码解读（三）：特征提取
我们使用ResNet-50-FPN提取特征 model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) ...
DL之FasterR-CNN：Faster R-CNN算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之FasterR-CNN:Faster R-CNN算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略目录 Faster R-CNN算法的简介(论文介绍) 1.实验结果 2.三者架构对 ...
Faster-RCNN.pytorch的搭建、使用过程详解（适配PyTorch 1.0以上版本）
Faster-RCNN.pytorch的搭建.使用过程详解引言 faster-rcnn pytorch代码下载 faster-rcnn pytorch配置过程 faster-rcnn pytorch ...
Object Detection--RCNN,SPPNet,Fast RCNN，FasterRCNN论文详解
物体检测图片分类和物体检测的区别输出不同检测的目标不同物体检测算法常用到的概念 Bounding Boxbbox Intersection over UnionIoU 非极大值抑制 Non-M ...
全卷积神经网路【U-net项目实战】语义分割之U-Net详解
文章目录 1.简介 2.U-net典型应用 3. U-net详解 4.参考文献 1.简介语义分割(Semantic Segmentation)是图像处理和机器视觉一个重要分支.与分类任务不同,语义分 ...
一文详解 YOLO 2 与 YOLO 9000 目标检测系统
一文详解 YOLO 2 与 YOLO 9000 目标检测系统 from 雷锋网雷锋网 AI 科技评论按:YOLO 是 Joseph Redmon 和 Ali Farhadi 等人于 2015 年提出 ...
DL之FCN：FCN算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略
DL之FCN:FCN算法的简介(论文介绍).架构详解.案例应用等配图集合之详细攻略目录 FCN算法的简介(论文介绍) 0.FCN性能-实验结果 1.全卷积神经网络的特点.局限性.缺点 FCN算法的架 ...

FasterRCNN详解

FasterRCNN详解

1.2.2 FasterRCNN

1 模型

1.1 主干网络VGG16 or ResNet50.

1.2 RPN生成建议框

1.3 RCNN进行分类和回归

2 预测

2.1 预测流程

3 训练

3.1 训练流程

3.2 生成标签

3.3 损失函数

FasterRCNN详解相关推荐

最新文章

热门文章