Mask RCNN源码解读

前言
数据集
数据载入
模型搭建
- 模型输入
- 模型输出
- resnet101
- RPN网络
- ProposalLayer
- DetectionTargetLayer
- fpn_classifier_graph
- - ROI Pooling局限性分析
  - ROI Align
  - ROI Align反向传播
  - 代码实现
- Header
- - build_fpn_mask_graph
  - build_fpn_mask_graph
- 模型损失
- - rpn_class_loss_graph
  - rpn_bbox_loss_graph
  - mrcnn_class_loss_graph
  - mrcnn_bbox_loss_graph
  - mrcnn_mask_loss_graph
模型训练
模型评估
- mIoU
- PixelAccuracy

前言

在前面写的一篇博客写到实例分割以及Mask-RCNN相关的内容（Mask RCNN综述以及建筑物实例分割）。最近工作中在钻研Mask-RCNN，于是看了一遍代码，虽然大部分还是看得不太明白，但也有那么一点点的收获，所以写下这篇博客记录那一点点的收获。关于MaskRCNN相关的内容也可以参考：MaskRCNN源码解析等或者百度谷歌关键字“Mask RCNN”、“源码解析”就会有很多相关的文章，在此就不一一指路了。我的Mask-RCNN源码放在我的github上：Mask RCNN，喜欢的可以给我个star，非常感谢！
代码结构如下所示

数据集

实例分割数据集文件夹结构包括三个部分：img、mask以及yaml。这三个文件夹分别保存的是原图、标签（掩膜图像）以及实例yml文件。在这里要特别说明下yml文件，yml文件记录的是原图中的每个目标的实例名称，如在一张图里面有三个人，标签id分别对应为1，2，3，那么对应的yml文件为：

person
person
person

可以简单认为标签mask的id标签对应yml的文件里类别的顺序。具体的关于数据集的制作可以参考：图像分割数据集制作

数据载入

数据载入模块主要是在utils下的CustomerDataset.py和dataset.文件下。CustomerDataset.py里面含有CustomerDataset这个类。针对数据载入，主要优化是将Mask预处理成.npz文件。在训练过程中，数据载入部分耗时非常的长，导致在数据集大的时候每个epoch耗时非常长，比如数据集有500多张，训练阶段一晚上才跑了三四个epoch。这一部分效率低主要集中Mask的预处理。下图红色方框部分是对mask的处理，众所周知，实例分割是对每个目标实例进行分割，mask处理是将每个实例抽取出来，如下黄圈所示。比如一张图像是512×512512\times512512×512，里面包含有3个实例。那么处理后的mask的shape为：[512,512,3][512, 512, 3][512,512,3]，第一个[:,:,0][:, :, 0][:,:,0]为第一个实例，第二个[:,:,1][:, :, 1][:,:,1]为第二个实例，第二个[:,:,2][:, :, 2][:,:,2]为第三个实例。预处理部分代码包括三重循环：首先遍历mask里面每一个实例，对每个实例然后遍历整个mask图像找到每个实例对应的像素位置。在这一过程中就需要遍历num_obj∗image_sizenum\_obj*image\_sizenum_obj∗image_size这么多次，而在训练集里面每张图包含几十个实例，预处理阶段耗时非常长。因此我预先把他保存成npz文件，然后训练的时候通过加载npz文件即可。具体优化代码可以参考：generateMaskNpz.py，这份代码是mask预处理的优化代码。

类CustomerDataset继承了类Dataset。类Dataset里面有image_id，image_info，class_info，source_class_ids属性。类也包括一些基础类方法，如载入图像的功能，载入mask的功能。下面来看下类Dataset属性：

image_id：图像的id，1，2，3…以此类推。
image_info：字典类型，里面包括图像的id，图像的来源，图像的路径。图像的来源（source）是配置文件里面的NAME字段，作用好像不是特别大
class_info：标签的信息，包括id以及名称
source_class_ids：类别的id，包括背景。
最终我们通过debug控制台可以看到类的信息：

在了解CustomerDataset后，我们看下训练集的载入，主要是以下几行代码：

dataset_train = CustomerDataset()
dataset_train.load_shapes(config.NAME,len(train_imglist), config.CLASSES, img_floder, mask_floder, train_imglist, yaml_floder)
dataset_train.prepare()
train_generator = data_generator(dataset_train, config, shuffle=True,batch_size=config.BATCHSIZE)

在数据准备好之后，我们需要构造生成器data_generator函数。在了解生成器的功能之前要清楚网络的输入：

images: [batch, H, W, C]
image_meta: [batch, (meta data)] 图像详细信息。
rpn_match: [batch, N] 代表建议框的匹配情况 (1=正样本, -1=负样本, 0=中性)
rpn_bbox: [batch, N, (dy, dx, log(dh), log(dw))] 建议框网络应该有的预测结果.
gt_class_ids: [batch, MAX_GT_INSTANCES] 种类ID
gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)]
gt_masks: [batch, height, width, MAX_GT_INSTANCES].

下图的这部分代码跟YOLOv3源码里面的process_true_box的函数功能一致。大致是计算每个feature_map下的真实先验框。在配置文件中设置了下采样：[4, 8, 16, 32, 64]，需要计算每个feature map下的先验框。这也提醒我们是喂入网络的图像大小是64的倍数。

处理完不同尺度的feature map的真实anchor下，再创建rpn的目标，具体代码如下：

def build_rpn_targets(image_shape, anchors, gt_class_ids, gt_boxes, config):# 1代表前景# -1代表背景# 0代表忽略rpn_match = np.zeros([anchors.shape[0]], dtype=np.int32)# 创建该部分内容利用先验框和真实框进行编码rpn_bbox = np.zeros((config.RPN_TRAIN_ANCHORS_PER_IMAGE, 4))'''iscrowd=0的时候，表示这是一个单独的物体，轮廓用Polygon(多边形的点)表示，iscrowd=1的时候表示两个没有分开的物体，轮廓用RLE编码表示，比如说一张图片里面有三个人，一个人单独站一边，另外两个搂在一起（标注的时候距离太近分不开了），这个时候，单独的那个人的注释里面的iscrowing=0,segmentation用Polygon表示，而另外两个用放在同一个anatation的数组里面用一个segmention的RLE编码形式表示'''crowd_ix = np.where(gt_class_ids < 0)[0]if crowd_ix.shape[0] > 0:non_crowd_ix = np.where(gt_class_ids > 0)[0]crowd_boxes = gt_boxes[crowd_ix]gt_class_ids = gt_class_ids[non_crowd_ix]gt_boxes = gt_boxes[non_crowd_ix]crowd_overlaps = utils.compute_overlaps(anchors, crowd_boxes)crowd_iou_max = np.amax(crowd_overlaps, axis=1)no_crowd_bool = (crowd_iou_max < 0.001)else:no_crowd_bool = np.ones([anchors.shape[0]], dtype=bool)# 计算先验框和真实框的重合程度 [num_anchors, num_gt_boxes]overlaps = utils.compute_overlaps(anchors, gt_boxes)# 1. 重合程度小于0.3则代表为负样本anchor_iou_argmax = np.argmax(overlaps, axis=1)anchor_iou_max = overlaps[np.arange(overlaps.shape[0]), anchor_iou_argmax]rpn_match[(anchor_iou_max < 0.3) & (no_crowd_bool)] = -1# 2. 每个真实框重合度最大的先验框是正样本gt_iou_argmax = np.argwhere(overlaps == np.max(overlaps, axis=0))[:,0]rpn_match[gt_iou_argmax] = 1# 3. 重合度大于0.7则代表为正样本rpn_match[anchor_iou_max >= 0.7] = 1# 正负样本平衡# 找到正样本的索引ids = np.where(rpn_match == 1)[0]# 如果大于(config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2)则删掉一些extra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE // 2)if extra > 0:ids = np.random.choice(ids, extra, replace=False)rpn_match[ids] = 0# 找到负样本的索引ids = np.where(rpn_match == -1)[0]# 使得总数为config.RPN_TRAIN_ANCHORS_PER_IMAGEextra = len(ids) - (config.RPN_TRAIN_ANCHORS_PER_IMAGE -np.sum(rpn_match == 1))if extra > 0:# Rest the extra ones to neutralids = np.random.choice(ids, extra, replace=False)rpn_match[ids] = 0# 找到内部真实存在物体的先验框，进行编码ids = np.where(rpn_match == 1)[0]ix = 0 for i, a in zip(ids, anchors[ids]):gt = gt_boxes[anchor_iou_argmax[i]]# 计算真实框的中心，高宽gt_h = gt[2] - gt[0]gt_w = gt[3] - gt[1]gt_center_y = gt[0] + 0.5 * gt_hgt_center_x = gt[1] + 0.5 * gt_w# 计算先验框中心，高宽a_h = a[2] - a[0]a_w = a[3] - a[1]a_center_y = a[0] + 0.5 * a_ha_center_x = a[1] + 0.5 * a_w# 编码运算rpn_bbox[ix] = [(gt_center_y - a_center_y) / np.maximum(a_h, 1),(gt_center_x - a_center_x) / np.maximum(a_w, 1),np.log(np.maximum(gt_h / np.maximum(a_h, 1), 1e-5)),np.log(np.maximum(gt_w / np.maximum(a_w, 1), 1e-5)),]# 改变数量级rpn_bbox[ix] /= config.RPN_BBOX_STD_DEVix += 1return rpn_match, rpn_bbox

模型搭建

简单的模型结构如上图所示，backbone为resnet101，提取的feature map送入RPN网络得到Proposals。Proposals经过RoIAlign后，经过三个分支：分类分支、框回归分支以及mask分支分别得到目标的掩膜、类别以及坐标。在了解这些组件之前，先熟悉模型的输入以及输出。

模型输入输出部分的代码：

inputs = [input_image, input_image_meta,input_rpn_match, input_rpn_bbox, input_gt_class_ids, input_gt_boxes, input_gt_masks]if not config.USE_RPN_ROIS:inputs.append(input_rois)
outputs = [rpn_class_logits, rpn_class, rpn_bbox,mrcnn_class_logits, mrcnn_class, mrcnn_bbox, mrcnn_mask,rpn_rois, output_rois,rpn_class_loss, rpn_bbox_loss, class_loss, bbox_loss, mask_loss]
model = KM.Model(inputs, outputs, name='mask_rcnn')
return model

模型输入

input_image：表示输入的图像
input_image_meta：表示输入图像的信息
input_rpn_match：RPN建议框网络的真实信息，匹配程度
input_rpn_bbox：RPN建议狂网络的框
input_gt_class_ids：每个真实框对应的类别
input_gt_boxes：真实框的位置
input_gt_masks：真实框的语义分割情况

模型输出

rpn_class_logits、rpn_class、rpn_bbox：RPN网络的输出
mrcnn_class_logits、mrcnn_class、mrcnn_bbox、mrcnn_mask：mrcnn的输出
rpn_rois：经过RPN网络筛选后的2000个rois
output_rois：同RPN网络
rpn_class_loss、rpn_bbox_loss：RPN网络损失
class_loss、bbox_loss、mask_loss：损失

resnet101

resnet在此不做详细介绍，输出的5个特征图：[C1, C2, C3, C4, C5]

def get_resnet(input_image,stage5=False, train_bn=True):# Stage 1x = ZeroPadding2D((3, 3))(input_image)x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=True)(x)x = BatchNormalization(name='bn_conv1')(x, training=train_bn)x = Activation('relu')(x)# Height/4,Width/4,64C1 = x = MaxPooling2D((3, 3), strides=(2, 2), padding="same")(x)# Stage 2x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1), train_bn=train_bn)x = identity_block(x, 3, [64, 64, 256], stage=2, block='b', train_bn=train_bn)# Height/4,Width/4,256C2 = x = identity_block(x, 3, [64, 64, 256], stage=2, block='c', train_bn=train_bn)# Stage 3x = conv_block(x, 3, [128, 128, 512], stage=3, block='a', train_bn=train_bn)x = identity_block(x, 3, [128, 128, 512], stage=3, block='b', train_bn=train_bn)x = identity_block(x, 3, [128, 128, 512], stage=3, block='c', train_bn=train_bn)# Height/8,Width/8,512C3 = x = identity_block(x, 3, [128, 128, 512], stage=3, block='d', train_bn=train_bn)# Stage 4x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a', train_bn=train_bn)block_count = 22for i in range(block_count):x = identity_block(x, 3, [256, 256, 1024], stage=4, block=chr(98 + i), train_bn=train_bn)# Height/16,Width/16,1024C4 = x# Stage 5if stage5:x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a', train_bn=train_bn)x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b', train_bn=train_bn)# Height/32,Width/32,2048C5 = x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c', train_bn=train_bn)else:C5 = Nonereturn [C1, C2, C3, C4, C5]

得到[C1, C2, C3, C4, C5]后组合成特征金字塔的结构，最终得到特征图，是一个

# 组合成特征金字塔的结构# P5长宽共压缩了5次# Height/32,Width/32,256P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c5p5')(C5)# P4长宽共压缩了4次# Height/16,Width/16,256P4 = KL.Add(name="fpn_p4add")([KL.UpSampling2D(size=(2, 2), name="fpn_p5upsampled")(P5),KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c4p4')(C4)])# P4长宽共压缩了3次# Height/8,Width/8,256P3 = KL.Add(name="fpn_p3add")([KL.UpSampling2D(size=(2, 2), name="fpn_p4upsampled")(P4),KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c3p3')(C3)])# P4长宽共压缩了2次# Height/4,Width/4,256P2 = KL.Add(name="fpn_p2add")([KL.UpSampling2D(size=(2, 2), name="fpn_p3upsampled")(P3),KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (1, 1), name='fpn_c2p2')(C2)])# 各自进行一次256通道的卷积，此时P2、P3、P4、P5通道数相同# Height/4,Width/4,256P2 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p2")(P2)# Height/8,Width/8,256P3 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p3")(P3)# Height/16,Width/16,256P4 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p4")(P4)# Height/32,Width/32,256P5 = KL.Conv2D(config.TOP_DOWN_PYRAMID_SIZE, (3, 3), padding="SAME", name="fpn_p5")(P5)# 在建议框网络里面还有一个P6用于获取建议框# Height/64,Width/64,256P6 = KL.MaxPooling2D(pool_size=(1, 1), strides=2, name="fpn_p6")(P5)# P2, P3, P4, P5, P6可以用于获取建议框rpn_feature_maps = [P2, P3, P4, P5, P6]# P2, P3, P4, P5用于获取mask信息mrcnn_feature_maps = [P2, P3, P4, P5]

生成mrcnn_feature_maps后通过get_anchors获取特征金字塔上面的所有anchor。在特征金字塔的特征图上以每个像素为中心，以配置文件的anchor大小为宽高，生成anchor。根据特征图相对原图缩小的比例，还原到原始的输入图像上，即生成anchor在原始图像上的坐标。

RPN网络

RPN网络作者已经给出了解释：

"""Builds a Keras model of the Region Proposal Network.It wraps the RPN graph so it can be used multiple times with sharedweights.anchors_per_location: number of anchors per pixel in the feature mapanchor_stride: Controls the density of anchors. Typically 1 (anchors forevery pixel in the feature map), or 2 (every other pixel).depth: Depth of the backbone feature map.Returns a Keras Model object. The model outputs, when called, are:rpn_class_logits: [batch, H * W * anchors_per_location, 2] Anchor classifier logits (before softmax)rpn_probs: [batch, H * W * anchors_per_location, 2] Anchor classifier probabilities.rpn_bbox: [batch, H * W * anchors_per_location, (dy, dx, log(dh), log(dw))] Deltas to beapplied to anchors."""

但是看rpn_graph函数中有一点要注意：rpn_class_logits：Anchor classifier logits (before softmax)

def rpn_graph(feature_map, anchors_per_location, anchor_stride):shared = KL.Conv2D(512, (3, 3), padding='same', activation='relu',strides=anchor_stride,name='rpn_conv_shared')(feature_map)x = KL.Conv2D(2 * anchors_per_location, (1, 1), padding='valid',activation='linear', name='rpn_class_raw')(shared)# batch_size,num_anchors,2# 代表这个先验框对应的类rpn_class_logits = KL.Reshape([-1,2])(x)rpn_probs = KL.Activation("softmax", name="rpn_class_xxx")(rpn_class_logits)x = KL.Conv2D(anchors_per_location * 4, (1, 1), padding="valid",activation='linear', name='rpn_bbox_pred')(shared)# batch_size,num_anchors,4# 这个先验框的调整参数rpn_bbox = KL.Reshape([-1,4])(x)return [rpn_class_logits, rpn_probs, rpn_bbox]

rpn_graph()函数，对传进来的特征图先统一做3*3的卷积，将通道数转换为512维。然后分别进入分类和回归操作：

分类操作中，先用1×11\times11×1的卷积核对上一步的结果进行卷积，得到2×32\times 32×3维数据，再reshape成[N,w×h×3,2][N,w\times h\times3,2][N,w×h×3,2]，N表示batch_size大小，w×h×3w\times h\times3w×h×3表示该特征图共生成多少个anchors，2表示正样本和负样本相应数据的两个维度。rpn_class_logits用于后面计算rpn分类损失，rpn_probs表示正样本和负样本的置信度。
在回归操作中，先用1×11\times11×1的卷积核对上一步的结果进行卷积，得到4×34\times 34×3维数据，再reshape成[N,w×h×3,4][N,w\times h\times3,4][N,w×h×3,4]，N表示batch_size大小，w×h×3w\times h\times3w×h×3表示该特征图共生成多少个anchors，4表示预测框的四个坐标。rpn_bbox用于后面计算rpn回归损失。

ProposalLayer

将开始得到的anchor与RPN网络结合在一起。首先对输出的概率进行排序，保留概率大的部分。然后选取相对应的anchor，利用RPN的输出回归值对anchor进行第一次修正，修正后利用非极大值抑制，获取最终的anchor。其作用主要有：

根据RPN网络，获取score靠前的6000个anchor
利用rpn_bbox对anchors进行修正
舍弃修正后不合规则的anchor，如超出图像大小的anchor
通过非极大值抑制获取最终的anchor

class ProposalLayer(KL.Layer):def __init__(self, proposal_count, nms_threshold, config=None, **kwargs):super(ProposalLayer, self).__init__(**kwargs)self.config = configself.proposal_count = proposal_countself.nms_threshold = nms_threshold# [rpn_class, rpn_bbox, anchors]def call(self, inputs):# 代表这个先验框内部是否有物体[batch, num_rois, 1]scores = inputs[0][:, :, 1]# 代表这个先验框的调整参数[batch, num_rois, 4]deltas = inputs[1]# [0.1 0.1 0.2 0.2]，改变数量级deltas = deltas * np.reshape(self.config.RPN_BBOX_STD_DEV, [1, 1, 4])# Anchorsanchors = inputs[2]# 筛选出得分前6000个的框pre_nms_limit = tf.minimum(self.config.PRE_NMS_LIMIT, tf.shape(anchors)[1])# 获得这些框的索引ix = tf.nn.top_k(scores, pre_nms_limit, sorted=True,name="top_anchors").indices# 获得这些框的得分scores = utils.batch_slice([scores, ix], lambda x, y: tf.gather(x, y),self.config.IMAGES_PER_GPU)# 获得这些框的调整参数deltas = utils.batch_slice([deltas, ix], lambda x, y: tf.gather(x, y),self.config.IMAGES_PER_GPU)# 获得这些框对应的先验框pre_nms_anchors = utils.batch_slice([anchors, ix], lambda a, x: tf.gather(a, x),self.config.IMAGES_PER_GPU,names=["pre_nms_anchors"])# [batch, N, (y1, x1, y2, x2)]# 对先验框进行解码boxes = utils.batch_slice([pre_nms_anchors, deltas],lambda x, y: apply_box_deltas_graph(x, y),self.config.IMAGES_PER_GPU,names=["refined_anchors"])# [batch, N, (y1, x1, y2, x2)]# 防止超出图片范围window = np.array([0, 0, 1, 1], dtype=np.float32)boxes = utils.batch_slice(boxes,lambda x: clip_boxes_graph(x, window),self.config.IMAGES_PER_GPU,names=["refined_anchors_clipped"])# 非极大抑制def nms(boxes, scores):indices = tf.image.non_max_suppression(boxes, scores, self.proposal_count,self.nms_threshold, name="rpn_non_max_suppression")proposals = tf.gather(boxes, indices)# 如果数量达不到设置的建议框数量的话# 就paddingpadding = tf.maximum(self.proposal_count - tf.shape(proposals)[0], 0)proposals = tf.pad(proposals, [(0, padding), (0, 0)])return proposalsproposals = utils.batch_slice([boxes, scores], nms,self.config.IMAGES_PER_GPU)if not context.executing_eagerly():# Infer the static output shape:out_shape = self.compute_output_shape(None)proposals.set_shape(out_shape)return proposalsdef compute_output_shape(self, input_shape):return (None, self.proposal_count, 4)

DetectionTargetLayer

DetectionTargetLayer主要对上一步的Proposal选出的2000个rois做进一步筛选，得到用于训练的rois，输入包含了target_rois, input_gt_class_ids, gt_boxes, input_gt_masks。其中target_rois是ProposalLayer输出的结果。

计算target_rois中的每一个rois和哪一个真实的框gt_boxes iou值，如果最大的iou大于0.5，则被认为是正样本，负样本是是iou小于0.5并且和crowd box相交不大的anchor，选择出了正负样本，还要保证样本的均衡性，具体可以才配置文件中进行配置。
计算了正样本中的anchor和哪一个真实的框最接近，用真实的框和anchor计算出偏移值，并且将mask的大小resize成28*28的，这些都是后面的分类和mask网络要用到的真实的值。

class DetectionTargetLayer(KL.Layer):"""找到建议框的ground_truthInputs:proposals: [batch, N, (y1, x1, y2, x2)]建议框gt_class_ids: [batch, MAX_GT_INSTANCES]每个真实框对应的类gt_boxes: [batch, MAX_GT_INSTANCES, (y1, x1, y2, x2)]真实框的位置gt_masks: [batch, height, width, MAX_GT_INSTANCES]真实框的语义分割情况Returns: rois: [batch, TRAIN_ROIS_PER_IMAGE, (y1, x1, y2, x2)]内部真实存在目标的建议框target_class_ids: [batch, TRAIN_ROIS_PER_IMAGE]每个建议框对应的类target_deltas: [batch, TRAIN_ROIS_PER_IMAGE, (dy, dx, log(dh), log(dw)]每个建议框应该有的调整参数target_mask: [batch, TRAIN_ROIS_PER_IMAGE, height, width]每个建议框语义分割情况"""def __init__(self, config, **kwargs):super(DetectionTargetLayer, self).__init__(**kwargs)self.config = configdef call(self, inputs):proposals = inputs[0]gt_class_ids = inputs[1]gt_boxes = inputs[2]gt_masks = inputs[3]# 对真实框进行编码names = ["rois", "target_class_ids", "target_bbox", "target_mask"]outputs = utils.batch_slice([proposals, gt_class_ids, gt_boxes, gt_masks],lambda w, x, y, z: detection_targets_graph(w, x, y, z, self.config),self.config.IMAGES_PER_GPU, names=names)return outputsdef compute_output_shape(self, input_shape):return [(None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # rois(None, self.config.TRAIN_ROIS_PER_IMAGE),  # class_ids(None, self.config.TRAIN_ROIS_PER_IMAGE, 4),  # deltas(None, self.config.TRAIN_ROIS_PER_IMAGE, self.config.MASK_SHAPE[0],self.config.MASK_SHAPE[1])  # masks]def compute_mask(self, inputs, mask=None):return [None, None, None, None]

fpn_classifier_graph

建立分类模型，调整预测结果的建议框个分类，最终得到的预测框。在这里需要提一下RoIAlign以及RoIPooling，以下内容摘抄自：详解 ROI Align 的基本原理和实现细节

ROI Pooling局限性分析

在常见的两级检测框架（比如Fast-RCNN，Faster-RCNN，RFCN）中，ROI Pooling 的作用是根据预选框的位置坐标在特征图中将相应区域池化为固定尺寸的特征图，以便进行后续的分类和包围框回归操作。由于预选框的位置通常是由模型回归得到的，一般来讲是浮点数，而池化后的特征图要求尺寸固定。故ROI Pooling这一操作存在两次量化的过程。

将候选框边界量化为整数点坐标值。
将量化后的边界区域平均分割成 k×kk\times kk×k个单元(bin),对每一个单元的边界进行量化。

事实上，经过上述两次量化，此时的候选框已经和最开始回归出来的位置有一定的偏差，这个偏差会影响检测或者分割的准确度。在论文里，作者把它总结为“不匹配问题（misalignment）。

下面我们用直观的例子具体分析一下上述区域不匹配问题。如下图所示，这是一个Faster-RCNN检测框架。输入一张800×800800\times800800×800的图片，图片上有一个665×665665\times665665×665的包围框(框着一只狗)。图片经过主干网络提取特征后，特征图缩放步长（stride）为32。因此，图像和包围框的边长都是输入时的1/32。800正好可以被32整除变为25。但665除以32以后得到20.78，带有小数，于是ROI Pooling 直接将它量化成20。接下来需要把框内的特征池化7×77\times77×7的大小，因此将上述包围框平均分割成7×77\times77×7个矩形区域。显然，每个矩形区域的边长为2.86，又含有小数。于是ROI Pooling 再次把它量化到2。经过这两次量化，候选区域已经出现了较明显的偏差（如图中绿色部分所示）。更重要的是，该层特征图上0.1个像素的偏差，缩放到原图就是3.2个像素。那么0.8的偏差，在原图上就是接近30个像素点的差别，这一差别不容小觑。

ROI Align

为了解决ROI Pooling的上述缺点，作者提出了ROI Align这一改进的方法。ROI Align的思路很简单：取消量化操作，使用双线性内插的方法获得坐标为浮点数的像素点上的图像数值,从而将整个特征聚集过程转化为一个连续的操作。值得注意的是，在具体的算法操作上，ROI Align并不是简单地补充出候选区域边界上的坐标点，然后将这些坐标点进行池化，而是重新设计了一套比较优雅的流程，如下所示：

遍历每一个候选区域，保持浮点数边界不做量化。
将候选区域分割成k×kk\times kk×k个单元，每个单元的边界也不做量化。
在每个单元中计算固定四个坐标位置，用双线性内插的方法计算出这四个位置的值，然后进行最大池化操作。

这里对上述步骤的第三点作一些说明：这个固定位置是指在每一个矩形单元（bin）中按照固定规则确定的位置。比如，如果采样点数是1，那么就是这个单元的中心点。如果采样点数是4，那么就是把这个单元平均分割成四个小方块以后它们分别的中心点。显然这些采样点的坐标通常是浮点数，所以需要使用插值的方法得到它的像素值。在相关实验中，作者发现将采样点设为4会获得最佳性能，甚至直接设为1在性能上也相差无几。事实上，ROI Align 在遍历取样点的数量上没有ROIPooling那么多，但却可以获得更好的性能，这主要归功于解决了misalignment的问题。值得一提的是，我在实验时发现，ROI Align在VOC2007数据集上的提升效果并不如在COCO上明显。经过分析，造成这种区别的原因是COCO上小目标的数量更多，而小目标受misalignment问题的影响更大（比如，同样是0.5个像素点的偏差，对于较大的目标而言显得微不足道，但是对于小目标，误差的影响就要高很多）。

ROI Align反向传播

常规的ROI Pooling的反向传播公式如下：

这里，xix_ixi代表池化前特征图上的像素点；yrjy_{rj}yrj代表池化后的第r个候选区域的第j个点；i∗(r,j)i*(r,j)i∗(r,j)代表点yrjy_{rj}yrj像素值的来源（最大池化的时候选出的最大像素值所在点的坐标）。由上式可以看出，只有当池化后某一个点的像素值在池化过程中采用了当前点xix_ixi的像素值（即满足i=i∗(r，j))i=i*(r，j))i=i∗(r，j))，才在xix_ixi处回传梯度。

类比于ROIPooling，ROIAlign的反向传播需要作出稍许修改：首先，在ROIAlign中，xi×(r,j)x_i\times(r,j)xi×(r,j)是一个浮点数的坐标位置(前向传播时计算出来的采样点)，在池化前的特征图中，每一个与 xi×(r,j)x_i\times(r,j)xi×(r,j) 横纵坐标均小于1的点都应该接受与此对应的点yrjy_{rj}yrj回传的梯度，故ROI Align 的反向传播公式如下:

代码实现

class PyramidROIAlign(KL.Layer):def __init__(self, pool_shape, **kwargs):super(PyramidROIAlign, self).__init__(**kwargs)self.pool_shape = tuple(pool_shape)def call(self, inputs):# 建议框的位置boxes = inputs[0]# image_meta包含了一些必要的图片信息image_meta = inputs[1]# 取出所有的特征层[batch, height, width, channels]feature_maps = inputs[2:]y1, x1, y2, x2 = tf.split(boxes, 4, axis=2)h = y2 - y1w = x2 - x1# 获得输入进来的图像的大小image_shape = parse_image_meta_graph(image_meta)['image_shape'][0]# 通过建议框的大小找到这个建议框属于哪个特征层image_area = tf.cast(image_shape[0] * image_shape[1], tf.float32)roi_level = log2_graph(tf.sqrt(h * w) / (224.0 / tf.sqrt(image_area)))roi_level = tf.minimum(5, tf.maximum(2, 4 + tf.cast(tf.round(roi_level), tf.int32)))# batch_size, box_numroi_level = tf.squeeze(roi_level, 2)# Loop through levels and apply ROI pooling to each. P2 to P5.pooled = []box_to_level = []# 分别在P2-P5中进行截取for i, level in enumerate(range(2, 6)):# 找到每个特征层对应boxix = tf.compat.v1.where(tf.equal(roi_level, level))level_boxes = tf.gather_nd(boxes, ix)box_to_level.append(ix)# 获得这些box所属的图片box_indices = tf.cast(ix[:, 0], tf.int32)# 停止梯度下降level_boxes = tf.stop_gradient(level_boxes)box_indices = tf.stop_gradient(box_indices)# Result: [batch * num_boxes, pool_height, pool_width, channels]pooled.append(tf.image.crop_and_resize(feature_maps[i], level_boxes, box_indices, self.pool_shape,method="bilinear"))pooled = tf.concat(pooled, axis=0)# 将顺序和所属的图片进行堆叠box_to_level = tf.concat(box_to_level, axis=0)box_range = tf.expand_dims(tf.range(tf.shape(box_to_level)[0]), 1)box_to_level = tf.concat([tf.cast(box_to_level, tf.int32), box_range],axis=1)# box_to_level[:, 0]表示第几张图# box_to_level[:, 1]表示第几张图里的第几个框sorting_tensor = box_to_level[:, 0] * 100000 + box_to_level[:, 1]# 进行排序，将同一张图里的某一些聚集在一起ix = tf.nn.top_k(sorting_tensor, k=tf.shape(box_to_level)[0]).indices[::-1]# 按顺序获得图片的索引ix = tf.gather(box_to_level[:, 2], ix)pooled = tf.gather(pooled, ix)# 重新reshape为原来的格式# 也就是# Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]shape = tf.concat([tf.shape(boxes)[:2], tf.shape(pooled)[1:]], axis=0)pooled = tf.reshape(pooled, shape)return pooleddef compute_output_shape(self, input_shape):return input_shape[0][:2] + self.pool_shape + (input_shape[2][-1], )

Header

build_fpn_mask_graph

该网络是maskrcnn的最后一层，与之并行的还有一个mask分支。
输入：

rois: [batch, num_rois, (y1, x1, y2, x2)] Proposal boxes in normalized coordinates. 归一化坐标
feature_maps: List of feature maps from different layers of the pyramid,[P2, P3, P4, P5]. Each has a different resolution. 每个都有不同的分辨率。
image_meta: [batch, (meta data)] Image details. See compose_image_meta() 1+3+3+4+1+80=92
pool_size: The width of the square feature map generated from ROI Pooling. 由ROI合并生成的方形特征图的宽度。
num_classes: number of classes, which determines the depth of the results 类的数量，它决定结果的深度
train_bn: Boolean. Train or freeze Batch Norm layers
fc_layers_size: Size of the 2 FC layers 全连接层大小

输出：

logits: [batch, num_rois, NUM_CLASSES] classifier logits (before softmax) 分类器logits（在softmax之前）
probs: [batch, num_rois, NUM_CLASSES] classifier probabilities 分类器概率
bbox_deltas: [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))] Deltas to apply to
proposal boxes 预选框的偏移量

def fpn_classifier_graph(rois, feature_maps, image_meta,pool_size, num_classes, train_bn=True,fc_layers_size=1024):# ROI Pooling，利用建议框在特征层上进行截取# Shape: [batch, num_rois, POOL_SIZE, POOL_SIZE, channels]x = PyramidROIAlign([pool_size, pool_size],name="roi_align_classifier")([rois, image_meta] + feature_maps)# Shape: [batch, num_rois, 1, 1, fc_layers_size]，相当于两次全连接x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (pool_size, pool_size), padding="valid"),name="mrcnn_class_conv1")(x)x = KL.TimeDistributed(KL.BatchNormalization(), name='mrcnn_class_bn1')(x, training=train_bn)x = KL.Activation('relu')(x)# Shape: [batch, num_rois, 1, 1, fc_layers_size]x = KL.TimeDistributed(KL.Conv2D(fc_layers_size, (1, 1)),name="mrcnn_class_conv2")(x)x = KL.TimeDistributed(KL.BatchNormalization(), name='mrcnn_class_bn2')(x, training=train_bn)x = KL.Activation('relu')(x)# Shape: [batch, num_rois, fc_layers_size]shared = KL.Lambda(lambda x: K.squeeze(K.squeeze(x, 3), 2),name="pool_squeeze")(x)# Classifier head# 这个的预测结果代表这个先验框内部的物体的种类mrcnn_class_logits = KL.TimeDistributed(KL.Dense(num_classes),name='mrcnn_class_logits')(shared)mrcnn_probs = KL.TimeDistributed(KL.Activation("softmax"),name="mrcnn_class")(mrcnn_class_logits)# BBox head# 这个的预测结果会对先验框进行调整# [batch, num_rois, NUM_CLASSES * (dy, dx, log(dh), log(dw))]x = KL.TimeDistributed(KL.Dense(num_classes * 4, activation='linear'),name='mrcnn_bbox_fc')(shared)# Reshape to [batch, num_rois, NUM_CLASSES, (dy, dx, log(dh), log(dw))]s = K.int_shape(x)if s[1] is None:mrcnn_bbox = KL.Reshape((-1, num_classes, 4), name="mrcnn_bbox")(x)else:mrcnn_bbox = KL.Reshape((s[1], num_classes, 4), name="mrcnn_bbox")(x)return mrcnn_class_logits, mrcnn_probs, mrcnn_bbox

build_fpn_mask_graph

实际上是一个FCN网络：https://zhuanlan.zhihu.com/p/30195134

def build_fpn_mask_graph(rois, feature_maps, image_meta,pool_size, num_classes, train_bn=True):# ROI Pooling，利用建议框在特征层上进行截取# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = PyramidROIAlign([pool_size, pool_size],name="roi_align_mask")([rois, image_meta] + feature_maps)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv1")(x)x = KL.TimeDistributed(KL.BatchNormalization(),name='mrcnn_mask_bn1')(x, training=train_bn)x = KL.Activation('relu')(x)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv2")(x)x = KL.TimeDistributed(KL.BatchNormalization(),name='mrcnn_mask_bn2')(x, training=train_bn)x = KL.Activation('relu')(x)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv3")(x)x = KL.TimeDistributed(KL.BatchNormalization(),name='mrcnn_mask_bn3')(x, training=train_bn)x = KL.Activation('relu')(x)# Shape: [batch, num_rois, MASK_POOL_SIZE, MASK_POOL_SIZE, channels]x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"),name="mrcnn_mask_conv4")(x)x = KL.TimeDistributed(KL.BatchNormalization(),name='mrcnn_mask_bn4')(x, training=train_bn)x = KL.Activation('relu')(x)# Shape: [batch, num_rois, 2xMASK_POOL_SIZE, 2xMASK_POOL_SIZE, channels]x = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"),name="mrcnn_mask_deconv")(x)# 反卷积后再次进行一个1x1卷积调整通道，使其最终数量为numclasses，代表分的类x = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1, activation="sigmoid"),name="mrcnn_mask")(x)return x

模型损失

Mask RCNN中总共有五个损失函数，分别是rpn网络的两个损失，mrcnn的两个损失，以及mask分支的损失函数。
前四个损失函数与fasterrcnn的损失函数一样，最后的mask损失函数的采用的是mask分支对于每个RoI有K×m2K\times m^2K×m2维度的输出。K个（类别数）分辨率为m×mm\times mm×m的二值mask。 Lmask为平均二值交叉熵损失（the average binary cross - entropy loss）. 对于一个属于第k个类别的RoI， Lmask仅仅考虑第k个mask（其他的掩模输入不会贡献到损失函数中）。这样的定义会允许对每个类别都会生成掩模，并且不会存在类间竞争。

rpn_class_loss_graph

def rpn_class_loss_graph(rpn_match, rpn_class_logits):"""建议框分类损失函数"""# 在最后一维度添加一维度rpn_match = tf.squeeze(rpn_match, -1)# 获得正样本anchor_class = K.cast(K.equal(rpn_match, 1), tf.int32)# 获得未被忽略的样本indices = tf.where(K.not_equal(rpn_match, 0))# 获得预测结果和实际结果rpn_class_logits = tf.gather_nd(rpn_class_logits, indices)anchor_class = tf.gather_nd(anchor_class, indices)# 计算二者之间的交叉熵loss = K.sparse_categorical_crossentropy(target=anchor_class,output=rpn_class_logits,from_logits=True)loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))loss = K.switch(tf.math.is_nan(loss), tf.constant([0.0]), loss)return loss

rpn_bbox_loss_graph

def rpn_bbox_loss_graph(config, target_bbox, rpn_match, rpn_bbox):"""建议框回归损失"""# 在最后一维度添加一维度rpn_match = K.squeeze(rpn_match, -1)# 获得正样本indices = tf.where(K.equal(rpn_match, 1))# 获得预测结果与实际结果rpn_bbox = tf.gather_nd(rpn_bbox, indices)# 将目标边界框修剪为与rpn_bbox相同的长度。batch_counts = K.sum(K.cast(K.equal(rpn_match, 1), tf.int32), axis=1)target_bbox = batch_pack_graph(target_bbox, batch_counts,config.IMAGES_PER_GPU)# 计算smooth_l1损失函数loss = smooth_l1_loss(target_bbox, rpn_bbox)loss = K.switch(tf.size(loss) > 0, K.mean(loss), tf.constant(0.0))loss = K.switch(tf.math.is_nan(loss), tf.constant([0.0]), loss)return loss

mrcnn_class_loss_graph

def mrcnn_class_loss_graph(target_class_ids, pred_class_logits,active_class_ids):"""classifier的分类损失函数"""# 目标信息target_class_ids = tf.cast(target_class_ids, 'int64')# 预测信息pred_class_ids = tf.argmax(pred_class_logits, axis=2)pred_active = tf.gather(active_class_ids[0], pred_class_ids)# 求二者交叉熵损失loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=target_class_ids, logits=pred_class_logits)# 去除无用的损失loss = loss * pred_active# 求平均loss = tf.reduce_sum(loss) / tf.maximum(tf.reduce_sum(pred_active), 1)return loss

mrcnn_bbox_loss_graph

def mrcnn_bbox_loss_graph(target_bbox, target_class_ids, pred_bbox):"""classifier的回归损失函数"""# Reshapetarget_class_ids = K.reshape(target_class_ids, (-1,))target_bbox = K.reshape(target_bbox, (-1, 4))pred_bbox = K.reshape(pred_bbox, (-1, K.int_shape(pred_bbox)[2], 4))# 只有属于正样本的建议框用于训练positive_roi_ix = tf.where(target_class_ids > 0)[:, 0]positive_roi_class_ids = tf.cast(tf.gather(target_class_ids, positive_roi_ix), tf.int64)indices = tf.stack([positive_roi_ix, positive_roi_class_ids], axis=1)# 获得对应预测结果与实际结果target_bbox = tf.gather(target_bbox, positive_roi_ix)pred_bbox = tf.gather_nd(pred_bbox, indices)# Smooth-L1 Lossloss = K.switch(tf.size(target_bbox) > 0,smooth_l1_loss(y_true=target_bbox, y_pred=pred_bbox),tf.constant(0.0))loss = K.mean(loss)return loss

mrcnn_mask_loss_graph

def mrcnn_mask_loss_graph(target_masks, target_class_ids, pred_masks):"""交叉熵损失"""target_class_ids = K.reshape(target_class_ids, (-1,))# 实际结果mask_shape = tf.shape(target_masks)target_masks = K.reshape(target_masks, (-1, mask_shape[2], mask_shape[3]))# 预测结果pred_shape = tf.shape(pred_masks)pred_masks = K.reshape(pred_masks, (-1, pred_shape[2], pred_shape[3], pred_shape[4]))# 进行维度变换 [N, num_classes, height, width]pred_masks = tf.transpose(pred_masks, [0, 3, 1, 2])# 只有正样本有效positive_ix = tf.where(target_class_ids > 0)[:, 0]positive_class_ids = tf.cast(tf.gather(target_class_ids, positive_ix), tf.int64)indices = tf.stack([positive_ix, positive_class_ids], axis=1)# 获得实际结果与预测结果y_true = tf.gather(target_masks, positive_ix)y_pred = tf.gather_nd(pred_masks, indices)# shape: [batch, roi, num_classes]loss = K.switch(tf.size(y_true) > 0,K.binary_crossentropy(target=y_true, output=y_pred),tf.constant(0.0))loss = K.mean(loss)return loss

模型训练

建立模型是一个非常艰难的过程，建立模型后可以进行训练设置，比如callbacks、optimize等。

callbacks

tensorboard = tf.keras.callbacks.TensorBoard(log_dir=MODEL_DIR,histogram_freq=0, write_graph=True, write_images=False)
model_ckp= tf.keras.callbacks.ModelCheckpoint(os.path.join(MODEL_DIR, "building_new.h5"),verbose=0, save_weights_only=True)
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=1)
learning_rate_reduce = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=0.3, patience=3, verbose=1)
callbacks = [tensorboard, model_ckp, early_stop, learning_rate_reduce]

优化器

optimizer = tf.keras.optimizers.Adam(lr=learning_rate, clipnorm=config.GRADIENT_CLIP_NORM)

模型评估

模型的评估在evaluate.py文件中，使用mIoU(类别平均交并比)和MeanPixel Accuracy(平均像素精度)。

mIoU

def IoU_calculate(pred, target, n_classes):ious = []# ignore IOU for background classfor item in range(1, n_classes):pred_inds =pred==itemtarget_inds = target==itemintersection = (pred_inds[target_inds]).sum()union = pred_inds.sum()+target_inds.sum()-intersectionif union==0:# if there is no ground true, do not include in evaluationious.append(float('nan'))else:ious.append(float(intersection)/float(max(union, 1)))return ious# numpy版本def all_iou(a, b, n):'''a: ground true, shape:h*wb: prediction, shape: h*wn: class'''# 找出ground true中需要的类别k = (a>0)&(a<=n)return np.bincount(n*a[k].astype(int)+b[k], minlength=n**2).reshape(n, n)def per_class_iou(hist):'''分别为每个类别计算mIoU'''# 矩阵的对角线上的值组成的一维数组/矩阵的所有元素之和return np.diag(hist)/(hist.sum(1)+hist.sum(0)-np.diag(hist))def mIoU_metric(pred, target, n_classes):hist = np.zeros((n_classes, n_classes))# 对图像进行计算hist矩阵并累加hist+= all_iou(target.flattern(), pred.flattern(), n_classes)# 计算每个类别的ioumIoUs = per_class_iou(hist)for ind_class in range(n_classes):print(str(round(mIoUs[ind_class]*100, 2)))print('--->mIoU：'+str(round(np.nanmean(mIoUs)*100, 2)))return mIoUs

PixelAccuracy

class Evaluator(object):def __init__(self, num_class) -> None:super().__init__()self.num_class = num_classself.confusion_matrix = np.zeros((self.num_class, )*2)def Pixel_Accuracy(self):Acc = np.diag(self.confusion_matrix).sum()/self.confusion_matrix.sum()return Accdef Pixel_Accuracy_Class(self):Acc = np.diag(self.confusion_matrix)/self.confusion_matrix.sum(axis=1)Acc = np.nanmean(Acc)return Accdef Mean_Intersection_over_Union(self):MIoU = np.diag(self.confusion_matrix) / (np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -np.diag(self.confusion_matrix))MIoU = np.nanmean(MIoU)return MIoUdef Frequency_Weighted_Intersection_over_Union(self):freq = np.sum(self.confusion_matrix, axis=1) / np.sum(self.confusion_matrix)iu = np.diag(self.confusion_matrix) / (np.sum(self.confusion_matrix, axis=1) + np.sum(self.confusion_matrix, axis=0) -np.diag(self.confusion_matrix))FWIoU = (freq[freq > 0] * iu[freq > 0]).sum()return FWIoUdef _generate_matrix(self, gt_image, pre_image):mask = (gt_image >= 0) & (gt_image < self.num_class)label = self.num_class * gt_image[mask].astype('int') + pre_image[mask]count = np.bincount(label, minlength=self.num_class**2)confusion_matrix = count.reshape(self.num_class, self.num_class)return confusion_matrixdef add_batch(self, gt_image, pre_image):assert gt_image.shape == pre_image.shapeself.confusion_matrix += self._generate_matrix(gt_image, pre_image)def reset(self):self.confusion_matrix = np.zeros((self.num_class,) * 2)

Mask RCNN源码解读相关推荐

faster rcnn源码解读（六）之minibatch
转载自:faster rcnn源码解读(六)之minibatch - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u010668907/article/ ...
faster rcnn源码解读（五）之layer（网络里的input-data）
转载自:faster rcnn源码解读(五)之layer(网络里的input-data) - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u010668 ...
faster rcnn源码解读（四）之数据类型imdb.py和pascal_voc.py（主要是imdb和roidb数据类型的解说）
转载自:faster rcnn源码解读(四)之数据类型imdb.py和pascal_voc.py(主要是imdb和roidb数据类型的解说) - 野孩子的专栏 - 博客频道 - CSDN.NET ht ...
faster rcnn源码解读（三）train_faster_rcnn_alt_opt.py
转载自:faster rcnn源码解读(三)train_faster_rcnn_alt_opt.py - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u ...
faster rcnn源码解读总结
转载自:faster rcnn源码解读总结 - 野孩子的专栏 - 博客频道 - CSDN.NET http://blog.csdn.net/u010668907/article/details/519 ...
还不懂目标检测嘛？一起来看看Faster R-CNN源码解读
Mask Scoring R-CNN——源码运行记录
Mask Scoring R-CNN--源码运行记录最近在跑该模型,遇到了很多问题,github上给的东西不足以将这个模型给正常运行起来,所以在此记录一下 github源码地址 1.环境说明各个版 ...
faster rcnn fpn_Faster-RCNN详解和torchvision源码解读（三）：特征提取
我们使用ResNet-50-FPN提取特征 model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True) ...
【Faster R-CNN论文精度系列】从Faster R-CNN源码中，我们“学习”到了什么？
[Faster R-CNN论文精度系列] (如下为建议阅读顺序) 1[Faster R-CNN论文精度系列]从Faster R-CNN源码中,我们"学习"到了什么? 2[Faste ...

Mask RCNN源码解读

Mask RCNN源码解读

前言

数据集

数据载入

模型搭建

模型输入

模型输出

resnet101

RPN网络

ProposalLayer

DetectionTargetLayer

fpn_classifier_graph

ROI Pooling局限性分析

ROI Align

ROI Align反向传播

代码实现

Header

build_fpn_mask_graph

build_fpn_mask_graph

模型损失

rpn_class_loss_graph

rpn_bbox_loss_graph

mrcnn_class_loss_graph

mrcnn_bbox_loss_graph

mrcnn_mask_loss_graph

模型训练

模型评估

mIoU

PixelAccuracy

Mask RCNN源码解读相关推荐

最新文章

热门文章