主要代码文件路径：
总架构文件: detectron2/detectron2/modeling/meta_arch/rcnn.py
默认配置：detectron2/detectron2/config/defaults.py
RPN_head：detectron2/detectron2/modeling/proposal_generator/rpn.py
Mask_head: detectron2/detectron2/modeling/roi_heads/mask_head.py
fast_rcnn(box_loss) : detectron2/detectron2/modeling/roi_heads/fast_rcnn.py
pooler: detectron2/detectron2/modeling/poolers.py
roi_head: detectron2/detectron2/modeling/roi_heads/roi_heads.py
sampling label: detectron2/detectron2/modeling/sampling.py
match: detectron2/detectron2/modeling/matcher.py

总结：

拖了一年的Mask/Faster R-CNN代码，终于也算是基本结束了，之前好几次迷糊在RPN以及ROI的理论中，之前由于懒，觉得detectron2把Faster R-CNN的代码拆解的太零散了。
当我一年后重新看最初的这篇ICCV 2017 best paper的代码时，很多东西也没有那么难了，很多坑都在后续的一切论文代码中提前适应了（比如对于偏移量deltas的理解，对于pooler RoI Align的操作，在sparse rcnn中，对于FPNbackbone的理解，在FCOS 中，对于nms的操作，对于后续box branch和mask branch的输出的理解等。）这一个学期，从一开始的配置环境迷迷糊糊开始启动training，当时感觉代码太复杂，自己都无法啃的下，到后续的打pdb断点一下一下的调试输出，到现在看完了anchor-free， anchor-based， end-to-end without NMS detection三种类型的detector，这也是一种成长，我相信看完了复杂的检测代码，在去看其他的代码会更加的适应。

1. FPN Backbone

多加一层的p6输出。

    backbone = FPN(bottom_up=bottom_up,in_features=in_features,out_channels=out_channels,  norm=cfg.MODEL.FPN.NORM,top_block=LastLevelMaxPool(), # 多加了一个p6fuse_type=cfg.MODEL.FPN.FUSE_TYPE,)

2. RPN_head

RPN loss的计算是通过IOU，分别ignore，positive，以及negative。
Rpn loss使用的正负样本的比例是1:1，各128个bbox来训练。
传入ROI Head的proposals是通过 objectness 分支的降序排列进行筛选的。
对于使用FPN的Mask R-CNN。
使用pairwise_iou以及matchrpn中pos和neg 以及ignore proposals的设定。

     # forward rpnif self.proposal_generator is not None:proposals, proposal_losses = self.proposal_generator(images, features, gt_instances)

  (proposal_generator): RPN((rpn_head): StandardRPNHead((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)(activation): ReLU())(objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))(anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)))(anchor_generator): DefaultAnchorGenerator((cell_anchors): BufferList()))

# 重要的配置信息
_C.MODEL.RPN.IOU_THRESHOLDS = [0.3, 0.7]
_C.MODEL.RPN.IOU_LABELS = [0, -1, 1]
_C.MODEL.RPN.BATCH_SIZE_PER_IMAGE = 256
_C.MODEL.RPN.POSITIVE_FRACTION = 0.5# nms 操作
# 以下是只用C5层的 anchor分布_C.MODEL.RPN.PRE_NMS_TOPK_TRAIN = 12000
_C.MODEL.RPN.PRE_NMS_TOPK_TEST = 6000# 如果是用FPN 5层类似的结构的话
# 在训练阶段 NMS之前2000个proposals NMS之后1000个
POST_NMS_TOPK_TRAIN: 2000 # 1000
POST_NMS_TOPK_TEST:  2000 # 1000

# 使用FPN的情况下，RPN的每一层的输出分别为anchor_deltas[bs, 4x3, hi, wi] 以及 objectness [bs, 3, hi, wi]。其中4x3表示每一层的anchor的大小有3种比例，4个dim。
features = [features[f] for f in self.in_features]
# anchor是在fpn的每一层上生成的 对应原图
anchors = self.anchor_generator(features) # 使用FPN的情况下 len(anchors) = 5, anchors[0].tensor.size() [184 * 288 * 3ratio, 4]pred_objectness_logits, pred_anchor_deltas = self.rpn_head(features) # forward StandardRPNHead
# eg. pred_anchor_deltas[0].size() 最大的FPN层
# torch.Size([bs, 12, 184, 288])# eg. pred_objectness_logits[0].size()
# torch.Size([4, 3, 184, 288])

2.1 match_label

RPN中的`match_label`表示的是[0,-1,1]# forward pairwise_iou function
# match_quality_matrix [M x N] for each img
# 其中M代表M个gt-boxes N代表N个anchors
match_quality_matrix = retry_if_cuda_oom(pairwise_iou)(gt_boxes_i, anchors)
# forward match function
# 使用IOU_Thresholds 以及 gt_labnel [1, 0, -1] for rpn; [1,0] for roi
# matched_idx 表示match_quality_matrix中每个anchor所匹配到的最大iou的gt-boxes的index
# gt-labels-i belongs [1, 0, -1]
# 可以用max(dim=0)得到每个anchor所匹配到的gt-boxes的最大iou
matched_idxs, gt_labels_i = retry_if_cuda_oom(self.anchor_matcher)(match_quality_matrix) # use IOU=[0,3, 0.7]
Returns:matches (Tensor[int64]): a vector of length N, where matches[i] is a matchedground-truth index in [0, M)match_labels (Tensor[int8]): a vector of length N, where pred_labels[i] indicateswhether a prediction is a true or false positive or ignored

2.2 subsample_label

# 根据256/512 选择正负样本
def subsample_labels(labels: torch.Tensor, num_samples: int, positive_fraction: float, bg_label: int
):num_pos = int(num_samples * positive_fraction)# protect against not enough positive examplesnum_pos = min(positive.numel(), num_pos)num_neg = num_samples - num_pos# protect against not enough negative examplesnum_neg = min(negative.numel(), num_neg)

注意，在rpn loss的sample中，默认是256，其中128是positive，128是negative，但如果pos或者neg满足不到128的话，那么min(num_positives, int(positive_fraction * num_samples))，

或者min(num_negatives, num_samples - num_positives_sampled)

但总保证相加是256，得到2种loss。感觉这里有个小细节不太理解，最后rpn_reg_loss sum后除以normalizer(256 x batchc-size)，可是和cate-loss不同，并没有负样本的box的回归loss。（这个问题在后续计算最后预测的reg_loss中，代码有注释解释。）

2.3 Rpn_loss

def losses(self,anchors: List[Boxes],pred_objectness_logits: List[torch.Tensor],gt_labels: List[torch.Tensor],pred_anchor_deltas: List[torch.Tensor],gt_boxes: List[torch.Tensor],
) -> Dict[str, torch.Tensor]:normalizer = self.batch_size_per_image * num_images # 256 x bslosses = {"loss_rpn_cls": objectness_loss / normalizer,# The original Faster R-CNN paper uses a slightly different normalizer# for loc loss. But it doesn't matter in practice"loss_rpn_loc": localization_loss / normalizer,}# noted that valid_mask != pos_mask

2.4 predict_proposals

解决loss后，得到predict_proposals，过nms操作，传入roi-head中。其中是根据objectness-logits 进行降序排列，每一层输出前1000个。

def predict_proposals(self,anchors: List[Boxes],pred_objectness_logits: List[torch.Tensor],pred_anchor_deltas: List[torch.Tensor],image_sizes: List[Tuple[int, int]],
):        return find_top_rpn_proposals(pred_proposals,pred_objectness_logits,image_sizes,self.nms_thresh,self.pre_nms_topk[self.training], # 2000 self.pre_nms_topk {True: 2000, False: 1000}self.post_nms_topk[self.training], # 1000 {True: 1000, False: 1000}self.min_box_size,self.training,)
'''def find_top_rpn_proposals(...):for level_id, (proposals_i, logits_i) in enumerate(zip(proposals, pred_objectness_logits)):Hi_Wi_A = logits_i.shape[1]if isinstance(Hi_Wi_A, torch.Tensor):  # it's a tensor in tracingnum_proposals_i = torch.clamp(Hi_Wi_A, max=pre_nms_topk)else:num_proposals_i = min(Hi_Wi_A, pre_nms_topk)# sort is faster than topk: https://github.com/pytorch/pytorch/issues/22812# topk_scores_i, topk_idx = logits_i.topk(num_proposals_i, dim=1)logits_i, idx = logits_i.sort(descending=True, dim=1)topk_scores_i = logits_i.narrow(1, 0, num_proposals_i)topk_idx = idx.narrow(1, 0, num_proposals_i)return resultseg. results[0].proposal_boxes.tensor.size() --> [1000, 4]results[0].objectness_logits.size() --> [1000] # sort descending 降序排列
'''

3. ROI_head

ROI Head

inference 通过得分值是否大于0.05来过滤。
最后选出512个proposal，用于进行ROI pooling操作，其中512中正负样本比例为1:3。
ROI pooling以后的操作为 7x7x256 –> 1024, 1024 –> 1024
计算loss的时候，正负样本的比例为0.25
通过pair_iou 生成[M, N]的match矩阵，其中M代表GT ，N代表所有的proposals。接着对porposal 按照iou以及label(0,1,-1)进行排序。subsample_labels通过此函数，来筛选正负样本，设定好各自的比例
rpn_head传入 roi_head有loss以及proposals，在加上之前fpn得到的features，一种传入roi_head中。
其中proposals只有box的4维以及分类得分objectness-logits（没有是什么类别）

        results, _ = self.roi_heads(images, features, proposals, None)

(只存在于训练中！)label_and_sample_proposals : self.batch_size_per_image = 512 self.positive_fraction = 0.25

inference 通过scores来过滤！

3.1 RoI Inference

ROI中inference的设定：

# inference.
_C.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.05
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
_C.MODEL.ROI_HEADS.NMS_THRESH_TEST = 0.5
# If True, augment proposals with ground-truth boxes before sampling proposals to
# train ROI heads.
_C.MODEL.ROI_HEADS.PROPOSAL_APPEND_GT = True

ROI head中的分支结构：

self.box_head:FastRCNNConvFCHead((flatten): Flatten()(fc1): Linear(in_features=12544, out_features=1024, bias=True)(fc_relu1): ReLU()(fc2): Linear(in_features=1024, out_features=1024, bias=True)(fc_relu2): ReLU())
self.box_predictor:FastRCNNOutputLayers((cls_score): Linear(in_features=1024, out_features=81, bias=True)(bbox_pred): Linear(in_features=1024, out_features=320, bias=True))

3.2 forward_box

与之对应的forward_box pooling 特征提取过程，得到检测2任务预测值。

其中，box_pooler中，这个容易和之前的fpn中分配的anchor所混淆，真正运用fpn level公式的地方在于这里。

这里有一个小细节不太理解，features本身是[‘p2’, ‘p3’,‘p4’,‘p5’,‘p6’] 但是self.box_in_features是['p2', 'p3', 'p4', 'p4', 'p5']，那多出来的p6层的fpn特征作用在于啥呢？预设anchor有它，因此proposals中可能包含从p6获得的proposals_boxes，那么x.proposal_boxes中在p6得到的proposal_boxes要把它在p2~p5中crop出来，这是我自己得到的理解。

pred_instances = self._forward_box(features, proposals)
# During inference cascaded prediction is used: the mask and keypoints heads are only
# applied to the top scoring box detections.
pred_instances = self.forward_with_given_boxes(features, pred_instances) # 在inference中, mask分支是建立在pred_instances分支中的。
return pred_instances, {}def _forward_box(..):features = [features[f] for f in self.box_in_features]box_features = self.box_pooler(features, [x.proposal_boxes for x in proposals]) # [512xbatch_size, 256, 7, 7]box-pooler features特征+proposal_boxes所在的位置 除以stridebox_features = self.box_head(box_features) # [512 x batch_size, 1024]predictions = self.box_predictor(box_features) # predictions[0] --> [512 x batch_size, 81]  predictions[1] --> [512 xbatch_size, 320]

最后输出的[1024, 4]的1024是 512 * 2来的，也就是说，通常是[512 x batch_size, 4]，根据默认的配置。

在ROI_HEAD中最后输出的reg分支的四维还是偏移量，这里有个小细节在于，这里的偏移量是针对RPN得到的proposal_boxes的偏移量，因此相当于是“第二次对于anchors的修正”，在算reg_loss的时候需要proposal_boxes+proposal_deltas

class FastRCNNOutputLayers(nn.Module):def loss(..):'''proposal_deltas [R, 320] gt_classes[R]R shall be the number of proposals. eg. 512 x batch_sizegt_classes的作用在于找到proposal_deltas对于gt classes的偏移量，然后proposal_deltas+proposal_boxes'''losses = {"loss_cls": cross_entropy(scores, gt_classes, reduction="mean"), "loss_box_reg": self.box_reg_loss(proposal_boxes, gt_boxes, proposal_deltas, gt_classes),}def box_erg_loss():if self.box_reg_loss_type == "smooth_l1":# smooth_l1则是算proposal-boxes与gt_boxes的delats 和 porposal-boxes与最后预测得到的delatsgt_pred_deltas = self.box2box_transform.get_deltas(proposal_boxes[fg_inds],gt_boxes[fg_inds],)loss_box_reg = smooth_l1_loss(fg_pred_deltas, gt_pred_deltas, self.smooth_l1_beta, reduction="sum")elif self.box_reg_loss_type == "giou": # giou则是算 pred_box + gt_boxes 应用 apply_deltasfg_pred_boxes = self.box2box_transform.apply_deltas(fg_pred_deltas, proposal_boxes[fg_inds])loss_box_reg = giou_loss(fg_pred_boxes, gt_boxes[fg_inds], reduction="sum")# gt_classes.numel() 就是 bs x 512的数量 也就是3.3提到的the total number of regions (R)return loss_box_reg / max(gt_classes.numel(), 1.0)  # return 0 if empty

3.3 reg_loss

关于为什么reg loss也要用the number of the region，代码中给出了注释，为了平衡训练和测试foreground的数量的差距。

        # The reg loss is normalized using the total number of regions (R), not the number# of foreground regions even though the box regression loss is only defined on# foreground regions. Why? Because doing so gives equal training influence to# each foreground example. To see how, consider two different minibatches:#  (1) Contains a single foreground region#  (2) Contains 100 foreground regions# If we normalize by the number of foreground regions, the single example in# minibatch (1) will be given 100 times as much influence as each foreground# example in minibatch (2). Normalizing by the total number of regions, R,# means that the single example in minibatch (1) and each of the 100 examples# in minibatch (2) are given equal influence.

3.4 forward_mask

解决了forward_box中的reg和cls，进入mask R-CNN中的forward_mask分支，在训练中， select_foreground_proposals作用挑选出有gt label的proposal，经过mask 分支提取特征，得到熟悉的[M,256, 14, 14]的mask_features后。

注意，这里的instances(M)是经过了一次筛选得到的，因此M的数量eg.17, 比box_prediction分支少了很多 eg. 512 x batchsize，经过调试，其实在算reg loss中的l1 loss的fg_pred_deltas的维度就是[17, 4] 对应 mask_features的[17, 256, 14, 14]，只不过reg loss要除以the total number of regions 而不是fg regions。

def select_foreground_proposals():for proposals_per_image in proposals:gt_classes = proposals_per_image.gt_classesfg_selection_mask = (gt_classes != -1) & (gt_classes != bg_label) # 判断positive porposalfg_idxs = fg_selection_mask.nonzero().squeeze(1)# eg. len(proposals_per_image) = 512# eg. len(proposals_per_image[fg_idxs]) = 19fg_proposals.append(proposals_per_image[fg_idxs]) # 这一步筛选出 每一个img中fg_idxsfg_selection_masks.append(fg_selection_mask)pdb.set_trace()return fg_proposals, fg_selection_masksdef _forward_mask(feateures, instances):if self.training:# head is only trained on positive proposals.instances, _ = select_foreground_proposals(instances, self.num_classes) # 这里的instances是经过了一次筛选得到的，因此M的数量eg.17, 比box_prediction分支少了很多 eg. 512 x batchsize    ....if self.mask_pooler is not None:features = [features[f] for f in self.mask_in_features] # p2~p5 feature mapboxes = [x.proposal_boxes if self.training else x.pred_boxes for x in instances] # len = 2 each [n_box, 4]features = self.mask_pooler(features, boxes) # [M, 256, 14, 14]return self.mask_head(features, instances) # 小细节 调用父类的forward方法

3.5 Mask_head and Mask_loss

最后进入mask_head分支，得到mask的预设值以及计算mask_loss。

主要的是mask_rcnn_loss，其中的参数pred_mask_logits以及instances给出示例。

gt_masks_per_image的crop_and_resize是很容易忽略的细节，之前师兄提到过，ISTR中有一个小bug，就是对于gt的选取，是选用gt_boxes去crop得到gt_masks，还是使用生成的proposal_boxes去crop得到的gt_masks。

在Mask R-CNN中，计算mask-loss的时候，使用的后者，proposal_boxes去crop得到gt_masks。

# eg. pred_mask_logits.size() --> [17(n_boxes), 80, 28, 28] len(instances) = bs, instances[0]+ ..+instance[N-1] = 17(n_boxes)def mask_rcnn_loss(pred_mask_logits: torch.Tensor, instances: List[Instances], vis_period: int = 0):for instances_per_image in instances: # iterate bs stepif len(instances_per_image) == 0:continueif not cls_agnostic_mask:gt_classes_per_image = instances_per_image.gt_classes.to(dtype=torch.int64)gt_classes.append(gt_classes_per_image)pdb.set_trace()gt_masks_per_image = instances_per_image.gt_masks.crop_and_resize(instances_per_image.proposal_boxes.tensor, mask_side_len).to(device=pred_mask_logits.device)#gt_masks_per_image:  A tensor of shape (N, M, M), N=#instances in the image; M=mask_side_lengt_masks.append(gt_masks_per_image)mask_loss = F.binary_cross_entropy_with_logits(pred_mask_logits, gt_masks, reduction="mean") # mean 除以的是M --> [M, 80, 28, 28]pdb.set_trace()return mask_loss

得到mask-loss和box-loss后，训练就基本理解完了。

[detectron2 ] Mask R-CNN代码笔记相关推荐

tensorflow笔记：多层CNN代码分析
tensorflow笔记系列: (一) tensorflow笔记:流程,概念和简单代码注释 (二) tensorflow笔记:多层CNN代码分析 (三) tensorflow笔记:多层LSTM代 ...
深度学习(DL)与卷积神经网络(CNN)学习笔记随笔-04-基于Python的LeNet之MLP
原文地址可以查看更多信息本文主要参考于:Multilayer Perceptron python源代码(github下载 CSDN免费下载) 本文主要介绍含有单隐层的MLP的建模及实现.建议在阅读 ...
Pytorch训练Bilinear CNN模型笔记
Pytorch训练Bilinear CNN模型笔记注:一个项目需要用到机器学习,而本人又是一个python小白,根据老师的推荐,然后在网上查找了一些资料,终于实现了目的. 参考文献: Caltech ...
CSDN技术主题月----“深度学习”代码笔记专栏
from: CSDN技术主题月----"深度学习"代码笔记专栏 2016-09-13 nigelyq 技术专题 Hi,各位用户 CSDN技术主题月代码笔记专栏会每月在CODE博客为 ...
深度学习(DL)与卷积神经网络(CNN)学习笔记随笔-03-基于Python的LeNet之LR
原地址可以查看更多信息本文主要参考于:Classifying MNIST digits using Logistic Regression python源代码(GitHub下载 CSDN免费下载) ...
Python Text Processing with NLTK 2.0 Cookbook代码笔记
如下是<Python Text Processing with NLTK 2.0 Cookbook>一书部分章节的代码笔记. Tokenizing text into sentences ...
R语言学习笔记（1~3）
R语言学习笔记(1~3) 一.R语言介绍 x <- rnorm(5) 创建了一个名为x的向量对象,它包含5个来自标准正态分布的随机偏差. 1.1 注释由符号#开头. #函数c()以向量的形式输 ...
R语言数据分析笔记——Cohort 存留分析
作者简介Introduction 杜雨,EasyCharts团队成员,R语言中文社区专栏作者,兴趣方向为:Excel商务图表,R语言数据可视化,地理信息数据可视化. 个人公众号:数据小魔方(微信ID: ...
r语言c函数怎么用,R语言学习笔记——C#中如何使用R语言setwd()函数
在R语言编译器中,设置当前工作文件夹可以用setwd()函数. > setwd("e://桌面//") > setwd("e:\桌面\") > ...

[detectron2 ] Mask R-CNN代码笔记

总结：