最近读了几篇WSDN的文章,有一篇的paper用到了RPN网络的一些思想,因此去拜读一下RPN这篇paper。这篇paper的attribution就是取长补短的思想,在卷积神经网络中,网络层数越浅,语意特征不明显,但是图像的特征却比较明显;网络层数越深,语义特征却越明显;这个其实从卷积神经网络的可视化都是可以看出来的。因此这篇paper想用的思想就是将深层的抽象特征与浅层丰富的几何信息结合到一起,这样生成的feature含有的信息就比较丰沛。尤其对于目标检测来说,深层的网络虽然具有很高的语义信息,但是Feature map比较小,含有的几何信息不是很多,不利用小目标的检测;浅层网络恰恰相反,没有比较高的语义信息,但是却含有比较多的几何信息,分辨率比较高。因此如果将这两者结合起来做目标检测的工作将会大大的提升小目标的检测。

1、Background

(a)十分耗时,计算速度比较慢。

(b)速度提升了,但是只有最后一层的高层语义特征,让低层的信息可能会丢失。

(c)虽然SSD中使用了多尺度的feature map但是没有使用低层的feature map

(d)既是多尺度,而且还将高层的语义特征和浅层的语义特征结合起来。通过自底向上,自顶向下,横向连接的方式。

将高层语义特征进行2倍上采样得到的特征和浅层语义的特征结合起来。其中1*1的卷积是为了减小通道数。

Bottom-up pathway:将输入的图像不断的进行卷积,最终得到最终的特征图。

Top-down pathway and lateral connections:将最后的特征图与下一层的特征图结合起来,得到更丰富的语义特征称之为自顶向下横向连接。

2、FPN网络

  • Attribution

    • 端到端的网络,不管在测试还是训练
    • 更高的准确率
    • 提出RPN网络
    • 没有增加测试时间提升了网络的性能
  • Result
    • 将faster rcnn的骨干网络换成RPN,mAP从51.7%->56.9%

3、代码

RPN核心代码

import random
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable, gradcheck
from torch.autograd.gradcheck import gradgradcheck
import torchvision.models as models
from torch.autograd import Variable
import numpy as np
import torchvision.utils as vutils
from model.utils.config import cfg
from model.rpn.rpn_fpn import _RPN_FPN
from model.roi_pooling.modules.roi_pool import _RoIPooling
from model.roi_crop.modules.roi_crop import _RoICrop
from model.roi_align.modules.roi_align import RoIAlignAvg
from model.rpn.proposal_target_layer import _ProposalTargetLayer
from model.utils.net_utils import _smooth_l1_loss, _crop_pool_layer, _affine_grid_gen, _affine_theta
import time
import pdbclass _FPN(nn.Module):""" FPN """def __init__(self, classes, class_agnostic):super(_FPN, self).__init__()self.classes = classesself.n_classes = len(classes)self.class_agnostic = class_agnostic# lossself.RCNN_loss_cls = 0self.RCNN_loss_bbox = 0self.maxpool2d = nn.MaxPool2d(1, stride=2)# define rpnself.RCNN_rpn = _RPN_FPN(self.dout_base_model)self.RCNN_proposal_target = _ProposalTargetLayer(self.n_classes)# NOTE: the original paper used pool_size = 7 for cls branch, and 14 for mask branch, to save the# computation time, we first use 14 as the pool_size, and then do stride=2 pooling for cls branch.self.RCNN_roi_pool = _RoIPooling(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1.0/16.0)self.RCNN_roi_align = RoIAlignAvg(cfg.POOLING_SIZE, cfg.POOLING_SIZE, 1.0/16.0)self.grid_size = cfg.POOLING_SIZE * 2 if cfg.CROP_RESIZE_WITH_MAX_POOL else cfg.POOLING_SIZEself.RCNN_roi_crop = _RoICrop()def _init_weights(self):def normal_init(m, mean, stddev, truncated=False):"""weight initalizer: truncated normal and random normal."""# x is a parameterif truncated:m.weight.data.normal_().fmod_(2).mul_(stddev).add_(mean) # not a perfect approximationelse:m.weight.data.normal_(mean, stddev)m.bias.data.zero_()# custom weights initialization called on netG and netDdef weights_init(m, mean, stddev, truncated=False):classname = m.__class__.__name__if classname.find('Conv') != -1:m.weight.data.normal_(0.0, 0.02)m.bias.data.fill_(0)elif classname.find('BatchNorm') != -1:m.weight.data.normal_(1.0, 0.02)m.bias.data.fill_(0)normal_init(self.RCNN_toplayer, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_smooth1, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_smooth2, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_smooth3, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_latlayer1, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_latlayer2, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_latlayer3, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_rpn.RPN_Conv, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_rpn.RPN_cls_score, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_rpn.RPN_bbox_pred, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_cls_score, 0, 0.01, cfg.TRAIN.TRUNCATED)normal_init(self.RCNN_bbox_pred, 0, 0.001, cfg.TRAIN.TRUNCATED)weights_init(self.RCNN_top, 0, 0.01, cfg.TRAIN.TRUNCATED)def create_architecture(self):self._init_modules()self._init_weights()def _upsample_add(self, x, y):'''Upsample and add two feature maps.Args:x: (Variable) top feature map to be upsampled.y: (Variable) lateral feature map.Returns:(Variable) added feature map.Note in PyTorch, when input size is odd, the upsampled feature mapwith `F.upsample(..., scale_factor=2, mode='nearest')`maybe not equal to the lateral feature map size.e.g.original input size: [N,_,15,15] ->conv2d feature map size: [N,_,8,8] ->upsampled feature map size: [N,_,16,16]So we choose bilinear upsample which supports arbitrary output sizes.'''_,_,H,W = y.size()return F.upsample(x, size=(H,W), mode='bilinear') + ydef _PyramidRoI_Feat(self, feat_maps, rois, im_info):''' roi pool on pyramid feature maps'''# do roi pooling based on predicted roisimg_area = im_info[0][0] * im_info[0][1]h = rois.data[:, 4] - rois.data[:, 2] + 1w = rois.data[:, 3] - rois.data[:, 1] + 1roi_level = torch.log(torch.sqrt(h * w) / 224.0)roi_level = torch.round(roi_level + 4)roi_level[roi_level < 2] = 2roi_level[roi_level > 5] = 5# roi_level.fill_(5)if cfg.POOLING_MODE == 'crop':# pdb.set_trace()# pooled_feat_anchor = _crop_pool_layer(base_feat, rois.view(-1, 5))# NOTE: need to add pyrmaidgrid_xy = _affine_grid_gen(rois, base_feat.size()[2:], self.grid_size)grid_yx = torch.stack([grid_xy.data[:,:,:,1], grid_xy.data[:,:,:,0]], 3).contiguous()roi_pool_feat = self.RCNN_roi_crop(base_feat, Variable(grid_yx).detach())if cfg.CROP_RESIZE_WITH_MAX_POOL:roi_pool_feat = F.max_pool2d(roi_pool_feat, 2, 2)elif cfg.POOLING_MODE == 'align':roi_pool_feats = []box_to_levels = []for i, l in enumerate(range(2, 6)):if (roi_level == l).sum() == 0:continueidx_l = (roi_level == l).nonzero().squeeze()box_to_levels.append(idx_l)scale = feat_maps[i].size(2) / im_info[0][0]feat = self.RCNN_roi_align(feat_maps[i], rois[idx_l], scale)roi_pool_feats.append(feat)roi_pool_feat = torch.cat(roi_pool_feats, 0)box_to_level = torch.cat(box_to_levels, 0)idx_sorted, order = torch.sort(box_to_level)roi_pool_feat = roi_pool_feat[order]elif cfg.POOLING_MODE == 'pool':roi_pool_feats = []box_to_levels = []for i, l in enumerate(range(2, 6)):if (roi_level == l).sum() == 0:continueidx_l = (roi_level == l).nonzero().squeeze()box_to_levels.append(idx_l)scale = feat_maps[i].size(2) / im_info[0][0]feat = self.RCNN_roi_pool(feat_maps[i], rois[idx_l], scale)roi_pool_feats.append(feat)roi_pool_feat = torch.cat(roi_pool_feats, 0)box_to_level = torch.cat(box_to_levels, 0)idx_sorted, order = torch.sort(box_to_level)roi_pool_feat = roi_pool_feat[order]return roi_pool_featdef forward(self, im_data, im_info, gt_boxes, num_boxes):batch_size = im_data.size(0)im_info = im_info.datagt_boxes = gt_boxes.datanum_boxes = num_boxes.data# feed image data to base model to obtain base feature map# Bottom-up#自底向上c1 = self.RCNN_layer0(im_data)c2 = self.RCNN_layer1(c1)c3 = self.RCNN_layer2(c2)c4 = self.RCNN_layer3(c3)c5 = self.RCNN_layer4(c4)# Top-down#自顶向下+横向连接p5 = self.RCNN_toplayer(c5)p4 = self._upsample_add(p5, self.RCNN_latlayer1(c4))p4 = self.RCNN_smooth1(p4)p3 = self._upsample_add(p4, self.RCNN_latlayer2(c3))p3 = self.RCNN_smooth2(p3)p2 = self._upsample_add(p3, self.RCNN_latlayer3(c2))p2 = self.RCNN_smooth3(p2)p6 = self.maxpool2d(p5)rpn_feature_maps = [p2, p3, p4, p5, p6]mrcnn_feature_maps = [p2, p3, p4, p5]#通过经过RPN网络得到proposalsrois, rpn_loss_cls, rpn_loss_bbox = self.RCNN_rpn(rpn_feature_maps, im_info, gt_boxes, num_boxes)# if it is training phrase, then use ground trubut bboxes for refiningif self.training:roi_data = self.RCNN_proposal_target(rois, gt_boxes, num_boxes)rois, rois_label, gt_assign, rois_target, rois_inside_ws, rois_outside_ws = roi_data## NOTE: additionally, normalize proposals to range [0, 1],#        this is necessary so that the following roi pooling#        is correct on different feature maps# rois[:, :, 1::2] /= im_info[0][1]# rois[:, :, 2::2] /= im_info[0][0]rois = rois.view(-1, 5)rois_label = rois_label.view(-1).long()gt_assign = gt_assign.view(-1).long()pos_id = rois_label.nonzero().squeeze()gt_assign_pos = gt_assign[pos_id]rois_label_pos = rois_label[pos_id]rois_label_pos_ids = pos_idrois_pos = Variable(rois[pos_id])rois = Variable(rois)rois_label = Variable(rois_label)rois_target = Variable(rois_target.view(-1, rois_target.size(2)))rois_inside_ws = Variable(rois_inside_ws.view(-1, rois_inside_ws.size(2)))rois_outside_ws = Variable(rois_outside_ws.view(-1, rois_outside_ws.size(2)))else:## NOTE: additionally, normalize proposals to range [0, 1],#        this is necessary so that the following roi pooling#        is correct on different feature maps# rois[:, :, 1::2] /= im_info[0][1]# rois[:, :, 2::2] /= im_info[0][0]rois_label = Nonegt_assign = Nonerois_target = Nonerois_inside_ws = Nonerois_outside_ws = Nonerpn_loss_cls = 0rpn_loss_bbox = 0rois = rois.view(-1, 5)pos_id = torch.arange(0, rois.size(0)).long().type_as(rois).long()rois_label_pos_ids = pos_idrois_pos = Variable(rois[pos_id])rois = Variable(rois)# pooling features based on rois, output 14x14 map#对于不同尺度的feature进行分开roi poolingroi_pool_feat = self._PyramidRoI_Feat(mrcnn_feature_maps, rois, im_info)# feed pooled features to top model#将最后得到的特征向量映射成4096维度pooled_feat = self._head_to_tail(roi_pool_feat)#之后在计算分类和回归# compute bbox offsetbbox_pred = self.RCNN_bbox_pred(pooled_feat)if self.training and not self.class_agnostic:# select the corresponding columns according to roi labelsbbox_pred_view = bbox_pred.view(bbox_pred.size(0), int(bbox_pred.size(1) / 4), 4)bbox_pred_select = torch.gather(bbox_pred_view, 1, rois_label.long().view(rois_label.size(0), 1, 1).expand(rois_label.size(0), 1, 4))bbox_pred = bbox_pred_select.squeeze(1)# compute object classification probabilitycls_score = self.RCNN_cls_score(pooled_feat)cls_prob = F.softmax(cls_score)RCNN_loss_cls = 0RCNN_loss_bbox = 0if self.training:# loss (cross entropy) for object classificationRCNN_loss_cls = F.cross_entropy(cls_score, rois_label)# loss (l1-norm) for bounding box regressionRCNN_loss_bbox = _smooth_l1_loss(bbox_pred, rois_target, rois_inside_ws, rois_outside_ws)rois = rois.view(batch_size, -1, rois.size(1))cls_prob = cls_prob.view(batch_size, -1, cls_prob.size(1))bbox_pred = bbox_pred.view(batch_size, -1, bbox_pred.size(1))if self.training:rois_label = rois_label.view(batch_size, -1)return rois, cls_prob, bbox_pred, rpn_loss_cls, rpn_loss_bbox, RCNN_loss_cls, RCNN_loss_bbox, rois_label

4、总结

这篇paper发表的时间比较早了,现在很多目标检测的任务已经引用到了RPN的思想,就好比把大象放到冰箱里,可以分成两个步骤,但是每个步骤的基础知识原理如果搞清楚了,从最基本的网络原理去解决问题,那么对于模型性能的提升会有很大帮助的。

代码来源:https://github.com/jwyang/fpn.pytorch

参考博客:https://www.cnblogs.com/hansjorn/p/12510888.html

论文:https://arxiv.org/abs/1612.03144

目标检测之RPN网络(Feature Pyramid Networks for Object Detection)相关推荐

  1. 目标检测--Feature Pyramid Networks for Object Detection

    CVPR2017 Feature Pyramid Networks for Object Detection https://arxiv.org/abs/1612.03144 Code will be ...

  2. Feature Pyramid Networks for Object Detection 总结

    最近在阅读FPN for object detection,看了网上的很多资料,有些认识是有问题的,当然有些很有价值.下面我自己总结了一下,以供参考. 1. FPN解决了什么问题? 答: 在以往的fa ...

  3. Feature Pyramid Networks for Object Detection 论文笔记

    版权声明:本文为博主原创文章,未经博主允许不得转载. https://blog.csdn.net/Jesse_Mx/article/details/54588085 论文地址:Feature Pyra ...

  4. FPN:feature pyramid networks for object detection

    论文:feature pyramid networks for object detection 论文链接 论文概述: 作者提出的多尺度的object detection算法:FPN(feature ...

  5. 《Feature Pyramid Networks for Object Detection》论文阅读笔记

    FPN 很多论文中都会采用含有FPN的backbone进行特征提取,因为FPN使用特征金字塔结构,将low-level的特征和high-level的特征进行融合,能提取更加准确的位置等特征信息. 所以 ...

  6. Feature Pyramid Networks for Object Detection论文翻译——中文版

    文章作者:Tyan 博客:noahsnail.com  |  CSDN  |  简书 声明:作者翻译论文仅为学习,如有侵权请联系作者删除博文,谢谢! 翻译论文汇总:https://github.com ...

  7. Feature Pyramid Networks for Object Detection论文翻译——中英文对照

    文章作者:Tyan 博客:noahsnail.com  |  CSDN  |  简书 声明:作者翻译论文仅为学习,如有侵权请联系作者删除博文,谢谢! 翻译论文汇总:https://github.com ...

  8. Detection论文总结(1)Feature Pyramid Networks for Object Detection

    文章地址:arxiv 论文目录 Feature Pyramid Networks for Object Detection 引言 相关研究 特征金字塔网络 应用 RPN应用 Fast RCNN应用 目 ...

  9. 【深度学习】FPN(特征金字塔)简介:Feature Pyramid Networks for Object Detection

    [深度学习]FPN(特征金字塔):Feature Pyramid Networks for Object Detection 提示:博主取舍了很多大佬的博文并亲测有效,分享笔记邀大家共同学习讨论 博文 ...

  10. Feature pyramid networks for object detection

    具体分析请见: https://medium.com/@jonathan_hui/understanding-feature-pyramid-networks-for-object-detection ...

最新文章

  1. iPhone地图 实战iPhone GPS定位系统
  2. 【计算理论】可判定性 ( 通用图灵机和停机问题 | 可判定性 与 可计算性 | 语言 与 算法模型 )
  3. 打开多个界面_使用 Terminator 在一个窗口中运行多个终端
  4. font HTML语言,HTML font 标签
  5. LightOJ - 1071 Baker Vai(最大费用最大流+拆点)
  6. 64 源码_【ClickHouse内核】源码阅读策略
  7. 实例分享--告诉你如何使用语音和自然语言控制智能家居
  8. 解决 A component required a bean of ‘XXX.RoleService‘ that could not be found.
  9. css3 动画 vs js 动画
  10. 交互式甘特图控件,教你如何使用日历!
  11. ESP8266开发之旅 进阶篇⑪ 深入了解 Esp8266 Https访问
  12. Springboot JPA日志输出打印SQL语句和传入的参数 高阶篇,java微服务架构视频下载
  13. 合成大西瓜html5游戏,关于html5:魔改和上线你的合成大西瓜最全教程
  14. layui 给table里面的添加图标_layui教程---table
  15. C语言中task的用法,c – 在std :: packaged_task中使用成员函数
  16. Perplexity困惑度解释
  17. 手机里拍摄的照片误删了也不怕,2招教你快速找回照片!
  18. Padavan 路由器的入门级技巧
  19. mysql不能存字母_jdbc - 无法在mysql中存储俄语中文阿拉伯语字母
  20. 百度输入法皮肤工具提示 CSS,如何利用百度输入法超级皮肤增加特别键盘

热门文章

  1. Python开发游戏?也太好用了吧
  2. win10上程序性能分析
  3. 音频信号处理技术学习笔记
  4. vue中findIndex()
  5. php js输出换行,js怎么换行
  6. Android L 漫游浅析
  7. linux hexdump显示格式c数组,linux hexdump使用
  8. 首发 成功移植OpenHarmony到龙芯开发板,代码开源
  9. Python中文编码的问题(UTF-8和CP936的区别)
  10. C# JObject解析JSON数据