CVPR-2018,Pytroch code


  • 1 Background and Motivation
  • 2 Advantages / Contributions
  • 3 Method
    • 3.1 Bottom-up Path Augmentation
    • 3.2 Adaptive Feature Pooling
    • 3.3 Fully-connected Fusion
  • 4 Experiments
    • 4.1 Datasets
    • 4.2 Experiments on COCO
    • 4.3 Experiments on Cityscapes
    • 4.4 Experiments on MVD

1 Background and Motivation

作者发现 information propagation in state-of-the-art Mask R-CNN can be further improved

在 Mask R-CNN 基础上改进,进一步提升目标检测和实例分割的效果

2 Advantages / Contributions

提出 Path Aggregation Network(PANet) aiming at boosting information flow in proposal-based instance segmentation framework

  • 1st place in the COCO 2017 Challenge Instance Segmentation task
  • 2nd place in the COCO 2017 Challenge Object Detection task
  • SOTA on MVD and Cityscapes

3 Method


3.1 Bottom-up Path Augmentation

现有 FPN 结构的缺陷:

there is a long path from low-level structure to topmost features, increasing difficulty to access accurate localization information【图 1 (a)中红色虚箭头,前向传播时底层信息得经过整个 backbone 才能到达顶层,eg 到达 P5 层】


A bottom-up path is augmented to make low-layer information easier to propagate.【图 1 (a)中绿色虚箭头 】

PANet 在 FPN 基础上创建了自下而上的路径增强。用于缩短信息路径,利用 low-level 特征中存储的精确定位信号,提升特征金字塔架构。 ——目标检测算法综述之FPN优化篇


Bottom-up Path 搭建方式是图 2 中的逆 FPN(自顶向下) 形式

注意 N2N_2N2​ is simply P2P_2P2​, without any processing

Keras 代码如下,来自 双向融合:PANet

N3 = KL.Add(name="panet_p3add")([P3, KL.Conv2D(256, (3, 3), strides=2, padding="SAME", name="panet_n2downsampled")(N2)])
N3 = KL.Conv2D(256, (3, 3), padding="SAME", name="panet_n3")(N3)
N3 = KL.Activation('relu')(N3)

3.2 Adaptive Feature Pooling


熟悉 FPN 的小伙伴应该知道,proposals are assigned to different feature levels according to the size of proposals(不同尺度的ROI,使用不同特征层作为ROI pooling 层的输入),像 “八爪鱼”,多条“腿”,一个 head,

two pro-posals with 10-pixel difference can be assigned to different levels,具体映射关系可以参考 Mask RCNN without Mask

information discarded in other levels may be helpful for final prediction


We use max operation to fuse features from different levels

聚合每个特征层次上的每个候选区域 ——目标检测算法综述之FPN优化篇

把同一 proposal 所有 level 的信息融合起来,而不是根据 proposal 的大小来决定采用 FPN 哪层 level 的特征

下面这个图就可以很直观的感受到利用多 level feature 的必要

横坐标是原 FPN 的 level,折线是采用 Adaptive Feature Pooling 之后的 level

以蓝色的 level1 折线为例,采用 Adaptive Feature Pooling 之后发现,属于 level1 范围大小的 proposal 仅用了 ~30% 的 level 1 特征,其余特征为 ~30% level 2, ~20% level3, ~20% level4(原 FPN 属于 level1 范围大小的 proposal 采用 100% level 1 特征)

可以看到 Adaptive Feature Pooling 使每个 proposal 的特征更加完整与丰富!

Keras 代码如下,来自 双向融合:PANet

class AdaptiveFeaturePooling(KE.Layer):def __init__(self, **kwargs):super(AdaptiveFeaturePooling, self).__init__(**kwargs)def call(self, inputs):x2, x3, x4, x5 = inputsx = tf.maximum(tf.maximum(x2, x3), tf.maximum(x4, x5))# x = tf.add_n([x2, x3, x4, x5])return xdef conpute_output_shpae(self, input_shape):return input_shape[0]
x2 = ROIAlign([pool_size, pool_size], name="bbox_roi_align_n2")([rois, feature_maps[0]])
x2 = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"),name="mrcnn_class_conv1_n2")(x2)
x2 = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1_n2')(x2, training=train_bn)
x2 = KL.Activation('relu')(x2)
x = AdaptiveFeaturePooling(name="bbox_adaptive_feature_pooling")([x2, x3, x4, x5])

3.3 Fully-connected Fusion


Mask R-CNN 方法中,mask prediction is made on a single view(卷积),losing the chance to gather more diverse information


A complementary branch capturing different views——引入了平行的 FC 分支,最后与 conv 分支融合来预测 mask

作者认为 FC 的优势在于

  • FC layers are location sensitive since predictions at different spatial locations are achieved by varying sets of parameters. So they have the ability to adapt to different spatial locations.

  • Also prediction at each spatial location is made with global information of the entire proposal.

Keras 代码如下,来自 双向融合:PANet

conv 分支

x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv1")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn1')(x, training=train_bn)
x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv2")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn2')(x, training=train_bn)
x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv3")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn3')(x, training=train_bn)
x = KL.Activation('relu')(x)x_fcn = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4")(x)
x_fcn = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn4')(x_fcn, training=train_bn)
x_fcn = KL.Activation('relu')(x_fcn)x_fcn = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), name="mrcnn_mask_deconv")(x_fcn)
x_fcn = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1), ame="mrcnn_mask")(x_fcn)

FC 分支

x_fc = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4_fc")(x)
x_fc = KL.Activation('relu')(x_fc)x_fc = KL.TimeDistributed(KL.Conv2D(128, (3, 3), padding="same"), name="mrcnn_mask_conv5_fc")(x_fc)
x_fc = KL.Activation('relu')(x_fc) # b, num_rois, h, w, ct_shape = x_fc.shape
x_fc = KL.Reshape([t_shape[1].value, t_shape[2].value * t_shape[3].value * t_shape[4].value])(x_fc) # b, num_rois, h*w*c
x_fc = KL.TimeDistributed(KL.Dense(mask_shape[0] * mask_shape[1]), name="mrcnn_mask_fc")(x_fc) # b, num_rois, mask_size * mask_size
x_fc = KL.Reshape([t_shape[1].value, mask_shape[0], mask_shape[1], 1])(x_fc) # b, num_rois, mask_size, mask_size, 1

conv 分支和 FC 分支融合在一起

x = KL.Add()([x_fc, x_fcn]) # (b, num_rois, mask_size, mask_size, 1) + (b, num_rois, mask_size, mask_size, num_class)
x = KL.TimeDistributed(KL.Activation('sigmoid'))(x)

4 Experiments

4.1 Datasets

  • COCO
  • Cityscapes
  • MVD

4.2 Experiments on COCO

1)Instance Segmentation Results

2)Object Detection Results

3)Component Ablation Studies

APAPAP 是分割任务的结果, APbbAP^{bb}APbb 是单独训练目标检测的结果,APbbMAP^{bbM}APbbM 是联合训练目标检测和分割的结果

tricks 的效果提升占了 50%

Half of the improvement is from multi-scale training and multi-GPU sync. BN

4)Ablation Studies on Adaptive Feature Pooling

5)Ablation Studies on Fully-connected Fusion

6)COCO 2017 Challenge

引入更多的 trick

1st,DCN 是 Deformable convolutional networks


4.3 Experiments on Cityscapes

4.4 Experiments on MVD

【PANet】《Path Aggregation Network for Instance Segmentation》相关推荐

  1. 《PANet:Path Aggregation Network for Instance Segmentation》论文笔记

    代码地址:PANet 1. 概述 导读:这篇论文是港中文大学与腾讯优图实验室联合发表的论文,其中提出神经网络中信息的传输是很重要的,由此提出了PANet的网络模型,增加了Bottom-up的金字塔特征 ...

  2. 实例分割--(PANet)Path Aggregation Network for Instance Segmentation

    PANet Path Aggregation Network for Instance Segmentation 收录:CVPR2018(IEEE Conference on Computer Vis ...

  3. Path Aggregation Network for Instance Segmentation

    Path Aggregation Network for Instance Segmentation 信息在神经网络中的传播方式是非常重要的.在本文中,我们提出了PANet(Path Aggregat ...

  4. 新型实例分割网络PANet(Path Aggregation Network for Instance Segmentation)源码和更新详解

    PANet是18年的一篇CVPR,作者来自港中文,北大,商汤与腾讯优图,PANET可看作Mask-RCNN+,是在Mask-RCNN基础上做的几处改进. PANet源码百度网盘:https://pan ...

  5. 【STDC】《Rethinking BiSeNet For Real-time Semantic Segmentation》

    CVPR-2021 好久没有写博客了,抽个空赶紧把阅读笔记梳理下,头发秃了容易忘事

  6. 孤读Paper——《Deep Snake for Real-Time Instance Segmentation》

    <Deep Snake for Real-Time Instance Segmentation>   论文借鉴了snake算法,将snake算法做成了轮廓结构化特征学习的方法.DeepSn ...

  7. 【转】《Cascaded Pyramid Network for Multi-Person Pose Estimation》--旷世2017COCO keypoints冠军论文解读

    转自: 简介 <Cascaded Pyramid Network for Mu ...

  8. 【ChatGPT】《Azure OpenAI 服务 - 提示工程简介》- 知识点目录

    <Azure OpenAI 服务 - 提示工程简介> 1. 格式示例提示 第三个示例中仅仅给出了格式[要点:| •],模型可以按照"格式示例"给出相应格式的回答: 2. ...

  9. 【Transformer】《PaLM-E: An Embodied Multimodal Language Model》译读笔记

    <PaLM-E: An Embodied Multimodal Language Model> 摘要 大语言模型已被证明可以执行复杂的任务.不过,要在现实世界中实现通用推理,例如解决机器人 ...

  10. 【转】《从入门到精通云服务器》第六讲—OpenStack基础

    前五期的<从入门到精通云服务器>受到了广泛好评,收到留言,有很多读者对云计算相关的技术非常感兴趣.应观众要求,我们这期要安利一条纯技术内容.准备好瓜子.花生,随小编一起进入OpenStac ...


  1. CMS:听我的,生产环境上要这样配置JVM参数
  2. 【emWin】例程二十五:窗口对象——Iconview
  3. Tengine怎么去安装第三方模块、以及安装源码中的模块
  4. 奇舞团的博客(360前端团队)
  5. 《Neural Networks for Machine Learning》学习二
  6. 都说性能调优难?玩转这3款工具,让你秒变“老司机”!
  7. I must be strong and carry on
  8. 数据中台和业务中台的区别
  9. 自动驾驶使用贝塞尔曲线进行动态障碍物避障测试
  10. mqdf matlab,mexopenCV的配置学习过程
  11. 智能优化算法:樽海鞘群优化算法-附代码
  12. 【STM32H7的DSP教程】第22章 DSP矩阵运算-放缩,乘法和转置矩阵
  13. 数字图像处理实验之对数变换
  14. 【新知实验室】-多人视频会议体验
  15. 软件需求与分析课堂讨论一
  16. 关于六度分割理论的一点认识
  17. PREEMPT RT 实现原理
  18. 微信小程序 星级评分 (带小数的评分/半星)
  19. 千锋android培训学院!双非渣本Android四年磨一剑,真香!
  20. vue加elementui开发的分页显示


  1. 计算空间点到直线的距离
  2. Python常用小技巧总结
  3. 极限精简服务器系统,极限精简斐讯T1/N1 极客开发者强迫症福音6.25
  4. 计算机毕业设计之java+javaweb的房屋出租系统
  5. C++——判身份证号码真伪
  6. 《数学之美》第30章 Google大脑和人工神经网络
  7. HTML中动态的增加和删除表格中的一行
  8. 调试错误,请回到请求来源地,重新发起请求。 错误代码 insufficient-isv-permissions 错误原因: ISV权限不足,建议在开发者中心检查对应功能是否已经添加
  9. C语言圆周率天书简化,c语言天书__圆周率的计算及分析
  10. 由西云数据运营的中国第二个AWS区域正式向客户提供服务