CVPR-2018,Pytroch code


文章目录

  • 1 Background and Motivation
  • 2 Advantages / Contributions
  • 3 Method
    • 3.1 Bottom-up Path Augmentation
    • 3.2 Adaptive Feature Pooling
    • 3.3 Fully-connected Fusion
  • 4 Experiments
    • 4.1 Datasets
    • 4.2 Experiments on COCO
    • 4.3 Experiments on Cityscapes
    • 4.4 Experiments on MVD

1 Background and Motivation

作者发现 information propagation in state-of-the-art Mask R-CNN can be further improved

在 Mask R-CNN 基础上改进,进一步提升目标检测和实例分割的效果

2 Advantages / Contributions

提出 Path Aggregation Network(PANet) aiming at boosting information flow in proposal-based instance segmentation framework

  • 1st place in the COCO 2017 Challenge Instance Segmentation task
  • 2nd place in the COCO 2017 Challenge Object Detection task
  • SOTA on MVD and Cityscapes

3 Method


三个改进模块

3.1 Bottom-up Path Augmentation

现有 FPN 结构的缺陷:

there is a long path from low-level structure to topmost features, increasing difficulty to access accurate localization information【图 1 (a)中红色虚箭头,前向传播时底层信息得经过整个 backbone 才能到达顶层,eg 到达 P5 层】

作者改进:

A bottom-up path is augmented to make low-layer information easier to propagate.【图 1 (a)中绿色虚箭头 】

PANet 在 FPN 基础上创建了自下而上的路径增强。用于缩短信息路径,利用 low-level 特征中存储的精确定位信号,提升特征金字塔架构。 ——目标检测算法综述之FPN优化篇

细节如下:

Bottom-up Path 搭建方式是图 2 中的逆 FPN(自顶向下) 形式

注意 N2N_2N2​ is simply P2P_2P2​, without any processing

Keras 代码如下,来自 双向融合:PANet

N3 = KL.Add(name="panet_p3add")([P3, KL.Conv2D(256, (3, 3), strides=2, padding="SAME", name="panet_n2downsampled")(N2)])
N3 = KL.Conv2D(256, (3, 3), padding="SAME", name="panet_n3")(N3)
N3 = KL.Activation('relu')(N3)

3.2 Adaptive Feature Pooling

缺陷:

熟悉 FPN 的小伙伴应该知道,proposals are assigned to different feature levels according to the size of proposals(不同尺度的ROI,使用不同特征层作为ROI pooling 层的输入),像 “八爪鱼”,多条“腿”,一个 head,

two pro-posals with 10-pixel difference can be assigned to different levels,具体映射关系可以参考 Mask RCNN without Mask

information discarded in other levels may be helpful for final prediction

作者改进(每条腿上都接个头):


We use max operation to fuse features from different levels

聚合每个特征层次上的每个候选区域 ——目标检测算法综述之FPN优化篇

把同一 proposal 所有 level 的信息融合起来,而不是根据 proposal 的大小来决定采用 FPN 哪层 level 的特征

下面这个图就可以很直观的感受到利用多 level feature 的必要

横坐标是原 FPN 的 level,折线是采用 Adaptive Feature Pooling 之后的 level

以蓝色的 level1 折线为例,采用 Adaptive Feature Pooling 之后发现,属于 level1 范围大小的 proposal 仅用了 ~30% 的 level 1 特征,其余特征为 ~30% level 2, ~20% level3, ~20% level4(原 FPN 属于 level1 范围大小的 proposal 采用 100% level 1 特征)

可以看到 Adaptive Feature Pooling 使每个 proposal 的特征更加完整与丰富!

Keras 代码如下,来自 双向融合:PANet

class AdaptiveFeaturePooling(KE.Layer):def __init__(self, **kwargs):super(AdaptiveFeaturePooling, self).__init__(**kwargs)def call(self, inputs):x2, x3, x4, x5 = inputsx = tf.maximum(tf.maximum(x2, x3), tf.maximum(x4, x5))# x = tf.add_n([x2, x3, x4, x5])return xdef conpute_output_shpae(self, input_shape):return input_shape[0]
x2 = ROIAlign([pool_size, pool_size], name="bbox_roi_align_n2")([rois, feature_maps[0]])
x2 = KL.TimeDistributed(KL.Conv2D(1024, (pool_size, pool_size), padding="valid"),name="mrcnn_class_conv1_n2")(x2)
x2 = KL.TimeDistributed(BatchNorm(), name='mrcnn_class_bn1_n2')(x2, training=train_bn)
x2 = KL.Activation('relu')(x2)
...
x = AdaptiveFeaturePooling(name="bbox_adaptive_feature_pooling")([x2, x3, x4, x5])

3.3 Fully-connected Fusion

缺陷:

Mask R-CNN 方法中,mask prediction is made on a single view(卷积),losing the chance to gather more diverse information

作者的改进:


A complementary branch capturing different views——引入了平行的 FC 分支,最后与 conv 分支融合来预测 mask

作者认为 FC 的优势在于

  • FC layers are location sensitive since predictions at different spatial locations are achieved by varying sets of parameters. So they have the ability to adapt to different spatial locations.

  • Also prediction at each spatial location is made with global information of the entire proposal.

Keras 代码如下,来自 双向融合:PANet

conv 分支

x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv1")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn1')(x, training=train_bn)
x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv2")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn2')(x, training=train_bn)
x = KL.Activation('relu')(x)x = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv3")(x)
x = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn3')(x, training=train_bn)
x = KL.Activation('relu')(x)x_fcn = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4")(x)
x_fcn = KL.TimeDistributed(BatchNorm(), name='mrcnn_mask_bn4')(x_fcn, training=train_bn)
x_fcn = KL.Activation('relu')(x_fcn)x_fcn = KL.TimeDistributed(KL.Conv2DTranspose(256, (2, 2), strides=2, activation="relu"), name="mrcnn_mask_deconv")(x_fcn)
x_fcn = KL.TimeDistributed(KL.Conv2D(num_classes, (1, 1), strides=1), ame="mrcnn_mask")(x_fcn)

FC 分支

x_fc = KL.TimeDistributed(KL.Conv2D(256, (3, 3), padding="same"), name="mrcnn_mask_conv4_fc")(x)
x_fc = KL.Activation('relu')(x_fc)x_fc = KL.TimeDistributed(KL.Conv2D(128, (3, 3), padding="same"), name="mrcnn_mask_conv5_fc")(x_fc)
x_fc = KL.Activation('relu')(x_fc) # b, num_rois, h, w, ct_shape = x_fc.shape
x_fc = KL.Reshape([t_shape[1].value, t_shape[2].value * t_shape[3].value * t_shape[4].value])(x_fc) # b, num_rois, h*w*c
x_fc = KL.TimeDistributed(KL.Dense(mask_shape[0] * mask_shape[1]), name="mrcnn_mask_fc")(x_fc) # b, num_rois, mask_size * mask_size
x_fc = KL.Reshape([t_shape[1].value, mask_shape[0], mask_shape[1], 1])(x_fc) # b, num_rois, mask_size, mask_size, 1

conv 分支和 FC 分支融合在一起

x = KL.Add()([x_fc, x_fcn]) # (b, num_rois, mask_size, mask_size, 1) + (b, num_rois, mask_size, mask_size, num_class)
x = KL.TimeDistributed(KL.Activation('sigmoid'))(x)

4 Experiments

4.1 Datasets

  • COCO
  • Cityscapes
  • MVD

4.2 Experiments on COCO

1)Instance Segmentation Results

2)Object Detection Results

3)Component Ablation Studies

APAPAP 是分割任务的结果, APbbAP^{bb}APbb 是单独训练目标检测的结果,APbbMAP^{bbM}APbbM 是联合训练目标检测和分割的结果

tricks 的效果提升占了 50%

Half of the improvement is from multi-scale training and multi-GPU sync. BN

4)Ablation Studies on Adaptive Feature Pooling

5)Ablation Studies on Fully-connected Fusion

6)COCO 2017 Challenge

引入更多的 trick

1st,DCN 是 Deformable convolutional networks


2nd

4.3 Experiments on Cityscapes


4.4 Experiments on MVD

【PANet】《Path Aggregation Network for Instance Segmentation》相关推荐

  1. 《PANet:Path Aggregation Network for Instance Segmentation》论文笔记

    代码地址:PANet 1. 概述 导读:这篇论文是港中文大学与腾讯优图实验室联合发表的论文,其中提出神经网络中信息的传输是很重要的,由此提出了PANet的网络模型,增加了Bottom-up的金字塔特征 ...

  2. 实例分割--(PANet)Path Aggregation Network for Instance Segmentation

    PANet Path Aggregation Network for Instance Segmentation 收录:CVPR2018(IEEE Conference on Computer Vis ...

  3. Path Aggregation Network for Instance Segmentation

    Path Aggregation Network for Instance Segmentation 信息在神经网络中的传播方式是非常重要的.在本文中,我们提出了PANet(Path Aggregat ...

  4. 新型实例分割网络PANet(Path Aggregation Network for Instance Segmentation)源码和更新详解

    PANet是18年的一篇CVPR,作者来自港中文,北大,商汤与腾讯优图,PANET可看作Mask-RCNN+,是在Mask-RCNN基础上做的几处改进. PANet源码百度网盘:https://pan ...

  5. 【STDC】《Rethinking BiSeNet For Real-time Semantic Segmentation》

    CVPR-2021 好久没有写博客了,抽个空赶紧把阅读笔记梳理下,头发秃了容易忘事

  6. 孤读Paper——《Deep Snake for Real-Time Instance Segmentation》

    <Deep Snake for Real-Time Instance Segmentation>   论文借鉴了snake算法,将snake算法做成了轮廓结构化特征学习的方法.DeepSn ...

  7. 【转】《Cascaded Pyramid Network for Multi-Person Pose Estimation》--旷世2017COCO keypoints冠军论文解读

    转自:https://blog.csdn.net/zhangboshen/article/details/78836704 简介 <Cascaded Pyramid Network for Mu ...

  8. 【ChatGPT】《Azure OpenAI 服务 - 提示工程简介》- 知识点目录

    <Azure OpenAI 服务 - 提示工程简介> 1. 格式示例提示 第三个示例中仅仅给出了格式[要点:| •],模型可以按照"格式示例"给出相应格式的回答: 2. ...

  9. 【Transformer】《PaLM-E: An Embodied Multimodal Language Model》译读笔记

    <PaLM-E: An Embodied Multimodal Language Model> 摘要 大语言模型已被证明可以执行复杂的任务.不过,要在现实世界中实现通用推理,例如解决机器人 ...

  10. 【转】《从入门到精通云服务器》第六讲—OpenStack基础

    前五期的<从入门到精通云服务器>受到了广泛好评,收到留言,有很多读者对云计算相关的技术非常感兴趣.应观众要求,我们这期要安利一条纯技术内容.准备好瓜子.花生,随小编一起进入OpenStac ...

最新文章

  1. CMS:听我的,生产环境上要这样配置JVM参数
  2. 【emWin】例程二十五:窗口对象——Iconview
  3. Tengine怎么去安装第三方模块、以及安装源码中的模块
  4. 奇舞团的博客(360前端团队)
  5. 《Neural Networks for Machine Learning》学习二
  6. 都说性能调优难?玩转这3款工具,让你秒变“老司机”!
  7. I must be strong and carry on
  8. 数据中台和业务中台的区别
  9. 自动驾驶使用贝塞尔曲线进行动态障碍物避障测试
  10. mqdf matlab,mexopenCV的配置学习过程
  11. 智能优化算法:樽海鞘群优化算法-附代码
  12. 【STM32H7的DSP教程】第22章 DSP矩阵运算-放缩,乘法和转置矩阵
  13. 数字图像处理实验之对数变换
  14. 【新知实验室】-多人视频会议体验
  15. 软件需求与分析课堂讨论一
  16. 关于六度分割理论的一点认识
  17. PREEMPT RT 实现原理
  18. 微信小程序 星级评分 (带小数的评分/半星)
  19. 千锋android培训学院!双非渣本Android四年磨一剑,真香!
  20. vue加elementui开发的分页显示

热门文章

  1. 计算空间点到直线的距离
  2. Python常用小技巧总结
  3. 极限精简服务器系统,极限精简斐讯T1/N1 极客开发者强迫症福音6.25
  4. 计算机毕业设计之java+javaweb的房屋出租系统
  5. C++——判身份证号码真伪
  6. 《数学之美》第30章 Google大脑和人工神经网络
  7. HTML中动态的增加和删除表格中的一行
  8. 调试错误,请回到请求来源地,重新发起请求。 错误代码 insufficient-isv-permissions 错误原因: ISV权限不足,建议在开发者中心检查对应功能是否已经添加
  9. C语言圆周率天书简化,c语言天书__圆周率的计算及分析
  10. 由西云数据运营的中国第二个AWS区域正式向客户提供服务