[paper reading] DenseNet

GitHub:Notes of Classic Detection Papers

本来想放到GitHub的,结果GitHub不支持公式。
没办法只能放到CSDN,但是格式也有些乱
强烈建议去GitHub上下载源文件,来阅读学习!!!这样阅读体验才是最好的
当然,如果有用,希望能给个star

topic motivation technique key element use yourself relativity
DenseNet Problem to Solve
Modifications
DenseNet Architecture
Advantages
Dense Block
Transition Layers
Growth Rate
Bottleneck Structure
Bottleneck Structure
Feature Reuse
Transition Layers
blogs
articles

文章目录

  • [paper reading] DenseNet
    • Motivation
      • Problem to Solve
      • Modifications
        • Feature Concatenate
        • Skip Connection
    • Technique
      • DenseNet Architecture
      • Advantages
        • Parameter Efficiency & Model Compactness
        • Feature Reuse & Collective Knowledge
        • Implicit Deep Supervision
        • Diversified Depth
    • Key Element
      • Dense Block
      • Transition Layers
        • Components(组成部分)
        • Compression(压缩)
      • Growth Rate
      • Bottleneck Structure
    • Math
    • Use Yourself
      • [Bottleneck Structure](#Bottleneck Structure)
      • [Transition Layers](#Transition Layers)
      • [Feature Reuse](#Feature Reuse)
    • Articles
    • Blogs

Motivation

Problem to Solve

DenseNet 是在 ResNet 的基础上进行改进。

ResNet 中 identity functionweight layers output求和的方式结合,会阻碍信息的传递

个人理解:在channel维度上的拼接更能保持不同path信息的独立性(而 ResNet 会因为相加造成特征混叠的情况)

Modifications

Feature Concatenate

  • 特征拼接的方式:element-level 的求和 ==> channel-level 的拼接

Skip Connection

  • DenseNet 大幅度扩展了 skip connection,取得了一系列的优点

    训练一个完全dense的网络,然后在上面剪枝才是最好的方法,unet++如是说。

Technique

DenseNet Architecture

DenseNet的 forward propagation 的公式:

注意:该公式并不局限于一个 dense block,而在整个 DenseNet 满足

xℓ=Hℓ([x0,x1,…,xℓ−1])\mathbf{x}_{\ell}=H_{\ell}\left(\left[\mathbf{x}_{0}, \mathbf{x}_{1}, \ldots, \mathbf{x}_{\ell-1}\right]\right) xℓ​=Hℓ​([x0​,x1​,…,xℓ−1​])

  • [x0,x1,…,xℓ−1]\left[\mathbf{x}_{0}, \mathbf{x}_{1}, \ldots, \mathbf{x}_{\ell-1}\right][x0​,x1​,…,xℓ−1​]

    第 0,……,ℓ−1\ell-1ℓ−1 层 feature map 的拼接(concatenation)

  • HℓH_{\ell}Hℓ​

    复合的函数,依次包括三个部分:

    • Batch Normalization
    • ReLU
    • 3x3 Conv(1个)

    Hℓ(⋅)H_{\ell}(·)Hℓ​(⋅) 还包括了维度压缩的过程,以提高计算效率、学习紧凑的feature representation,详见 [bottleneck layers](#bottleneck layers)

Advantages

Parameter Efficiency & Model Compactness

  • parameter efficiency ==> less overfitting参数的高效利用会一定程度上避免过拟合

    One positive side-effect of the more efficient use of parameters is a tendency of DenseNets to be less prone to overfitting.

  • feature reuse ==> model compactness

实现的方式有两个:

  • bottleneck structure
  • compression of transition layers

The DenseNet-BC with bottleneck structure and dimension reduction at transition layers is particularly parameter-efficient.

Feature Reuse & Collective Knowledge

  • Collective Knowledge

    每一层均可获得其较前层的 feature map。这些不同层的 feature map 共同构成了 collective knowledge

    One explanation for this is that each layer has accessto all the preceding feature-maps in its block and, therefore,to the network’s “collective knowledge”.

  • Feature Reuse

    LLL 层的 DenseNet 有 L(L+1)2\frac{L(L+1)}{2}2L(L+1)​ 条 connection,这些 connection 实现了 feature reuse。

    • 同block的layers通过 shortcut connection 直接利用前层的 feature map
    • 不同block的layers通过 transition layers 利用被降维的前层的 feature map

    • 同block深层可以直接利用浅层特征

      Dense Block 中每层都会把权重分散同block的许多input

      All layers spread their weights over many inputs within the same block. This indicates that features extracted by very early layers are, indeed, directly used by deep layers throughout the same dense block.

    • transition layers 实现了间接的特征复用

      transition layers 也将权重分散到了之前 Dense Block 的层中

      indicating information flow from the first to the last layers of the DenseNet through few indirections.

    • transition layers 输出冗余

      第2和第3个 Dense Block 中对 transition layers 的输出都分配了最低的权重,说明 transition layers 的输出特征冗余(即便在 transition layers 进行了 Compression 也是如此)

      The layers within the second and third dense block consistently assign the least weight to the outputs of the transition layer (the top row of the triangles), indicating that the transition layer outputs many redundant features (with low weight on average). This is in keeping with the strong results of DenseNet-BC where exactly these outputs are compressed.

    • 深层中依然会产生 high-level 的信息

      最后一层分类器权重分散到其所有的输出,但明显地偏向最终的 feature map,说明网络的深层依旧产生 high-level 特征

      Although the final classification layer, shown on the very right, also uses weights across the entire dense block, there seems to be a concentration towards final feature-maps, suggesting that there may be some more high-level features produced late in the network.

Implicit Deep Supervision

分类器可以通过更短的路径(至多2~3个 transition layers),去直接监督所有的层,从而实现隐性的 deep supervision

One explanation for the improved accuracy of dense convolutional networks may be that individual layers receive additional supervision from the loss function through the shorter connections.

DenseNets perform a similar deep supervision in an implicit fashion: a single classifier on top of the network provides direct supervision to all layers through at most two or three transition layers.

其实 ResNet 也具有 Deep Supervision 的思想,即深层的分类器直接监督浅层,详见论文:Residual Networks Behave Like Ensembles of Relatively Shallow Networks ,该论文在 [ResNet](./[paper reading] ResNet.md) 中有详细的解读。

Diversified Depth

  • DenseNet 是 statistic depth 的一个特例

    there is a small probability for any two layers, between the same pooling layers, to be directly connected—if all intermediate layers are randomly dropped.

  • 针对 ResNet 的 ensemble-like behavior 同样适用于 DenseNet

    因为 ensemble-like behavior 的基础 “collection of path of different length” 在 DenseNet 依旧成立

Key Element

Dense Block

Transition Layers

==> 实现 dense block 之间的 Down Sampling

Components(组成部分)

依次包括以下三个部分:

  • Batch Normalization
  • 1x1 Conv
  • 2x2 average pooling

Compression(压缩)

对于一个 dense block 产生的 mmm 个feature map,Transition Layers 会生成 ⌊θm⌋\lfloor \theta_m \rfloor⌊θm​⌋ 个 feature map,其中 compression factor θ\thetaθ 满足 $0<\theta \leqslant 1 $

If a dense block contains mmm feature-maps, we let the following transition layer generateb ⌊θm⌋\lfloor \theta_m \rfloor⌊θm​⌋ output feature-maps, where 0<θ≤10<θ≤10<θ≤1 is referred to as the compression factor.

在实验中,θ\thetaθ 选择为 0.5 (同时使用 θ<1\theta<1θ<1 和 bottleneck 的模型称为 DenseNet-BC

Growth Rate

  • 实际意义

    每层对 global state 贡献多少的 new information (因为每层会自己产生 kkk 个 feature map)

    The growthrate regulates how much new information each layer contributes to the global state.

    • 第 ℓth\ell^{th}ℓth 层的 input feature map 的channel数 ==> 前层的 feature map 在深度上叠加
      k0+k×(ℓ−1)k_0 + k ×(\ell - 1) k0​+k×(ℓ−1)

    • 第 ℓth\ell^{th}ℓth 层的 output feature map 的channel数 ==> 固定值 Growth Rate kkk
      kk k

      至于为什么能做到每层的 output feature map 的 channel 固定Growth Rate kkk,参见 [bottleneck layers](#bottleneck layers)

Bottleneck Structure

  • 原因和优势

    • 原因

      如果不增加 bottleneck layers,每个 layer 的输出 feature map 的通道指数增长

      举一个例子,假设每层都依照 Growth Rate 产生 k0k_0k0​ 个 channel 的 feature map。则:

      • 第1层 feature map 的 channel:
        c1=k0c_1 = k_0 c1​=k0​

      • 第2层 feature map 的 channel:
        c2=k0+k0=2k0c_2 = k_0 + k_0 = 2k_0 c2​=k0​+k0​=2k0​

      • 第3层 feature map 的 channel:
        c3=k0+2k0+k0=4k0c_3 = k_0 + 2k_0 + k_0 = 4k_0 c3​=k0​+2k0​+k0​=4k0​

      • ……

      • 第 ℓ\ellℓ 层 feature map 的 channel:
        cℓ=2ℓ−1⋅k0c_{\ell} = 2^{\ell-1}·k_0 cℓ​=2ℓ−1⋅k0​

      这种指数级别的通道数是不允许存在的,过多的通道数会极大的增加参数量,从而极大降低运行速度。

    • 优势

      1. 提高了计算效率
      2. 学习紧凑的 feature representation
  • 原理

    注意:1x1 Conv 的位置是在 3x3 Conv**(正常操作)之前**,先对 input feature map 进行降维。

    否则起不到 computational efficiency 的效果

    每个 3x3 Conv 前加上 1x1 Conv ,对 channel 维度进行降维压缩

    a 1×1 convolution can be introduced as bottleneck layer before each 3×3 convolution to reduce the number of input feature-maps, and thus to improve computational efficiency.

    BN−ReLU−Conv(1×1)==>BN−ReLU−Conv(3×3)BN-ReLU -Conv(1×1) ==> BN-ReLU -Conv(3×3) BN−ReLU−Conv(1×1)==>BN−ReLU−Conv(3×3)

  • 参数设置

    在论文中,作者令每个 1x1 Conv 产生 4kkk feature maps(将对应网络结构称为 DenseNet-B

Math

本文没有大量的数学公式,故将math分散在了各章节。

Use Yourself

[Bottleneck Structure](#Bottleneck Structure)

bottleneck structure 是在 block-level 起作用,在以下方面具有良好的作用:

  • 控制channel维度
  • 提高参数效率
  • 提高计算效率

[Transition Layers](#Transition Layers)

bottleneck structure 是在 layer-level 起作用,优势与 Bottleneck Structure 类似:

  • 控制channel维度
  • 提高参数效率
  • 提高计算效率

[Feature Reuse](#Feature Reuse)

具有以下的优点:

  • multi-level:可以同时利用 low-level 和 high-level 的优势
  • multi-scale:low-level 一般具有较高的空间分辨率,而 high-level 一般具有较低的空间分辨率
  • model compactness:避免了特征的重复学习

Articles

Blogs

  • 指数增长的通道数:深入解读DenseNet(附源码)

[paper reading] DenseNet相关推荐

  1. [paper reading] SSD

    [paper reading] SSD GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到CSDN ...

  2. [paper reading] ResNet

    [paper reading] ResNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到C ...

  3. cvpr2019/cvpr2018/cvpr2017(Papers/Codes/Project/Paper reading)

    cvpr2019/cvpr2018/cvpr2017(Papers/Codes/Project/Paper reading) Source:https://mp.weixin.qq.com/s/SmS ...

  4. [paper reading] FCOS

    [paper reading] FCOS GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yourself,即对于本文的理解和 ...

  5. [paper reading] CenterNet (Object as Points)

    [paper reading] CenterNet (Object as Points) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更 ...

  6. [paper reading] CenterNet (Triplets)

    [paper reading] CenterNet (Triplets) GitHub:Notes of Classic Detection Papers 2020.11.09更新:更新了Use Yo ...

  7. [paper reading] CornerNet

    [paper reading] CornerNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...

  8. [paper reading] RetinaNet

    [paper reading] RetinaNet GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能 ...

  9. [paper reading] YOLO v1

    [paper reading] YOLO v1 GitHub:Notes of Classic Detection Papers 本来想放到GitHub的,结果GitHub不支持公式. 没办法只能放到 ...

最新文章

  1. 读取SBT项目resources目录中的文件
  2. 极简版 卸载 home 扩充 根分区--centos7 xfs 文件格式
  3. xdm,把我大学四年能用到的软件都分享给你。
  4. My github blog
  5. 麦克风增强软件_唱吧麦克风不会唱歌用它唱都好听,《向往的生活》同款麦克风...
  6. reids和memcache的区别和一些常见的问题
  7. 搭载华为HiCar 新宝骏跨界融合打造智能出行生态圈
  8. 1. JanusGraph的优势
  9. 奔图 Pantum P2206NW 打印机驱动
  10. 大数据工程师、数据挖掘师和数据分析师有啥区别
  11. 用树莓派DIY便携式警报器
  12. ASP.NET中,HiddenField隐藏控件的用法
  13. 阿里钉钉、蚂蚁、饿了么,淘宝真实面试分享
  14. 卢松松:寄生式创业更容易成功
  15. 狂胜——Redis学习笔记
  16. mysql 1033 Incorrect information in file 错误
  17. OAuth2.0最简向导(多图预警)
  18. 计算机开根号原理,根号的原理_怎么开的根号,有原理吗
  19. vue+axios以流的形式下载文件
  20. 计算机热点方向、历史、未来

热门文章

  1. python入门指南-Python完全小白入门指南
  2. python必备入门代码-python基础入门这一篇就够
  3. 语音识别的原理_语音识别原理_语音识别原理框图 - 云+社区 - 腾讯云
  4. 灵云语音识别(ASR)实现实时识别
  5. get post请求区别_网页常见的两种请求方式Get和Post
  6. ffmpeg转码器移植VC的工程:ffmpeg for MFC
  7. python 选择多个文件_python-PyQt QFileDialog-多目录选择
  8. c 字符串数组_C语言探索之旅 | 第二部分第四课:字符串
  9. 鼎微方案导航一体机刷机包_SMB方案之星 | 海康威视人脸门禁一体机产品应用方案集锦...
  10. 怎么用Android做一个信息管理系统,从零开始设计一个管理系统