CVPR-2020

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 Method
- 4.1 Ghost Module for More Features
- 4.2 Building Efficient CNNs
5 Experiments
- 5.1 Datasets
- 5.2 Efficiency of Ghost Module
- 5.3 GhostNet on Visual Benchmarks
6 Conclusion（own） / Future work

1 Background and Motivation

由于内存和计算资源的限制，在嵌入式设备上部署卷积神经网络非常困难。

The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design.

本文作者提出 GhostNet，保留部分固有特征（intrinsic features），通过固有特征的线性变换（cheap operation）模拟生成相对冗余的特征（ghost features），降低计算量的同时，保持了特征的多样性——图1 中同颜色的框，可以看成一个是 intrinsic feature，一个是 intrinsic feature 通过线性变换得到的 ghost feature

为啥要冗余特征？

Abundant and even redundant information in the feature maps of well-trained deep neural networks often guarantees a comprehensive understanding of the input data.
Redundancy in feature maps could be an important characteristic for a successful deep neural network.

作者 embrace 冗余特征, but in a cost-efficient way

2 Related Work

Model Compression
- Pruning connections
- Channel pruning
- Model quantization
- binarization methods
- Tensor decomposition
- Knowledge distillation
Compact Model Design
- MobileNets
- MobileNets V2
- MobileNets V3
- ShuffleNet
- ShuffleNet V2

3 Advantages / Contributions

提出 GhostNet，分类任务中，速度精度权衡的比 mobilenetv3 好

4 Method

4.1 Ghost Module for More Features

输入 X∈Rc×h×wX \in \mathbb{R}^{c \times h \times w}X∈Rc×h×w

卷积产生的特征图 Y∈Rh′×w′×nY \in \mathbb{R}^{h' \times w' \times n}Y∈Rh′×w′×n， Y=X∗f+bY = X*f +bY=X∗f+b

其中 ∗*∗ 是 conv 操作，bbb 是 bias，convolution filters f∈Rc×k×k×nf \in \mathbb{R}^{c \times k \times k \times n}f∈Rc×k×k×n

We point out that it is unnecessary to generate these redundant feature maps one by one with large number of FLOPs and parameters.

Suppose that the output feature maps are “ghosts” of a handful of intrinsic feature maps with some cheap transformations

These intrinsic feature maps are often of smaller size and produced by ordinary convolution filters.

通道数为 mmm 的 intrinsic feature Y′Y'Y′ 产生的方式为

Y′=X∗f′Y' = X * f'Y′=X∗f′，Y′∈Rh′×w′×mY' \in \mathbb{R}^{h' \times w' \times m}Y′∈Rh′×w′×m

其中 f′∈Rc×k×k×mf' \in \mathbb{R}^{c \times k \times k \times m}f′∈Rc×k×k×m，m≤nm \leq nm≤n

通过 intrinsic feature Y′Y'Y′ 产生 ghost feature 的形式如下

其中

yi′y'_iyi′ 是 iii-th intrinsic feature map

Φij\Phi_{ij}Φij 是 jjj-th linear operation for generating the jjj-th ghost feature map yijy_{ij}yij，最后一个操作固定为 identity mapping

一个 intrinsic feature 可以有多个 ghost map，{yij}j=1s\{y_{ij}\}_{j=1}^s{yij}j=1s

引入 ghost 机制后，最终的输出特征图为

Y=[y11,y12,...,y1m,y21,...,yms]Y = [y_{11}, y_{12}, ..., y_{1m}, y_{21}, ..., y_{ms}]Y=[y11,y12,...,y1m,y21,...,yms]，

mmm 是 intrinsic features 的数量

看代码后，结构是这样的

图片来自于 https://zhuanlan.zhihu.com/p/115844245

cheap ops 也即 depth-wise conv

1）提出的 Ghost module 和普通卷积之间有什么不同呢？

Ghost module 可以 have customized kernel size（intrinsic->ghost 这个过程，x->intrinsic 的时候作者为了高效还是采用的 1x1 conv），不像一些轻量级的 module，为减少计算量采用了大量的 1x1 conv
轻量级 module 用 point-wise 来处理 feature across channel，depth-wise conv 处理 spatial information，Ghost module 用正常卷积产生 intrinsic feature，然后 utilizes cheap linear operations to augment the features and increase the channels
其他 module 的 operation 仅局限于 depthwise 或者 shift ，Ghost module 是 linear operation（比如 affine transformation, wavelet transformation, and conv——包含smoothing, blurring, motion, etc.），特征可以更多样
the identity mapping is paralleled with linear transformations（module 级别，而不是 bottleneck 级别的）

2）Ghost module Complexities 如何？

Ghost module 有 1 个 identity mapping，m⋅(s−1)=ns⋅(s−1)m \cdot (s-1) = \frac{n}{s} \cdot (s-1)m⋅(s−1)=sn⋅(s−1) 个 linear operation

和正常 conv 对比，计算量比值如下

分母两项，前面一项是正常卷积，输入通道 c，输出通道 m=nsm = \frac{n}{s}m=sn，后面一项对通道为 mmm 的 intrinsic feature maps 每个通道做了 s−1s-1s−1 种 linear operation（比如 depth-wise conv），the averaged kernel size of each linear operation is equal to d×dd \times dd×d

k≈dk \approx dk≈d，s≪cs \ll cs≪c

we suggest to take linear operations of the same size (e.g. 3x3 or 5x5) in one Ghost module for efficient implementation.

参数量比值如下

可以看到，计算量和参数量都约等于减少了 sss 倍数（linear operation 的个数）

larger sss leads to larger compression and speed-up ratio

4.2 Building Efficient CNNs

1）Ghost Bottlenecks

two stacked Ghost modules

一个 expansion layer increasing the number of channels

一个 reduces the number of channels to match the shortcut path

第二个 Ghost modules 没用 relu 是借鉴的 MobileNetV2 思想（通道数较少的时候不用 relu）

2）GhostNet

GhostNet-α\alphaα，multiply a factor α\alphaα on the number of channels

5 Experiments

5.1 Datasets

CIFAR-10
ImageNet ILSVRC 2012
MS COCO object detection

5.2 Efficiency of Ghost Module

1）Toy Experiments

用的是 depth-wise conv

there are strong correlations between feature maps in deep neural networks and these redundant feature maps could be generated from several intrinsic feature maps.

the irregular module（各种 linear operation） will reduce the efficiency of computing units，作者推荐是 ddd 固定，用 depth-wise conv

2）CIFAR-10

a）Analysis on Hyper-parameters

固定 s=2s=2s=2（两分支），消融 ddd（非 identity mapping 分支中的 depth-wise conv 的 kernel size）

此时 d=3d=3d=3 效果最好，1x1 cannot introduce spatial information，d=5d = 5d=5 or d=7d = 7d=7 lead to overfitting and more computations

固定 d=3d=3d=3，消融 sss

计算量和参数量随着 sss 的增加明显降低，精度损失的比较缓慢

larger sss leads to larger compression and speed-up ratio

b）Comparison with State-of-the-arts

c）Visualization of Feature Maps

Although the generated feature maps are from the primary feature maps, they exactly have significant difference which means the generated features are flexible enough to satisfy the need for the specific task.

3）Large Models on ImageNet

对比不同的压缩形式

5.3 GhostNet on Visual Benchmarks

1）ImageNet Classification

人狠话不多，SOTA

Actual Inference Speed

2）Object Detection

和 V3 差不多

6 Conclusion（own） / Future work

code：https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnet_pytorch#g-ghostnet

class GhostModule(nn.Module):def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):super(GhostModule, self).__init__()self.oup = oupinit_channels = math.ceil(oup / ratio)new_channels = init_channels*(ratio-1)self.primary_conv = nn.Sequential(nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),nn.BatchNorm2d(init_channels),nn.ReLU(inplace=True) if relu else nn.Sequential(),)self.cheap_operation = nn.Sequential(nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),nn.BatchNorm2d(new_channels),nn.ReLU(inplace=True) if relu else nn.Sequential(),)def forward(self, x):x1 = self.primary_conv(x)x2 = self.cheap_operation(x1)out = torch.cat([x1,x2], dim=1)return out[:,:self.oup,:,:]class GhostBottleneck(nn.Module):""" Ghost bottleneck w/ optional SE"""def __init__(self, in_chs, mid_chs, out_chs, dw_kernel_size=3,stride=1, act_layer=nn.ReLU, se_ratio=0.):super(GhostBottleneck, self).__init__()has_se = se_ratio is not None and se_ratio > 0.self.stride = stride# Point-wise expansionself.ghost1 = GhostModule(in_chs, mid_chs, relu=True)# Depth-wise convolutionif self.stride > 1:self.conv_dw = nn.Conv2d(mid_chs, mid_chs, dw_kernel_size, stride=stride,padding=(dw_kernel_size-1)//2,groups=mid_chs, bias=False)self.bn_dw = nn.BatchNorm2d(mid_chs)# Squeeze-and-excitationif has_se:self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio)else:self.se = None# Point-wise linear projectionself.ghost2 = GhostModule(mid_chs, out_chs, relu=False)# shortcutif (in_chs == out_chs and self.stride == 1):self.shortcut = nn.Sequential()else:self.shortcut = nn.Sequential(nn.Conv2d(in_chs, in_chs, dw_kernel_size, stride=stride,padding=(dw_kernel_size-1)//2, groups=in_chs, bias=False),nn.BatchNorm2d(in_chs),nn.Conv2d(in_chs, out_chs, 1, stride=1, padding=0, bias=False),nn.BatchNorm2d(out_chs),)def forward(self, x):residual = x# 1st ghost bottleneckx = self.ghost1(x)# Depth-wise convolutionif self.stride > 1:x = self.conv_dw(x)x = self.bn_dw(x)# Squeeze-and-excitationif self.se is not None:x = self.se(x)# 2nd ghost bottleneckx = self.ghost2(x)x += self.shortcut(residual)return x

可以有很多线性变换方式同时作用，作者仅用了一种线性变换方式，dw conv

摘抄一些有趣的解读

哈哈哈，深度可分离卷积倒过来，没毛病

【GhostNet】《GhostNet：More Features from Cheap Operations》相关推荐

【AutoAugment】《AutoAugment：Learning Augmentation Policies from Data》
arXiv-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 E ...
【RepVGG】《RepVGG：Making VGG-style ConvNets Great Again》
CVPR-2021 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Building Re ...
【FeatherNets】《FeatherNets：Convolutional Neural Networks as Light as Feather for Face Anti-spoofing》
CVPR-2019 workshop code:https://github.com/SoftwareGift/FeatherNets_Face-Anti-spoofing-Attack-Detect ...
【MobileNet V2】《MobileNetV2：Inverted Residuals and Linear Bottlenecks》
CVPR-2018 caffe 版本的代码:https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prot ...
【BiSeNet】《BiSeNet：Bilateral Segmentation Network for Real-time Semantic Segmentation》
ECCV-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 Ex ...
【D2Det】《 D2Det：Towards High Quality Object Detection and Instance Segmentation》
CVPR-2020 Pytorch Code: https://github.com/JialeCao001/D2Det. 文章目录 1 Background and Motivation 2 Rel ...
【WebFace260M】《WebFace260M：A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition》
CVPR-2021 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Datasets an ...
【RichieZhu】《狙击手：幽灵战士》首部DLC
<狙击手:幽灵战士>的第1弹补丁追加下载内容官方已经开始提供,本次的内容主要新增了5张对战地图,地图都非常广阔,玩家们能更痛快的享受百发百中的乐趣.City Interactive公布&l ...
【Distilling】《Distilling the Knowledge in a Neural Network》
arXiv-2015 In NIPS Deep Learning Workshop, 2014 文章目录 1 Background and Motivation 2 Conceptual block ...

【GhostNet】《GhostNet：More Features from Cheap Operations》