【GhostNet】《GhostNet:More Features from Cheap Operations》
CVPR-2020
文章目录
- 1 Background and Motivation
- 2 Related Work
- 3 Advantages / Contributions
- 4 Method
- 4.1 Ghost Module for More Features
- 4.2 Building Efficient CNNs
- 5 Experiments
- 5.1 Datasets
- 5.2 Efficiency of Ghost Module
- 5.3 GhostNet on Visual Benchmarks
- 6 Conclusion(own) / Future work
1 Background and Motivation
由于内存和计算资源的限制,在嵌入式设备上部署卷积神经网络非常困难。
The redundancy in feature maps is an important characteristic of those successful CNNs, but has rarely been investigated in neural architecture design.
本文作者提出 GhostNet,保留部分固有特征(intrinsic features),通过固有特征的线性变换(cheap operation)模拟生成相对冗余的特征(ghost features),降低计算量的同时,保持了特征的多样性——图1 中同颜色的框,可以看成一个是 intrinsic feature,一个是 intrinsic feature 通过线性变换得到的 ghost feature
为啥要冗余特征?
Abundant and even redundant information in the feature maps of well-trained deep neural networks often guarantees a comprehensive understanding of the input data.
Redundancy in feature maps could be an important characteristic for a successful deep neural network.
作者 embrace 冗余特征, but in a cost-efficient way
2 Related Work
- Model Compression
- Pruning connections
- Channel pruning
- Model quantization
- binarization methods
- Tensor decomposition
- Knowledge distillation
- Compact Model Design
- MobileNets
- MobileNets V2
- MobileNets V3
- ShuffleNet
- ShuffleNet V2
3 Advantages / Contributions
提出 GhostNet,分类任务中,速度精度权衡的比 mobilenetv3 好
4 Method
4.1 Ghost Module for More Features
输入 X∈Rc×h×wX \in \mathbb{R}^{c \times h \times w}X∈Rc×h×w
卷积产生的特征图 Y∈Rh′×w′×nY \in \mathbb{R}^{h' \times w' \times n}Y∈Rh′×w′×n, Y=X∗f+bY = X*f +bY=X∗f+b
其中 ∗*∗ 是 conv 操作,bbb 是 bias,convolution filters f∈Rc×k×k×nf \in \mathbb{R}^{c \times k \times k \times n}f∈Rc×k×k×n
We point out that it is unnecessary to generate these redundant feature maps one by one with large number of FLOPs and parameters.
Suppose that the output feature maps are “ghosts” of a handful of intrinsic feature maps with some cheap transformations
These intrinsic feature maps are often of smaller size and produced by ordinary convolution filters.
通道数为 mmm 的 intrinsic feature Y′Y'Y′ 产生的方式为
Y′=X∗f′Y' = X * f'Y′=X∗f′,Y′∈Rh′×w′×mY' \in \mathbb{R}^{h' \times w' \times m}Y′∈Rh′×w′×m
其中 f′∈Rc×k×k×mf' \in \mathbb{R}^{c \times k \times k \times m}f′∈Rc×k×k×m,m≤nm \leq nm≤n
通过 intrinsic feature Y′Y'Y′ 产生 ghost feature 的形式如下
其中
yi′y'_iyi′ 是 iii-th intrinsic feature map
Φij\Phi_{ij}Φij 是 jjj-th linear operation for generating the jjj-th ghost feature map yijy_{ij}yij,最后一个操作固定为 identity mapping
一个 intrinsic feature 可以有多个 ghost map,{yij}j=1s\{y_{ij}\}_{j=1}^s{yij}j=1s
引入 ghost 机制后,最终的输出特征图为
Y=[y11,y12,...,y1m,y21,...,yms]Y = [y_{11}, y_{12}, ..., y_{1m}, y_{21}, ..., y_{ms}]Y=[y11,y12,...,y1m,y21,...,yms],
mmm 是 intrinsic features 的数量
看代码后,结构是这样的
图片来自于 https://zhuanlan.zhihu.com/p/115844245
cheap ops 也即 depth-wise conv
1)提出的 Ghost module 和普通卷积之间有什么不同呢?
- Ghost module 可以 have customized kernel size(intrinsic->ghost 这个过程,x->intrinsic 的时候作者为了高效还是采用的 1x1 conv),不像一些轻量级的 module,为减少计算量采用了大量的 1x1 conv
- 轻量级 module 用 point-wise 来处理 feature across channel,depth-wise conv 处理 spatial information,Ghost module 用正常卷积产生 intrinsic feature,然后 utilizes cheap linear operations to augment the features and increase the channels
- 其他 module 的 operation 仅局限于 depthwise 或者 shift ,Ghost module 是 linear operation(比如 affine transformation, wavelet transformation, and conv——包含smoothing, blurring, motion, etc.),特征可以更多样
- the identity mapping is paralleled with linear transformations(module 级别,而不是 bottleneck 级别的)
2)Ghost module Complexities 如何?
Ghost module 有 1 个 identity mapping,m⋅(s−1)=ns⋅(s−1)m \cdot (s-1) = \frac{n}{s} \cdot (s-1)m⋅(s−1)=sn⋅(s−1) 个 linear operation
和正常 conv 对比,计算量比值如下
分母两项,前面一项是 正常卷积,输入通道 c,输出通道 m=nsm = \frac{n}{s}m=sn,后面一项对通道为 mmm 的 intrinsic feature maps 每个通道 做了 s−1s-1s−1 种 linear operation(比如 depth-wise conv),the averaged kernel size of each linear operation is equal to d×dd \times dd×d
k≈dk \approx dk≈d,s≪cs \ll cs≪c
we suggest to take linear operations of the same size (e.g. 3x3 or 5x5) in one Ghost module for efficient implementation.
参数量比值如下
可以看到,计算量和参数量都约等于减少了 sss 倍数(linear operation 的个数)
larger sss leads to larger compression and speed-up ratio
4.2 Building Efficient CNNs
1)Ghost Bottlenecks
two stacked Ghost modules
一个 expansion layer increasing the number of channels
一个 reduces the number of channels to match the shortcut path
第二个 Ghost modules 没用 relu 是借鉴的 MobileNetV2 思想(通道数较少的时候不用 relu)
2)GhostNet
GhostNet-α\alphaα,multiply a factor α\alphaα on the number of channels
5 Experiments
5.1 Datasets
- CIFAR-10
- ImageNet ILSVRC 2012
- MS COCO object detection
5.2 Efficiency of Ghost Module
1)Toy Experiments
用的是 depth-wise conv
there are strong correlations between feature maps in deep neural networks and these redundant feature maps could be generated from several intrinsic feature maps.
the irregular module(各种 linear operation) will reduce the efficiency of computing units,作者推荐是 ddd 固定,用 depth-wise conv
2)CIFAR-10
a)Analysis on Hyper-parameters
固定 s=2s=2s=2(两分支),消融 ddd(非 identity mapping 分支中的 depth-wise conv 的 kernel size)
此时 d=3d=3d=3 效果最好,1x1 cannot introduce spatial information,d=5d = 5d=5 or d=7d = 7d=7 lead to overfitting and more computations
固定 d=3d=3d=3,消融 sss
计算量和参数量随着 sss 的增加明显降低,精度损失的比较缓慢
larger sss leads to larger compression and speed-up ratio
b)Comparison with State-of-the-arts
c)Visualization of Feature Maps
Although the generated feature maps are from the primary feature maps, they exactly have significant difference which means the generated features are flexible enough to satisfy the need for the specific task.
3)Large Models on ImageNet
对比不同的压缩形式
5.3 GhostNet on Visual Benchmarks
1)ImageNet Classification
人狠话不多,SOTA
Actual Inference Speed
2)Object Detection
和 V3 差不多
6 Conclusion(own) / Future work
- code:https://github.com/huawei-noah/Efficient-AI-Backbones/tree/master/ghostnet_pytorch#g-ghostnet
class GhostModule(nn.Module):def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):super(GhostModule, self).__init__()self.oup = oupinit_channels = math.ceil(oup / ratio)new_channels = init_channels*(ratio-1)self.primary_conv = nn.Sequential(nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),nn.BatchNorm2d(init_channels),nn.ReLU(inplace=True) if relu else nn.Sequential(),)self.cheap_operation = nn.Sequential(nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),nn.BatchNorm2d(new_channels),nn.ReLU(inplace=True) if relu else nn.Sequential(),)def forward(self, x):x1 = self.primary_conv(x)x2 = self.cheap_operation(x1)out = torch.cat([x1,x2], dim=1)return out[:,:self.oup,:,:]class GhostBottleneck(nn.Module):""" Ghost bottleneck w/ optional SE"""def __init__(self, in_chs, mid_chs, out_chs, dw_kernel_size=3,stride=1, act_layer=nn.ReLU, se_ratio=0.):super(GhostBottleneck, self).__init__()has_se = se_ratio is not None and se_ratio > 0.self.stride = stride# Point-wise expansionself.ghost1 = GhostModule(in_chs, mid_chs, relu=True)# Depth-wise convolutionif self.stride > 1:self.conv_dw = nn.Conv2d(mid_chs, mid_chs, dw_kernel_size, stride=stride,padding=(dw_kernel_size-1)//2,groups=mid_chs, bias=False)self.bn_dw = nn.BatchNorm2d(mid_chs)# Squeeze-and-excitationif has_se:self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio)else:self.se = None# Point-wise linear projectionself.ghost2 = GhostModule(mid_chs, out_chs, relu=False)# shortcutif (in_chs == out_chs and self.stride == 1):self.shortcut = nn.Sequential()else:self.shortcut = nn.Sequential(nn.Conv2d(in_chs, in_chs, dw_kernel_size, stride=stride,padding=(dw_kernel_size-1)//2, groups=in_chs, bias=False),nn.BatchNorm2d(in_chs),nn.Conv2d(in_chs, out_chs, 1, stride=1, padding=0, bias=False),nn.BatchNorm2d(out_chs),)def forward(self, x):residual = x# 1st ghost bottleneckx = self.ghost1(x)# Depth-wise convolutionif self.stride > 1:x = self.conv_dw(x)x = self.bn_dw(x)# Squeeze-and-excitationif self.se is not None:x = self.se(x)# 2nd ghost bottleneckx = self.ghost2(x)x += self.shortcut(residual)return x
- 可以有很多线性变换方式同时作用,作者仅用了一种线性变换方式,dw conv
摘抄一些有趣的解读
哈哈哈,深度可分离卷积倒过来,没毛病
【GhostNet】《GhostNet:More Features from Cheap Operations》相关推荐
- 【AutoAugment】《AutoAugment:Learning Augmentation Policies from Data》
arXiv-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 E ...
- 【RepVGG】《RepVGG:Making VGG-style ConvNets Great Again》
CVPR-2021 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Building Re ...
- 【FeatherNets】《FeatherNets:Convolutional Neural Networks as Light as Feather for Face Anti-spoofing》
CVPR-2019 workshop code:https://github.com/SoftwareGift/FeatherNets_Face-Anti-spoofing-Attack-Detect ...
- 【MobileNet V2】《MobileNetV2:Inverted Residuals and Linear Bottlenecks》
CVPR-2018 caffe 版本的代码:https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prot ...
- 【BiSeNet】《BiSeNet:Bilateral Segmentation Network for Real-time Semantic Segmentation》
ECCV-2018 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Method 5 Ex ...
- 【D2Det】《 D2Det:Towards High Quality Object Detection and Instance Segmentation》
CVPR-2020 Pytorch Code: https://github.com/JialeCao001/D2Det. 文章目录 1 Background and Motivation 2 Rel ...
- 【WebFace260M】《WebFace260M:A Benchmark Unveiling the Power of Million-Scale Deep Face Recognition》
CVPR-2021 文章目录 1 Background and Motivation 2 Related Work 3 Advantages / Contributions 4 Datasets an ...
- 【RichieZhu】《狙击手:幽灵战士》首部DLC
<狙击手:幽灵战士>的第1弹补丁追加下载内容官方已经开始提供,本次的内容主要新增了5张对战地图,地图都非常广阔,玩家们能更痛快的享受百发百中的乐趣.City Interactive公布&l ...
- 【Distilling】《Distilling the Knowledge in a Neural Network》
arXiv-2015 In NIPS Deep Learning Workshop, 2014 文章目录 1 Background and Motivation 2 Conceptual block ...
最新文章
- (转)互斥对象锁和临界区锁性能比较 .
- 分公司与子公司的区别及各自优势你知道吗?
- gimp 去掉一个颜色的背景_不用背景图,PPT也能做的高大上?网友:看完这页PPT,我信了...
- 监听器使用spring的bean
- Linux下的awk用法详解
- 新茶饮“降价内卷”的尽头,是供应链?
- python获取数组中最多的元素
- ECMAScript5 严格模式,JSON,及其它
- Android Button字母自动全部大写的问题
- 地产遇冷之际,行业竞争加剧,房企如何走出营销低谷,先声夺人?
- linux zfs raid,ZFS-自我恢复RAID
- (十进制高速幂+矩阵优化)BZOJ 3240 3240: [Noi2013]矩阵游戏
- android checkbox 选中事件_智慧树知到Android移动应用开发基础章节测试答案
- Mybaitis框架与Spring整合详解(三)
- WinInet 错误代码 (12001 - 12156 )
- SpringBoot实现阿里云短信接口对接
- gc计算机术语,GC是什么意思?
- Cadence Allegro修改静态铜皮为动态铜皮图文教程及视频演示
- HTML - 调用腾讯 QQ 进行客服在线聊天(PC)
- 自适应动态规划matlab,自适应动态规划ADP