《MobileNets-v2原始论文：Mobilenetv2: Inverted residuals and linear bottlenecks》

GitHub：tonylins/pytorch-mobilenet-v2

GitHub：d-li14/mobilenetv2.pytorch

相较MobelNets-v1，准确率更高，模型更小；

MobileNet v1的特色就是深度可分离卷积，但研究人员发现深度可分离卷积中有大量卷积核为0，即有很多卷积核没有参与实际计算。是什么原因造成的呢？v2的作者发现是ReLU激活函数的问题，认为 ReLU这个激活函数，在低维空间运算中会损失很多信息，而在高维空间中会保留较多有用信息 。

既然如此，v2的解决方案也很简单，就是直接将ReLU6激活换成线性激活函数，当然也不是全都换，只是将最后一层的ReLU换成线性函数。具体到v2网络中就是将最后的Point-Wise卷积的ReLU6都换成线性函数。v2给这个操作命名为linear bottleneck，这也是v2网络的第一个关键点。

深度卷积（Depth-Wise）本身没有改变通道的作用，比如本文前例中的深度可分离卷积的例子，在前一半的深度卷积操作中，输入是3个通道，输出还是3个通道。所以为了能让深度卷积能在高维上工作，v2提出在深度卷积之前加一个扩充通道的卷积操作，什么操作能给通道升维呢？当然是1*1卷积了。

这种在深度卷积之前扩充通道的操作在v2中被称作Expansion layer。这也是v2网络的第二个关键点。

MobileNet v1虽然加了深度可分离卷积，但网络主体仍然是VGG的直筒型结构。所以v2网络的第三个大的关键点就是借鉴了ResNet的残差结构，在v1网络结构基础上加入了跳跃连接。相较于ResNet的残差块结构，v2给这种结构命名为Inverted resdiual block，即倒残差块。与ResNet相比，我们来仔细看一下“倒”在哪。

从图中可以看到，ResNet是先0.25倍降维，然后标准3*3卷积，再升维，而MobileNet v2则是先6倍升维，然后深度可分离卷积，最后再降维。更形象一点我们可以这么画：

MobileNet v2的维度升降顺序跟ResNet完全相反，所以才叫倒残差。

综合上述三个关键点：Linear Bottlenecks、Expansion layer和Inverted resdiual之后就组成了MobileNet v2的block，如下图所示。

MobileNet v2的网络结构如下图所示。

可以看到，输入经过一个常规卷积之后，v2网络紧接着加了7个bottleneck block层，然后再两个11卷积和一个77的平均池化的组合操作。

一、经典残差结构

每层卷积层的输入维度、输出维度

输入：三维数据，(宽winw_{in}win×高hinh_{in}hin×深dind_{in}din)
每层卷积层的参数：
- 感受野(receptive field)的大小 fff
- 过滤器(Filter)的数量(决定输出单元的深度) kkk
- 步幅(Stride) sss
- 补零(zero-padding)的数量 ppp
输出：三维单元，(宽woutw_{out}wout×高houth_{out}hout×深doutd_{out}dout)，其中各维度大小为：
- wout=win−f+2ps+1w_{out}=\cfrac{w_{in}-f+2p}{s}+1wout=swin−f+2p+1
- hout=hin−f+2ps+1h_{out}=\cfrac{h_{in}-f+2p}{s}+1hout=shin−f+2p+1
- dout=kd_{out}=kdout=k

二、MobelNets-v2结构

ttt：扩展因子；
ccc：输出特征矩阵的通道数，上图中的 k′k^{'}k′；
nnn：Bottleneck（指的是Inverted Bottleneck）重复的次数；
sss：每一个sequence的第一层所采用的步距，该sequence的其他层都采用1；
Each line describes a sequence of 1 or more identical (modulo stride) layers, repeated n times；
All layers in the same sequence have the same number c of output channels；
The ﬁrst layer of each sequence has a stride sss and all others use stride 1；
All spatial convolutions use 3 × 3 kernels；
The expansion factor ttt is always applied to the input size；

1、Inverted Residuals（倒残差结构）【中间胖两头瘦】【激活函数：ReLU6】

在MobileNetV2中的Inverted Residuals正好与ResNet的bottleneck residual block相反，其结构形状是中间胖两头窄。即：

在可分离卷积的前面增加一个大小为1*1的卷积进行升维（Expansion layer）【用1×1核卷积（增加通道数来升维）、3×3核卷积（不变）、用1×1核降维】；
将输入和输出的部分进行连接（residual connection）, 如下图所示（Inverted Residuals（中间大两头小））。
激活函数采用ReLU6；y=ReLU6(x)=min(max(x,0),6)y=ReLU6(x)=min(max(x,0),6)y=ReLU6(x)=min(max(x,0),6)

2、Linear Bottlenck结构

由于DW、PW都是以Relu作为激活函数，且PW会做降维，再对低维特征做ReLU时会丢失很多信息，所以：

从高维向低维转换，使用ReLU激活函数可能会造成信息丢失或破坏（所以不使用非线性激活数函数），即在PW这一部分（倒残差结构的最后一个1×1卷积层），我们不再使用ReLU激活函数而是使用线性激活函数，如下图。

三、性能对比

1、分类任务

2、目标检测

四、MobelNets-v2代码

import torch.nn as nn
import mathdef conv_bn(inp, oup, stride):return nn.Sequential(nn.Conv2d(inp, oup, 3, stride, 1, bias=False),nn.BatchNorm2d(oup),nn.ReLU6(inplace=True))def conv_1x1_bn(inp, oup):return nn.Sequential(nn.Conv2d(inp, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),nn.ReLU6(inplace=True))def make_divisible(x, divisible_by=8):import numpy as npreturn int(np.ceil(x * 1. / divisible_by) * divisible_by)class InvertedResidual(nn.Module):def __init__(self, inp, oup, stride, expand_ratio):super(InvertedResidual, self).__init__()self.stride = strideassert stride in [1, 2]hidden_dim = int(inp * expand_ratio)self.use_res_connect = self.stride == 1 and inp == oupif expand_ratio == 1:self.conv = nn.Sequential(# dwnn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),nn.BatchNorm2d(hidden_dim),nn.ReLU6(inplace=True),# pw-linearnn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),)else:self.conv = nn.Sequential(# pwnn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),nn.BatchNorm2d(hidden_dim),nn.ReLU6(inplace=True),# dwnn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),nn.BatchNorm2d(hidden_dim),nn.ReLU6(inplace=True),# pw-linearnn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),)def forward(self, x):if self.use_res_connect:return x + self.conv(x)else:return self.conv(x)class MobileNetV2(nn.Module):def __init__(self, n_class=1000, input_size=224, width_mult=1.):super(MobileNetV2, self).__init__()block = InvertedResidualinput_channel = 32last_channel = 1280interverted_residual_setting = [# t, c, n, s[1, 16, 1, 1],[6, 24, 2, 2],[6, 32, 3, 2],[6, 64, 4, 2],[6, 96, 3, 1],[6, 160, 3, 2],[6, 320, 1, 1],]# building first layerassert input_size % 32 == 0# input_channel = make_divisible(input_channel * width_mult)  # first channel is always 32!self.last_channel = make_divisible(last_channel * width_mult) if width_mult > 1.0 else last_channelself.features = [conv_bn(3, input_channel, 2)]# building inverted residual blocksfor t, c, n, s in interverted_residual_setting:output_channel = make_divisible(c * width_mult) if t > 1 else cfor i in range(n):if i == 0:self.features.append(block(input_channel, output_channel, s, expand_ratio=t))else:self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))input_channel = output_channel# building last several layersself.features.append(conv_1x1_bn(input_channel, self.last_channel))# make it nn.Sequentialself.features = nn.Sequential(*self.features)# building classifierself.classifier = nn.Linear(self.last_channel, n_class)self._initialize_weights()def forward(self, x):x = self.features(x)x = x.mean(3).mean(2)x = self.classifier(x)return xdef _initialize_weights(self):for m in self.modules():if isinstance(m, nn.Conv2d):n = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsm.weight.data.normal_(0, math.sqrt(2. / n))if m.bias is not None:m.bias.data.zero_()elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()elif isinstance(m, nn.Linear):n = m.weight.size(1)m.weight.data.normal_(0, 0.01)m.bias.data.zero_()def mobilenet_v2(pretrained=True):model = MobileNetV2(width_mult=1)if pretrained:try:from torch.hub import load_state_dict_from_urlexcept ImportError:from torch.utils.model_zoo import load_url as load_state_dict_from_urlstate_dict = load_state_dict_from_url('https://www.dropbox.com/s/47tyzpofuuyyv1b/mobilenetv2_1.0-f2a8633.pth.tar?dl=1', progress=True)model.load_state_dict(state_dict)return modelif __name__ == '__main__':net = mobilenet_v2(True)

"""
Creates a MobileNetV2 Model as defined in:
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen. (2018).
MobileNetV2: Inverted Residuals and Linear Bottlenecks
arXiv preprint arXiv:1801.04381.
import from https://github.com/tonylins/pytorch-mobilenet-v2
"""import torch.nn as nn
import math__all__ = ['mobilenetv2']def _make_divisible(v, divisor, min_value=None):"""This function is taken from the original tf repo.It ensures that all layers have a channel number that is divisible by 8It can be seen here:https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py:param v::param divisor::param min_value::return:"""if min_value is None:min_value = divisornew_v = max(min_value, int(v + divisor / 2) // divisor * divisor)# Make sure that round down does not go down by more than 10%.if new_v < 0.9 * v:new_v += divisorreturn new_vdef conv_3x3_bn(inp, oup, stride):return nn.Sequential(nn.Conv2d(inp, oup, 3, stride, 1, bias=False),nn.BatchNorm2d(oup),nn.ReLU6(inplace=True))def conv_1x1_bn(inp, oup):return nn.Sequential(nn.Conv2d(inp, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),nn.ReLU6(inplace=True))class InvertedResidual(nn.Module):def __init__(self, inp, oup, stride, expand_ratio):super(InvertedResidual, self).__init__()assert stride in [1, 2]hidden_dim = round(inp * expand_ratio)self.identity = stride == 1 and inp == oupif expand_ratio == 1:self.conv = nn.Sequential(# dwnn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),nn.BatchNorm2d(hidden_dim),nn.ReLU6(inplace=True),# pw-linearnn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),)else:self.conv = nn.Sequential(# pwnn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),nn.BatchNorm2d(hidden_dim),nn.ReLU6(inplace=True),# dwnn.Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),nn.BatchNorm2d(hidden_dim),nn.ReLU6(inplace=True),# pw-linearnn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),nn.BatchNorm2d(oup),)def forward(self, x):if self.identity:return x + self.conv(x)else:return self.conv(x)class MobileNetV2(nn.Module):def __init__(self, num_classes=1000, width_mult=1.):super(MobileNetV2, self).__init__()# setting of inverted residual blocksself.cfgs = [# t, c, n, s[1,  16, 1, 1],[6,  24, 2, 2],[6,  32, 3, 2],[6,  64, 4, 2],[6,  96, 3, 1],[6, 160, 3, 2],[6, 320, 1, 1],]# building first layerinput_channel = _make_divisible(32 * width_mult, 4 if width_mult == 0.1 else 8)layers = [conv_3x3_bn(3, input_channel, 2)]# building inverted residual blocksblock = InvertedResidualfor t, c, n, s in self.cfgs:output_channel = _make_divisible(c * width_mult, 4 if width_mult == 0.1 else 8)for i in range(n):layers.append(block(input_channel, output_channel, s if i == 0 else 1, t))input_channel = output_channelself.features = nn.Sequential(*layers)# building last several layersoutput_channel = _make_divisible(1280 * width_mult, 4 if width_mult == 0.1 else 8) if width_mult > 1.0 else 1280self.conv = conv_1x1_bn(input_channel, output_channel)self.avgpool = nn.AdaptiveAvgPool2d((1, 1))self.classifier = nn.Linear(output_channel, num_classes)self._initialize_weights()def forward(self, x):x = self.features(x)x = self.conv(x)x = self.avgpool(x)x = x.view(x.size(0), -1)x = self.classifier(x)return xdef _initialize_weights(self):for m in self.modules():if isinstance(m, nn.Conv2d):n = m.kernel_size[0] * m.kernel_size[1] * m.out_channelsm.weight.data.normal_(0, math.sqrt(2. / n))if m.bias is not None:m.bias.data.zero_()elif isinstance(m, nn.BatchNorm2d):m.weight.data.fill_(1)m.bias.data.zero_()elif isinstance(m, nn.Linear):m.weight.data.normal_(0, 0.01)m.bias.data.zero_()def mobilenetv2(**kwargs):"""Constructs a MobileNet V2 model"""return MobileNetV2(**kwargs)

参考资料：
迈微精选 | 轻量化CNN网络MobileNet系列详解
轻量化网络——MobileNet
深度学习在图像处理中的应用（tensorflow2.4以及pytorch1.10实现）
轻量级网络-Mobilenet系列(v1,v2,v3)
倒残差与线性瓶颈浅析 - MobileNetV2

移动端/嵌入式-CV模型-2018：MobelNets-v2【Inverted Residuals（中间胖两头瘦）、Linear Bottlenecks（每个倒残差的最后一个卷积层使用线性激活函数）】相关推荐

【MobileNet V2】《MobileNetV2：Inverted Residuals and Linear Bottlenecks》
CVPR-2018 caffe 版本的代码:https://github.com/shicai/MobileNet-Caffe/blob/master/mobilenet_v2_deploy.prot ...
学习观察神经网络：可视化整个模型训练中卷积层的激活
全文共3425字,预计学习时长10分钟图源:www.raincent.com 深度学习是机器学习领域中一个新的研究方向,它被引入机器学习使其更接近于最初的目标--人工智能. 深度学习是学习样本数据的 ...
文字层一点就变红_学习观察神经网络：可视化整个模型训练中卷积层的激活
全文共3425字,预计学习时长10分钟图源:www.raincent.com 深度学习是机器学习领域中一个新的研究方向,它被引入机器学习使其更接近于最初的目标--人工智能. 深度学习是学习样本数据的 ...
CV算法复现（分类算法6/6）：MobileNet（2017年V1，2018年V2，2019年V3，谷歌）
致谢:霹雳吧啦Wz:霹雳吧啦Wz的个人空间_哔哩哔哩_Bilibili 目录致谢:霹雳吧啦Wz:霹雳吧啦Wz的个人空间_哔哩哔哩_Bilibili 1 本次要点 1.1 pytorch框架语法 2 ...
Mask TextSpotter v3 来了！最强端到端文本识别模型
场景文本的识别可以用文本检测+文本识别两个过程来做,近年来端到端的场景文本识别(即Text Spotting)越来越引起学术界的重视,而华中科技大学白翔老师组的 Mask TextSpotter v1 ...
卷积神经网络学习路线（二十一） | 旷世科技 ECCV 2018 ShuffleNet V2
前言这个系列已经更新了20多篇了,感谢一直以来大家的支持和等待.前面已经介绍过MobileNet V1,MobileNet V2,MobileNet V3,ShuffleNet V1这几个针对移动端 ...
ECCV 2022 | 浙大快手提出CoText：基于对比学习和多信息表征的端到端视频OCR模型...
点击下方卡片,关注"CVer"公众号 AI/CV重磅干货,第一时间送达点击进入-> CV 微信技术交流群转载自:CSIG文档图像分析与识别专委会本文是对快手和浙大联合研 ...
Android端调用Caffe模型实现CNN分类
本文的主要内容如下. 移动端的深度学习的实现方式 tiny-cnn介绍以及移动端移植总结与改进应用截图一.移动端深度学习的几种实现方式 (1)Caffe的移动端编译项目 caffe(命令式框架) ...
我参加第七届NVIDIA Sky Hackathon——训练CV模型
如何从0开始训练自己的CV模型第一步配置基本环境(在上一篇已经配置了我参加第七届NVIDIA Sky Hackathon--训练ASR模型 ) 第二步利用labelimg制作图像数据集第三步 ...
华为诺亚实验室：端侧AI模型的技术进展与未来
主讲人 | 王云鹤华为诺亚实验室量子位编辑 | 公众号 QbitAI 近两年来,端侧AI在技术和应用方面都取得了快速发展.相较于云侧AI,端侧AI具有低时延.保护数据隐私与安全.减少云端能耗.不依 ...

移动端/嵌入式-CV模型-2018：MobelNets-v2【Inverted Residuals（中间胖两头瘦）、Linear Bottlenecks（每个倒残差的最后一个卷积层使用线性激活函数）】