ResNet讲解

1. 简介

1.1 简介

发表在2015年，2016年CVPR最佳论文:Deep Residual Learning for Image Recognition。作者：何恺明、张祥雨、任少卿、孙剑。

通过残差模块解决深层网络的退化问题，大大提升神经网络深度，各类计算机视觉任务均从深度模型提取出的特征中获益。

ResNet获得2015年ImageNet图像分类、定位、目标检测竞赛冠军，MS COCO目标检测、图像分割冠军。并在ImageNet图像分类性能上超过人类水平

论文链接

PDF链接

官方实现的代码

1.2 存在的问题(深度网络退化问题)

Resnet网络是为了解决深度网络中的退化问题，即网络层数越深时，在数据集上表现的性能却越差，如下图所示是论文中给出的深度网络退化现象。

从图中我们可以看到，作者在CIFAR-10数据集上测试了20层和56层的深度网络，结果就是56层的训练误差和测试误差反而比层数少的20层网络更大，这就是ResNet网络要解决的深度网络退化问题。

1.3 解决方案(亮点)-残差结构

而采用ResNet网络之后，可以解决这种退化问题，如下图所示。

从图中作者在ImageNet数据集上的训练结果可以看出，在没有采用ResNet结构之前，如左图所示，34层网络plain-34的性能误差要大于18层网络plain-18的性能误差。而采用ResNet网络结构的34层网络结构ResNet-34性能误差小于18层网络ResNet-18。因此，采用ResNet网络结构的网络层数越深，则性能越佳。

2. 论文解读

2.1 残差结构

接下来介绍ResNet网络原理及结构。

假设我们想要网络块学习到的映射H(x)H(x)H(x),而直接学习H(x)H(x)H(x)是比较困难的。若我们学习另一个残差函数F(x)=H(x)−xF(x)=H(x)-xF(x)=H(x)−x是可以很容易的。因此此时网络块的训练目标是将F(x)F(x)F(x)逼近与0，而不是某一个特定映射。因此，最后的映射H(x)H(x)H(x)就是将F(x)F(x)F(x)和xxx相加。H(x)=F(x)+xH(x)=F(x)+xH(x)=F(x)+x。如图所示

因此，这个网络块的输出yyy为
y=F(x)+xy=F(x)+x y=F(x)+x
由于相加必须保证xxx和F(x)F(x)F(x)是同维度的，因此可以写成通式如下，WsW_sWs用于匹配维度
y=F(x,{Wi})+Wsxy=F(x,\{W_i\})+W_sx y=F(x,{Wi})+Wsx
文中提到两种维度匹配的方式（A）用zero-padding增加维度； (B）用1x1卷积增加维度。

具体的残差结构代码，下面会讲解

2.2 卷积操作讲解

1) 1*1 卷积

def conv1x1(in_planes, out_planes, stride=1):"""1x1 convolution"""return nn.Conv2d(in_planes, out_planes,kernel_size=(1, 1),stride=(stride, stride),bias=False)

1*1卷积, 卷积只能升级通道数

因为 F=1,S=1,P=0,
W−F+2PS+1=int(W−1+01)+1=W\frac{W-F+2P}{S}+1=int(\frac{W-1+0}{1})+1=W SW−F+2P+1=int(1W−1+0)+1=W
所以1*1 卷积是不改变宽高的

理由是python是向下取整的。不是四舍五入

print(int(5.5))

结果为

所以
int(W−1+01)=W−1int(\frac{W-1+0}{1})=W-1 int(1W−1+0)=W−1

2) 3*3 卷积

def conv3x3(in_planes, out_planes, stride=1, groups=1, dilation=1):"""3x3 convolution with padding"""return nn.Conv2d(in_planes, out_planes,kernel_size=(3, 3),stride=(stride, stride),padding=dilation,groups=groups,bias=False,dilation=(dilation, dilation))

3*3卷积用来提取特征，进行下采样

如果步长为1的话，宽高不变。
W−F+2PS+1=W−3+21+1=W\frac{W-F+2P}{S}+1=\frac{W-3+2}{1}+1=W SW−F+2P+1=1W−3+2+1=W
如果步长为2的话，宽高直接变成1/2.类似于下采样
W−F+2PS+1=W−3+22+1=W2\frac{W-F+2P}{S}+1=\frac{W-3+2}{2}+1=\frac{W}{2} SW−F+2P+1=2W−3+2+1=2W

2.3 残差块讲解

官方实现的ResNet中

ResNet18,Resnet34 使用的普通的Basicblock
ResNet50,ResNet101,ResNet152使用的都是Bottleneck瓶颈结构

3.1.1 BasicBlock

class BasicBlock(nn.Module):expansion: int = 1def __init__(self,inplanes: int,planes: int,stride: int = 1,downsample: Optional[nn.Module] = None,groups: int = 1,base_width: int = 64,dilation: int = 1,norm_layer: Optional[Callable[..., nn.Module]] = None,) -> None:super().__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dif groups != 1 or base_width != 64:raise ValueError("BasicBlock only supports groups=1 and base_width=64")if dilation > 1:raise NotImplementedError("Dilation > 1 not supported in BasicBlock")# Both self.conv1 and self.downsample layers downsample the input when stride != 1self.conv1 = conv3x3(inplanes, planes, stride)self.bn1 = norm_layer(planes)self.relu = nn.ReLU(inplace=True)self.conv2 = conv3x3(planes, planes)self.bn2 = norm_layer(planes)self.downsample = downsampleself.stride = stridedef forward(self, x: Tensor) -> Tensor:identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)if self.downsample is not None:identity = self.downsample(x)out += identityout = self.relu(out)return out

3.1.2 Bottleneck

这样子设计的理由

在resnet50以后，由于层数的增加残差块发生了变化，从原来3x3卷积变为三层卷积，卷积核分别为1x1、3x3、1x1，减少了网络参数。主要通过两种方式：1.用zero-padding去增加维度 2.用1x1卷积来增加维度

Bottleneck 还有两种结构

一种是输入的x进行了卷积后的out和残差identity 相加
一种是输入的x进行了卷积后的out和对残差identity 进行下采样后,进行相加

这两种不同的连接结构对应代码位置不同的部分就是downsample,这个参数

class Bottleneck(nn.Module):# Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)# while original implementation places the stride at the first 1x1 convolution(self.conv1)# according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.# This variant is also known as ResNet V1.5 and improves accuracy according to# https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.expansion: int = 4def __init__(self,inplanes: int,planes: int,stride: int = 1,downsample: Optional[nn.Module] = None,groups: int = 1,base_width: int = 64,dilation: int = 1,norm_layer: Optional[Callable[..., nn.Module]] = None,) -> None:super().__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dwidth = int(planes * (base_width / 64.0)) * groups# Both self.conv2 and self.downsample layers downsample the input when stride != 1self.conv1 = conv1x1(inplanes, width)self.bn1 = norm_layer(width)self.conv2 = conv3x3(width, width, stride, groups, dilation)self.bn2 = norm_layer(width)self.conv3 = conv1x1(width, planes * self.expansion)self.bn3 = norm_layer(planes * self.expansion)self.relu = nn.ReLU(inplace=True)self.downsample = downsampleself.stride = stridedef forward(self, x: Tensor) -> Tensor:identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample is not None:identity = self.downsample(x)out += identityout = self.relu(out)return out

Bottleneck 使用细节

与基础版的不同之处只在于这里是三个卷积，分别是1x1,3x3,1x1,分别用来压缩维度，卷积处理，恢复维度，inplane是输入的通道数，plane是输出的通道数，expansion是对输出通道数的倍乘，在basic中expansion是1，此时完全忽略expansion这个东东，输出的通道数就是plane，然而bottleneck就是不走寻常路，它的任务就是要对通道数进行压缩，再放大，于是，plane不再代表输出的通道数，而是block内部压缩后的通道数，输出通道数变为plane*expansion。接着就是网络主体了。

2.4 常见的ResNet网络结构图

了解了上述BasicBlock基础块和BotteNeck结构后，ResNet结构就直接叠加搭建了。5种不同层数的ResNet结构图如图所示，

2.5 ResNet34 具体结构图

3. 代码

3.1 ResNet18 实现手写数字识别

image-classification/2015-ResNet/ResNet实现手写数字识别.ipynb · fakerlove/cv - 码云 - 开源中国 (gitee.com)

创建模型

import torch.nn as nn
from torch.nn import functional as F
import torch
from torch.utils import data  # 获取迭代数据
from torch.autograd import Variable  # 获取变量
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision.datasets import mnist  # 获取数据集
import matplotlib.pyplot as plt
from torch import nnimport osdef conv3x3(in_planes, out_planes, stride=1):"""3x3 convolution with padding"""return nn.Conv2d(in_planes,out_planes,kernel_size=(3, 3),stride=(stride, stride),padding=1, bias=False)class ResBlk(nn.Module):"""resnet block"""def __init__(self, ch_in, ch_out, stride=1):"""小模块:param ch_in:输入通道:param ch_out: 输出通道"""super(ResBlk, self).__init__()self.conv1 = nn.Conv2d(ch_in, ch_out, kernel_size=(3, 3), stride=stride, padding=1)self.bn1 = nn.BatchNorm2d(ch_out)self.conv2 = nn.Conv2d(ch_out, ch_out, kernel_size=(3, 3), stride=(1, 1), padding=1)self.bn2 = nn.BatchNorm2d(ch_out)self.extra = nn.Sequential()if ch_out != ch_in:# [b, ch_in, h, w] => [b, ch_out, h, w]self.extra = nn.Sequential(nn.Conv2d(ch_in, ch_out, kernel_size=(1, 1), stride=stride),nn.BatchNorm2d(ch_out))def forward(self, x):""":param x: [batch_size, channel, height, weight]:return:"""out = F.relu(self.bn1(self.conv1(x)))out = self.bn2(self.conv2(out))# short cut# extra module:[b, ch_in, h, w] => [b, ch_out, h, w]# element-wise add:out = self.extra(x) + outout = F.relu(out)return outclass ResNet18(nn.Module):def __init__(self):super(ResNet18, self).__init__()self.conv1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=(3, 3), stride=(3, 3), padding=0),nn.BatchNorm2d(64))# followed 4 blocks# [b, 64, h, w] => [b, 128, h, w]self.blk1 = ResBlk(64, 128, stride=2)# [b, 128, h, w] => [b, 256, h, w]self.blk2 = ResBlk(128, 256, stride=2)# [b, 256, h, w] => [b, 512, h, w]self.blk3 = ResBlk(256, 512, stride=2)# [b, 512, h, w] => [b, 512, h, w]self.blk4 = ResBlk(512, 512, stride=2)self.outlayer = nn.Linear(512 * 1 * 1, 10)def forward(self, x):""":param x::return:"""print(x)# [b, 1, h, w] => [b, 64, h, w]x = F.relu(self.conv1(x))# [b, 64, h, w] => [b, 512, h, w]x = self.blk1(x)x = self.blk2(x)x = self.blk3(x)x = self.blk4(x)# print(x.shape) # [b, 512, 1, 1]# 意思就是不管之前的特征图尺寸为多少，只要设置为(1,1)，那么最终特征图大小都为(1,1)# [b, 512, h, w] => [b, 512, 1, 1]x = F.adaptive_avg_pool2d(x, [1, 1])x = x.view(x.size(0), -1)x = self.outlayer(x)return x

加载数据

path = r"./model"
if not os.path.exists(path):os.mkdir(path)def get_dataloader(mode):"""获取数据集加载:param mode::return:"""#准备数据迭代器# 这里我已经下载好了，所以是否需要下载写的是false#准备数据集，其中0.1307，0.3081为MNIST数据的均值和标准差，这样操作能够对其进行标准化#因为MNIST只有一个通道（黑白图片）,所以元组中只有一个值dataset = torchvision.datasets.MNIST('../../data/mini', train=mode,download=False,transform=torchvision.transforms.Compose([torchvision.transforms.ToTensor(),torchvision.transforms.Normalize((0.1307,), (0.3081,))]))return DataLoader(dataset, batch_size=64, shuffle=True)

进行训练和测试

def train(epoch):loss_count = []# 获取训练集train_loader = get_dataloader(True)print("训练集的长度", len(train_loader))for i, (x, y) in enumerate(train_loader):# 通道数是1 ,28*28的灰度图,batch_size=64batch_x = Variable(x)  # torch.Size([batch_size, 1, 28, 28])batch_y = Variable(y)  # torch.Size([batch_size])# 获取最后输出out = model(batch_x)  # torch.Size([batch_size,10])# 获取损失loss = loss_func(out, batch_y)# 使用优化器优化损失opt.zero_grad()  # 清空上一步残余更新参数值loss.backward()  # 误差反向传播，计算参数更新值opt.step()  # 将参数更新值施加到net的parmeters上if i % 200 == 0:loss_count.append(loss.item())print('训练次数{}---{}:\t--损失值{}'.format(epoch,i, loss.item()))# 保存训练模型，以便下次使用torch.save(model.state_dict(), r'./model/resnet_model.pkl')# 打印测试诗句# print(loss_count)plt.figure('PyTorch_CNN_的损失值')plt.plot(range(len(loss_count)), loss_count, label='Loss')plt.title('PyTorch_CNN_的损失值')plt.legend()plt.show()def test():# 获取测试集accuracy_sum = []test_loader = get_dataloader(False)for index, (a, b) in enumerate(test_loader):test_x = Variable(a)test_y = Variable(b)out = model(test_x)accuracy = torch.max(out, 1)[1].numpy() == test_y.numpy()accuracy_sum.append(accuracy.mean())if index % 100 == 0:print('测试了100批次准确率为:\t', accuracy.mean())print('总准确率：\t', sum(accuracy_sum) / len(accuracy_sum))# 精确率图plt.figure('Accuracy')print(accuracy_sum)plt.plot(range(len(accuracy_sum)), accuracy_sum, 'o', label='accuracy')plt.title('Pytorch_CNN_准确率')plt.legend()plt.show()for epoch in range(3):train(epoch)test()

4. 自己常见问题解答

参考资料

https://blog.csdn.net/weixin_43593330/article/details/107620042

2015-ResNet讲解相关推荐

计算机视觉识别简史：从 AlexNet、ResNet 到 Mask RCNN
点上方计算机视觉联盟获取更多干货仅作学术分享,不代表本公众号立场,侵权联系删除转载于:数据派THU AI博士笔记系列推荐周志华<机器学习>手推笔记正式开源!可打印版本附pdf下载链接 ...
第十二章_网络搭建及训练
文章目录第十二章网络搭建及训练 CNN训练注意事项第十二章 TensorFlow.pytorch和caffe介绍 12.1 TensorFlow 12.1.1 TensorFlow是什么? 12 ...
CNN Architecture
CS231n: Convolutional Neural Networks for Visual Recognition Lecture 9 CNN Architecture 笔记总结主要根据ILS ...
深度学习—常见的神经网络结构
一.卷积神经网络结构常见的卷积神经网络结构: 服务器上:LeNet.AlexNet.VGG.InceptionV1-V4.Inception-ResNet.ResNet 手机上:SqueezNet. ...
OpenMMLab 目标检测
OpenMMLab 目标检测 1. 目标检测简介 1.0 常用工具 1.0.0 实用工具 1.0.1 [MMYOLO 自定义数据集从标注到部署](https://github.com/open-mml ...
深度学习经典数据集汇总
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达很多朋友在学习了神经网络和深度学习之后,早已迫不及待要开始动手实战 ...
卷积层数据放大_卷积神经网络重要回顾
上世纪60年代,Hubel等人通过对猫视觉皮层细胞的研究,提出了感受野这个概念,到80年代,Fukushima在感受野概念的基础之上提出了神经认知机的概念,可以看作是卷积神经网络的第一个实现网络,神经 ...
【深度学习】CNN图像分类：从LeNet5到EfficientNet
深度学习 Author:louwill From:深度学习笔记在对卷积的含义有了一定的理解之后,我们便可以对CNN在最简单的计算机视觉任务图像分类中的经典网络进行探索.CNN在近几年的发展历程中,从 ...
【深度学习】深度学习经典数据集汇总
深度学习数据集 Author:louwill From:深度学习笔记很多朋友在学习了神经网络和深度学习之后,早已迫不及待要开始动手实战了.第一个遇到的问题通常就是数据.作为个人学习和实验来说,很难获 ...

2015-ResNet讲解