动作识别0-10：mmaction2(SlowFast)-源码无死角解析（6）-模型构建总览

以下链接是个人关于mmaction2(SlowFast-动作识别) 所有见解，如有错误欢迎大家指出，我会第一时间纠正。有兴趣的朋友可以加微信：17575010159 相互讨论技术。若是帮助到了你什么，一定要记得点赞！因为这是对我最大的鼓励。文末附带\color{blue}{文末附带}文末附带公众号−\color{blue}{公众号 -}公众号−海量资源。\color{blue}{ 海量资源}。海量资源。

动作识别0-00：mmaction2(SlowFast)-目录-史上最新无死角讲解

极度推荐的商业级项目：\color{red}{极度推荐的商业级项目：}极度推荐的商业级项目：这是本人落地的行为分析项目，主要包含（1.行人检测，2.行人追踪，3.行为识别三大模块）：行为分析(商用级别)00-目录-史上最新无死角讲解

前言

通过前面的博客，我们已经知道训练模型的总体思路，以及数据集加载的整体过程，并且知道我们拿到的数据是什么。接下来我们就是要对模型进行分析。分析其是如何构建的，如何训练的，loss是如何计算的，网络测试又是如何进行等等。首先我们查看我们修改的configs/recognition/slowfast/my_slowfast_r50_4x16x1_256e_ucf101_rgb.py文件，可以看到如下关键代码：

model = dict(type='Recognizer3D', # 使用3D识别卷积（相对于2D，增加了时间维度）backbone=dict( # 主干网络相关配置type='ResNet3dSlowFast', #......slow_pathway=dict( # 慢速路径 type='resnet3d', # 使用resnet3d网络   ......)fast_pathway=dict( # 快速路径type='resnet3d', # 使用resnet3d网络......) )    cls_head=dict( # 头部的分类网络type='SlowFastHead',......)

其上的这些参数都是比较重要的，那么我们接下来就根据这些参数，去分析模型的构建过程。

ResNet3dSlowFast

首先我们分析 backbone 这个字典，其包含参数 type=‘ResNet3dSlowFast’，我们查看mmaction/models/backbones/resnet3d_slowfast.py可以找到class ResNet3dSlowFast(nn.Module): 这个类。本人的注释如下（后续有分析带读，可以结合注释一起分析），请大家暂时不要去深究每个函数的具体实现，大致了解其功能即可：

# 注册到BACKBONES容器中
@BACKBONES.register_module()
class ResNet3dSlowFast(nn.Module):"""Slowfast backbone.This module is proposed in `SlowFast Networks for Video Recognition<https://arxiv.org/abs/1812.03982>`_Args:# resnet的深度，可选参数为{18, 34, 50, 101, 152}depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.# 预训练模型的目录pretrained (str): The file path to a pretrained model.# tau, 其对应论文中的参数τresample_rate (int): A large temporal stride ``resample_rate``on input frames, corresponding to the :math:`\\tau` in the paper.i.e., it processes only one out of ``resample_rate`` frames.Default: 16.# # alpha, 其对应论文中的参数αspeed_ratio (int): Speed ratio indicating the ratio between timedimension of the fast and slow pathway, corresponding to the:math:`\\alpha` in the paper. Default: 8.channel_ratio (int): Reduce the channel number of fast pathwayby ``channel_ratio``, corresponding to :math:`\\beta` in the paper.Default: 8.slow_pathway (dict): Configuration of slow branch, should containnecessary arguments for building the specific type of pathwayand:type (str): type of backbone the pathway bases on.lateral (bool): determine whether to build lateral connectionfor the pathway.Default:.. code-block:: Pythondict(type='ResNetPathway',lateral=True, depth=50, pretrained=None,conv1_kernel=(1, 7, 7), dilations=(1, 1, 1, 1),conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1))fast_pathway (dict): Configuration of fast branch, similar to`slow_pathway`. Default:.. code-block:: Pythondict(type='ResNetPathway',lateral=False, depth=50, pretrained=None, base_channels=8,conv1_kernel=(5, 7, 7), conv1_stride_t=1, pool1_stride_t=1)"""def __init__(self,pretrained, # 是否使用预训练模型resample_rate=8, # 对应论文中的参数τspeed_ratio=8, # 对应论文中的参数αchannel_ratio=8, # 其对应论文中β的倒数slow_pathway=dict( # 慢速路径相关配置type='resnet3d', #使用resnet3d网络depth=50, # 其深度为50pretrained=None, # 是否使用预训练模型lateral=True, # 是否使用侧面链接的方式conv1_kernel=(1, 7, 7), # 第一层卷积层在时序维度上的步伐dilations=(1, 1, 1, 1),conv1_stride_t=1, # 第一层卷积层在时序维度上的步伐pool1_stride_t=1, # 第一个池化层在时序方向上的步伐inflate=(0, 0, 1, 1)),fast_pathway=dict( # 快速路径type='resnet3d', # 使用resnet3d网络depth=50,pretrained=None, # 是否加载预训练模型lateral=False, # 是否使用侧面链接的方式base_channels=8, # 基础通道数目conv1_kernel=(5, 7, 7), # 第一层卷积层在时序维度上的步伐conv1_stride_t=1,  # 第一个池化层在时序方向上的步伐pool1_stride_t=1)): # 验证的时候是否使用正则化super().__init__()# 进行相应的赋值操作self.pretrained = pretrainedself.resample_rate = resample_rateself.speed_ratio = speed_ratioself.channel_ratio = channel_ratio# 如果慢速路径使用侧面链接，设定其论文中的参数τ，以及论文中的参数α参数if slow_pathway['lateral']:slow_pathway['speed_ratio'] = speed_ratioslow_pathway['channel_ratio'] = channel_ratio# 构建快速路径和慢速路径self.slow_path = build_pathway(slow_pathway)self.fast_path = build_pathway(fast_pathway)# 对权重进行初始化，如果需要加载预训练模型def init_weights(self):"""Initiate the parameters either from existing checkpoint or fromscratch."""if isinstance(self.pretrained, str):logger = get_root_logger()msg = f'load model from: {self.pretrained}'print_log(msg, logger=logger)# Directly load 3D model.load_checkpoint(self, self.pretrained, strict=True, logger=logger)elif self.pretrained is None:# Init two branch seperately.self.fast_path.init_weights()self.slow_path.init_weights()else:raise TypeError('pretrained must be a str or None')def forward(self, x):"""Defines the computation performed at every call.Args:x (torch.Tensor): The input data，经过预处理,图像增强的视频帧Returns:tuple[torch.Tensor]: The feature of the inputsamples extracted by the backbone."""# 以间隔为self.resample_rate(默认为8，对应论文中的τ)进行帧提取# x[b,3,clip_len,w,h] --> x_slow[b,3,clip_len/self.resample_rate,w,h]# 本人的设置为: x[4,3,16,224,224] --> x_slow[4,3,2,224,224]# 3代表每张图像的输入通道数.其与w,h共同表示空间维度# x[4,3,16,224,224]中的16，x_slow[4,3,2,224,224]中的2都表示时序维度x_slow = x[:, :, ::self.resample_rate, :, :]# [b,3,clip_len/self.resample_rate,w,h] --> [b,3,clip_len/self.resample_rate,w/2,h/2]# 本人的设置为:[4,3,2,224,224] --> [4,64,2,112,112]x_slow = self.slow_path.conv1(x_slow)# [b,3,clip_len/self.resample_rate,w/2,h/2] --> [b,3,clip_len/self.resample_rate,w/4,h/4]# [b,64,2,112,112] --> [b,64,4,56,56]x_slow = self.slow_path.maxpool(x_slow)# 以间隔为self.resample_rate*(默认为8，对应论文中的τ)/self.speed_ratio(默认为8,对应论文中的α)进行帧提取# x[b,3,clip_len,w,h] --> x_slow[b,3,clip_len/(self.resample_rate*self.speed_ratio),w,h]# 本人的设置为: x[4,3,16,224,224] --> x_slow[4,3,16,224,224]# 3代表每张图像的输入通道数.其与w,h共同表示空间维度# x[b,3,16,224,224]中的16，x_slow[b,3,16,224,224]中的16都表示时序维度x_fast = x[:, :, ::self.resample_rate // self.speed_ratio, :, :]# x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w,h] --> x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w/2,h/2]# [4,8,16,224,224] --> [4,8,16,112,112]x_fast = self.fast_path.conv1(x_fast)# x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w,h] --> x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w/4,h/4]# [4,8,16,112,112] --> [4,8,16,56,56]x_fast = self.fast_path.maxpool(x_fast)#如果慢路径使用了侧面连接，则对快速通道进行转换，然后进行连接融合if self.slow_path.lateral:# x_fast 经过快速路径的conv1_lateral获得x_fast_lateral# x_fast[b,8,16,56,56] --> x_fast_lateral[b,16,2,56,56]x_fast_lateral = self.slow_path.conv1_lateral(x_fast)# 连接起来;  x_slow[b,64,2,56,56] + x_fast_lateral[b,16,2,56,56] --> x_slow[b,80,2,56,56]x_slow = torch.cat((x_slow, x_fast_lateral), dim=1)# self.slow_path.res_layers = ['layer1', 'layer2', 'layer3', 'layer4']for i, layer_name in enumerate(self.slow_path.res_layers):# 每次迭代获得一个慢速路径res_layer层res_layer = getattr(self.slow_path, layer_name)# 把 x_slow 输入 res_layer 层获得新的x_slow，迭代过程如下:# i=0 : x_slow[b, 80,   2, 56, 56] --> [4, 256,  2, 56, 56]# i=1 : x_slow[b, 320,  2, 56, 56] --> [4, 512,  2, 28, 28]# i=2 : x_slow[b, 640,  2, 56, 56] --> [4, 1024, 2, 14, 14]# i=3 : x_slow[b, 1280, 2, 56, 56] --> [4, 2048, 2, 7,  7 ]            a = x_slowx_slow = res_layer(x_slow)# 每次迭代获得一个慢速路径res_layer_fast层res_layer_fast = getattr(self.fast_path, layer_name)# 把 x_fast 输入 res_layer_fast 层获得新的 x_fast# i=0 : x_fast[4, 8,   16, 56, 56] --> [b, 32,  16, 56, 56]# i=1 : x_fast[b, 32,  16, 56, 56] --> [b, 64,  2, 28, 28]# i=2 : x_fast[b, 64,  16, 28, 28] --> [b, 128, 2, 14, 14]# i=3 : x_fast[b, 128, 16, 14, 14] --> [b, 256, 2, 7,  7 ]x_fast = res_layer_fast(x_fast)# 如果不为最后一层，且慢速路径设置为侧面连接if (i != len(self.slow_path.res_layers) - 1and self.slow_path.lateral):# No fusion needed in the final stage# 在最后阶段不需要进行融合,如果不是最后一个阶段，则调用慢速阶段的lateral_connections，lateral_name = self.slow_path.lateral_connections[i]conv_lateral = getattr(self.slow_path, lateral_name)# 把 x_fast 输入 conv_lateral 层获得 x_fast_lateral# i=0 : x_fast[b, 32,  16, 56, 56] --> x_fast_lateral[b, 64,  2, 56, 56]# i=1 : x_fast[b, 64,  16, 28, 28] --> x_fast_lateral[b, 128,  2, 28, 28]# i=2 : x_fast[b, 128, 16, 14, 14] --> x_fast_lateral[b, 256, 2, 14, 14]x_fast_lateral = conv_lateral(x_fast)# i=0 : x_slow[b,256, 2,56,56] + x_fast_lateral[b,64,2,56,56 ] --> x_slow[b,320,2,56,56]# i=1 : x_slow[b,512, 2,28,28] + x_fast_lateral[b,128,2,28,28] --> x_slow[b,640,2,28,28]# i=2 : x_slow[b,1024,2,14,14] + x_fast_lateral[b,256,2,14,14] --> x_slow[b,1280,2,14,14]x_slow = torch.cat((x_slow, x_fast_lateral), dim=1)# x_slow[4,2048,2,7,7], x_fast[b, 256, 2, 7,  7 ]out = (x_slow, x_fast)return out

论文对照

首先，我们查看如下代码(分析其前向传播过程):

        x_slow = x[:, :, ::self.resample_rate, :, :]x_slow = self.slow_path.conv1(x_slow)x_slow = self.slow_path.maxpool(x_slow)x_fast = x[:, :, ::self.resample_rate // self.speed_ratio, :, :]x_fast = self.fast_path.conv1(x_fast)x_fast = self.fast_path.maxpool(x_fast)

其对应论文中Table 1的如下过程（红色圈出部分）：

def forward(self, x)中剩下的代码就对应以下部分（注意不包含）global average pool, concate, fc 以及classes这一列：

通过注释，大家应该可以注意到以下几点

1.初始输入数据形状为x[b,3,clip_len,w,h]，slow_path以间隔为resample_rate=8=τ进行采样，fast_path以间隔为self.resample_rate/self.speed_ratio(α)=8/8=1=进行采样。可以知道 fast_path 在时间轴的采样数目为 slow_path的8倍。2.在进行横向（侧向）特征融合的时候，slow_path的路径获得的特征形状一般不进行改变，主要是调整 fast_path 输出特征的形状，让其能和 slow_path 进行匹配。进行横向连接默认使用的是conv_lateral，也就是一个卷积层。3.fast_path 和 slow_path 在空间上（长和宽）上的特征分辨率都是相等的，但是通道数，一致保持 fast_path 只有 slow_path 八分之一的状态。

结语

我相信大家看了下面的注释，已经对模型的总体架构有一定了解。但是对于细节的实现，还是存在疑问的，如下：

class ResNet3dSlowFast(nn.Module):def __init__(self,.......# 构建快速路径和慢速路径self.slow_path = build_pathway(slow_pathway)self.fast_path = build_pathway(fast_pathway)def forward(self, x):x_slow = self.slow_path.conv1(x_slow)x_fast = self.fast_path.conv1(x_fast)# self.slow_path.res_layers = ['layer1', 'layer2', 'layer3', 'layer4']for i, layer_name in enumerate(self.slow_path.res_layers):if (i != len(self.slow_path.res_layers) - 1 and self.slow_path.lateral):、.......

等等，都不是很了解，如 build_pathway，slow_path.conv1 的具体实现。接下来的博客，就会这些细节进行详细的分析。记得给我一个赞呀，相信大家看到这里也不容易的。拜拜，下篇博客再见。