文章目录

  • HRNet CVPR2019
    • 1. 简介
    • 2. 网络架构
      • 2.1 总揽图
      • 2.2 3*3卷积块
      • 2.3 BasicBlock
      • 2.4 三层残差块
      • 2.5 HighResolutionNet
        • 结构初始化 `__init__()`
        • 构建 stage 间转换层 `_make_transition_layer()`
        • 构建 stage1 的 layer `_make_layer()`
        • 构建 stage 2/3/4 的 layer `_make_stage`
      • 2.6 高分辨率模块HighResolutionModule
        • check_branches()
        • 构建一个横向分支make_one_branch
        • forward
        • 构建 multi-scale 特征融合层:fuse_layer函数
        • transition_layers函数(上图中画叉的那一个分支)
    • 3. 训练
    • 4. 代码
      • 4.1 简易版
      • 4.2 原版

HRNet CVPR2019

HRNet,是高分辨率网络 (High-Resolution Net) 的缩写。

论文地址

代码链接

代码链接2

1. 简介

中科大和微软亚洲研究院,发布了新的人体姿态估计模型,刷新了三项COCO纪录,还中选了CVPR 2019

这个名叫HRNet的神经网络,拥有与众不同的并联结构,可以随时保持高分辨率表征,不只靠从低分辨率表征里,恢复高分辨率表征。如此一来,姿势识别的效果明显提升:

在COCO数据集的关键点检测姿态估计多人姿态估计这三项任务里,HRNet都超越了所有前辈。

改变输入头,就可以做目标分割,分类等任务

2. 网络架构

2.1 总揽图

第一步stem net

从 IMG 到 1/4 大小的 feature map,得到此尺寸的特征图后,之后的 HRNet 始终保持此尺寸的图片

第二步HRNet 4 stages:如下图所示的 4 阶段 由 HighResolutionModule 组成的模型

  • 每个stage产生的multi-scale特征图
  • stage 的连接处有 transition 结构,用于在不同 stage 之间连接,完成 channels 及 feature map 大小对应。

第三步segment head

将stage4输出的4种scale特征concat到一起,加上num_channels->num_classes层,得到分割结果

2.2 3*3卷积块

def conv3x3(in_planes, out_planes, stride=1):"""3x3 convolution with padding"""return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,padding=1, bias=False)

2.3 BasicBlock

class BasicBlock(nn.Module):expansion = 1def __init__(self, inplanes, planes, stride=1, downsample=None):super(BasicBlock, self).__init__()self.conv1 = conv3x3(inplanes, planes, stride)self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.relu = nn.ReLU(inplace=True)self.conv2 = conv3x3(planes, planes)self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.downsample = downsampleself.stride = stridedef forward(self, x):residual = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)if self.downsample is not None:residual = self.downsample(x)out += residualout = self.relu(out)return out

2.4 三层残差块

expansion的参数,这个参数用来控制卷积的输入输出通道数。

class Bottleneck(nn.Module):expansion = 4def __init__(self, inplanes, planes, stride=1, downsample=None):super(Bottleneck, self).__init__()self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,padding=1, bias=False)self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,bias=False)self.bn3 = nn.BatchNorm2d(planes * self.expansion,momentum=BN_MOMENTUM)self.relu = nn.ReLU(inplace=True)self.downsample = downsampleself.stride = stridedef forward(self, x):residual = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample is not None:residual = self.downsample(x)out += residualout = self.relu(out)return out

2.5 HighResolutionNet

  1. 原图先降成1/4大小
  2. 执行1个stage1(4个block)
  3. 通过卷积生成1/2分辨率的流(现在有两条流)
  4. 执行1个stage2(两个流的4个block以及两个流之间交融)
  5. 通过卷积生成1/4分辨率的流(现在有三条流)
  6. 执行4个stage3(三个流的4个block以及三个流之间交融)
  7. 通过卷积生成1/8分辨率的流(现在有四条流)
  8. 执行3个stage4(四个流的4个block以及四个流之间交融)
  9. 上采样下面三条流,使之大小变回原大小,在concat拼接channel用于后续分割任务

结构初始化 __init__()

HRNet 类定义,通过 config 指定的模型结构,实例化特定结构的模型,构建过程如下

def __init__(self, config, **kwargs):"""# stem net# 两层 3x3 conv,stride=2,得到 1/4 大小的 feature map# 开始 HRModule 阶段# 每个 stage 不仅保留之前所有 size 的特征,还增加一个新的下采样 size 特征# stage1: [1/4]# stage2: [1/4, 1/8]# stage3: [1/4, 1/8, 1/16]# stage4: [1/4, 1/8, 1/16, 1/32]# last_layers,即 segment head# 从 num_channels 到 num_classes,完成语义分割"""

构建 stage 间转换层 _make_transition_layer()

transition layer 完成 stage 之间连接需要的 两种转换

  • input channels 转换
  • feature size downsample
def _make_transition_layer(self, num_channels_pre_layer, num_channels_cur_layer):""":param num_channels_pre_layer: pre_stage output channels list:param num_channels_cur_layer: cur_stage output channels listcur 总比 pre 多一个 output_channel 对应增加的 1/2 下采样stage2      stage3          stage4pre:    [256]       [48,96]         [48,96,192]cur:    [48,96]     [48,96,192]     [48,96,192,384]每个 stage channels 数量也对应了 stage2/3/4 使用 BASIC block; expansion=1:return:transition_layers:1.完成 pre_layer 到 cur_layer input channels 数量对应2.完成 feature map 尺寸对应"""

以下为 hrnet_w48 的 transition 具体结构

# stage 1-2(transition1): ModuleList(# input channels,从 1/4 到 1/4,完成通道数量转换(0): Sequential((0): Conv2d(256, 48, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)(1): BatchNorm2d(48, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU())# input channels + downsample,从 1/4 到 1/8,不仅通道数量,而且使用 stride=2 进行下采样(1): Sequential((0): Sequential((0): Conv2d(256, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU())))# stage 2-3(transition2): ModuleList((0): None  # 因为 同层对应的连接处的 feature map channels 和 size 一致,所以不需要转换(1): None# downsample,stage2 末尾,从 1/8 到 1/16,需要使用 stride=2 下采样(2): Sequential((0): Sequential((0): Conv2d(96, 192, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(1): BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU())))# stage 3-4(transition3): ModuleList((0): None(1): None(2): None# downsample,同 stage2 用法一样,因为前3个branch对应的 feature map 可以直接连接,所以只要对末尾完成 1/16 到 1/32 下采样(3): Sequential((0): Sequential((0): Conv2d(192, 384, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)(1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)(2): ReLU())))

构建 stage1 的 layer _make_layer()

stage1 产生 1/4 feature map,没有 branch 分支结构,采用与 resnet 完成一样的 _make_layer() 函数构建层

def _make_layer(self, block, inplanes, planes, blocks, stride=1):""":param block: BasicBlock / Bottleneck:param inplanes: 输入通道数:param planes: 中间通道数:param blocks: layer 内 block 重复次数:param stride: 步长 >1 说明 layer 连接处有下采样,需要 downsample:return:"""downsample = Noneif stride != 1 or inplanes != planes * block.expansion:# stride=1 and inplanes == planes * block.expansion; 为 layer 内部 blockdownsample = nn.Sequential(nn.Conv2d(inplanes, planes * block.expansion,kernel_size=1, stride=stride, bias=False),BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),)layers = []layers.append(block(inplanes, planes, stride, downsample))inplanes = planes * block.expansionfor i in range(1, blocks):layers.append(block(inplanes, planes))return nn.Sequential(*layers)

构建 stage 2/3/4 的 layer _make_stage

stage 2/3/4 为 HRNet 核心结构,用到了 HighResolutionModule,内含 branch 构建和 特征 fuse 模块

def _make_stage(self, layer_config, num_inchannels, multi_scale_output=True):"""创建 num_modules 个 HighResolutionModule 结构,每个 module 末尾完成 hrnet 特有的特征融合模块:param layer_config: 从 yaml config 文件读取到的 stage 配置:param num_inchannels: 由 NUM_CHANNELS 和 block.expansion 相乘得到:param multi_scale_output: 都是 True:return:num_modules 个 HighResolutionModule 串联结构其中每个 HighResolutionModule 先有 branch 分支并行,末尾处再将不同 scale 的特征交叉 sum 融合"""# eg. stage2num_modules = layer_config['NUM_MODULES']  # 1, HighResolutionModule 重复次数num_branches = layer_config['NUM_BRANCHES']  # 2, 并行分支数,高度num_blocks = layer_config['NUM_BLOCKS']  # [4,4],每个分支 block 重复次数num_channels = layer_config['NUM_CHANNELS']  # [48,96],每个分支 channelsblock = blocks_dict[layer_config['BLOCK']]  # BASICfuse_method = layer_config['FUSE_METHOD']  # SUM,multi scale 特征融合方式modules = []for i in range(num_modules):  # HighResolutionModule 重复次数if not multi_scale_output and i == num_modules - 1:reset_multi_scale_output = Falseelse:reset_multi_scale_output = Truemodules.append(HighResolutionModule(num_branches,  # 高度block,  # BASIC/BOTTLENECKnum_blocks,  # 宽度num_inchannels,  # block feature 宽度num_channels,fuse_method,reset_multi_scale_output))num_inchannels = modules[-1].get_num_inchannels()  # cls methodreturn nn.Sequential(*modules), num_inchannels

2.6 高分辨率模块HighResolutionModule

实现下图红框中的,branch 并行 多 scale 特征提取 和 末端将 多 scale 特征通过 upsample/downsample 方式融合

class HighResolutionModule(nn.Module):def __init__(self, num_branches, blocks, num_blocks, num_inchannels,num_channels, fuse_method, multi_scale_output=True):super(HighResolutionModule, self).__init__()self._check_branches(num_branches, blocks, num_blocks, num_inchannels, num_channels)self.num_inchannels = num_inchannelsself.fuse_method = fuse_methodself.num_branches = num_branchesself.multi_scale_output = multi_scale_outputself.branches = self._make_branches(num_branches, blocks, num_blocks, num_channels)self.fuse_layers = self._make_fuse_layers()self.relu = nn.ReLU(False)

check_branches()

这个函数的作用是检查,在高分辨率模块中num_branches(int类型),和len(num_inchannels(里面的元素是int)),和len(num_channels(里面的元素是int))它们三个的值是否相等。

def _check_branches(self, num_branches, blocks, num_blocks,num_inchannels, num_channels):if num_branches != len(num_blocks):error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(num_branches, len(num_blocks))logger.error(error_msg)raise ValueError(error_msg)if num_branches != len(num_channels):error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(num_branches, len(num_channels))logger.error(error_msg)raise ValueError(error_msg)if num_branches != len(num_inchannels):error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(num_branches, len(num_inchannels))logger.error(error_msg)raise ValueError(error_msg)

构建一个横向分支make_one_branch

它的作用就是创建一个新的分支,如图

def _make_one_branch(self, branch_index, block, num_blocks, num_channels,stride=1):downsample = Noneif stride != 1 or \self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:downsample = nn.Sequential(nn.Conv2d(self.num_inchannels[branch_index],num_channels[branch_index] * block.expansion,kernel_size=1, stride=stride, bias=False),nn.BatchNorm2d(num_channels[branch_index] * block.expansion,momentum=BN_MOMENTUM),)layers = []layers.append(block(self.num_inchannels[branch_index],num_channels[branch_index], stride, downsample))self.num_inchannels[branch_index] = \num_channels[branch_index] * block.expansionfor i in range(1, num_blocks[branch_index]):layers.append(block(self.num_inchannels[branch_index],num_channels[branch_index]))return nn.Sequential(*layers)

make_branches函数是看看每个stage里面有多少branch,然后有几个就调用几次_make_one_branch函数。

根据 stage cfg 中指定的 branch 数量,构建多个并行的 branch,调用之前的 _make_one_branch(),如 stage 2/3/4 各有 2/3/4 个 branches

def _make_branches(self, num_branches, block, num_blocks, num_channels):"""并行分支的 ModuleList 结构:param num_branches: 分支数:param block: BASIC/BOTTLENECK:param num_blocks: 每个分支 block 重复次数:param num_channels: 每个分支 channel:return:"""branches = []for i in range(num_branches):branches.append(  # add one branch, 内部 features, stride=1self._make_one_branch(i, block, num_blocks, num_channels, stride=1))return nn.ModuleList(branches)  # 使用 ModuleList 得到并行分支结果

forward

def forward(self, x):if self.num_branches == 1:return [self.branches[0](x[0])]for i in range(self.num_branches):x[i] = self.branches[i](x[i])x_fuse = []for i in range(len(self.fuse_layers)):y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])for j in range(1, self.num_branches):if i == j:y = y + x[j]else:y = y + self.fuse_layers[i][j](x[j])x_fuse.append(self.relu(y))return x_fuse

构建 multi-scale 特征融合层:fuse_layer函数

HighResolutionModule 末尾的特征融合层

以下图红框即 stage3 中 蓝色 branch 输出结果为例,其输出结果要转换成 4 种尺度的特征,用于每个 branch 末尾的特征融合