BiSeNetV1 面部分割

1、论文

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

https://arxiv.org/abs/1808.00897.pdf

论文中提到：采用降低空间分辨率，实现实时推理速度会导致性能差。为此提出由空间路径(Spatial Path)和上下文路径(Context Path)两部分组成的双边分割网络(BiSeNet)。

Spatial Path：保存空间信息，生成高分辨率特征。

Context Path：采用快速下采样策略获得足够的感受野。

作者总结实时语义分割，加速模型的三种方法：

①、尝试限制输入大小，通过裁剪或调整大小来降低计算复杂度。虽然该方法简单有效，但空间细节的丢失破坏了预测，特别是在边界附近，导致度量和可视化精度下降。

②、对网络的通道进行修剪，以提高推理速度，特别是在基础模型的早期阶段。然而，它削弱了空间容量。

③、ENet建议放弃模型的最后阶段，追求一个非常紧凑的框架。然而，这种方法的缺点是很明显的:由于ENet在最后阶段放弃了降采样操作，模型的接受域不足以覆盖大对象，导致识别能力较差。

详细参考：https://blog.csdn.net/sinat_17456165/article/details/106152907

2、网络结构

各个模块

①Spatial Path：由几组卷积+BN+relu组成每层卷积步长为2.

特点：网络浅、通道宽。作用：保留丰富的空间信息生成高分辨率特征。

class SpatialPath(nn.Module):def __init__(self):super(SpatialPath, self).__init__()self.cbnr1=ConvBNRelu(3,64,7,2,3)self.cbnr2 = ConvBNRelu(64, 64, 3, 2, 1)self.cbnr3 = ConvBNRelu(64, 64, 3, 2, 1)self.cbnr4 = ConvBNRelu(64, 128, 1, 1, 0)self.init_weight()def init_weight(self):for ly in self.children():if isinstance(ly, nn.Conv2d):nn.init.kaiming_normal_(ly.weight, a=1)if not ly.bias is None:nn.init.constant_(ly.bias, 0)def forward(self,x):x=self.cbnr1(x)x=self.cbnr2(x)x=self.cbnr3(x)x=self.cbnr4(x)return xdef get_params(self):wd_params, nowd_params = [], []for name, module in self.named_modules():if isinstance(module, (nn.Linear, nn.Conv2d)):wd_params.append(module.weight)if not module.bias is None:nowd_params.append(module.bias)elif isinstance(module, nn.BatchNorm2d):nowd_params += list(module.parameters())return wd_params, nowd_params

②Context Path ：由ARM+轻量型网络（Res18/Xception39等）

特点：网络深。作用：获取足够多的感受野。

以res18为例：

若不使用torchvision中model库，重新写res18网络并使用其预训练模型。

网络中的参数名可以不同，但是网络层数需要一致，主要是方便参数赋值。

初始化-预训练参数的加载。

 def init_weight(self):model=resnet18(pretrained=False)model.fc=Nonemodel.load_state_dict(torch.load('resnet18-5c106cde.pth'))#如果不使用临时变量，参数值不会更新self_state_dict=self.state_dict()dict=[]for k,v in  model.state_dict().items():dict.append(v)for i,(k,v) in  enumerate(self_state_dict.items()):self_state_dict.update({k:dict[i]})self.load_state_dict(self_state_dict)

ARM模块：

细化特征，特点：计算无损耗。

③FFM 特征融合模块

主要是融合两条路径的特征map

3、数据集

人脸分割数据集CelebAMask-HQ包含3w张人脸图像，以及人脸各部分分割的mask。

数据集有19个分割标签（包含背景）：'skin', 'l_brow', 'r_brow', 'l_eye', 'r_eye', 'eye_g', 'l_ear', 'r_ear', 'ear_r', 'nose', 'mouth', 'u_lip', 'l_lip', 'neck', 'neck_l', 'cloth', 'hair', 'hat'。

mask图像是24位png图，且各个分类标签是独立的，需要将其量化并融合到一张图中转换为8位png图。

#!/usr/bin/python
# -*- encoding: utf-8 -*-import os.path as osp
import os
import cv2
from transform import *
from PIL import Imageface_data = '/data/CelebAMask-HQ/CelebA-HQ-img'
face_sep_mask = '/data/CelebAMask-HQ/CelebAMask-HQ-mask-anno'
mask_path = '/data/CelebAMask-HQ/mask'
counter = 0
total = 0
for i in range(15):atts = ['skin', 'l_brow', 'r_brow', 'l_eye', 'r_eye', 'eye_g', 'l_ear', 'r_ear', 'ear_r','nose', 'mouth', 'u_lip', 'l_lip', 'neck', 'neck_l', 'cloth', 'hair', 'hat']for j in range(i * 2000, (i + 1) * 2000):mask = np.zeros((512, 512))for l, att in enumerate(atts, 1):total += 1file_name = ''.join([str(j).rjust(5, '0'), '_', att, '.png'])path = osp.join(face_sep_mask, str(i), file_name)if os.path.exists(path):counter += 1sep_mask = np.array(Image.open(path).convert('P'))# print(np.unique(sep_mask))mask[sep_mask == 225] = lcv2.imwrite('{}/{}.png'.format(mask_path, j), mask)print(j)print(counter, total)

合并后的mask图像为：

数据集划分：train：test=9:1

4、加载数据集

①、数据增强

数据增强方法有随机裁剪、镜像、缩放、颜色空间增强等。

随机裁剪：原图像与mask同处理。

镜像：原图像与mask 镜像处理，mask中部分标签互换：眼睛、眉毛、耳朵。

缩放：原图像与mask同处理。

颜色空间：原图像进行饱和度、对比度、透明度调整。

②、加载

DataLoader与DataSet结合使用

transform转换

图像遍历

图像的批次

5、损失函数

Li:logsoftmax

lp：主损失

li：辅助损失（cp过程）

6、优化器

随机梯度下降法，超参数设置、更新。

7、日志

使用logger库记录训练过程中数据。

8、评估指标

混淆矩阵的形式:

T(F)/P(N)	预测为真	预测为假
实际为真	真阳性（TP）	假阴性（FN）
实际为假	假阳性（FP）	真阴性（TN）

计算构建：

def confusion_matrix(self,pre,lab):P_pre=pre.flatten()L_lab=lab.flatten()mask=(L_lab>=0)&(L_lab<self.num_class)confusion=np.zeros((self.num_class,self.num_class))#,dtype=np.int32#n*L+Pconfusion+=np.bincount(self.num_class*L_lab[mask].astype(int)+P_pre[mask],minlength=self.num_class**2).reshape(self.num_class,self.num_class)return confusion

由混淆矩阵计算模型的评估指标：

像素精度:

def pixel_acc(self,confusion):return np.diag(confusion).sum()/(confusion.sum())

各类别精度

 def class_acc(self,confusion):return np.diag(confusion)/np.maximum(confusion.sum(axis=1),1)#vector(1*numclass)

类别平均精度：

def mpa(self,cls_acc):return np.nanmean(cls_acc)

iou交并比

def iou(self,confusion):return np.diag(confusion) / np.maximum(np.sum(confusion,axis=1) + np.sum(confusion,axis=0) - np.diag(confusion), 1)

miou平均交并比

def miou(self,iou_):return np.nanmean(iou_)

9、结果分析

训练8w次
acc=94.95%, macc=57.41%, mIoU=52.40%

测试：

参考：

GitHub - zllrunning/face-makeup.PyTorch: Lip and hair color editor using face parsing maps.

语义分割各种评价指标实现_络小绎的博客-CSDN博客_语义分割评价指标