转自AI Studio，原文链接：高分辨率实时抠像-RobustVideoMatting（RVM）飞桨复现 - 飞桨AI Studio

飞桨复现视频抠像RobustVideoMatting（RVM）

这是作者林山川的第二篇关于视频抠像的论文，他的第一篇论文获得CVPR2021最佳学生论文提名奖（提名只有三篇论文），论文代码地址

林山川的第一篇论文是学生时代发布的，第二篇论文RobustVideoMatting是在字节跳动发布的，期待他在微软的第三篇论文实现。

在综合研判后，决定用飞桨复现他的第二篇论文Robust High-Resolution Video Matting with Temporal Guidance，论文官网地址 Github代码地址 Gitee镜像地址

飞桨复现Github地址. Gitee地址

论文 Robust High-Resolution Video Matting with Temporal Guidance，简称RVM ，专为稳定人物视频抠像设计。不同于现有神经网络将每一帧作为单独图片处理，RVM 使用循环神经网络，在处理视频流时有时间记忆。RVM 可在任意视频上做实时高清抠像。在 Nvidia GTX 1080Ti 上实现 4K 76FPS 和 HD 104FPS。此研究项目来自字节跳动。

论文复现基本思路和过程

回想我第一次看见论文效果的时候，是在B站偶然浏览到效果展示视频，简直惊为天人，原来视频抠图还能这样玩！这不就是把《黑客帝国》开始大热的绿幕/蓝幕拍摄技术普及到人人都可以操作的地步了么！人人都可以圆大导演的梦啦！当时就暗暗定下一个小目标：一定要复现出飞桨框架的代码实现版本！一转眼，尔来有二十有一周矣。不经历风雨怎能见彩虹，作为一个视频抠像小白，从零开始复现一个CVPR最佳论文（提名）的作者的新作，即使只是把代码从Pytroch改成飞桨PaddlePaddle，过程也是充满曲折和坎坷。

在此感谢鹏城实验室AI平台，为我调试代码提供了算力支持！

复现步骤概述

首先考虑使用X2Paddle直接转换代码

可以一键转换项目和模型文件，简单方便快捷。但是有时不能转换的算子太多，这次主要是Pytorch里面的高阶函数无法转换，最终放弃此方法。当然后来知道X2Paddle专门实现了ResNet50等常见模型，而且已经跟Pytorch对齐，所以以后再复现的话，可以先到X2Paddle的复现模型库里面找，会节省很多时间。

X2Paddle转换方便，但是转换后的代码需要附带X2Paddle库，有时不太方便。

退而求次，全手动改写代码。

整体流程为：调通ResNet50模型-调通ResNet50 MattingNetwork模块-调通推理部分-调通MobileNetV3模型-调通MobileNetV3 MattingNetwork-调通推理部分

1 调通ResNet50模型骨干网

事实上选择先调ResNet50模型是对的，因为后来在调试MobileNetV3对齐的时候，难度要远远高于ResNet50模型。

2 调通MattingNetwork网络

在调通ResNet骨干网络的基础上，调通MattingNetwork网络。把网络里面的模块，拆开，一个一个的转写并测试。本ipynb文件中主要就是在AIStudio中验证演示调通ResNet50、MattingNetwork和推理部分。

3 调通推理模块

耗时最少的模块

4 调通MobileNetV3模型骨干网MattingNetwork 耗时最多的模块，一言难尽，需要单独一个项目才能讲明白：飞桨源码MobileNetV3分类模型对齐Pytorch-省事版

飞桨实现RobustVideoMatting

前期大部分步骤都是在启智云脑AI平台调试的，主要完成Pytorch到飞桨代码的转换，AIStudio项目中主要做后期精度对齐等操作，毕竟习惯把飞桨放在AIStudio中跑。

安装包和文件下载等

先下载代码： git clone https://github.com/PeterL1n/RobustVideoMatting 进入目录后安装相应的库文件 pip install -r requirements_inference.txt

需要安装av 和Pytorch1.9。注意，前面适配飞桨的启智云脑AI平台默认的Pytorch版本低，需要安装或升级到requirements_inference.txt中的版本。

In [ ]

# 安装相关包， ipywidgets安装好之后还报错要重启内核
!pip install av tqdm pims ipywidgets

复现开始！其实主要是改写啦！

1. 改写ResNet50

如前所述，X2Paddle有现成ResNet50模型代码，并且已经跟Pytorch模型对齐。但是在AIStudio里面无法顺利调试X2Paddle代码，同时自己以前也做过ResNet50模型的复现，所以最终还是选择手工复现。

另一个原因是使用X2Paddle的ResNet50模型发现在这个项目里没有对齐，所以才选择了重写。而后来还是选择把X2Paddle里面的ResNet50代码拿来用，但是把X2Paddle相关的代码，都修改成纯飞桨代码，这样就能在AIStudio里面顺利执行了。

当然最终复现结束的时候，终于弄明白一开始没对齐的原因，是因为一个参数的设置问题导致。也就是不管 X2Paddle里的现成复现，还是自己以前写的ResNet50模型，都跟Pytroch的是对齐的，只是当时没注意参数设置罢了。

1.1 ReLU和constant_init_()

In [ ]

# 手工修改几个x2paddle里面的代码，以便在AIStudio下运行
import paddle
class ReLU(paddle.nn.ReLU):def __init__(self, inplace=False):super().__init__()self.inplace = inplacedef forward(self, x):if self.inplace:out = paddle.nn.functional.relu_(x)else:out = super().forward(x)return outdef constant_init_(param, val):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=paddle.nn.initializer.Assign(paddle.full(param.shape, val, param.dtype)))paddle.assign(replaced_param, param)

1.2 PaddleDtypes()

In [ ]

# 照抄x2paddle里面的paddle_dtypes，因为x2paddle有时候抽风
# -*- coding:UTF-8 -*-
# Copyright (c) 2021  PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License"
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import paddledef string(param):""" 生成字符串。"""return "\'{}\'".format(param)def check_version():version = paddle.__version__v0, v1, v2 = version.split('.')if not ((v0 == '0' and v1 == '0' and v2 == '0') or(int(v0) >= 2 and int(v1) >= 1)):return Falseelse:return Trueclass PaddleDtypes():def __init__(self, is_new_version=True):if is_new_version:self.t_float16 = paddle.float16self.t_float32 = paddle.float32self.t_float64 = paddle.float64self.t_uint8 = paddle.uint8self.t_int8 = paddle.int8self.t_int16 = paddle.int16self.t_int32 = paddle.int32self.t_int64 = paddle.int64self.t_bool = paddle.boolelse:self.t_float16 = "paddle.fluid.core.VarDesc.VarType.FP16"self.t_float32 = "paddle.fluid.core.VarDesc.VarType.FP32"self.t_float64 = "paddle.fluid.core.VarDesc.VarType.FP64"self.t_uint8 = "paddle.fluid.core.VarDesc.VarType.UINT8"self.t_int8 = "paddle.fluid.core.VarDesc.VarType.INT8"self.t_int16 = "paddle.fluid.core.VarDesc.VarType.INT16"self.t_int32 = "paddle.fluid.core.VarDesc.VarType.INT32"self.t_int64 = "paddle.fluid.core.VarDesc.VarType.INT64"self.t_bool = "paddle.fluid.core.VarDesc.VarType.BOOL"is_new_version = check_version()
paddle_dtypes = PaddleDtypes(is_new_version)

1.3 凯明初始化

In [ ]

# 单独写凯明初始化
# Copyright (c) 2021  PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.import math
from functools import reduce
import paddle
from paddle.fluid import framework
from paddle.fluid.core import VarDesc
from paddle.fluid.initializer import XavierInitializer, MSRAInitializer
from paddle.fluid.data_feeder import check_variable_and_dtype
# from x2paddle.utils import paddle_dtypesdef _calculate_fan_in_and_fan_out(var):dimensions = var.dim()if dimensions < 2:raise ValueError("Fan in and fan out can not be computed for var with fewer than 2 dimensions")num_input_fmaps = var.shape[0]num_output_fmaps = var.shape[1]receptive_field_size = 1if var.dim() > 2:receptive_field_size = reduce(lambda x, y: x * y, var.shape[2:])fan_in = num_input_fmaps * receptive_field_sizefan_out = num_output_fmaps * receptive_field_sizereturn fan_in, fan_outdef _calculate_correct_fan(var, mode):mode = mode.lower()valid_modes = ['fan_in', 'fan_out']if mode not in valid_modes:raise ValueError("Mode {} not supported, please use one of {}".format(mode, valid_modes))fan_in, fan_out = _calculate_fan_in_and_fan_out(var)return fan_in if mode == 'fan_in' else fan_outdef _calculate_gain(nonlinearity, param=None):linear_fns = ['linear', 'conv1d', 'conv2d', 'conv3d', 'conv_transpose1d','conv_transpose2d', 'conv_transpose3d']if nonlinearity in linear_fns or nonlinearity == 'sigmoid':return 1elif nonlinearity == 'tanh':return 5.0 / 3elif nonlinearity == 'relu':return math.sqrt(2.0)elif nonlinearity == 'leaky_relu':if param is None:negative_slope = 0.01elif not isinstance(param, bool) and isinstance(param, int) or isinstance(param, float):# True/False are instances of int, hence check abovenegative_slope = paramelse:raise ValueError("negative_slope {} not a valid number".format(param))return math.sqrt(2.0 / (1 + negative_slope**2))elif nonlinearity == 'selu':return 3.0 / 4  # Value found empirically (https://github.com/pytorch/pytorch/pull/50664)else:raise ValueError("Unsupported nonlinearity {}".format(nonlinearity))class KaimingNormal(MSRAInitializer):def __init__(self, a=0, mode='fan_in', nonlinearity='leaky_relu'):super(KaimingNormal, self).__init__(uniform=False, fan_in=None, seed=0)self.a = aself.mode = modeself.nonlinearity = nonlinearitydef __call__(self, var, block=None):"""Initialize the input tensor with MSRA initialization.Args:var(Tensor): Tensor that needs to be initialized.block(Block, optional): The block in which initialization opsshould be added. Used in static graph only, default None.Returns:The initialization op"""block = self._check_block(block)assert isinstance(var, framework.Variable)assert isinstance(block, framework.Block)f_in, f_out = self._compute_fans(var)if self._seed == 0:self._seed = block.program.random_seed# to be compatible of fp16 initalizersif var.dtype == paddle_dtypes.t_float16:out_dtype = paddle_dtypes.t_float32out_var = block.create_var(name=unique_name.generate(".".join(['masra_init', var.name, 'tmp'])),shape=var.shape,dtype=out_dtype,type=VarDesc.VarType.LOD_TENSOR,persistable=False)else:out_dtype = var.dtypeout_var = varfan = _calculate_correct_fan(var, self.mode)gain = _calculate_gain(self.nonlinearity, self.a)std = gain / math.sqrt(fan)op = block._prepend_op(type="gaussian_random",outputs={"Out": out_var},attrs={"shape": out_var.shape,"dtype": int(out_dtype),"mean": 0.0,"std": std,"seed": self._seed},stop_gradient=True)if var.dtype == VarDesc.VarType.FP16:block.append_op(type="cast",inputs={"X": out_var},outputs={"Out": var},attrs={"in_dtype": out_var.dtype,"out_dtype": var.dtype})if not framework.in_dygraph_mode():var.op = opreturn opdef kaiming_normal_(param, a=0, mode='fan_in', nonlinearity='leaky_relu'):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=KaimingNormal(a=a, mode=mode, nonlinearity=nonlinearity))paddle.assign(replaced_param, param)class XavierNormal(XavierInitializer):def __init__(self, gain=1.0):super(XavierNormal, self).__init__(uniform=True, fan_in=None, fan_out=None, seed=0)self._gain = gaindef __call__(self, var, block=None):block = self._check_block(block)assert isinstance(block, framework.Block)check_variable_and_dtype(var, "Out", ["float16", "float32", "float64"],"xavier_init")fan_in, fan_out = _calculate_fan_in_and_fan_out(var)if self._seed == 0:self._seed = block.program.random_seed# to be compatible of fp16 initalizersif var.dtype == paddle_dtypes.t_float16:out_dtype = paddle_dtypes.t_float32out_var = block.create_var(name=unique_name.generate(".".join(['xavier_init', var.name, 'tmp'])),shape=var.shape,dtype=out_dtype,type=VarDesc.VarType.LOD_TENSOR,persistable=False)else:out_dtype = var.dtypeout_var = varstd = self._gain * math.sqrt(2.0 / float(fan_in + fan_out))op = block._prepend_op(type="uniform_random",inputs={},outputs={"Out": out_var},attrs={"shape": out_var.shape,"dtype": out_dtype,"min": 0,"max": std,"seed": self._seed},stop_gradient=True)if var.dtype == paddle_dtypes.t_float16:block.append_op(type="cast",inputs={"X": out_var},outputs={"Out": var},attrs={"in_dtype": out_var.dtype,"out_dtype": var.dtype})if not framework.in_dygraph_mode():var.op = opreturn opdef xavier_normal_(param, gain=1.0):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=XavierNormal(gain=gain))paddle.assign(replaced_param, param)class XavierUniform(XavierInitializer):def __init__(self, gain=1.0):super(XavierUniform, self).__init__(uniform=True, fan_in=None, fan_out=None, seed=0)self._gain = gaindef __call__(self, var, block=None):block = self._check_block(block)assert isinstance(block, framework.Block)check_variable_and_dtype(var, "Out", ["float16", "float32", "float64"],"xavier_init")fan_in, fan_out = _calculate_fan_in_and_fan_out(var)if self._seed == 0:self._seed = block.program.random_seed# to be compatible of fp16 initalizersif var.dtype == paddle_dtypes.t_float16:out_dtype = paddle_dtypes.t_float32out_var = block.create_var(name=unique_name.generate(".".join(['xavier_init', var.name, 'tmp'])),shape=var.shape,dtype=out_dtype,type=VarDesc.VarType.LOD_TENSOR,persistable=False)else:out_dtype = var.dtypeout_var = varstd = self._gain * math.sqrt(2.0 / float(fan_in + fan_out))limit = math.sqrt(3.0) * stdop = block._prepend_op(type="uniform_random",inputs={},outputs={"Out": out_var},attrs={"shape": out_var.shape,"dtype": out_dtype,"min": -limit,"max": limit,"seed": self._seed},stop_gradient=True)if var.dtype == paddle_dtypes.t_float16:block.append_op(type="cast",inputs={"X": out_var},outputs={"Out": var},attrs={"in_dtype": out_var.dtype,"out_dtype": var.dtype})if not framework.in_dygraph_mode():var.op = opreturn opdef xavier_uniform_(param, gain=1.0):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=XavierUniform(gain=gain))paddle.assign(replaced_param, param)def constant_init_(param, val):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=paddle.nn.initializer.Assign(paddle.full(param.shape, val, param.dtype)))paddle.assign(replaced_param, param)def normal_init_(param, mean=0.0, std=1.0):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=paddle.nn.initializer.Assign(paddle.normal(mean=mean, std=std, shape=param.shape)))paddle.assign(replaced_param, param)def ones_init_(param):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=paddle.nn.initializer.Assign(paddle.ones(param.shape, param.dtype)))paddle.assign(replaced_param, param)def zeros_init_(param):replaced_param = paddle.create_parameter(shape=param.shape,dtype=param.dtype,default_initializer=paddle.nn.initializer.Assign(paddle.zeros(param.shape, param.dtype)))paddle.assign(replaced_param, param)

1.5 ResNet主代码

In [ ]

# 使用x2paddle里面的代码,完全临摹torch的ResNet，以便参数对齐
import paddle
import paddle.nn as nn
from paddle import Tensor
from paddle.utils.download import get_weights_path_from_url
from typing import Type, Any, Callable, Union, List, Optional
# from x2paddle import storch2paddle__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152','resnext50_32x4d', 'resnext101_32x8d', 'wide_resnet50_2', 'wide_resnet101_2'
]model_urls = {'resnet18':'https://x2paddle.bj.bcebos.com/vision/models/resnet18-pt.pdparams','resnet34':'https://x2paddle.bj.bcebos.com/vision/models/resnet34-pt.pdparams','resnet50':'https://x2paddle.bj.bcebos.com/vision/models/resnet50-pt.pdparams','resnet101':'https://x2paddle.bj.bcebos.com/vision/models/resnet101-pt.pdparams','resnet152':'https://x2paddle.bj.bcebos.com/vision/models/resnet152-pt.pdparams','resnext50_32x4d':'https://x2paddle.bj.bcebos.com/vision/models/resnext50_32x4d-pt.pdparams','resnext101_32x8d':'https://x2paddle.bj.bcebos.com/vision/models/resnext101_32x8d-pt.pdparams','wide_resnet50_2':'https://x2paddle.bj.bcebos.com/vision/models/wide_resnet50_2-pt.pdparams','wide_resnet101_2':'https://x2paddle.bj.bcebos.com/vision/models/wide_resnet101_2-pt.pdparams',
}def conv3x3(in_planes: int,out_planes: int,stride: int=1,groups: int=1,dilation: int=1) -> nn.Conv2D:"""3x3 convolution with padding"""return nn.Conv2D(in_planes,out_planes,kernel_size=3,stride=stride,padding=dilation,groups=groups,bias_attr=False,dilation=dilation)def conv1x1(in_planes: int, out_planes: int, stride: int=1) -> nn.Conv2D:"""1x1 convolution"""return nn.Conv2D(in_planes, out_planes, kernel_size=1, stride=stride, bias_attr=False)class BasicBlock(nn.Layer):expansion: int = 1def __init__(self,inplanes: int,planes: int,stride: int=1,downsample: Optional[nn.Layer]=None,groups: int=1,base_width: int=64,dilation: int=1,norm_layer: Optional[Callable[..., nn.Layer]]=None) -> None:super(BasicBlock, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2dif groups != 1 or base_width != 64:raise ValueError('BasicBlock only supports groups=1 and base_width=64')if dilation > 1:raise NotImplementedError("Dilation > 1 not supported in BasicBlock")# Both self.conv1 and self.downsample layers downsample the input when stride != 1self.conv1 = conv3x3(inplanes, planes, stride)self.bn1 = norm_layer(planes)self.relu = ReLU(True)self.conv2 = conv3x3(planes, planes)self.bn2 = norm_layer(planes)self.downsample = downsampleself.stride = stridedef forward(self, x: Tensor) -> Tensor:identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)if self.downsample is not None:identity = self.downsample(x)out += identityout = self.relu(out)return outclass Bottleneck(nn.Layer):# Bottleneck in torchvision places the stride for downsampling at 3x3 convolution(self.conv2)# while original implementation places the stride at the first 1x1 convolution(self.conv1)# according to "Deep residual learning for image recognition"https://arxiv.org/abs/1512.03385.# This variant is also known as ResNet V1.5 and improves accuracy according to# https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch.expansion: int = 4def __init__(self,inplanes: int,planes: int,stride: int=1,downsample: Optional[nn.Layer]=None,groups: int=1,base_width: int=64,dilation: int=1,norm_layer: Optional[Callable[..., nn.Layer]]=None) -> None:super(Bottleneck, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2Dwidth = int(planes * (base_width / 64.)) * groups# Both self.conv2 and self.downsample layers downsample the input when stride != 1self.conv1 = conv1x1(inplanes, width)self.bn1 = norm_layer(width)self.conv2 = conv3x3(width, width, stride, groups, dilation)self.bn2 = norm_layer(width)self.conv3 = conv1x1(width, planes * self.expansion)self.bn3 = norm_layer(planes * self.expansion)self.relu = ReLU(True)self.downsample = downsampleself.stride = stridedef forward(self, x: Tensor) -> Tensor:identity = xout = self.conv1(x)out = self.bn1(out)out = self.relu(out)out = self.conv2(out)out = self.bn2(out)out = self.relu(out)out = self.conv3(out)out = self.bn3(out)if self.downsample is not None:identity = self.downsample(x)out += identityout = self.relu(out)return outclass ResNet(nn.Layer):def __init__(self,block: Type[Union[BasicBlock, Bottleneck]],layers: List[int],num_classes: int=1000,zero_init_residual: bool=False,groups: int=1,width_per_group: int=64,replace_stride_with_dilation: Optional[List[bool]]=None,norm_layer: Optional[Callable[..., nn.Layer]]=None) -> None:super(ResNet, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2Dself._norm_layer = norm_layerself.inplanes = 64self.dilation = 1if replace_stride_with_dilation is None:# each element in the tuple indicates if we should replace# the 2x2 stride with a dilated convolution insteadreplace_stride_with_dilation = [False, False, Ture]if len(replace_stride_with_dilation) != 3:raise ValueError("replace_stride_with_dilation should be None ""or a 3-element tuple, got {}".format(replace_stride_with_dilation))self.groups = groupsself.base_width = width_per_groupself.conv1 = nn.Conv2D(3,self.inplanes,kernel_size=7,stride=2,padding=3,bias_attr=False)self.bn1 = norm_layer(self.inplanes)self.relu = ReLU(True)self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)self.layer1 = self._make_layer(block, 64, layers[0])self.layer2 = self._make_layer(block,128,layers[1],stride=2,dilate=replace_stride_with_dilation[0])self.layer3 = self._make_layer(block,256,layers[2],stride=2,dilate=replace_stride_with_dilation[1])self.layer4 = self._make_layer(block,512,layers[3],stride=2,dilate=replace_stride_with_dilation[2])self.avgpool = nn.AdaptiveAvgPool2D((1, 1))self.fc = nn.Linear(512 * block.expansion, num_classes)for m in self.sublayers():if isinstance(m, nn.Conv2D):kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')elif isinstance(m, (nn.BatchNorm2D, nn.GroupNorm)):constant_init_(m.weight, 1)constant_init_(m.bias, 0)# Zero-initialize the last BN in each residual branch,# so that the residual branch starts with zeros, and each residual block behaves like an identity.# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677if zero_init_residual:for m in self.sublayers():if isinstance(m, Bottleneck):constant_init_(m.bn3.weight,0)  # type: ignore[arg-type]elif isinstance(m, BasicBlock):constant_init_(m.bn2.weight,0)  # type: ignore[arg-type]def _make_layer(self,block: Type[Union[BasicBlock, Bottleneck]],planes: int,blocks: int,stride: int=1,dilate: bool=False) -> nn.Sequential:norm_layer = self._norm_layerdownsample = Noneprevious_dilation = self.dilationif dilate:self.dilation *= stridestride = 1if stride != 1 or self.inplanes != planes * block.expansion:downsample = nn.Sequential(conv1x1(self.inplanes, planes * block.expansion, stride),norm_layer(planes * block.expansion), )layers = []layers.append(block(self.inplanes, planes, stride, downsample, self.groups,self.base_width, previous_dilation, norm_layer))self.inplanes = planes * block.expansionfor _ in range(1, blocks):layers.append(block(self.inplanes,planes,groups=self.groups,base_width=self.base_width,dilation=self.dilation,norm_layer=norm_layer))return nn.Sequential(*layers)def _forward_impl(self, x: Tensor) -> Tensor:# See note [TorchScript super()]x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.maxpool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.avgpool(x)x = paddle.flatten(x, 1)x = self.fc(x)return xdef forward(self, x: Tensor) -> Tensor:return self._forward_impl(x)def _resnet(arch: str,block: Type[Union[BasicBlock, Bottleneck]],layers: List[int],pretrained: bool,**kwargs: Any) -> ResNet:model = ResNet(block, layers, **kwargs)if pretrained:state_dict = paddle.load(get_weights_path_from_url(model_urls[arch]))model.load_dict(state_dict)return modeldef resnet18(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNet-18 model from`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""return _resnet('resnet18', BasicBlock, [2, 2, 2, 2], pretrained, **kwargs)def resnet34(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNet-34 model from`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""return _resnet('resnet34', BasicBlock, [3, 4, 6, 3], pretrained, **kwargs)def resnet50(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNet-50 model from`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""return _resnet('resnet50', Bottleneck, [3, 4, 6, 3], pretrained, **kwargs)def resnet101(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNet-101 model from`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""return _resnet('resnet101', Bottleneck, [3, 4, 23, 3], pretrained, **kwargs)def resnet152(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNet-152 model from`"Deep Residual Learning for Image Recognition" <https://arxiv.org/pdf/1512.03385.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""return _resnet('resnet152', Bottleneck, [3, 8, 36, 3], pretrained, **kwargs)def resnext50_32x4d(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNeXt-50 32x4d model from`"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""kwargs['groups'] = 32kwargs['width_per_group'] = 4return _resnet('resnext50_32x4d', Bottleneck, [3, 4, 6, 3], pretrained,**kwargs)def resnext101_32x8d(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""ResNeXt-101 32x8d model from`"Aggregated Residual Transformation for Deep Neural Networks" <https://arxiv.org/pdf/1611.05431.pdf>`_.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""kwargs['groups'] = 32kwargs['width_per_group'] = 8return _resnet('resnext101_32x8d', Bottleneck, [3, 4, 23, 3], pretrained,**kwargs)def wide_resnet50_2(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""Wide ResNet-50-2 model from`"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_.The model is the same as ResNet except for the bottleneck number of channelswhich is twice larger in every block. The number of channels in outer 1x1convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048channels, and in Wide ResNet-50-2 has 2048-1024-2048.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""kwargs['width_per_group'] = 64 * 2return _resnet('wide_resnet50_2', Bottleneck, [3, 4, 6, 3], pretrained,**kwargs)def wide_resnet101_2(pretrained: bool=False, progress: bool=True,**kwargs: Any) -> ResNet:r"""Wide ResNet-101-2 model from`"Wide Residual Networks" <https://arxiv.org/pdf/1605.07146.pdf>`_.The model is the same as ResNet except for the bottleneck number of channelswhich is twice larger in every block. The number of channels in outer 1x1convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048channels, and in Wide ResNet-50-2 has 2048-1024-2048.Args:pretrained (bool): If True, returns a model pre-trained on ImageNetprogress (bool): If True, displays a progress bar of the download to stderr"""kwargs['width_per_group'] = 64 * 2return _resnet('wide_resnet101_2', Bottleneck, [3, 4, 23, 3], pretrained,**kwargs)

因为一度无法跟原torch的ResNet50对齐，还专门使用了手写版进行尝试。

# 改写resnet，这是手动改写版，最后还是用的x2paddle版本。
import paddle
import paddle.nn as nn
from typing import Type, Any, Callable, Union, List, Optionalfrom paddle import Tensor
from paddle.nn import functional as F
from typing import Tuple, Optional
from paddle.vision.models.resnet import BottleneckBlock as Bottleneck #实在不行，这里需要按照torch重写，或者用x2paddle的代码。# basicblock
class Identity(nn.Layer):def __init_(self):super().__init__()def forward(self, x):return xclass BasicBlock(nn.Layer):def __init__(self, in_dim, out_dim, stride):super().__init__()## 补充代码self.conv1 = nn.Conv2D(in_channels=in_dim, out_channels=out_dim,kernel_size=3,stride=stride,padding=1,bias_attr=False)self.bn1 = nn.BatchNorm2D(out_dim)self.relu = nn.ReLU()self.conv2 = nn.Conv2D(in_channels=out_dim, out_channels=out_dim,kernel_size=3,stride=1,padding=1,bias_attr=False)self.bn2 = nn.BatchNorm2D(out_dim)if stride==1 or in_dim!=out_dim :self.downsample = nn.Sequential(*[nn.Conv2D(in_dim,out_dim,1,stride=stride),nn.BatchNorm2D(out_dim)])else:self.downsample=Identity()def forward(self, x):## 补充代码h = xx = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.conv2(x)x = self.bn2(x)identity = self.downsample(h)x = x+identityx = self.relu(x)return x# Bottleneck = BottleneckBlock
class ResNet(nn.Layer):def __init__(self,block: Type[Union[BasicBlock, Bottleneck]],layers: List[int],num_classes: int = 1000,zero_init_residual: bool = False,groups: int = 1,width_per_group: int = 64,replace_stride_with_dilation: Optional[List[bool]] = None,norm_layer: Optional[Callable[..., nn.Layer]] = None) -> None:super(ResNet, self).__init__()if norm_layer is None:norm_layer = nn.BatchNorm2Dself._norm_layer = norm_layerself.inplanes = 64self.dilation = 1if replace_stride_with_dilation is None:# each element in the tuple indicates if we should replace# the 2x2 stride with a dilated convolution insteadreplace_stride_with_dilation = [False, False, False]if len(replace_stride_with_dilation) != 3:raise ValueError("replace_stride_with_dilation should be None ""or a 3-element tuple, got {}".format(replace_stride_with_dilation))self.groups = groupsself.base_width = width_per_group# self.conv1 = nn.Conv2D(in_channels=3, #     out_channels=in_dim,#     kernel_size=3,#     stride=1,#     padding=1,#     bias_attr=False#     )self.conv1 = nn.Conv2D(3, self.inplanes, kernel_size=7, stride=2, padding=3,bias_attr=False)self.bn1 = norm_layer(self.inplanes)# self.relu = nn.ReLU(inplace=True)self.relu = nn.ReLU()self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)self.layer1 = self._make_layer(block, 64, layers[0])self.layer2 = self._make_layer(block, 128, layers[1], stride=2,dilate=replace_stride_with_dilation[0])self.layer3 = self._make_layer(block, 256, layers[2], stride=2,dilate=replace_stride_with_dilation[1])self.layer4 = self._make_layer(block, 512, layers[3], stride=2,dilate=replace_stride_with_dilation[2])# self.avgpool = nn.AdaptiveAvgPool2d((1, 1))self.avgpool = nn.AdaptiveAvgPool2D(1)self.fc = nn.Linear(512 * block.expansion, num_classes)# 初始化先去掉# for m in self.children():#     if isinstance(m, nn.Conv2D):#         nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')#     elif isinstance(m, (nn.BatchNorm2D, nn.GroupNorm)):#         nn.init.constant_(m.weight, 1)#         nn.init.constant_(m.bias, 0)# Zero-initialize the last BN in each residual branch,# so that the residual branch starts with zeros, and each residual block behaves like an identity.# This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677# 初始化先去掉# if zero_init_residual:#     for m in self.modules():#         if isinstance(m, Bottleneck):#             nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]#         elif isinstance(m, BasicBlock):#             nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]def _make_layer(self, block: Type[Union[BasicBlock, Bottleneck]], planes: int, blocks: int,stride: int = 1, dilate: bool = False) -> nn.Sequential:norm_layer = self._norm_layerdownsample = Noneprevious_dilation = self.dilationif dilate:self.dilation *= stridestride = 1# if stride != 1 or self.inplanes != planes * block.expansion:#     downsample = nn.Sequential(#         conv1x1(self.inplanes, planes * block.expansion, stride),#         norm_layer(planes * block.expansion),#     )if stride != 1 or self.inplanes != planes * block.expansion:downsample = nn.Sequential(*[nn.Conv2D(self.inplanes,planes * block.expansion,1,stride=stride),nn.BatchNorm2D(planes * block.expansion)])layers = []layers.append(block(self.inplanes, planes, stride, downsample, self.groups,self.base_width, previous_dilation, norm_layer))self.inplanes = planes * block.expansionfor _ in range(1, blocks):layers.append(block(self.inplanes, planes, groups=self.groups,base_width=self.base_width, dilation=self.dilation,norm_layer=norm_layer))return nn.Sequential(*layers)def _forward_impl(self, x: Tensor) -> Tensor:# See note [TorchScript super()]x = self.conv1(x)x = self.bn1(x)x = self.relu(x)x = self.maxpool(x)x = self.layer1(x)x = self.layer2(x)x = self.layer3(x)x = self.layer4(x)x = self.avgpool(x)x = paddle.flatten(x, 1)x = self.fc(x)return xdef forward(self, x: Tensor) -> Tensor:return self._forward_impl(x)

1.6测试飞桨的ResNet50跟Pytroch对齐

调用完全相同的存档模型，比对输出，确认两者输出完全一致（精度误差低于万分一）。当然Pytorch的测试代码不能在AIStudio下使用，要使用本机或者其它AI平台。

首先跟Pytorch的ResNet模型进行输出shape比对，验证通过。其实这里有个坑，因为ResNet50的输出shape都是[batch_size, 1000],导致一开始模型内部并没有对齐，也蒙混过关了。后来发现问题后，再反过来调ResNet50的代码。

In [ ]

a = paddle.randn([2,3,224,224])
model = ResNet(block=Bottleneck, layers=[3, 4, 6, 3], replace_stride_with_dilation=[False, False, True])
tmp = model(a)
print(a.shape, tmp.shape)

跟Pytorch的ResNet模型进行结构对齐，验证通过。

一开始模型结构怎么也对不齐，最终发现是replace_stride_with_dilation参数没有设置的原因，论文代码中的replace_stride_with_dilation设置了跟默认不一样的参数，这个地方走了弯路，反查代码才找出来是参数问题。

In [ ]

# 模型结构对齐
import numpy as np
paddlemodel = ResNet(block=Bottleneck, layers=[3, 4, 6, 3], replace_stride_with_dilation=[False, False, True])
img = np.ones([1,3,224,224]).astype('float32')
img = paddle.to_tensor(img)
model = paddlemodel
paddle.summary(model, input=img)

测试resnet50前向对齐，同样的输入信息，输出out应该一致。验证通过。

In [ ]

# 测试resnet50对齐，测试通过！
import numpy as np
paddlemodel = ResNet(block=Bottleneck, layers=[3, 4, 6, 3], replace_stride_with_dilation=[False, False, True])
img = np.ones([1,3,224,224]).astype('float32')
img = paddle.to_tensor(img)
model = paddlemodelmodelpath = "work/resnet50.pdparams"
def copyStateDict(state_dict):if list(state_dict.keys())[0].startswith('module'):start_idx = 1else:start_idx = 0new_state_dict = OrderedDict()for k, v in state_dict.items():name = '.'.join(k.split('.')[start_idx:])new_state_dict[name] = vreturn new_state_dict# model.load_state_dict(copyStateDict(paddle.load(modelpath)))
model.set_state_dict(paddle.load(modelpath))# model.set_state_dict(para_state_dict)
model.eval()out = model(img)print (out)

1.7测试飞桨ResNet50Encoder跟Pytroch对齐

这里主要对比了输出的shape

In [ ]

import paddle
from paddle import nn
# from paddle.vision.models.resnet import BottleneckBlock
# from paddle.vision.models import ResNet
# from torchvision.models.resnet import ResNet, Bottleneckclass ResNet50Encoder(ResNet):def __init__(self, pretrained: bool = False):super().__init__(block=Bottleneck,layers=[3, 4, 6, 3],replace_stride_with_dilation=[False, False, True],norm_layer=None)if pretrained:# self.load_state_dict(torch.hub.load_state_dict_from_url(#     'https://download.pytorch.org/models/resnet50-0676ba61.pth'))load_weight = paddle.load("rvm_resnet50.pdparams")self.weight.set_value(load_weight)del self.avgpooldel self.fcdef forward_single_frame(self, x):x = self.conv1(x)x = self.bn1(x)x = self.relu(x)f1 = x  # 1/2x = self.maxpool(x)x = self.layer1(x)f2 = x  # 1/4x = self.layer2(x)f3 = x  # 1/8x = self.layer3(x)x = self.layer4(x)f4 = x  # 1/16return [f1, f2, f3, f4]def forward_time_series(self, x):B, T = x.shape[:2]features = self.forward_single_frame(x.flatten(0, 1))# print("==before unflatten features", B, T, len(features), features[0].shape, features[0].shape)# for i,j in enumerate(features):#     print(i,j.shape)# tmpshape = [B, T] + features[0].shape[1:]# features = [f.unflatten(0, (B, T)) for f in features]features = [f.reshape([B, T] + f.shape[1:]) for f in features]# print("==after unflatten features", len(features), features[0].shape, features[0].shape)return featuresdef forward(self, x):if x.ndim == 5:return self.forward_time_series(x)else:return self.forward_single_frame(x)
# a = paddle.randn((2, 3, 224, 224))
# testmodel = ResNet50Encoder()
# tmp = testmodel(a)
# print(len(tmp))

In [ ]

a = paddle.randn((2, 3, 3, 244, 244))
testmodel = ResNet50Encoder()
tmp = testmodel(a)
print(len(tmp))

In [ ]

for i in tmp :print(i.shape)

ResNet50Encoder验证通过，输出shape一致，期间碰到5d数据不通过，报错

/tmp/ipykernel_101/852320671.py in <listcomp>(.0)38         B, T = x.shape[:2]39         features = self.forward_single_frame(x.flatten(0, 1))
---> 40         features = [f.unflatten(0, (B, T)) for f in features]41         return features42
AttributeError: 'Tensor' object has no attribute 'unflatten'

现在改写了unflatten语句，5d也通过了

# features = [f.unflatten(0, (B, T)) for f in features]
features = [f.reshape([B, T] + f.shape[1:]) for f in features]

2.调通MattingNetwork网络

将论文中的MattingNetwork网络模块拆开，单独复现和测试。

2.1改写LRASPP

通过查看源代码，以及在Pytorch程序中设置断点输出，来拿到LRASPP的输入和输出数据的shape信息。然后进行比对。

f1, f2, f3, f4 = self.backbone(src_sm) torch.Size([2, 64, 112, 112]) torch.Size([2, 256, 56, 56]) torch.Size([2, 512, 28, 28]) torch.Size([2, 2048, 14, 14])aspp = LRASPP(960, 128)f4 = self.aspp(f4) torch.Size([2, 256, 14, 14])

In [ ]

class LRASPP(nn.Layer):def __init__(self, in_channels, out_channels):super().__init__()self.aspp1 = nn.Sequential(nn.Conv2D(in_channels, out_channels, 1, bias_attr=False),nn.BatchNorm2D(out_channels),nn.ReLU(True))self.aspp2 = nn.Sequential(nn.AdaptiveAvgPool2D(1),nn.Conv2D(in_channels, out_channels, 1, bias_attr=False),nn.Sigmoid())def forward_single_frame(self, x):return self.aspp1(x) * self.aspp2(x)def forward_time_series(self, x):B, T = x.shape[:2]# x = self.forward_single_frame(x.flatten(0, 1)).unflatten(0, (B, T))x = self.forward_single_frame(x.flatten(0, 1))x = x.reshape([B, T]+x.shape[1:])return xdef forward(self, x):# print("x.ndim =", x.ndim)if x.ndim == 5:# print("self.forward_time_series(x)", self.forward_time_series(x).shape)return self.forward_time_series(x)else:return self.forward_single_frame(x)

In [ ]

a = paddle.randn((2, 2048, 14, 14))
print(a.max())
testmodel = LRASPP(2048, 256)
tmp = testmodel(a)
print(len(tmp))
for i in tmp:print(i.shape)# print(i.max(2))

LRASPP完成

还是一样，一开始只完成4d数据，5d数据报错,主要是没有“unflatten” ，最后就是用reshape命令来实现了unflatten解决了问题。

2.2修改RecurrentDecoder和Projection

self.decoder = RecurrentDecoder([64, 256, 512, 256], [128, 64, 32, 16])

decoder的输入：f4 = self.aspp(f4) torch.Size([2, 256, 14, 14])

命令hid, *rec = self.decoder(src_sm, f1, f2, f3, f4, r1, r2, r3, r4)

输出大约是x0, r1, r2, r3, r4 torch.Size([2, 16, 224, 224]) torch.Size([2, 16, 112, 112]) torch.Size([2, 32, 56, 56]) torch.Size([2, 64, 28, 28]) torch.Size([2, 128, 14, 14])

In [ ]

class RecurrentDecoder(nn.Layer):def __init__(self, feature_channels, decoder_channels):super().__init__()self.avgpool = AvgPool()self.decode4 = BottleneckBlock(feature_channels[3])self.decode3 = UpsamplingBlock(feature_channels[3], feature_channels[2], 3, decoder_channels[0])self.decode2 = UpsamplingBlock(decoder_channels[0], feature_channels[1], 3, decoder_channels[1])self.decode1 = UpsamplingBlock(decoder_channels[1], feature_channels[0], 3, decoder_channels[2])self.decode0 = OutputBlock(decoder_channels[2], 3, decoder_channels[3])def forward(self,s0: Tensor, f1: Tensor, f2: Tensor, f3: Tensor, f4: Tensor,r1: Optional[Tensor], r2: Optional[Tensor],r3: Optional[Tensor], r4: Optional[Tensor]):s1, s2, s3 = self.avgpool(s0)x4, r4 = self.decode4(f4, r4)x3, r3 = self.decode3(x4, f3, s3, r3)x2, r2 = self.decode2(x3, f2, s2, r2)x1, r1 = self.decode1(x2, f1, s1, r1)x0 = self.decode0(x1, s0)# print("x0, r1, r2, r3, r4", x0.shape, r1.shape,r2.shape,r3.shape, r4.shape)return x0, r1, r2, r3, r4class AvgPool(nn.Layer):def __init__(self):super().__init__()self.avgpool = nn.AvgPool2D(2, 2, exclusive=False, ceil_mode=True)  # count_include_pad exclusivedef forward_single_frame(self, s0):s1 = self.avgpool(s0)s2 = self.avgpool(s1)s3 = self.avgpool(s2)return s1, s2, s3def forward_time_series(self, s0):B, T = s0.shape[:2]s0 = s0.flatten(0, 1)s1, s2, s3 = self.forward_single_frame(s0)# s1 = s1.unflatten(0, (B, T))# s2 = s2.unflatten(0, (B, T))# s3 = s3.unflatten(0, (B, T))s1 = s1.reshape([B, T] + s1.shape[1:])s2 = s2.reshape([B, T] + s2.shape[1:])s3 = s3.reshape([B, T] + s3.shape[1:])return s1, s2, s3def forward(self, s0):if s0.ndim == 5:return self.forward_time_series(s0)else:return self.forward_single_frame(s0)class BottleneckBlock(nn.Layer):def __init__(self, channels):super().__init__()self.channels = channelsself.gru = ConvGRU(channels // 2)def forward(self, x, r: Optional[Tensor]):# print("a, b = x.split(self.channels // 2, axis=-3)", self.channels // 2)a = []# a, b = x.split(self.channels // 2, axis=-3) # dima, b = x.split(2, axis=-3) # print(len(a))b, r = self.gru(b, r)x = paddle.concat([a, b], axis=-3) # cat concatreturn x, rclass UpsamplingBlock(nn.Layer):def __init__(self, in_channels, skip_channels, src_channels, out_channels):super().__init__()self.out_channels = out_channelsself.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)self.conv = nn.Sequential(nn.Conv2D(in_channels + skip_channels + src_channels, out_channels, 3, 1, 1, bias_attr=False),nn.BatchNorm2D(out_channels),nn.ReLU(),)self.gru = ConvGRU(out_channels // 2)def forward_single_frame(self, x, f, s, r: Optional[Tensor]):x = self.upsample(x)x = x[:, :, :s.shape[2], :s.shape[3]]x = paddle.concat([x, f, s], axis=1) #torch.cat paddle.concat dim - axisx = self.conv(x)# a, b = x.split(self.out_channels // 2, axis=1)a, b = x.split(2, axis=1)b, r = self.gru(b, r)x = paddle.concat([a, b], axis=1)return x, rdef forward_time_series(self, x, f, s, r: Optional[Tensor]):B, T, _, H, W = s.shapex = x.flatten(0, 1)f = f.flatten(0, 1)s = s.flatten(0, 1)x = self.upsample(x)x = x[:, :, :H, :W]x = paddle.concat([x, f, s], axis=1)x = self.conv(x)# x = x.unflatten(0, (B, T))x = x.reshape([B, T] + x.shape[1:])# a, b = x.split(self.out_channels // 2, axis=2)a, b = x.split(2, axis=2)b, r = self.gru(b, r)x = paddle.concat([a, b], axis=2)return x, rdef forward(self, x, f, s, r: Optional[Tensor]):if x.ndim == 5:return self.forward_time_series(x, f, s, r)else:return self.forward_single_frame(x, f, s, r)class OutputBlock(nn.Layer):def __init__(self, in_channels, src_channels, out_channels):super().__init__()self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=False)self.conv = nn.Sequential(nn.Conv2D(in_channels + src_channels, out_channels, 3, 1, 1, bias_attr=False),nn.BatchNorm2D(out_channels),nn.ReLU(),nn.Conv2D(out_channels, out_channels, 3, 1, 1, bias_attr=False),nn.BatchNorm2D(out_channels),nn.ReLU(),)def forward_single_frame(self, x, s):x = self.upsample(x)x = x[:, :, :s.shape[2], :s.shape[3]]x = paddle.concat([x, s], axis=1)x = self.conv(x)return xdef forward_time_series(self, x, s):B, T, _, H, W = s.shapex = x.flatten(0, 1)s = s.flatten(0, 1)x = self.upsample(x)x = x[:, :, :H, :W]x = paddle.concat([x, s], axis=1)x = self.conv(x)# x = x.unflatten(0, (B, T))x = x.reshape([B, T] + x.shape[1:])return xdef forward(self, x, s):if x.ndim == 5:return self.forward_time_series(x, s)else:return self.forward_single_frame(x, s)class ConvGRU(nn.Layer):def __init__(self,channels: int,kernel_size: int = 3,padding: int = 1):super().__init__()self.channels = channelsself.ih = nn.Sequential(nn.Conv2D(channels * 2, channels * 2, kernel_size, padding=padding),nn.Sigmoid())self.hh = nn.Sequential(nn.Conv2D(channels * 2, channels, kernel_size, padding=padding),nn.Tanh())def forward_single_frame(self, x, h):# print("forward_single_frame split(self.channels, axis=1)", self.channels)# r, z = self.ih(paddle.concat([x, h], axis=1)).split(self.channels, axis=1)r, z = self.ih(paddle.concat([x, h], axis=1)).split(2, axis=1)# print(r,z)c = self.hh(paddle.concat([x, r * h], axis=1))h = (1 - z) * h + z * creturn h, hdef forward_time_series(self, x, h):o = []for xt in x.unbind(axis=1): # dim to axisot, h = self.forward_single_frame(xt, h)o.append(ot)o = paddle.stack(o, axis=1) # torch.stack dim-axisreturn o, hdef forward(self, x, h: Optional[Tensor]):if h is None:h = paddle.zeros((x.shape[0], x.shape[-3], x.shape[-2], x.shape[-1]),dtype=x.dtype)if x.ndim == 5:return self.forward_time_series(x, h)else:return self.forward_single_frame(x, h)class Projection(nn.Layer):def __init__(self, in_channels, out_channels):super().__init__()self.conv = nn.Conv2D(in_channels, out_channels, 1)def forward_single_frame(self, x):return self.conv(x)def forward_time_series(self, x):B, T = x.shape[:2]# return self.conv(x.flatten(0, 1)).unflatten(0, (B, T))x = self.conv(x.flatten(0, 1))return x.reshape([B, T] + x.shape[1:])def forward(self, x):if x.ndim == 5:return self.forward_time_series(x)else:return self.forward_single_frame(x)

In [ ]

# 需要看看RecuRecurrentDecoder的输入信息，以便验证
RecurrentDecoder([64, 256, 512, 256], [128, 64, 32, 16])s0 = paddle.randn([2, 2, 3, 224, 224])
f1=paddle.randn([2, 2, 64, 112, 112])
f2=paddle.randn([2, 2, 256, 56, 56])
f3=paddle.randn([2, 2, 512, 28, 28])
f4=paddle.randn([2, 2, 256, 14, 14])
r1=r2=r3=r4=Nonetestmodel = RecurrentDecoder([64, 256, 512, 256], [128, 64, 32, 16])
tmp = testmodel(s0, f1, f2, f3, f4, r1, r2, r3, r4)
print(len(tmp))
for i in tmp:print(i.shape)

RecurrentDecoder验证通过，跟torch比较，shape对齐

2.3开始FastGuidedFilterRefiner

这个在torch里面好像没有涉及？

后来才明白，是因为程序中有默认参数，默认不用FastGuidedFilterRefiner。

In [ ]

"""
Adopted from <https://github.com/wuhuikai/DeepGuidedFilter/>
"""class FastGuidedFilterRefiner(nn.Layer):def __init__(self, *args, **kwargs):super().__init__()self.guilded_filter = FastGuidedFilter(1)def forward_single_frame(self, fine_src, base_src, base_fgr, base_pha):fine_src_gray = fine_src.mean(1, keepdim=True)base_src_gray = base_src.mean(1, keepdim=True)fgr, pha = self.guilded_filter(# torch.cat([base_src, base_src_gray], dim=1),# torch.cat([base_fgr, base_pha], dim=1),# torch.cat([fine_src, fine_src_gray], dim=1)).split([3, 1], dim=1) paddle.concat([base_src, base_src_gray], axis=1),torch.cat([base_fgr, base_pha], dim=1),torch.cat([fine_src, fine_src_gray], dim=1)).split([3, 1], dim=1)
#         print("FastGuidedFilterRefiner forward_single_frame fgr, pha", fgr.shape, pha.shape)return fgr, phadef forward_time_series(self, fine_src, base_src, base_fgr, base_pha):
#         print("==FastGuidedFilterRefiner fine_src, base_src, base_fgr, base_pha", fine_src, base_src, base_fgr, base_pha)B, T = fine_src.shape[:2]fgr, pha = self.forward_single_frame(fine_src.flatten(0, 1),base_src.flatten(0, 1),base_fgr.flatten(0, 1),base_pha.flatten(0, 1))# fgr = fgr.unflatten(0, (B, T))fgr = fgr.reshape([B, T] + fgr.shape[1:])# pha = pha.unflatten(0, (B, T))pha = fgr.reshape([B, T] + pha.shape[1:])
#         print("FastGuidedFilterRefiner forward_time_series fgr, pha", fgr.shape, pha.shape)return fgr, phadef forward(self, fine_src, base_src, base_fgr, base_pha, base_hid):# print("fine_src.ndim=", fine_src.ndim)if fine_src.ndim == 5:return self.forward_time_series(fine_src, base_src, base_fgr, base_pha)else:return self.forward_single_frame(fine_src, base_src, base_fgr, base_pha)class FastGuidedFilter(nn.Layer):def __init__(self, r: int, eps: float = 1e-5):super().__init__()self.r = rself.eps = epsself.boxfilter = BoxFilter(r)def forward(self, lr_x, lr_y, hr_x):mean_x = self.boxfilter(lr_x)mean_y = self.boxfilter(lr_y)cov_xy = self.boxfilter(lr_x * lr_y) - mean_x * mean_yvar_x = self.boxfilter(lr_x * lr_x) - mean_x * mean_xA = cov_xy / (var_x + self.eps)b = mean_y - A * mean_xA = F.interpolate(A, hr_x.shape[2:], mode='bilinear', align_corners=False)b = F.interpolate(b, hr_x.shape[2:], mode='bilinear', align_corners=False)return A * hr_x + bclass BoxFilter(nn.Layer):def __init__(self, r):super(BoxFilter, self).__init__()self.r = rdef forward(self, x):# Note: The original implementation at <https://github.com/wuhuikai/DeepGuidedFilter/>#       uses faster box blur. However, it may not be friendly for ONNX export.#       We are switching to use simple convolution for box blur.kernel_size = 2 * self.r + 1# kernel_x = torch.full((x.data.shape[1], 1, 1, kernel_size), 1 / kernel_size, device=x.device, dtype=x.dtype)# kernel_y = torch.full((x.data.shape[1], 1, kernel_size, 1), 1 / kernel_size, device=x.device, dtype=x.dtype)kernel_x = paddle.full((x.shape[1], 1, 1, kernel_size), 1 / kernel_size, dtype=x.dtype)kernel_y = paddle.full((x.shape[1], 1, kernel_size, 1), 1 / kernel_size, dtype=x.dtype)x = F.conv2d(x, kernel_x, padding=(0, self.r), groups=x.shape[1])x = F.conv2d(x, kernel_y, padding=(self.r, 0), groups=x.shape[1])
#         print(x.shape)return x

2.4改写DeepGuidedFilterRefiner

In [ ]

"""
Adopted from <https://github.com/wuhuikai/DeepGuidedFilter/>
"""class DeepGuidedFilterRefiner(nn.Layer):def __init__(self, hid_channels=16):super().__init__()self.box_filter = nn.Conv2D(4, 4, kernel_size=3, padding=1, bias_attr=False, groups=4) # 修改bisa# print("box_filter", type(self.box_filter), self.box_filter.weight)# self.box_filter.weight.data[...] = 1 / 9# self.box_filter.weight =1/9# x = paddle.to_tensor(1/9, dtype="float32")# print("==self.box_filter.weight.shape", self.box_filter.weight.shape)x = paddle.full(self.box_filter.weight.shape, 1/9, dtype="float32") # shape=[4,1,3,1,]self.box_filter.weight = paddle.create_parameter(shape=x.shape,dtype=str(x.numpy().dtype),default_initializer=paddle.nn.initializer.Assign(x))self.conv = nn.Sequential(nn.Conv2D(4 * 2 + hid_channels, hid_channels, kernel_size=1, bias_attr=False),nn.BatchNorm2D(hid_channels),nn.ReLU(True),nn.Conv2D(hid_channels, hid_channels, kernel_size=1, bias_attr=False),nn.BatchNorm2D(hid_channels),nn.ReLU(True),nn.Conv2D(hid_channels, 4, kernel_size=1, bias_attr=True))def forward_single_frame(self, fine_src, base_src, base_fgr, base_pha, base_hid):# fine_x = torch.cat([fine_src, fine_src.mean(1, keepdim=True)], dim=1) # axis# base_x = torch.cat([base_src, base_src.mean(1, keepdim=True)], dim=1)# base_y = torch.cat([base_fgr, base_pha], dim=1)fine_x = paddle.concat([fine_src, fine_src.mean(1, keepdim=True)], axis=1) # axisbase_x = paddle.concat([base_src, base_src.mean(1, keepdim=True)], axis=1)base_y = paddle.concat([base_fgr, base_pha], axis=1)mean_x = self.box_filter(base_x)mean_y = self.box_filter(base_y)cov_xy = self.box_filter(base_x * base_y) - mean_x * mean_yvar_x  = self.box_filter(base_x * base_x) - mean_x * mean_x# A = self.conv(torch.cat([cov_xy, var_x, base_hid], dim=1))A = self.conv(paddle.concat([cov_xy, var_x, base_hid], axis=1))b = mean_y - A * mean_xH, W = fine_src.shape[2:]A = F.interpolate(A, (H, W), mode='bilinear', align_corners=False)b = F.interpolate(b, (H, W), mode='bilinear', align_corners=False)out = A * fine_x + b# fgr, pha = out.split([3, 1], dim=1) fgr = out[:, :3, ]pha = out[:, 3:, ]
#         print("DeepGuidedFilterRefiner forward_single_frame fgr, pha", fgr.shape, pha.shape)return fgr, phadef forward_time_series(self, fine_src, base_src, base_fgr, base_pha, base_hid):B, T = fine_src.shape[:2]fgr, pha = self.forward_single_frame(fine_src.flatten(0, 1),base_src.flatten(0, 1),base_fgr.flatten(0, 1),base_pha.flatten(0, 1),base_hid.flatten(0, 1))# fgr = fgr.unflatten(0, (B, T))fgr = fgr.reshape([B, T] + fgr.shape[1:])# pha = pha.unflatten(0, (B, T))pha = pha.reshape([B, T] + pha.shape[1:])
#         print("DeepGuidedFilterRefiner forward_time_series fgr, pha", fgr.shape, pha.shape)return fgr, phadef forward(self, fine_src, base_src, base_fgr, base_pha, base_hid):if fine_src.ndim == 5:# print("if fine_src.ndim == 5:")return self.forward_time_series(fine_src, base_src, base_fgr, base_pha, base_hid)else:# print("if fine_src.ndim != 5:")return self.forward_single_frame(fine_src, base_src, base_fgr, base_pha, base_hid)

2.5开始最后的冲刺MattingNetwork

到了这一步，就感觉见到了曙光，有了冲刺的感觉！因为模型的各组成部分已经复现完毕，就等最终一战了！

In [ ]

class MattingNetwork(nn.Layer):def __init__(self,variant: str = 'mobilenetv3',refiner: str = 'deep_guided_filter',pretrained_backbone: bool = False):super().__init__()assert variant in ['mobilenetv3', 'resnet50']assert refiner in ['fast_guided_filter', 'deep_guided_filter']
#         print(variant, refiner)if variant == 'mobilenetv3':self.backbone = MobileNetV3LargeEncoder(pretrained_backbone)self.aspp = LRASPP(960, 128)self.decoder = RecurrentDecoder([16, 24, 40, 128], [80, 40, 32, 16])else:self.backbone = ResNet50Encoder(pretrained_backbone)self.aspp = LRASPP(2048, 256)self.decoder = RecurrentDecoder([64, 256, 512, 256], [128, 64, 32, 16])self.project_mat = Projection(16, 4)self.project_seg = Projection(16, 1)if refiner == 'deep_guided_filter':self.refiner = DeepGuidedFilterRefiner()else:self.refiner = FastGuidedFilterRefiner()def forward(self,src: Tensor,r1: Optional[Tensor] = None,r2: Optional[Tensor] = None,r3: Optional[Tensor] = None,r4: Optional[Tensor] = None,downsample_ratio: float = 1,segmentation_pass: bool = False):if downsample_ratio != 1:src_sm = self._interpolate(src, scale_factor=downsample_ratio)else:src_sm = src
#         print("src_sm=", src_sm.shape)f1, f2, f3, f4 = self.backbone(src_sm)
#         print("====f1, f2, f3, f4 = self.backbone(src_sm)", f1.shape, f2.shape, f3.shape, f4.shape)f4 = self.aspp(f4)
#         print("f4 = self.aspp(f4)", f4.shape)hid, *rec = self.decoder(src_sm, f1, f2, f3, f4, r1, r2, r3, r4)if not segmentation_pass:# fgr_residual, pha = self.project_mat(hid).split([3, 1], dim=-3)
#             print("self.project_mat(hid).split", self.project_mat(hid).shape)fgr_residual, pha = self.project_mat(hid).split([3, 1], axis=-3)# fgr_residual = self.project_mat(hid)[:, :3, ]# pha = self.project_mat(hid)[:, 3:, ]if downsample_ratio != 1:fgr_residual, pha = self.refiner(src, src_sm, fgr_residual, pha, hid)fgr = fgr_residual + srcfgr = fgr.clip(0., 1.)pha = pha.clip(0., 1.)return [fgr, pha, *rec]else:seg = self.project_seg(hid)return [seg, *rec]def _interpolate(self, x: Tensor, scale_factor: float):if x.ndim == 5:B, T = x.shape[:2]# x = F.interpolate(x.flatten(0, 1), scale_factor=scale_factor,#     mode='bilinear', align_corners=False, recompute_scale_factor=False)x = F.interpolate(x.flatten(0, 1), scale_factor=scale_factor,mode='bilinear', align_corners=False)# x = x.unflatten(0, (B, T))x = x.reshape([B, T] + x.shape[1:])else:# x = F.interpolate(x, scale_factor=scale_factor,#     mode='bilinear', align_corners=False, recompute_scale_factor=False)x = F.interpolate(x, scale_factor=scale_factor,mode='bilinear', align_corners=False)return x

MattingNetwork模型输出比对，并通过。

在本次复现过程中，其实是在这一步发现模型结构没有对齐，因为调用X2Paddle转换的模型，报错，并且模型的前向计算跟torch不一致。于是才发现模型结构不一致，然后反推往回查，发现ResNet50模型没有对齐。

In [ ]

# 验证通过
model = MattingNetwork('resnet50')
# a = paddle.randn((2, 24, 3, 224, 224))
import numpy as np
np.random.seed(1)
a = np.random.randn(3,3,244,244).astype('float32')
a = paddle.to_tensor(a)
tmp = model(a)
print(len(tmp))
for i in tmp:print(i.max())

查看模型结构，跟原论文结构对比，花了很大精力才把结构对齐。

In [ ]

model.parameters

3.拆解推理模块

3.1飞桨实现PIL to image

In [ ]

# 飞桨实现PIL to image
import paddle
import PIL
import numbers
import numpy as np
from PIL import Image
from paddle.vision.transforms import BaseTransform
from paddle.vision.transforms import functional as Fclass ToPILImage(BaseTransform):def __init__(self, mode=None, keys=None):super(ToTensor, self).__init__(keys)def _apply_image(self, pic):"""Args:pic (Tensor|np.ndarray): Image to be converted to PIL Image.Returns:PIL: Converted image."""if not (isinstance(pic, paddle.Tensor) or isinstance(pic, np.ndarray)):raise TypeError('pic should be Tensor or ndarray. Got {}.'.format(type(pic)))elif isinstance(pic, paddle.Tensor):if pic.ndimension() not in {2, 3}:raise ValueError('pic should be 2/3 dimensional. Got {} dimensions.'.format(pic.ndimension()))elif pic.ndimension() == 2:# if 2D image, add channel dimension (CHW)pic = pic.unsqueeze(0)elif isinstance(pic, np.ndarray):if pic.ndim not in {2, 3}:raise ValueError('pic should be 2/3 dimensional. Got {} dimensions.'.format(pic.ndim))elif pic.ndim == 2:# if 2D image, add channel dimension (HWC)pic = np.expand_dims(pic, 2)npimg = picif isinstance(pic, paddle.Tensor) and "float" in str(pic.numpy().dtype) and mode != 'F':pic = pic.mul(255).byte()if isinstance(pic, paddle.Tensor):npimg = np.transpose(pic.numpy(), (1, 2, 0))if not isinstance(npimg, np.ndarray):raise TypeError('Input pic must be a paddle.Tensor or NumPy ndarray, ' +'not {}'.format(type(npimg)))if npimg.shape[2] == 1:expected_mode = Nonenpimg = npimg[:, :, 0]if npimg.dtype == np.uint8:expected_mode = 'L'elif npimg.dtype == np.int16:expected_mode = 'I;16'elif npimg.dtype == np.int32:expected_mode = 'I'elif npimg.dtype == np.float32:expected_mode = 'F'if mode is not None and mode != expected_mode:raise ValueError("Incorrect mode ({}) supplied for input type {}. Should be {}".format(mode, np.dtype, expected_mode))mode = expected_modeelif npimg.shape[2] == 2:permitted_2_channel_modes = ['LA']if mode is not None and mode not in permitted_2_channel_modes:raise ValueError("Only modes {} are supported for 2D inputs".format(permitted_2_channel_modes))if mode is None and npimg.dtype == np.uint8:mode = 'LA'elif npimg.shape[2] == 4:permitted_4_channel_modes = ['RGBA', 'CMYK', 'RGBX']if mode is not None and mode not in permitted_4_channel_modes:raise ValueError("Only modes {} are supported for 4D inputs".format(permitted_4_channel_modes))if mode is None and npimg.dtype == np.uint8:mode = 'RGBA'else:permitted_3_channel_modes = ['RGB', 'YCbCr', 'HSV']if mode is not None and mode not in permitted_3_channel_modes:raise ValueError("Only modes {} are supported for 3D inputs".format(permitted_3_channel_modes))if mode is None and npimg.dtype == np.uint8:mode = 'RGB'if mode is None:raise TypeError('Input type {} is not supported'.format(npimg.dtype))return Image.fromarray(npimg, mode=mode)

3.2 推理工具

In [ ]

# RobustVideoMatting/inference_utils.py
# 后面会用到这四个函数：from inference_utils import VideoReader, VideoWriter, ImageSequenceReader, ImageSequenceWriter
import av
import os
import pims
import numpy as np
# from torch.utils.data import Dataset
from paddle.io import Dataset # 据说这个跟torch功能一致
# from torchvision.transforms.functional import to_pil_image
# from paddle.vision.transforms.functional import to_pil_image
to_pil_image = ToPILImage
from PIL import Image# @property创建只读属性class VideoReader(Dataset):def __init__(self, path, transform=None):self.video = pims.PyAVVideoReader(path)self.rate = self.video.frame_rateself.transform = transform@propertydef frame_rate(self):return self.ratedef __len__(self):return len(self.video)def __getitem__(self, idx):frame = self.video[idx]frame = Image.fromarray(np.asarray(frame))if self.transform is not None:frame = self.transform(frame)return frameclass VideoWriter:def __init__(self, path, frame_rate, bit_rate=1000000):self.container = av.open(path, mode='w')self.stream = self.container.add_stream('h264', rate=round(frame_rate))self.stream.pix_fmt = 'yuv420p'self.stream.bit_rate = bit_ratedef write(self, frames):# frames: [T, C, H, W]
#         print("==frames: [T, C, H, W]", frames.shape, frames[0,0,0,0])self.stream.width = frames.shape[3] #shape size(3)self.stream.height = frames.shape[2]if frames.shape[1] == 1:
#             print("==write frames before repeat", frames.shape)
#             frames = frames.repeat(1, 3, 1, 1) # convert grayscale to RGB repeat对应飞桨什么呢？frames = frames.tile([1, 3, 1, 1])
#             print("==write frames after repeat", frames.shape)# 拆分下面的长句，以便单步执行和代码替换x=frames*255
#         print("==x=frames*255", x[0,0,0,0])x=x.transpose([0,2,3,1])x=x.astype('uint8')
#         print("==x.astype('uint8')", x)# print("==write", x.shape)x=x.numpy()# frames = frames.mul(255).byte().cpu().permute(0, 2, 3, 1).numpy()frames = x
#         print("==frames", frames.shape, frames[0,0,0,0])for t in range(frames.shape[0]):frame = frames[t]
#             print('=frame', frame.shape, type(frame), frame)frame = av.VideoFrame.from_ndarray(frame, format='rgb24')self.container.mux(self.stream.encode(frame))def close(self):self.container.mux(self.stream.encode())self.container.close()class ImageSequenceReader(Dataset):def __init__(self, path, transform=None):self.path = pathself.files = sorted(os.listdir(path))self.transform = transformdef __len__(self):return len(self.files)def __getitem__(self, idx):with Image.open(os.path.join(self.path, self.files[idx])) as img:img.load()if self.transform is not None:return self.transform(img)return imgclass ImageSequenceWriter:def __init__(self, path, extension='jpg'):self.path = pathself.extension = extensionself.counter = 0os.makedirs(path, exist_ok=True)def write(self, frames):# frames: [T, C, H, W]for t in range(frames.shape[0]):to_pil_image(frames[t]).save(os.path.join(self.path, str(self.counter).zfill(4) + '.' + self.extension))self.counter += 1def close(self):pass

3.3 复现推理主文件

In [ ]

"""
python inference.py \--variant mobilenetv3 \--checkpoint "CHECKPOINT" \--device cuda \--input-source "input.mp4" \--output-type video \--output-composition "composition.mp4" \--output-alpha "alpha.mp4" \--output-foreground "foreground.mp4" \--output-video-mbps 4 \--seq-chunk 1
"""import numpy as npimport paddle
import paddle.nn as nn
import paddle.nn.functional as F
from paddle.io import Dataset, BatchSampler, DataLoaderimport os
# from paddle.vision.transforms import functional as F
from paddle.vision import transforms
from typing import Optional, Tuple
from tqdm.auto import tqdm# from inference_utils import VideoReader, VideoWriter, ImageSequenceReader, ImageSequenceWriterdef convert_video(model,input_source: str,input_resize: Optional[Tuple[int, int]] = None,downsample_ratio: Optional[float] = None,output_type: str = 'video',output_composition: Optional[str] = None,output_alpha: Optional[str] = None,output_foreground: Optional[str] = None,output_video_mbps: Optional[float] = None,seq_chunk: int = 1,num_workers: int = 0,progress: bool = True,device: Optional[str] = None,dtype: Optional[paddle.dtype] = None): # torch.dtype"""Args:input_source:A video file, or an image sequence directory. Images must be sorted in accending order, support png and jpg.input_resize: If provided, the input are first resized to (w, h).downsample_ratio: The model's downsample_ratio hyperparameter. If not provided, model automatically set one.output_type: Options: ["video", "png_sequence"].output_composition:The composition output path. File path if output_type == 'video'. Directory path if output_type == 'png_sequence'.If output_type == 'video', the composition has green screen background.If output_type == 'png_sequence'. the composition is RGBA png images.output_alpha: The alpha output from the model.output_foreground: The foreground output from the model.seq_chunk: Number of frames to process at once. Increase it for better parallelism.num_workers: PyTorch's DataLoader workers. Only use >0 for image input.progress: Show progress bar.device: Only need to manually provide if model is a TorchScript freezed model.dtype: Only need to manually provide if model is a TorchScript freezed model."""assert downsample_ratio is None or (downsample_ratio > 0 and downsample_ratio <= 1), 'Downsample ratio must be between 0 (exclusive) and 1 (inclusive).'assert any([output_composition, output_alpha, output_foreground]), 'Must provide at least one output.'assert output_type in ['video', 'png_sequence'], 'Only support "video" and "png_sequence" output modes.'assert seq_chunk >= 1, 'Sequence chunk must be >= 1'assert num_workers >= 0, 'Number of workers must be >= 0'# Initialize transformif input_resize is not None:transform = transforms.Compose([transforms.Resize(input_resize[::-1]),transforms.ToTensor()])else:transform = transforms.ToTensor()# Initialize readerif os.path.isfile(input_source):source = VideoReader(input_source, transform)else:source = ImageSequenceReader(input_source, transform)
#     print("source.shape", source.shape)# reader = DataLoader(source, batch_size=seq_chunk, pin_memory=True, num_workers=num_workers)reader = DataLoader(source, batch_size=seq_chunk, num_workers=num_workers)# Initialize writersif output_type == 'video':frame_rate = source.frame_rate if isinstance(source, VideoReader) else 30output_video_mbps = 1 if output_video_mbps is None else output_video_mbpsif output_composition is not None:writer_com = VideoWriter(path=output_composition,frame_rate=frame_rate,bit_rate=int(output_video_mbps * 1000000))if output_alpha is not None:writer_pha = VideoWriter(path=output_alpha,frame_rate=frame_rate,bit_rate=int(output_video_mbps * 1000000))if output_foreground is not None:writer_fgr = VideoWriter(path=output_foreground,frame_rate=frame_rate,bit_rate=int(output_video_mbps * 1000000))else:if output_composition is not None:writer_com = ImageSequenceWriter(output_composition, 'png')if output_alpha is not None:writer_pha = ImageSequenceWriter(output_alpha, 'png')if output_foreground is not None:writer_fgr = ImageSequenceWriter(output_foreground, 'png')# Inference# model = model.eval() model.eval()# if device is None or dtype is None: # 先暂时屏蔽看看#     param = next(model.parameters())#     dtype = param.dtype#     device = param.deviceif (output_composition is not None) and (output_type == 'video'):# bgr = torch.tensor([120, 255, 155], device=device, dtype=dtype).div(255).view(1, 1, 3, 1, 1)bgr = (paddle.to_tensor([120, 255, 155], dtype="float32") / paddle.to_tensor(255.0, dtype='float32')).reshape([1,1,3,1,1])
#         print ("==bgr", bgr.shape, bgr[0,0,0])try:with paddle.no_grad():# bar = tqdm(total=len(source), disable=not progress, dynamic_ncols=True)bar = tqdm(total=len(source))rec = [None] * 4for src in reader:if downsample_ratio is None:downsample_ratio = auto_downsample_ratio(*src.shape[2:])# src = src.to(device, dtype, non_blocking=True).unsqueeze(0) # [B, T, C, H, W]src =src.unsqueeze(0)fgr, pha, *rec = model(src, *rec, downsample_ratio)if output_foreground is not None:writer_fgr.write(fgr[0])if output_alpha is not None:writer_pha.write(pha[0])if output_composition is not None:if output_type == 'video':com = fgr * pha + bgr * (1 - pha)else:fgr = fgr * pha.gt(0)# com = torch.cat([fgr, pha], dim=-3)com = paddle.concat([fgr, pha], axis=-3)writer_com.write(com[0])#                 bar.update(src.size(1))bar.update(src.shape[1])finally:# Clean upif output_composition is not None:writer_com.close()if output_alpha is not None:writer_pha.close()if output_foreground is not None:writer_fgr.close()def auto_downsample_ratio(h, w):"""Automatically find a downsample ratio so that the largest side of the resolution be 512px."""return min(512 / max(h, w), 1)class Converter:def __init__(self, variant: str, checkpoint: str, device: str):self.model = MattingNetwork(variant).eval().to(device)self.model.load_state_dict(torch.load(checkpoint, map_location=device))self.model = torch.jit.script(self.model)self.model = torch.jit.freeze(self.model)self.device = devicedef convert(self, *args, **kwargs):convert_video(self.model, device=self.device, dtype=torch.float32, *args, **kwargs)

3.4 pytorch权重转paddlepaddle权重

因为AIStudio下无法执行，所以这里将其设为MarkDown 代码模式，在启智AI智能平台或其它环境下，将其设置为code格式即可。

import paddle
import torch
def export_weight_names(net):print(net.state_dict().keys())with open('paddle.txt', 'w') as f:for key in net.state_dict().keys():f.write(key + '\n')# pytorch权重转paddlepaddle权重
def transfer(paddlemodel=None):
#     res2net_paddle_implement = paddle.vision.models.resnet50(pretrained=False)res2net_paddle_implement =paddlemodelexport_weight_names(res2net_paddle_implement)  # 将自己paddle模型的keys存为txtpaddle_list = open('paddle.txt')  # paddle的keys
#     state_dict = torch.load('resnet50-0676ba61.pth')state_dict = torch.load("rvm_resnet50.pth")paddle_state_dict = OrderedDict()paddle_list = paddle_list.readlines()torch_list = state_dict.keys()for p in paddle_list:p = p.strip()t = pif "mean" in p:t = p.replace("_mean", "running_mean")if "variance" in p:t = p.replace("_variance", "running_var")if t in torch_list:if 'fc' not in p:paddle_state_dict[p] = state_dict[t].detach().cpu().numpy()else:paddle_state_dict[p] = state_dict[t].detach().cpu().numpy().Telse:print(p)#     f = open('resnet50.pdparams', 'wb')
#     f = open("rvm_resnet50.pdparams")
#     import pickle
#     pickle.dump(paddle_state_dict, f)
#     f.close()with open("rvm_resnet50.pdparams", 'wb') as f:import picklepickle.dump(paddle_state_dict, f)import paddle
import numpy as np
# from paddle.vision.models import ResNet
# from paddle.vision.models.resnet import BottleneckBlock, BasicBlock
from collections import OrderedDict
# paddleresnet50 = ResNet(BottleneckBlock, 50)
model = MattingNetwork('resnet50')
transfer(paddlemodel=model)

3.5 最终推理

执行推理，查看能否正常执行，输出文件是否正确。

In [ ]

#若只需要做视频抠像处理，我们提供简单的 API:
# from inference import convert_video
model = MattingNetwork('resnet50')
model.set_state_dict(paddle.load("rvm_resnet50.pdparams"))
convert_video(model,                           # 模型，可以加载到任何设备（cpu 或 cuda）input_source='video.mp4',        # 视频文件，或图片序列文件夹output_type='video',             # 可选 "video"（视频）或 "png_sequence"（PNG 序列）output_composition='com.mp4',    # 若导出视频，提供文件路径。若导出 PNG 序列，提供文件夹路径output_alpha="pha.mp4",          # [可选项] 输出透明度预测output_foreground="fgr.mp4",     # [可选项] 输出前景预测output_video_mbps=4,             # 若导出视频，提供视频码率downsample_ratio=None,           # 下采样比，可根据具体视频调节，或 None 选择自动seq_chunk=1                    # 设置多帧并行计算 12
)

将输出的com.mp4文件下载到本地，是不是一个帅哥在绿幕前的样子？这就是证明整个项目复现成功了！这个帅哥的项目地址是：https://gitee.com/roy-kwok/robust-video-matting-master

限于篇幅，MobileNetV3骨干网络部分就不在这里写了，大家可以移步这里飞桨源码MobileNetV3分类模型对齐Pytorch-省事版

经验总结

这次论文复现，存在投机取巧的部分，跟论文复现赛的要求比起来尚有不足，有如下几点没有完成：

没有复现出训练部分
骨干网精度对齐了，但最终网络没有精度对齐测试，只是看最终效果还行，就算任务完成了。
代码没有规范化处理，很多测试语句只是注释掉而没有删除。

不过能够从头到尾完成整个项目（从模型到推理），心里已经非常激动了。在连续参加大约4届论文复现赛后，这是自己复现效果最好的一次，我的评价是：满意

那些测试语句，带着也挺好。因为其实只要过1-2周，再看这些代码，就已然看不懂了，有测试代码，还能让自己想起来的快点。

把复现中常见的一些问题，存档到这个项目里了，以方便以后的复现工作飞桨与Pytorch算子对照表-论文复现小助手

同时完成一个爆款项目没有绿幕，AI给我们造！超强的稳定视频抠像 (RVM),就是把这个项目封装成人人都可以用的超简单模式，大家也可以来试试哈！

调试纠错

报错too many values to unpack (expected 2)

     58     def forward(self, x, r: Optional[Tensor]):
---> 59         a, b = x.split(self.channels // 2, axis=-3) # dim60         b, r = self.gru(b, r)61         x = paddle.concat([a, b], axis=-3) # cat concat
ValueError: too many values to unpack (expected 2)

飞桨是按照那个分开的数值输出多个变量，可能torch不一样

第二个参数不同： PyTorch：第二个参数split_size_or_sections类型为int或者list(int)。 PaddlePaddle：第二个参数num_or_sections类型为int、list(int)或者tuple(int)。也就是torch是分的块大小，而飞桨是分的块数

self.box_filter.weight =1/9

TypeError: assignment to parameter 'weight' should be of type Parameter or None, but got 'float' 也就是飞桨里，怎么给模型的weight赋值。

import paddleweight_attr = paddle.ParamAttr(name="weight",learning_rate=0.5,regularizer=paddle.regularizer.L2Decay(1.0),trainable=True)
print(weight_attr.name) # "weight"
paddle.nn.Linear(3, 4, weight_attr=weight_attr)or
# PaddlePaddle示例：
import paddle
x = paddle.zeros([2, 3], dtype="float32")
param = paddle.create_parameter(shape=x.shape,dtype=str(x.numpy().dtype),default_initializer=paddle.nn.initializer.Assign(x))
param.stop_gradient = True# 输出
# Parameter containing:
# Tensor(shape=[2, 3], dtype=float32, place=CPUPlace, stop_gradient=True,
#        [[0., 0., 0.],
#         [0., 0., 0.]])

split 里面切分参数是列表

split([3,1], dim = -3) 直接用切片解决

[:, :3, ] [:, 3:, ] 后来发现不能用切片解决，因为不知道它是几维的啊！其实就用split就行了，只是dim要改成axis

            # fgr_residual, pha = self.project_mat(hid).split([3, 1], dim=-3)print("self.project_mat(hid).split", self.project_mat(hid).shape)fgr_residual, pha = self.project_mat(hid).split([3, 1], axis=-3)

'Tensor' object has no attribute 'clamp'

---> 57             fgr = fgr.clamp(0., 1.)58             pha = pha.clamp(0., 1.)59             return [fgr, pha, *rec]
AttributeError: 'Tensor' object has no attribute 'clamp'

直接用 paddle.clip 即可。

调用训练参数文件报错

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1441: UserWarning: Skip loading for refiner.conv.4._mean. refiner.conv.4._mean is not found in the provided dict.warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py:1441: UserWarning: Skip loading for refiner.conv.4._variance. refiner.conv.4._variance is not found in the provided dict.warnings.warn(("Skip loading for {}. ".format(key) + str(err)))

就是因为看到调用训练参数文件这么多报错，才下决心重写ResNet。

其实有报错，就证明模型没有对齐。

unflatten报错

/tmp/ipykernel_166/3011955335.py in forward_time_series(self, x)40         B, T = x.shape[:2]41         features = self.forward_single_frame(x.flatten(0, 1))
---> 42         features = [f.unflatten(0, (B, T)) for f in features]43         return features44
/tmp/ipykernel_166/3011955335.py in <listcomp>(.0)40         B, T = x.shape[:2]41         features = self.forward_single_frame(x.flatten(0, 1))
---> 42         features = [f.unflatten(0, (B, T)) for f in features]43         return features44
AttributeError: 'Tensor' object has no attribute 'unflatten'

我们的tensor没有unflatten函数。 torch的

>>> torch.randn(3, 4, 1).unflatten(1, (2, 2)).shape
torch.Size([3, 2, 2, 1])
>>> torch.randn(3, 4, 1).unflatten(1, (-1, 2)).shape # the size -1 is inferred from the size of dimension 1
torch.Size([3, 2, 2, 1])
>>> torch.randn(2, 4, names=('A', 'B')).unflatten('B', (('B1', 2), ('B2', 2)))
tensor([[[-1.1772,  0.0180],[ 0.2412,  0.1431]],[[-1.1819, -0.8899],[ 1.5813,  0.2274]]], names=('A', 'B1', 'B2'))
>>> torch.randn(2, names=('A',)).unflatten('A', (('B1', -1), ('B2', 1)))
tensor([[-0.8591],[ 0.3100]], names=('B1', 'B2'))

本来感觉用split可以处理的，后来试验了一下，发现不行，那就看下torch的源代码吧。看不懂源码，没找到关键语句，用reshape语句解决问题

# features = [f.unflatten(0, (B, T)) for f in features]
features = [f.reshape([B, T] + f.shape[1:]) for f in features]

报错

ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [6, 3, 4, 224, 224] and the shape of Y = [6, 8, 3, 224, 224]. Received [3] in X is not equal to [8] in Y at i:1.[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:240)[operator < elementwise_add > error]

咋不对捏？盲猜是不是前面的切片有问题？

调整新shape测试，发现是56行报错：

     54             if downsample_ratio != 1:55                 fgr_residual, pha = self.refiner(src, src_sm, fgr_residual, pha, hid)
---> 56             fgr = fgr_residual + src57             fgr = fgr.clip(0., 1.)58             pha = pha.clip(0., 1.)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py in __impl__(self, other_var)262             axis = -1263             math_op = getattr(_C_ops, op_type)
--> 264             return math_op(self, other_var, 'axis', axis)265 266         comment = OpProtoHolder.instance().get_op_proto(op_type).comment
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [2, 3, 4, 224, 224] and the shape of Y = [2, 24, 3, 224, 224]. Received [3] in X is not equal to [24] in Y at i:1.[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at /paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:240)[operator < elementwise_add > error]

由切片改回split,解决问题！

            # fgr_residual, pha = self.project_mat(hid).split([3, 1], dim=-3)print("self.project_mat(hid).split", self.project_mat(hid).shape)fgr_residual, pha = self.project_mat(hid).split([3, 1], axis=-3)

加载模型数据后，前向计算结果不同

跟torch model.named_parameters进行对比，下面是飞桨tmpmodel.sublayers：

Tensor(shape=[64, 3, 7, 7], dtype=float32, place=CUDAPlace(0), stop_gradient=False,[[[[ 0.00920054,  0.00598424, -0.02342818, ..., -0.05485044,-0.05662441, -0.08892745],[ 0.00141057,  0.00058585,  0.00649579, ..., -0.00672991,-0.02917050, -0.05985972],[ 0.02777172,  0.03086680,  0.01981828, ...,  0.10167034,0.06547850,  0.04509544],

torch

OrderedDict([('backbone.conv1.weight',tensor([[[[ 9.2005e-03,  5.9842e-03, -2.3428e-02,  ..., -5.4850e-02,-5.6624e-02, -8.8927e-02],[ 1.4106e-03,  5.8585e-04,  6.4958e-03,  ..., -6.7299e-03,-2.9171e-02, -5.9860e-02],[ 2.7772e-02,  3.0867e-02,  1.9818e-02,  ...,  1.0167e-01,6.5478e-02,  4.5095e-02],

看来还是对齐的

再来看看最后的信息,最后的信息好像没堆起来：

paddletorch
('refiner.conv.4.weight',tensor([1.1835, 1.1862, 1.3103, 1.0038, 1.2307, 1.5397, 1.0028, 1.1493, 1.2051,1.1066, 0.8704, 1.1728, 1.0673, 1.1416, 1.0163, 1.1113],device='cuda:0')),('refiner.conv.4.bias',tensor([-0.0158,  0.0759, -0.0215, -0.1072, -0.2232,  0.0079,  0.0211, -0.0166,-0.0709, -0.0674, -0.0459,  0.1081,  0.0335,  0.0550,  0.0274,  0.1692],device='cuda:0')),('refiner.conv.4.running_mean',tensor([-0.0813, -0.1583, -0.2181,  0.2204,  0.0940,  0.0842, -0.1248, -0.0449,0.1911,  0.1713, -0.1602,  0.0618, -0.0110, -0.0663,  0.2190,  0.0388],device='cuda:0')),('refiner.conv.4.running_var',tensor([0.1016, 0.0304, 0.0449, 0.0309, 0.0496, 0.0117, 0.4346, 0.0580, 0.0196,0.0543, 0.2745, 0.0461, 0.0263, 0.0308, 0.1099, 0.0121],device='cuda:0')),('refiner.conv.4.num_batches_tracked',tensor(11863, device='cuda:0')),('refiner.conv.6.weight', tensor([[[[-7.8937e-02]],[[-1.7649e-01]],[[ 2.6084e-01]],[[ 6.5176e-02]],[[ 1.6884e-01]],[[ 1.6580e-03]],........[[ 1.5224e-01]]]], device='cuda:0')),('refiner.conv.6.bias',tensor([ 0.1447,  0.1293,  0.0782, -0.0261], device='cuda:0'))])

ypeError: 'numpy.int64' object is not callable

/tmp/ipykernel_1229/1406632871.py in write(self, frames)44     def write(self, frames):45         # frames: [T, C, H, W]
---> 46         self.stream.width = frames.size(3)47         self.stream.height = frames.size(2)48         if frames.size(1) == 1:TypeError: 'numpy.int64' object is not callable

这是因为torch的size返回tensor，而飞桨的size返回int值。使用torch.shape即可。应该是paddle.shape

transpose() takes from 2 to 3 positional arguments but 5 were given

---> 54         x=x.transpose(0,2,3,1)55         x=x.numpy()56         # frames = frames.mul(255).byte().cpu().permute(0, 2, 3, 1).numpy()TypeError: transpose() takes from 2 to 3 positional arguments but 5 were given

这里奇怪了，哪里来的5维度啊？哦，原来是写错参数了，应该是transpose([0, 2, 3, 1])

##. 报错Operator transpose2 does not have kernel for data_type[int8_t]:

RuntimeError: (NotFound) Operator transpose2 does not have kernel for data_type[int8_t]:data_layout[ANY_LAYOUT]:place[CUDAPlace(0)]:library_type[PLAIN].[Hint: Expected kernel_iter != kernels.end(), but received kernel_iter == kernels.end().] (at /paddle/paddle/fluid/imperative/prepared_operator.cc:159)[operator < transpose2 > error]

简单，把类型转换放到后面就行了。

没有更多提示信息的错误

/tmp/ipykernel_1229/4196880029.py in write(self, frames)62         for t in range(frames.shape[0]):63             frame = frames[t]
---> 64             frame = av.VideoFrame.from_ndarray(frame, format='rgb24')65             self.container.mux(self.stream.encode(frame))66 av/video/frame.pyx in av.video.frame.VideoFrame.from_ndarray()AssertionError:

经过调试输出信息，发现是因为前面类型转换的时候使用‘int8’,导致出现负数才报错的。使用‘uint8’解决。

报错 'Tensor' object has no attribute 'repeat' 还是要解决repeat问题。

<ipython-input-72-dcd21ea5364d> in write(self, frames)48         self.stream.height = frames.shape[2]49         if frames.shape[1] == 1:
---> 50             frames = frames.repeat(1, 3, 1, 1) # convert grayscale to RGB repeat对应飞桨什么呢？51         # 拆分下面的长句，以便单步执行和代码替换52         x=frames*255AttributeError: 'Tensor' object has no attribute 'repeat'

repeat 改成tile

#             frames = frames.repeat(1, 3, 1, 1) # convert grayscale to RGB repeat对应飞桨什么呢？frames = frames.tile([1, 3, 1, 1])

'numpy.int64' object is not callable,

<ipython-input-86-db69f39ec30d> in convert_video(model, input_source, input_resize, downsample_ratio, output_type, output_composition, output_alpha, output_foreground, output_video_mbps, seq_chunk, num_workers, progress, device, dtype)157                     writer_com.write(com[0])158
--> 159                 bar.update(src.size(1))160 161     finally:TypeError: 'numpy.int64' object is not callable

因为用了size ，改成shape

#                 bar.update(src.size(1))bar.update(src.shape[1])

版本信息

版本1.0 2022.4.21日

结束语

用飞桨，划时代！让我们荡起双桨，在AI的海洋乘风破浪！

飞桨官网：https://www.paddlepaddle.org.cn

因为水平有限，难免有不足之处，还请大家多多帮助。

作者：段春华，网名skywalk 或天马行空，济宁市极快软件科技有限公司的AI架构师，百度飞桨PPDE。

我在AI Studio上获得至尊等级，点亮11个徽章，来关注啊~ https://aistudio.baidu.com/aistudio/personalcenter/thirdview/141218

高分辨率实时抠像-RobustVideoMatting飞桨复现相关推荐

基于飞桨复现图像分类模型TNT，实现肺炎CT分类
本项目介绍了TNT图像分类模型,讲述了如何使用飞桨一步步构建TNT模型网络结构,并尝试在新冠肺炎CT数据集上进行分类.由于作者水平有限,若有不当之处欢迎批评指正. TNT模型介绍 TNT模型全称是Tr ...
基于飞桨复现ICML顶会模型SGC，可实现超快速网络收敛
点击左上方蓝字关注我们 [飞桨开发者说]尹梓琦,北京理工大学在读本科生,关注图深度学习,图挖掘算法和谱图理论随着深度学习在欧几里得空间的成功应用,例如CNN,RNN等极大的提高了图像分类,序列预测等 ...
基于飞桨复现语义分割网络HRNet，实现瓷砖缺陷检测
点击左上方蓝字关注我们 [飞桨开发者说]路星奎,沈阳化工大学信息工程学院研究生在读,PPDE飞桨开发者技术专家,研究方向为图像分类.目标检测.图像分割等内容简介本项目讲述了HRNet网络结构,并尝 ...
组队瓜分百万奖金池，资深算法工程师带你挑战飞桨论文复现赛！
你是否正在焦虑找不到好的论文? 好不容易找到了paper,无法复现出code? 缺少科研同行交流,只能独自一人闭门造车? 是的,论文复现是要想最快的学习和了解AI领域的方式,复现困境也被叫做" ...
PPDE第二季度迎新 | 欢迎22位AI开发者加入飞桨开发者技术专家计划！
PPDE计划是飞桨开发者技术专家的荣誉认证体系,无论是热爱编程开发的资深程序员.大型技术社区的引领者,还是顶级开源软件的Committer.新兴科技公司创始人或CTO,这些开发者技术专家将通过线上线下 ...
聚百川之源，欢迎28位AI开发者加入飞桨开发者技术专家计划！
PPDE计划是飞桨开发者技术专家的荣誉认证体系,无论是热爱编程开发的资深程序员.大型技术社区的引领者,还是顶级开源软件的Committer.新兴科技公司创始人或CTO,这些开发者技术专家将通过线上线下 ...
不认得各种中药材？来看看飞桨开发者的妙招
[飞桨开发者说]韩爱庆北京中医药大学管理学院副教授中药是中医临床治疗的主要载体.也是中医药文化最核心的载体,蕴含着大量的科技资源.文化资源.产业资源.据相关报道,我国是全球最大的中药材市场.数据 ...
PPDE迎新 | 欢迎18位AI开发者加入飞桨开发者技术专家计划！
PPDE计划是飞桨开发者技术专家的荣誉认证体系,无论是热爱编程开发的资深程序员.大型技术社区的引领者,还是顶级开源软件的Committer.新兴科技公司创始人或CTO,这些开发者技术专家将通过线上线下 ...
实时高分辨率视频抠像
一.实验目的视频抠像有许多实际应用.许多正在兴起的用例,例如视频会议和娱乐视频创作,都需要在没有绿幕道具的情况下对人体主体进行实时背景替换.因此我选择该项目作为大作业方向 1.在视频流上提取前景 ...

高分辨率实时抠像-RobustVideoMatting飞桨复现

飞桨复现视频抠像RobustVideoMatting（RVM）

论文复现基本思路和过程

复现步骤概述

首先考虑使用X2Paddle直接转换代码

退而求次，全手动改写代码。

飞桨实现RobustVideoMatting

安装包和文件下载等

1. 改写ResNet50

1.1 ReLU和constant_init_()

1.2 PaddleDtypes()

1.3 凯明初始化

1.5 ResNet主代码

1.6测试飞桨的ResNet50跟Pytroch对齐

1.7测试飞桨ResNet50Encoder跟Pytroch对齐

2.调通MattingNetwork网络

2.1改写LRASPP

2.2修改RecurrentDecoder和Projection

2.3开始FastGuidedFilterRefiner

2.4改写DeepGuidedFilterRefiner

2.5开始最后的冲刺MattingNetwork

3.拆解推理模块

3.1飞桨实现PIL to image

3.2 推理工具

3.3 复现推理主文件

3.4 pytorch权重转paddlepaddle权重

3.5 最终推理

经验总结

调试纠错

报错too many values to unpack (expected 2)

self.box_filter.weight =1/9

split 里面切分参数是列表

'Tensor' object has no attribute 'clamp'

调用训练参数文件报错

unflatten报错

报错

加载模型数据后，前向计算结果不同

ypeError: 'numpy.int64' object is not callable

transpose() takes from 2 to 3 positional arguments but 5 were given

没有更多提示信息的错误

报错 'Tensor' object has no attribute 'repeat' 还是要解决repeat问题。

'numpy.int64' object is not callable,

版本信息

结束语

高分辨率实时抠像-RobustVideoMatting飞桨复现相关推荐

最新文章

热门文章