神经网络推理加速：合并卷积和BN层运算原理及实验

1. 为什么要合并BN层

在训练深度网络模型时，BN（Batch Normalization）层能够加速网络收敛，并且能够控制过拟合，一般放在卷积层之后。BN 层将数据归一化后，能够有效解决梯度消失与梯度爆炸问题。虽然 BN 层在训练时起到了积极作用，然而，在网络前向推断时多了一些层的运算，影响了模型的性能，且占用了更多的内存或者显存空间。目前，很多先进的网络模型（ResNet，MobileNet，Xception，ShuffleNet 等）都使用了BN技术，因此，我们有必要将 BN 层的参数合并到卷积层，来提升模型前向推断的速度。

２. BN层与卷积层合并的数学原理

卷积层中

卷积权重： W，卷积偏置：B

卷积层运算：

BN 层中
均值：，方差：，缩放因子：，偏移：，一个较小数（防止分母为0）：

BN层和卷积层合并后：

3. 实验结果

机器：显卡 GTX 1080Ti，i7 CPU

本实验对比了Resnet50 模型合并BN层前后的性能，分类精度保持不变，速度显著提升。

模型	CPU前向时间	GPU前向时间
Resnet50（合并前）	176.17ms	11.03ms
Resnet50（合并后）	161.69ms	7.3ms
提升	10%	51%

4. 合并的python脚本

该脚本需要caffe的python接口

#!/usr/bin/env python
# -*- coding: UTF-8 -*-import numpy as np
import sys
import os
import os.path as osp
import google.protobuf as pb
import google.protobuf.text_format
from argparse import ArgumentParser
import caffecaffe.set_mode_cpu()def load_and_fill_biases(src_model, src_weights, dst_model, dst_weights):with open(src_model) as f:model = caffe.proto.caffe_pb2.NetParameter()pb.text_format.Merge(f.read(), model)for i, layer in enumerate(model.layer):if layer.type == 'Convolution': # or layer.type == 'Scale':# Add bias layer if neededif layer.convolution_param.bias_term == False:layer.convolution_param.bias_term = Truelayer.convolution_param.bias_filler.type = 'constant'layer.convolution_param.bias_filler.value = 0.0with open(dst_model, 'w') as f:f.write(pb.text_format.MessageToString(model))caffe.set_mode_cpu()net_src = caffe.Net(src_model, src_weights, caffe.TEST)net_dst = caffe.Net(dst_model, caffe.TEST)for key in net_src.params.keys():for i in range(len(net_src.params[key])):net_dst.params[key][i].data[:] = net_src.params[key][i].data[:]if dst_weights is not None:# Store paramspassreturn net_dstdef merge_conv_and_bn(net, i_conv, i_bn, i_scale):# This is based on Kyeheyon's workassert(i_conv != None)assert(i_bn != None)def copy_double(data):return np.array(data, copy=True, dtype=np.double)key_conv = net._layer_names[i_conv]key_bn = net._layer_names[i_bn]key_scale = net._layer_names[i_scale] if i_scale else None# Copybn_mean = copy_double(net.params[key_bn][0].data)bn_variance = copy_double(net.params[key_bn][1].data)num_bn_samples = copy_double(net.params[key_bn][2].data)# and Invalidate the BN layernet.params[key_bn][0].data[:] = 0net.params[key_bn][1].data[:] = 1net.params[key_bn][2].data[:] = 1if num_bn_samples[0] == 0:num_bn_samples[0] = 1if net.params.has_key(key_scale):print 'Combine {:s} + {:s} + {:s}'.format(key_conv, key_bn, key_scale)scale_weight = copy_double(net.params[key_scale][0].data)scale_bias = copy_double(net.params[key_scale][1].data)net.params[key_scale][0].data[:] = 1net.params[key_scale][1].data[:] = 0else:print 'Combine {:s} + {:s}'.format(key_conv, key_bn)scale_weight = 1scale_bias = 0weight = copy_double(net.params[key_conv][0].data)bias = copy_double(net.params[key_conv][1].data)alpha = scale_weight / np.sqrt(bn_variance / num_bn_samples[0] + 1e-5)net.params[key_conv][1].data[:] = bias * alpha + (scale_bias - (bn_mean / num_bn_samples[0]) * alpha)for i in range(len(alpha)):net.params[key_conv][0].data[i] = weight[i] * alpha[i]def merge_batchnorms_in_net(net):# for each BNfor i, layer in enumerate(net.layers):if layer.type != 'BatchNorm':continuel_name = net._layer_names[i]l_bottom = net.bottom_names[l_name]assert(len(l_bottom) == 1)l_bottom = l_bottom[0]l_top = net.top_names[l_name]assert(len(l_top) == 1)l_top = l_top[0]can_be_absorbed = True# Search all (bottom) layersfor j in xrange(i - 1, -1, -1):tops_of_j = net.top_names[net._layer_names[j]]if l_bottom in tops_of_j:if net.layers[j].type not in ['Convolution', 'InnerProduct']:can_be_absorbed = Falseelse:# There must be only one layerconv_ind = jbreakif not can_be_absorbed:continue# find the following Scalescale_ind = Nonefor j in xrange(i + 1, len(net.layers)):bottoms_of_j = net.bottom_names[net._layer_names[j]]if l_top in bottoms_of_j:if scale_ind:# Followed by two or more layersscale_ind = Nonebreakif net.layers[j].type in ['Scale']:scale_ind = jtop_of_j = net.top_names[net._layer_names[j]][0]if top_of_j == bottoms_of_j[0]:# On-the-fly => Can be mergedbreakelse:# Followed by a layer which is not 'Scale'scale_ind = Nonebreakmerge_conv_and_bn(net, conv_ind, i, scale_ind)return netdef process_model(net, src_model, dst_model, func_loop, func_finally):with open(src_model) as f:model = caffe.proto.caffe_pb2.NetParameter()pb.text_format.Merge(f.read(), model)for i, layer in enumerate(model.layer):map(lambda x: x(layer, net, model, i), func_loop)map(lambda x: x(net, model), func_finally)with open(dst_model, 'w') as f:f.write(pb.text_format.MessageToString(model))# Functions to remove (redundant) BN and Scale layers
to_delete_empty = []
def pick_empty_layers(layer, net, model, i):if layer.type not in ['BatchNorm', 'Scale']:returnbottom = layer.bottom[0]top = layer.top[0]if (bottom != top):# Not supperted yetreturnif layer.type == 'BatchNorm':zero_mean = np.all(net.params[layer.name][0].data == 0)one_var = np.all(net.params[layer.name][1].data == 1)if zero_mean and one_var:print 'Delete layer: {}'.format(layer.name)to_delete_empty.append(layer)if layer.type == 'Scale':no_scaling = np.all(net.params[layer.name][0].data == 1)zero_bias = np.all(net.params[layer.name][1].data == 0)if no_scaling and zero_bias:print 'Delete layer: {}'.format(layer.name)to_delete_empty.append(layer)def remove_empty_layers(net, model):map(model.layer.remove, to_delete_empty)# A function to add 'engine: CAFFE' param into 1x1 convolutions
def set_engine_caffe(layer, net, model, i):if layer.type == 'Convolution':if layer.convolution_param.kernel_size == 1\or (layer.convolution_param.kernel_h == layer.convolution_param.kernel_w == 1):layer.convolution_param.engine = dict(layer.convolution_param.Engine.items())['CAFFE']def main():# Set default output file namesif args.output_model is None:file_name = osp.splitext(args.model)[0]args.output_model = file_name + '_inference.prototxt'if args.output_weights is None:file_name = osp.splitext(args.weights)[0]args.output_weights = file_name + '_inference.caffemodel'net = load_and_fill_biases(args.model, args.weights, args.model + '.temp.pt', None)net = merge_batchnorms_in_net(net)process_model(net, args.model + '.temp.pt', args.output_model,[pick_empty_layers, set_engine_caffe],[remove_empty_layers])# Store paramsnet.save(args.output_weights)if __name__ == '__main__':parser = ArgumentParser(description="Generate Batch Normalized model for inference")parser.add_argument('--model', default="MobileNetSSD_deploy.prototxt", help="The net definition prototxt")parser.add_argument('--weights', default="MobileNetSSD_deploy.caffemodel", help="The weights caffemodel")parser.add_argument('--output_model')parser.add_argument('--output_weights')args = parser.parse_args()main()

脚本下载地址：

https://download.csdn.net/download/kangdi7547/10578152

参考博客： http://keep.01ue.com/?pi=943537&_a=app&_c=index&_m=p

神经网络推理加速：合并卷积和BN层运算原理及实验相关推荐

merge卷积和bn层的原理
<merge卷积和bn层的原理> 这是一个在移动端非常实用的技巧,而且丝毫不会影响模型的精度,而提高模型的运算速度,就是把BN层离线的时候做好,放在权重值和偏执项中就可以了. Key ...
Caffe中merge卷积和bn层的原理
https://blog.csdn.net/racesu/article/details/111002511
合并BN层到卷积层的原理及实验
1. 为什么要合并BN层在训练深度网络模型时,BN(Batch Normalization)层能够加速网络收敛,并且能够控制过拟合,一般放在卷积层之后.BN 层将数据归一化后,能够有效解决梯度消失 ...
神经网络中BN层的原理与作用
BN层介绍 BN,全称Batch Normalization,是2015年提出的一种方法,在进行深度网络训练时,大都会采取这种算法. 原文链接:Batch Normalization: Acceler ...
神经网络推理加速—— GPU为什么这么牛
导读 AI模型运行在计算机上,除了需要消耗大量的计算资源外,还需要大量的内存以及带宽用来存储和搬运数据. 在如今一个模型动辄几千亿个参数的情况下,模型运行的性能变得越来越重要,对计算机硬件的需求也水 ...
网络骨架：Backbone(神经网络基本组成——BN层、全连接层)
BN层为了追求更高的性能,卷积网络被设计得越来越深,然而网络却变得难以训练收敛与调参.原因在于,浅层参数的微弱变化经过多层线性变化与激活函数后会被放大,改变了每一层的输入分布,造成深层的网络需要不断 ...
卷积层与BN层的融合方式
BN(批归一化)层常用于在卷积层之后,对feature maps进行归一化,从而加速网络学习,也具有一定的正则化效果.训练时,BN需要学习一个minibatch数据的均值.方差,然后利用这些信息进行归 ...
【剑指offer】BN层详解
[剑指offer]系列文章目录梯度消失和梯度爆炸交叉熵损失函数文章目录 [剑指offer]系列文章目录 BN层的本质原理 BN层的优点总结 BN层的过程代码实现 BN层的本质原理 BN层(Ba ...
狠补基础-数学+算法角度讲解卷积层,激活函数,池化层,Dropout层,BN层,全链接层
狠补基础-数学+算法角度讲解卷积层,激活函数,池化层,Dropout层,BN层,全链接层在这篇文章中您将会从数学和算法两个角度去重新温习一下卷积层,激活函数,池化层,Dropout层,BN层,全链接 ...

神经网络推理加速：合并卷积和BN层运算原理及实验

神经网络推理加速：合并卷积和BN层运算原理及实验相关推荐

最新文章

热门文章

神经网络推理加速： 合并卷积和BN层运算原理及实验

神经网络推理加速： 合并卷积和BN层运算原理及实验相关推荐

最新文章

热门文章

神经网络推理加速：合并卷积和BN层运算原理及实验

神经网络推理加速：合并卷积和BN层运算原理及实验相关推荐