模型裁剪--Rethinking the Value of Network Pruning

Rethinking the Value of Network Pruning
https://github.com/Eric-mingjie/rethinking-network-pruning

网络模型裁剪价值的重新思考

当前的深度学习网络模型一般都是 heavy computational cost，如何降低其计算量而尽可能保持网络性能是一个重要的研究课题。

标准的模型裁剪三部曲是：1）训练一个 large, over-parameterized network，得到最佳网络性能，以此为基准；2）基于一定的准则来裁剪大网络模型；3）在数据集上微调裁剪后的网络模型

在这个裁剪的过程中，存在两个 common beliefs：
1）一般认为一开始训练一个 large, over-parameterized network 是很重要的，以大模型的性能为基准进行裁剪，一般认为这个方式比从头训练一个小模型的方式是更好的。
2）一般认为裁剪后的网络模型结构及其参数权重都很重要。所以目前大部分方法都是在裁剪后的模型上进行微调，The preserved weights after pruning are usually considered to be critical

本文经过大量实验得出了两个较意外的结论：
1）如果我们的目标小模型是事先确定的，那么可以直接在数据集上训练此模型，得到的性能是最佳的，不比微调的性能差

First, for pruning algorithms with predefined target network architectures (Figure 2), directly training the small target model
from random initialization can achieve the same, if not better, performance, as the model obtained from the three-stage pipeline. In this case, starting with a large model is not necessary and one could instead directly train the target model from scratch。

2）对于目标模型不是事先确定的情况，从头开始训练裁剪后的模型，其得到的网络性能也是最好的，不比微调的差。
for pruning algorithms without a predefined target network, training the pruned model from scratch can also achieve comparable or even better performance than fine-tuning. This observation shows that for these pruning algorithms,
what matters is the obtained architecture, instead of the preserved weights,

模型裁剪的过程本质上可能是一个最优网络结构的搜索过程
our results suggest that the value of automatic pruning algorithms may lie in identifying efficient structures and performing implicit architecture search, rather than selecting “important” weights

predefined and non-predefined (automatically discovered) target architectures

predefined target architectures 这里我们举一个例子来说明一下： prune 50% channels in each layer of VGG，不管是哪个具体的 channels 被裁剪，最终的网络结构是一样的。因为 the pruning algorithm 将每个网络层中 least important 50% channels 裁掉。具体裁剪的比例一般是经验或尝试决定 the ratio in each layer is usually selected through empirical studies or heuristics

网络模型可以使用以下几个指标来描述：
model size, memory footprint, the number of computation operations (FLOPs) and power usage

本文选择了三个数据集和三个标准的网络结构
CIFAR-10， CIFAR-100 ， and ImageNet
VGG， ResNet， and DenseNet

6个网络裁剪方法：
L1-norm based Channel Pruning (Li et al., 2017)
ThiNet (Luo et al., 2017)
Regression based Feature Reconstruction (He et al., 2017b)
Network Slimming (Liu et al., 2017):
Sparse Structure Selection (Huang & Wang, 2018) :
Non-structured Weight Pruning (Han et al., 2015):

Training Budget. One crucial question is how long should we train the small pruned model from scratch?
如果从头训练小模型，那么训练时间即迭代次数是一个关键的问题

这里我们做了两个尝试：
Scratch-E 表示和训练大模型的迭代次数一样 to denote training the small pruned models for the same epochs
Scratch-B 表示两者的计算量一样（和大模型训练的计算量）to denote training for the same amount of computation budget

4 Experiments
4.1 Predefined target architectures

L1-norm based Channel Pruning (Li et al., 2017):
In each layer, a certain percentage of channels with smaller L1-norm of its filter weights will be pruned

ThiNet (Luo et al., 2017) greedily prunes the channel that has the smallest effect on the next layer’s activation values

Regression based Feature Reconstruction (He et al., 2017b)
prunes channels by minimizing the feature map reconstruction error of the next layer

最终的结论是：当我们确定了最终的目标网络结构，从头训练小模型比微调小模型更好。从头训练小模型的计算量如果和大模型训练一样，那么其得到的网络性能比微调后的性能一般要好

4.2 Automatically discovered target architectures

Network Slimming (Liu et al., 2017):
imposes L 1 -sparsity on channel-wise scaling factors from Batch Normalization layers (Ioffe & Szegedy, 2015) during
training, and prunes channels with lower scaling factors afterward.

Sparse Structure Selection (Huang & Wang, 2018) :
uses sparsified scaling factors to prune structures, and can be seen as a generalization of Network Slimming. Other
than channels, pruning can be on residual blocks in ResNet or groups in ResNeXt (Xie et al., 2017)

Non-structured Weight Pruning (Han et al., 2015):
prunes individual weights that have small magnitudes. This pruning granularity leaves the weight matrices sparse, hence
it is commonly referred to as non-structured weight pruning.

4.3 Transfer Learning to object detection

we evaluate the L1-norm based pruning method on the PASCAL VOC object detection task, using Faster-RCNN

Prune-C refers to pruning on classifcation pre-trained weights
Prune-D refers to pruning after the weights are transferred to detection task

5 Network pruning as architecture search

不同网络层裁剪的幅度不一样

模型裁剪--Rethinking the Value of Network Pruning相关推荐

【综述】闲话模型压缩之网络剪枝（Network Pruning）
关注上方"深度学习技术前沿",选择"星标公众号", 资源干货,第一时间送达! 来自 | CSDN 地址 | https://blog.csdn.net/jinz ...
Rethinking the value of network pruning
最近将High-way的思想用到了步态识别上,我觉得应该算是一个创新点,但是小伙伴建议我读一读Rethinking the value of resnet ,或许会对我有所启发.所以今天就来拜读一下. ...
闲话模型压缩之网络剪枝（Network Pruning）篇
1. 背景今天,深度学习已成为机器学习中最主流的分支之一.它的广泛应用不计其数,无需多言.但众所周知深度神经网络(DNN)有个很大的缺点就是计算量太大.这很大程度上阻碍了基于深度学习方法的产品化,尤 ...
RETHINKING THE VALUE OF NETWORK PRUNING 笔记：
RETHINKING THE VALUE OF NETWORK PRUNING 笔记: https://download.csdn.net/download/weixin_44543648/18515 ...
《RETHINKING THE VALUE OF NETWORK PRUNING》论文笔记
参考代码:rethinking-network-pruning 1. 概述导读:在模型进行部署的时候一般会希望模型的尽可能小,减少infer时候资源的消耗,其中比较常用的方式就是模型剪枝,其中一般常 ...
网络压缩《network pruning 浅析》
一:背景介绍众所周知的是,神经网络有个很大特点就是参数太多了,计算量太大了,可能是几百兆的规模,这么大的存储量和计算量在小型设备上是运行不起来的,我们必须要做对网络进行精简. 精简后有很多好处,比 ...
AI实战：深度学习模型压缩：模型裁剪——Pruning with Keras
前言上一篇文章 AI实战:深度学习模型压缩:模型裁剪--Pruning with Tensorflow 介绍了使用Tensorflow裁剪模型的方法,本文继续介绍使用Keras裁剪模型的方法及源码分 ...
【论文】模型剪枝（Network Pruning）论文详细翻译
前言: 这是关于模型剪枝(Network Pruning)的一篇论文,论文题目是:Learning both weights and connections for efficient neural ...
GDP: Network Pruning
GDP: Stabilized Neural Network Pruning via Gates with Differentiable Polarization https://arxiv.org/ ...

模型裁剪--Rethinking the Value of Network Pruning

模型裁剪--Rethinking the Value of Network Pruning相关推荐

最新文章

热门文章