文章目录

前言
模型剪枝
- MNIST
- - 常规训练
  - - Setup
    - 常规训练模型
    - 模型评估
  - Pruning
  - - 模型定义
    - 训练模型
    - 评估模型
  - pruning your model
  - API
  - - prune_low_magnitude
    - strip_pruning
  - PolynomialDecay
  - 实战
模型量化
- 量化训练
- - MNIST
  - 训练模型
  - 量化模型
- 训练后量化
参考

前言

前段时间看了知乎有一个问题：训练好的深度学习模型是怎样部署的？，当中有一个高赞的回答，大致意思如下：
深度学习部署方式取决于你的需求：

需求一：简单的demo演示，看看效果。只需要把训练模型切换到inference模式或者用CPython包装供C++工程进行调用。
需求二：放到服务器上开启服务，不要求吞吐不要求时延。基于训练框架（tensorflow，pytorch，caffe）等用C++部署一套。这种尚未脱离框架，导致占用存储空间。
需求三：放到服务器上要求吞吐和时延。使用TensorRT、Openvino等推理框架
需求四：放在Nvidia嵌入式平台上跑，注重时延。比如PX2、TX2、Xavier等
需求五：上面都不满足，那就自己去写推理框架。

根据列出的需求，深度学习模型训练完成后不可能直接在训练框架上通过切换推理模型来进行部署推理，先不说系统要求的吞吐或者时延，使用深度学习的推理框架占用的存储空间非常大，如tensorflow的，当然土豪可以除外。因此为了提升模型的效率，接下来对深度学习模型部署进行论述。

本篇博客主要介绍通过模型的量化与剪枝进行瘦身，后续可能会写其他的瘦身以及提速，如使用一些推理框架TensorRT以及penvino，脱离原来的训练框架，还有是模型如何提速以及模型移动端的部署(Tensorflow Lite)。另外本博客主要是基于Tensorflow框架，模型的优化主要是从模型的剪枝、量化以及权重聚类叙述。

模型剪枝

MNIST

MNIST是深度学习里面的“Hello world”，接下来以这个为例，讲述如何进行模型的剪枝。顺便提及一下，剪枝在决策树中也非常重要，可分为预剪枝和后剪枝。在MNIST的例子当中，模型的剪枝与预剪枝有异曲同工之处，详情可以参考：https://github.com/tensorflow/model-optimization/blob/master/tensorflow_model_optimization/g3doc/guide/pruning/pruning_with_keras.ipynb。Tensorflow的模型剪枝需要安装tensorflow-model-optimization模块。

常规训练

Setup

 pip install -q tensorflow-model-optimization

import tempfile
import os
import tensorflow as tf
import numpy as np
from tensorflow import keras
%load_ext tensorboard

常规训练模型

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()# Normalize the input image so that each pixel value is between 0 and 1.
train_images = train_images / 255.0
test_images = test_images / 255.0# Define the model architecture.
model = keras.Sequential([keras.layers.InputLayer(input_shape=(28, 28)),keras.layers.Reshape(target_shape=(28, 28, 1)),keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),keras.layers.MaxPooling2D(pool_size=(2, 2)),keras.layers.Flatten(),keras.layers.Dense(10)
])# Train the digit classification model
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])model.fit(train_images,train_labels,epochs=4,validation_split=0.1,
)

得到的输出：

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step
Epoch 1/4
1688/1688 [==============================] - 10s 6ms/step - loss: 0.2785 - accuracy: 0.9220 - val_loss: 0.1031 - val_accuracy: 0.9740
Epoch 2/4
1688/1688 [==============================] - 9s 5ms/step - loss: 0.1063 - accuracy: 0.9691 - val_loss: 0.0782 - val_accuracy: 0.9790
Epoch 3/4
1688/1688 [==============================] - 9s 5ms/step - loss: 0.0815 - accuracy: 0.9765 - val_loss: 0.0788 - val_accuracy: 0.9775
Epoch 4/4
1688/1688 [==============================] - 9s 5ms/step - loss: 0.0689 - accuracy: 0.9797 - val_loss: 0.0633 - val_accuracy: 0.9840
<tensorflow.python.keras.callbacks.History at 0x7f146fbd8bd0>

模型评估

_, baseline_model_accuracy = model.evaluate(test_images, test_labels, verbose=0)print('Baseline test accuracy:', baseline_model_accuracy)_, keras_file = tempfile.mkstemp('.h5')
tf.keras.models.save_model(model, keras_file, include_optimizer=False)
print('Saved baseline model to:', keras_file)

Baseline test accuracy: 0.9775999784469604
Saved baseline model to: /tmp/tmpjj6swf59.h5

Pruning

模型定义

在这个模型剪枝的例子当中，模型以50%的稀疏度开始，以稀疏度80%结束。

import tensorflow_model_optimization as tfmotprune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude# Compute end step to finish pruning after 2 epochs.
batch_size = 128
epochs = 2
validation_split = 0.1 # 10% of training set will be used for validation set. num_images = train_images.shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs# Define model for pruning.
pruning_params = {'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.50,final_sparsity=0.80,begin_step=0,end_step=end_step)
}model_for_pruning = prune_low_magnitude(model, **pruning_params)# `prune_low_magnitude` requires a recompile.
model_for_pruning.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])model_for_pruning.summary()

/tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py:2191: UserWarning: `layer.add_variable` is deprecated and will be removed in a future version. Please use `layer.add_weight` method instead.warnings.warn('`layer.add_variable` is deprecated and '
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
prune_low_magnitude_reshape  (None, 28, 28, 1)         1
_________________________________________________________________
prune_low_magnitude_conv2d ( (None, 26, 26, 12)        230
_________________________________________________________________
prune_low_magnitude_max_pool (None, 13, 13, 12)        1
_________________________________________________________________
prune_low_magnitude_flatten  (None, 2028)              1
_________________________________________________________________
prune_low_magnitude_dense (P (None, 10)                40572
=================================================================
Total params: 40,805
Trainable params: 20,410
Non-trainable params: 20,395
_________________________________________________________________

在训练的过程中，以下两个callbacks是必须的：

tfmot.sparsity.keras.UpdatePruningStep is required during training, and
tfmot.sparsity.keras.PruningSummaries provides logs for tracking progress and debugging.

不然会出现以下的报错提示：

训练模型

logdir = tempfile.mkdtemp()callbacks = [tfmot.sparsity.keras.UpdatePruningStep(),tfmot.sparsity.keras.PruningSummaries(log_dir=logdir),
]model_for_pruning.fit(train_images, train_labels,batch_size=batch_size, epochs=epochs, validation_split=validation_split,callbacks=callbacks)

Epoch 1/2
WARNING:tensorflow:From /tmpfs/src/tf_docs_env/lib/python3.7/site-packages/tensorflow/python/ops/array_ops.py:5049: calling gather (from tensorflow.python.ops.array_ops) with validate_indices is deprecated and will be removed in a future version.
Instructions for updating:
The `validate_indices` argument has no effect. Indices are always validated on CPU and never validated on GPU.3/422 [..............................] - ETA: 12s - loss: 0.0628 - accuracy: 0.9896  WARNING:tensorflow:Callback method `on_train_batch_end` is slow compared to the batch time (batch time: 0.0075s vs `on_train_batch_end` time: 0.0076s). Check your callbacks.
422/422 [==============================] - 5s 9ms/step - loss: 0.0797 - accuracy: 0.9771 - val_loss: 0.0828 - val_accuracy: 0.9790
Epoch 2/2
422/422 [==============================] - 3s 8ms/step - loss: 0.0971 - accuracy: 0.9741 - val_loss: 0.0839 - val_accuracy: 0.9775
<tensorflow.python.keras.callbacks.History at 0x7f12e4502910>

评估模型

_, model_for_pruning_accuracy = model_for_pruning.evaluate(test_images, test_labels, verbose=0)print('Baseline test accuracy:', baseline_model_accuracy)
print('Pruned test accuracy:', model_for_pruning_accuracy)

Baseline test accuracy: 0.9775999784469604
Pruned test accuracy: 0.972100019454956

pruning your model

对自己模型进行剪枝，可以参考我的仓库：https://github.com/RyanCCC/Yolov4/blob/main/pruning.py

API

prune_low_magnitude

tfmot.sparsity.keras.prune_low_magnitude(to_prune, pruning_schedule=pruning_sched.ConstantSparsity(0.5, 0),block_size=(1, 1), block_pooling_type='AVG', pruning_policy=None,sparsity_m_by_n=None, **kwargs
)

Prune a model:

pruning_params = {'pruning_schedule': ConstantSparsity(0.5, 0),'block_size': (1, 1),'block_pooling_type': 'AVG'
}model = prune_low_magnitude(keras.Sequential([layers.Dense(10, activation='relu', input_shape=(100,)),layers.Dense(2, activation='sigmoid')]), **pruning_params)

Prune a layer

pruning_params = {'pruning_schedule': PolynomialDecay(initial_sparsity=0.2,final_sparsity=0.8, begin_step=1000, end_step=2000),'block_size': (2, 3),'block_pooling_type': 'MAX'
}model = keras.Sequential([layers.Dense(10, activation='relu', input_shape=(100,)),prune_low_magnitude(layers.Dense(2, activation='tanh'), **pruning_params)
])

strip_pruning

tfmot.sparsity.keras.strip_pruning(model
)

orig_model = tf.keras.Model(inputs, outputs)
pruned_model = prune_low_magnitude(orig_model)
exported_model = strip_pruning(pruned_model)

PolynomialDecay

tfmot.sparsity.keras.PolynomialDecay(initial_sparsity, final_sparsity, begin_step, end_step, power=3, frequency=100
)

更多的Api请参考：https://www.tensorflow.org/model_optimization/api_docs/python/tfmot

实战

关于实战的话，对自定义的模型层进行剪枝的时候会出现以下错误：

对于自定义的层，而这个层可能继承的是keras.layers.Layer类，但并不支持prunable，因此导致这个错误。解决方案就是使你的模型或层同时继承Layer和Prunable_layer这两个类，使其成为PrunableLayer的实例。

如：

class Mish(Layer, tfmot.sparsity.keras.PrunableLayer):  def __init__(self, **kwargs):super(Mish, self).__init__(**kwargs)self.supports_masking = Truedef call(self, inputs):return inputs * K.tanh(K.softplus(inputs))def get_config(self):config = super(Mish, self).get_config()return configdef compute_output_shape(self, input_shape):return input_shape  def get_prunable_weights(self):return self.weights

模型量化

模型的量化有两种形式：训练后量化以及量化训练。训练化的量化更易于使用，但是训练后得到的模型性能没有量化训练的好。

量化训练

MNIST

导入相关的模块：

import tempfile
import osimport tensorflow as tffrom tensorflow import keras

训练模型

# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0# Define the model architecture.
model = keras.Sequential([keras.layers.InputLayer(input_shape=(28, 28)),keras.layers.Reshape(target_shape=(28, 28, 1)),keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),keras.layers.MaxPooling2D(pool_size=(2, 2)),keras.layers.Flatten(),keras.layers.Dense(10)
])# Train the digit classification model
model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])model.fit(train_images,train_labels,epochs=1,validation_split=0.1,
)

量化模型

import tensorflow_model_optimization as tfmotquantize_model = tfmot.quantization.keras.quantize_model# q_aware stands for for quantization aware.
q_aware_model = quantize_model(model)# `quantize_model` requires a recompile.
q_aware_model.compile(optimizer='adam',loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=['accuracy'])q_aware_model.summary()

train_images_subset = train_images[0:1000] # out of 60000
train_labels_subset = train_labels[0:1000]q_aware_model.fit(train_images_subset, train_labels_subset,batch_size=500, epochs=1, validation_split=0.1)

_, baseline_model_accuracy = model.evaluate(test_images, test_labels, verbose=0)_, q_aware_model_accuracy = q_aware_model.evaluate(test_images, test_labels, verbose=0)print('Baseline test accuracy:', baseline_model_accuracy)
print('Quant test accuracy:', q_aware_model_accuracy)

更加详细的指南请参考：量化感知训练综合指南

训练后量化

这部分内容主要针对移动端的部署，详情可以参考：quantization/post_training

import tensorflow as tfdef representative_dataset_gen():for _ in range(num_calibration_steps):# Get sample input data as a numpy array in a method of your choosing.yield [input]converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
tflite_quant_model = converter.convert()

参考

EfficientDL Book 推荐，相关链接谷歌、Meta如何给大模型瘦身？谷歌工程师亲述部署秘籍（免费书）

深度学习模型部署之模型优化相关推荐

前深度学习时代CTR预估模型的演化之路：从LR到FFM\n
本文是王喆在 AI 前线开设的原创技术专栏"深度学习 CTR 预估模型实践"的第二篇文章(以下"深度学习 CTR 预估模型实践"简称"深度 CTR ...
ML之模型文件：机器学习、深度学习中常见的模型文件(.h5、.keras)简介、h5模型文件下载集锦、使用方法之详细攻略
ML之模型文件:机器学习.深度学习中常见的模型文件(.h5..keras)简介.h5模型文件下载集锦.使用方法之详细攻略目录 ML/DL中常见的模型文件(.h5..keras)简介及其使用方法一. ...
前深度学习时代CTR预估模型的演化之路 [王喆观点]
毕业于清华大学计算机系的王喆学长梳理从传统机器学习时代到深度学习时代所有经典CTR(click through rate)模型的演化关系和模型特点.内容来源:https://zhuanlan.zhih ...
人工智能发展到GPT4经历了什么，从专家系统到机器学习再到深度学习，从大模型到现在的GPT4
大家好,我是微学AI,今天给大家讲一下人工智能的发展,从专家系统到机器学习再到深度学习,从大模型到现在的GPT4,讲这个的目的是让每个人都懂得人工智能,每个人都懂得人工智能的发展,未来人工智能是大方向 ...
【深度学习】常用的模型评估指标
[深度学习]常用的模型评估指标 "没有测量,就没有科学."这是科学家门捷列夫的名言.在计算机科学中,特别是在机器学习的领域,对模型的测量和评估同样至关重要.只有选择与问题相匹配的评 ...
【深度学习】Tensorboard可视化模型训练过程和Colab使用
[深度学习]Tensorboard可视化模型训练过程和Colab使用文章目录 1 概述 2 手撸代码实现 3 Colab使用3.1 详细步骤3.2 Demo 4 总结 1 概述在利用TensorF ...
pytorch 训练过程acc_深度学习Pytorch实现分类模型
今天将介绍深度学习中的分类模型,以下主要介绍Softmax的基本概念.神经网络模型.交叉熵损失函数.准确率以及Pytorch实现图像分类.01Softmax基本概念在分类问题中,通常标签都为类别,可 ...
【深度学习】——利用pytorch搭建一个完整的深度学习项目（构建模型、加载数据集、参数配置、训练、模型保存、预测）
目录一.深度学习项目的基本构成二.实战(猫狗分类) 1.数据集下载 2.dataset.py文件 3.model.py 4.config.py 5.predict.py 一.深度学习项目的基本构成 ...
深度学习：长短期记忆模型LSTM
http://blog.csdn.net/pipisorry/article/details/78361778 LSTM模型长短期记忆模型(long-short term memory)是一种特殊的 ...
时间序列深度学习：状态 LSTM 模型预测太阳黑子
目录时间序列深度学习:状态 LSTM 模型预测太阳黑子教程概览商业应用长短期记忆(LSTM)模型太阳黑子数据集构建 LSTM 模型预测太阳黑子 1 若干相关包 2 数据 3 探索性数据分析 ...

深度学习模型部署之模型优化

文章目录

前言

模型剪枝

MNIST

常规训练

Setup

常规训练模型

模型评估

Pruning

模型定义

训练模型

评估模型

pruning your model

API

prune_low_magnitude

strip_pruning

PolynomialDecay

实战

模型量化

量化训练

MNIST

训练模型

量化模型

训练后量化

参考

深度学习模型部署之模型优化相关推荐

最新文章

热门文章