1. 背景

Dynamic shapes指的是我们可以在runtime（推理）阶段来指定some或者all输入数据的维度，同时，提供C++和Python两种接口。一般需要指定为dynamic的是batch_size这一个维度，使得我们可以根据自己实际情况动态设置batch，而不需要每次都重新生成engine文件。

2. 总体流程

如何生成及使用支持dynamic shapes的engine的大致步骤如下：

1. 使用最新的接口创建NetworkDefinition对象

截止到tensorrt-7.2.2，builder创建NetworkDefinition提供两个接口：createNetwork()和createNetworkV2()，这两者之间区别有两点：1. 后者处理的维度从原来的（C,H,W)变成了（B,C,H,W)即包括了batch_size; 2. 是否支持dynamic shapes.

2. 对于input tensor中dynamic的维度，通过-1来占位这个维度

3. 在build阶段设置一个或多个optimization profiles，用来指定在runtime阶段inputs允许的维度范围，一般设置三个profiles，最小，最大和最优。

通过上述设置，在build阶段就可以生成一个带dynamic shapes的engine文件，然后就是在推理阶段如何使用这个engine。

4. 如何使用带dynamic shapes的engine：

a. 创建一个execution context，此时的context也是dynamic shapes的；

b. 将想要设置的input dimension绑定context，那么此时context处理的维度确定了；

c. 基于这个context进行推理。

3. 代码实现

3.1 创建NetworkDefinition

3.2 设置输入维度

对于3维的input tensor，其name为“foo"的设置流程如下

3.3 设置Optimization Profiles

通过上述设置完成后，即可编译得到engine文件。

3.4 context对象设置

基于engine对象生成一个execution context对象，此时利用get_binding_shape函数得到的结果如下（此时engine和context获取得到的维度相同）：

然后通过如下方式将输入tensor的维度绑定到该context上

然后再通过get_binding_shape函数获取维度，得到的结果如下：

此时，输入输出的维度固定，分配输入输出所需的内存和显存，调用enqueueV2()函数进行推理。

4. resnet18部署

重要代码讲解

1. 生成engine

注意点包括如下几点：

使用最新的接口创建NetworkDefinition
创建config对象来设置workspace，mixed precision
如果onnx模型中包含dynamic shape，通过传入dynamic_shapes参数设置对应的op

def build_engine(onnx_file_path, trt_logger, trt_engine_datatype=trt.DataType.FLOAT, batch_size=1, silent=False, dynamic_shapes={}):"""Takes an ONNX file and creates a TensorRT engine to run inference with"""EXPLICIT_BATCH = [] if trt.__version__[0] < '7' else [1 << (int)(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)]with trt.Builder(trt_logger) as builder, builder.create_network(*EXPLICIT_BATCH) as network, trt.OnnxParser(network,trt_logger) as parser:builder.max_batch_size = batch_size                                                                                         config = builder.create_builder_config()                                                                                        config.max_workspace_size = 1 << 30        # work spaceif trt_engine_datatype == trt.DataType.HALF:        # float 16config.set_flag(trt.BuilderFlag.FP16)#  Parse model fileif not os.path.exists(onnx_file_path):print('ONNX file {} not found, please run yolov3_to_onnx.py first to generate it.'.format(onnx_file_path))exit(0)print('Loading ONNX file from path {}...'.format(onnx_file_path))with open(onnx_file_path, 'rb') as model:print('Beginning ONNX file parsing')if not parser.parse(model.read()):print('ERROR: Failed to parse the ONNX file.')for error in range(parser.num_errors):print(parser.get_error(error))return Noneprint('Completed parsing of ONNX file')if not silent:print('Building an engine from file {}; this may take a while...'.format(onnx_file_path))# dynamic batch_size if len(dynamic_shapes) > 0:print("===> using dynamic shapes!")profile = builder.create_optimization_profile()for binding_name, dynamic_shape in dynamic_shapes.items():min_shape, opt_shape, max_shape = dynamic_shapeprofile.set_shape(binding_name, min_shape, opt_shape, max_shape)config.add_optimization_profile(profile)return builder.build_engine(network, config)

2. 绑定维度

需要在执行execute函数之前绑定维度，从而确定输入输出维度。

dynamic_batch = 6
input_data = np.ones((dynamic_batch, 3, 224, 224), dtype=np.float32)
# At runtime you need to set an optimization profile before setting input dimensions.
# trt_inference_wrapper.context.active_optimization_profile = 0
# specifying runtime dimensions
trt_inference_wrapper.context.set_binding_shape(0, input_data.shape)

3. 分配内存和显存

def allocate_buffers(engine, context):"""Allocates host and device buffer for TRT engine inference.This function is similair to the one in ../../common.py, butconverts network outputs (which are np.float32) appropriatelybefore writing them to Python buffer. This is needed, sinceTensorRT plugins doesn't support output type description, andin our particular case, we use NMS plugin as network output.Args:engine (trt.ICudaEngine): TensorRT engineReturns:inputs [HostDeviceMem]: engine input memoryoutputs [HostDeviceMem]: engine output memorybindings [int]: buffer to device bindingsstream (cuda.Stream): cuda stream for engine inference synchronization"""inputs = []outputs = []bindings = []stream = cuda.Stream()# Current NMS implementation in TRT only supports DataType.FLOAT but# it may change in the future, which could brake this sample here# when using lower precision [e.g. NMS output would not be np.float32# anymore, even though this is assumed in binding_to_type]for binding in range(engine.num_bindings):size = trt.volume(context.get_binding_shape(binding))# size = trt.volume(engine.get_binding_shape(binding))dtype = trt.nptype(engine.get_binding_dtype(binding))# Allocate host and device buffershost_mem = cuda.pagelocked_empty(size, dtype)device_mem = cuda.mem_alloc(host_mem.nbytes)# Append the device buffer to device bindings.bindings.append(int(device_mem))# Append to the appropriate list.if engine.binding_is_input(binding):inputs.append(HostDeviceMem(host_mem, device_mem))else:outputs.append(HostDeviceMem(host_mem, device_mem))return inputs, outputs, bindings, stream

完整代码分享在github，支持dynamic和static shapes。

egbertYeah/simple_tensorrt_dynamic: a simple example to learn tensorrt with dynamic shapes (github.com)

参考链接

tensorrt动态输入（Dynamic shapes） - 知乎 (zhihu.com)

Developer Guide :: NVIDIA Deep Learning TensorRT Documentation

TensorRT7 Onnx模型动态多batch问题解决_weixin_45415546的博客-CSDN博客_tensorrt动态batchsize

【tensorrt之dynamic shapes】相关推荐

安全实践：webshell检测
2020/03/07 - 引言最近在看webshell检测的相关文章,最初的搜索的关键词是webshell +cnn,通过找了一些论文来看看具体的检测效果,这里看了几篇文章感觉不错. 基于机器学习的 ...
Technology Document Guide of TensorRT
Technology Document Guide of TensorRT Abstract 本示例支持指南概述了GitHub和产品包中包含的所有受支持的TensorRT 7.2.1示例.Tensor ...
Python API vs C++ API of TensorRT
Python API vs C++ API of TensorRT 本质上,C++ API和Python API应该在支持您的需求方面接近相同.pythonapi的主要优点是数据预处理和后处理都很容易 ...
tensorrt动态输入分辨率尺寸
本文只有 tensorrt python部分涉动态分辨率设置,没有c++的. 目录 pytorch转onnx: onnx转tensorrt: python tensorrt推理: 知乎博客也可以参考: ...
onnx格式转tensorRT
自己写的onnx转trt的代码. 此处记录几点: 模型的输入须为numpy格式,所以直接从DataLoader取出的数据是不能直接扔进模型的模型的输入是多个的时候,如输入多张图片时,可以通过下面这种 ...
TensorRT 命令行程序trtexec常用用法
安装TensorRT后,进入到/usr/src/tensorrt/bin目录下,可以看到有个trtexec二进制可执行文件,执行 ./trtexec --help可以看到命令行支持的所有参数项: == ...
使用TensorRT对AlphaPose模型进行加速
最近刚完成使用TensorRT对AlphaPose人体姿态估计网络的加速处理,在这里记录一下大概的流程,具体代码我放在这里了. 目前主要有三种方式构建TensorRT的engine模型. (1) 第一 ...
[翻译] TensorRT 中的 Explicit 与 Implicit Batch
本文翻译自 Explicit Versus Implicit Batch 原文地址: https://docs.nvidia.com/deeplearning/tensorrt/developer-g ...
【TensorRT】trtexec工具转engine
目前官方的转换工具 ONNX-TensorRT https://github.com/onnx/onnx-tensorrt trtexec的用法说明参考 https://blog.csdn.net/q ...

【tensorrt之dynamic shapes】