FP16

FP16 ：FP32 是指 Full Precise Float 32 ，FP 16 就是 float 16。更省内存空间，更节约推理时间。
Half2Mode ：tensorRT 的一种执行模式（execution mode ），这种模式下图片上相邻区域的 tensor 是以16位交叉存储的方式存在的。而且在 batchsize 大于 1的情况下，这种模式的运行速度是最快的。（Half2Mode is an execution mode where internal tensors interleave 16-bits from
adjacent pairs of images, and is the fastest mode of operation for batch sizes greater
than one. ）

这是计算机组成原理中涉及到存储方式的选择，不是很懂。大概是下图这样的：

以下分别是 2D和3D情况：

参考这个顺序存储和交叉存储，这样做可以提升存储器带宽。更多详细内容参考文末参考资料。

2 具体做法

2.1 配置 builder

TensorRT3.0的官方文档上说，如果只是使用 float 16 的数据精度代替 float-32 ，实际上并不会有多大的性能提升。真正提升性能的是 half2mode ，也就是上述使用了交叉存存储方式的模式。

如何使用half2mode ？

首先使用float 16 精度的数据来初始化 network 对象，主要做法就是在调用NvCaffeParser 工具解析 caffe模型时，使用 DataType::kHALF 参数，如下：
```
const IBlobNameToTensor *blobNameToTensor =
parser->parse(locateFile(deployFile).c_str(),
locateFile(modelFile).c_str(),
*network,
DataType::kHALF);
```
配置builder 使用 half2mode ，这个很简单，就一个语句就完成了：
```
builder->setHalf2Mode(true);
```

int8

定义网络时，注意这个地方传进去的dataType，如果使用FP16 inference 则传进去的是FP16，也就是kHALF；但如果是使用INT8 inference的话，这个地方传进去的是kFLOAT，也就是 FP32，这是因为INT8 需要先用FP32的精度来确定转换系数，TensorRT自己会在内部转换成INT8。

const IBlobNameToTensor* blobNameToTensor =
parser->parse(locateFile(deployFile).c_str(),
locateFile(modelFile).c_str(),
*network,
DataType::kFLOAT);

这个看起来就跟使用FP32是一样的流程，INT8 MODE inference的输入和输出都是 FP32的。

配置build使用int8,

//设置int8模式
builder->setInt8Mode(dataType == DataType::kINT8);
//s设置int8标定
builder->setInt8Calibrator(calibrator);
//brief: Build a CUDA engine from a network definition.
//实际过程在buildCudaEngine时完成
engine = builder->buildCudaEngine(*network);

int8标定
官方文档关于量化的叙述：

INT8 calibration provides an alternative to generate per activation tensor the dynamic range. This methods can be categorized as post training technique to generate the appropriate quantization scale. The process of determining these scale factors is called calibration, and requires the application to pass batches of representative input for the network (typically batches from the training set.) Experiments indicate that about 500 images is sufficient for calibrating ImageNet classification networks.

int8量化的调用接口为　IInt8EntropyCalibrator

当build int8 engine时，执行以下步骤：
Builds a 32-bit engine, runs it on the calibration set, and records a histogram for each tensor of the distribution of activation values.
Builds a calibration table from the histograms.
Builds the INT8 engine from the calibration table and the network definition.

build一个32位engine,在标定集中运行，记录每一个tensor的激活值分布的直方图
根据直方图创建一个标定表
根据标定表和网络定义生成int8 engine

标定表可以缓存。当多次build同一个网络时，标定表缓存是非常必要的。当第一次build时，生成标定表，之后build时，直接读取标定表，而不用再次标定。

build中配置int8标定
builder->setInt8Calibrator(calibrator);

使用 writeCalibrationCache() and readCalibrationCache() 函数缓存标定表。

参考

TensorRT(4)-Profiling and 16-bit：Inference https://arleyzhang.github.io/articles/fda11be6/

TensorRT量化－FP16和INT8相关推荐

FP32、FP16和INT8
1.定义 FP32(Full Precise Float 32,单精度)占用4个字节,共32位,其中1位为符号位,8为指数位,23为尾数位. FP16(float,半精度)占用2个字节,共16位,其中 ...
Jetson 系列——基于yolov5对火源或者烟雾的检测，使用tensorrt、c++和int8加速
项目简介: 代码地址: github地址:https://github.com/RichardoMrMu/yolov5-fire-smoke-detect gitee地址:https://gitee. ...
判断英伟达显卡计算力及是否支持FP16和INT8
文章目录 1.检查显卡的计算力打开官网,检查你相应型号显卡的算力: 比如GTX1080 is 6.1, Tesla T4 is 7.5. 2.检查是否支持FP16和INT8 打开网页查看
TensorRT下FP32转INT8的过程
作者:Tiso-yan 来源:CSDN 原文:https://blog.csdn.net/qq_32043199/article/details/81119357 1. 关于TensorRT NVID ...
FP16\FP32\INT8\混合精度的含义
FP32 是单精度浮点数,用8bit 表示指数,23bit 表示小数,占用4字节: FP16半精度浮点数,用5bit 表示指数,10bit 表示小数,占用2字节: INT8,八位整型占用1个字节,IN ...
fp32和int8模型的区别_Int8量化-ncnn社区Int8重构之路（三）
传送门ncnngithub.comBUG1989/caffe-int8-convert-toolsgithub.com 前言从去年8月初首次向社区提交armv7a版本的int8功能模块到现在过去 ...
TensorRT量化第三课：动态范围的常用计算方法
目录模型量化原理注意事项一.2023/4/11更新二.2023/4/13更新三.2023/4/16更新四.2023/4/24更新前言 1.前情回顾 2.动态范围的常用计算方法 3.His ...
想提速但TensorRT的FP16不得劲？怎么办？在线支招！
问题的开始前些天尝试使用TensorRT转换一个模型,模型用TensorFlow训练,包含LSTM+Transform+CNN,是一个典型的时序结构模型,包含编码解码结构,暂称为debug.onnx ...
Window10手把手带你YOLOV5的火焰烟雾检测+tensorrt量化加速+C++动态库打包
目录 0.引言 1.yolov5模型训练 1.2 模型训练 1.3 模型测试 2 模型转换 2.1 pt→wts→engine 2.1.1 pt转wts 2.1.2 wts转engine 3 动态库打 ...

TensorRT量化－FP16和INT8

FP16

2 具体做法

2.1 配置 builder

int8

参考

TensorRT量化－FP16和INT8相关推荐

最新文章

热门文章