Tensorrt一些优化技术介绍

Tensorrt一些优化技术介绍

Figure 1. A quantizable AveragePool layer (in blue) is fused with a DQ layer and a Q layer. All three layers are replaced by a quantized AveragePool layer (in green).

Figure 2. An illustration depicting a DQ forward-propagation and Q backward-propagation.

Figure 3. Two examples of how TensorRT fuses convolutional layers. On the left, only the inputs are quantized. On the right, both inputs and output are quantized.

Figure 4. Example of a linear operation followed by an activation function.

Figure 5. Batch normalization is fused with convolution and ReLU while keeping the same execution order as defined in the pre-fusion network. There is no need to simulate BN-folding in the training network.

Figure 6. The precision of xf1 is floating-point, so the output of the fused convolution is limited to floating-point, and the trailing Q-layer cannot be fused with the convolution.

Figure 7. When xf1 is quantized to INT8, the output of the fused convolution is also INT8, and the trailing Q-layer is fused with the convolution.

Figure 8. An example of quantizing a quantizable-operator. An element-wise addition operator is fused with the input DQ operators and the output Q operator.

Figure 9. An example of suboptimal quantization fusions: contrast the suboptimal fusion in A and the optimal fusion in B. The extra pair of Q/DQ operators (highlighted with a glowing-green border) forces the separation of the convolution operator from the element-wise addition operator.

Figure 10. An example showing scales of Q1 and Q2 are compared for equality, and if equal, they are allowed to propagate backward. If the engine is refitted with new values for Q1 and Q2 such that Q1 != Q2, then an exception aborts the refitting process.

参考链接：
https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html

Tensorrt一些优化技术介绍相关推荐

黑盒优化技术评测基准RABBO介绍
引言在面对一些极其复杂的.目标函数不可解析的优化问题时,我们经常如坠入茫茫黑夜.不知道路在何方,黑盒优化技术正是冲破这茫茫黑夜,将我们带向最优解的一项技术. 作为优化领域的一个分支,黑盒优化所针对的 ...
大前端CPU优化技术--NEON指令介绍
前言 ARM NEON 可以提升音视频,图像,计算机视觉等计算密集型程序的性能,在上一篇大前端CPU优化技术--NEON技术的介绍中,我们知道一些编译器可以将 C/C++ 代码自动转换为 NEON 指 ...
海量智库第4期｜Vastbase G100核心技术介绍之【NUMA架构性能优化技术】
导语 NUMA架构优化技术是针对程序在NUMA架构CPU上运行出现资源消耗不均,程序执行效率低等问题进行优化的技术.这种优化技术在现在主流的NUMA架构多核服务器中,可以有效降低访问时延,提升高并发场 ...
深度学习自动编译和优化技术调研
深度学习自动编译和优化技术调研转自:https://moqi.com.cn/blog/deeplearning/ 作者:墨奇科技全栈开发在墨奇科技,我们需要将一些包含深度神经网络(DNN)的 AI ...
WMI技术介绍和应用——查询系统信息和补丁包信息
本文使用了< WMI技术介绍和应用--使用VC编写一个半同步查询WMI服务的类>中代码做为基础.本节只是列出了WQL语句,具体使用参看前面的例子.( 转载请指明出于breaksoftwar ...
干货丨从基础知识到实际应用，一文了解「机器学习非凸优化技术」
文章来源:机器之心优化作为一种研究领域在科技中有很多应用.随着数字计算机的发展和算力的大幅增长,优化对生活的影响也越来越大.今天,小到航班表大到医疗.物理.人工智能的发展,都依赖优化技术的进步. 在 ...
DPDK — 数据平台优化技术
目录文章目录目录 DPDK 优化技术 DPDK 性能影响因素硬件结构的影响 OS 版本及其内核的影响 OVS 性能问题内存管理 CPU 核间无锁通信设置正确的目标 CPU 类型与模式优化方 ...
内核网络中的GRO、RFS、RPS技术介绍和调优
内核网络中的GRO.RFS.RPS技术介绍和调优 1. 前言 2. GRO(Generic Receive Offloading) 2.1 使用 ethtool 修改 GRO 配置 2.2 napi_ ...
网易视频云分享：流媒体技术介绍（上篇）
网易视频云分享:流媒体技术介绍(上篇) 网易视频云2016-04-14 11:49:49 创业媒体技术阅读(949)评论(0) 声明:本文由入驻搜狐公众平台的作者撰写,除搜狐官方账号外,观点仅代 ...

Tensorrt一些优化技术介绍

Tensorrt一些优化技术介绍相关推荐

最新文章

热门文章