首先说明,我用的模型是一个动态模型,内部需要设置

--minShapes=input:1x1x80x92x60  
--optShapes=input:2x1x80x92x60 
--maxShapes=input:10x1x80x92x60

min batch=1

opt batch =2

max batch =10

其次,我用的int8量化;量化需要设置calib文件夹;

D:\Download\TensorRT-8.2.1.8.Windows10.x86_64.cuda-11.4.cudnn8.2\TensorRT-8.2.1.8\bin\trtexec
--onnx=dynamic.onnx
--minShapes=input:1x1x80x92x60
--optShapes=input:2x1x80x92x60
--maxShapes=input:10x1x80x92x60
--workspace=5632
--int8
--best
--calib=D:\PATIENT_DICOM
--saveEngine=soft.engine
--buildOnly  

附上 :trtexec命令行参数

(base) zxl@R7000P:~/TensorRT-7.2.3.4/bin$ ./trtexec --help
&&&& RUNNING TensorRT.trtexec # ./trtexec --help
=== Model Options ===--uff=<file>                UFF model--onnx=<file>               ONNX model--model=<file>              Caffe model (default = no model, random weights used)--deploy=<file>             Caffe prototxt file--output=<name>[,<name>]*   Output names (it can be specified multiple times); at least one output is required for UFF and Caffe--uffInput=<name>,X,Y,Z     Input blob name and its dimensions (X,Y,Z=C,H,W), it can be specified multiple times; at least one is required for UFF models--uffNHWC                   Set if inputs are in the NHWC layout instead of NCHW (use X,Y,Z=H,W,C order in --uffInput)=== Build Options ===--maxBatch                  Set max batch size and build an implicit batch engine (default = 1)--explicitBatch             Use explicit batch sizes when building the engine (default = implicit)--minShapes=spec            Build with dynamic shapes using a profile with the min shapes provided--optShapes=spec            Build with dynamic shapes using a profile with the opt shapes provided--maxShapes=spec            Build with dynamic shapes using a profile with the max shapes provided--minShapesCalib=spec       Calibrate with dynamic shapes using a profile with the min shapes provided--optShapesCalib=spec       Calibrate with dynamic shapes using a profile with the opt shapes provided--maxShapesCalib=spec       Calibrate with dynamic shapes using a profile with the max shapes providedNote: All three of min, opt and max shapes must be supplied.However, if only opt shapes is supplied then it will be expanded sothat min shapes and max shapes are set to the same values as opt shapes.In addition, use of dynamic shapes implies explicit batch.Input names can be wrapped with escaped single quotes (ex: \'Input:0\').Example input shapes spec: input0:1x3x256x256,input1:1x3x128x128Each input shape is supplied as a key-value pair where key is the input name andvalue is the dimensions (including the batch dimension) to be used for that input.Each key-value pair has the key and value separated using a colon (:).Multiple input shapes can be provided via comma-separated key-value pairs.--inputIOFormats=spec       Type and format of each of the input tensors (default = all inputs in fp32:chw)See --outputIOFormats help for the grammar of type and format list.Note: If this option is specified, please set comma-separated types and formats for allinputs following the same order as network inputs ID (even if only one inputneeds specifying IO format) or set the type and format once for broadcasting.--outputIOFormats=spec      Type and format of each of the output tensors (default = all outputs in fp32:chw)Note: If this option is specified, please set comma-separated types and formats for alloutputs following the same order as network outputs ID (even if only one outputneeds specifying IO format) or set the type and format once for broadcasting.IO Formats: spec  ::= IOfmt[","spec]IOfmt ::= type:fmttype  ::= "fp32"|"fp16"|"int32"|"int8"fmt   ::= ("chw"|"chw2"|"chw4"|"hwc8"|"chw16"|"chw32"|"dhwc8")["+"fmt]--workspace=N               Set workspace size in megabytes (default = 16)--noBuilderCache            Disable timing cache in builder (default is to enable timing cache)--nvtxMode=mode             Specify NVTX annotation verbosity. mode ::= default|verbose|none--minTiming=M               Set the minimum number of iterations used in kernel selection (default = 1)--avgTiming=M               Set the number of times averaged in each iteration for kernel selection (default = 8)--noTF32                    Disable tf32 precision (default is to enable tf32, in addition to fp32)--refit                     Mark the engine as refittable. This will allow the inspection of refittable layers and weights within the engine.--fp16                      Enable fp16 precision, in addition to fp32 (default = disabled)--int8                      Enable int8 precision, in addition to fp32 (default = disabled)--best                      Enable all precisions to achieve the best performance (default = disabled)--calib=<file>              Read INT8 calibration cache file--safe                      Only test the functionality available in safety restricted flows--saveEngine=<file>         Save the serialized engine--loadEngine=<file>         Load a serialized engine--tacticSources=tactics     Specify the tactics to be used by adding (+) or removing (-) tactics from the default tactic sources (default = all available tactics).Note: Currently only cuBLAS and cuBLAS LT are listed as optional tactics.Tactic Sources: tactics ::= [","tactic]tactic  ::= (+|-)liblib     ::= "cublas"|"cublasLt"=== Inference Options ===--batch=N                   Set batch size for implicit batch engines (default = 1)--shapes=spec               Set input shapes for dynamic shapes inference inputs.Note: Use of dynamic shapes implies explicit batch.Input names can be wrapped with escaped single quotes (ex: \'Input:0\').Example input shapes spec: input0:1x3x256x256, input1:1x3x128x128Each input shape is supplied as a key-value pair where key is the input name andvalue is the dimensions (including the batch dimension) to be used for that input.Each key-value pair has the key and value separated using a colon (:).Multiple input shapes can be provided via comma-separated key-value pairs.--loadInputs=spec           Load input values from files (default = generate random inputs). Input names can be wrapped with single quotes (ex: 'Input:0')Input values spec ::= Ival[","spec]Ival ::= name":"file--iterations=N              Run at least N inference iterations (default = 10)--warmUp=N                  Run for N milliseconds to warmup before measuring performance (default = 200)--duration=N                Run performance measurements for at least N seconds wallclock time (default = 3)--sleepTime=N               Delay inference start with a gap of N milliseconds between launch and compute (default = 0)--streams=N                 Instantiate N engines to use concurrently (default = 1)--exposeDMA                 Serialize DMA transfers to and from device. (default = disabled)--noDataTransfers           Do not transfer data to and from the device during inference. (default = disabled)--useSpinWait               Actively synchronize on GPU events. This option may decrease synchronization time but increase CPU usage and power (default = disabled)--threads                   Enable multithreading to drive engines with independent threads (default = disabled)--useCudaGraph              Use cuda graph to capture engine execution and then launch inference (default = disabled)--separateProfileRun        Do not attach the profiler in the benchmark run; if profiling is enabled, a second profile run will be executed (default = disabled)--buildOnly                 Skip inference perf measurement (default = disabled)=== Build and Inference Batch Options ===When using implicit batch, the max batch size of the engine, if not given, is set to the inference batch size;when using explicit batch, if shapes are specified only for inference, they will be used also as min/opt/max in the build profile; if shapes are specified only for the build, the opt shapes will be used also for inference;if both are specified, they must be compatible; and if explicit batch is enabled but neither is specified, the model must provide complete staticdimensions, including batch size, for all inputs=== Reporting Options ===--verbose                   Use verbose logging (default = false)# 使用详细日志记录--avgRuns=N                 Report performance measurements averaged over N consecutive iterations (default = 10)--percentile=P              Report performance for the P percentage (0<=P<=100, 0 representing max perf, and 100 representing min perf; (default = 99%)--dumpRefit                 Print the refittable layers and weights from a refittable engine--dumpOutput                Print the output tensor(s) of the last inference iteration (default = disabled)--dumpProfile               Print profile information per layer (default = disabled)--exportTimes=<file>        Write the timing results in a json file (default = disabled)--exportOutput=<file>       Write the output tensors to a json file (default = disabled)--exportProfile=<file>      Write the profile information per layer in a json file (default = disabled)=== System Options ===--device=N                  Select cuda device N (default = 0)--useDLACore=N              Select DLA core N for layers that support DLA (default = none)--allowGPUFallback          When DLA is enabled, allow GPU fallback for unsupported layers (default = disabled)--plugins                   Plugin library (.so) to load (can be specified multiple times)=== Help ===--help, -h                  Print this message

trt 使用trtexec工具ONNX转engine相关推荐

  1. 【TensorRT】trtexec工具转engine

    目前官方的转换工具 ONNX-TensorRT https://github.com/onnx/onnx-tensorrt trtexec的用法说明参考 https://blog.csdn.net/q ...

  2. YOLOv5导出jit,onnx,engine

    一.YOLOv5导出jit YOLOv5自导出,我们可以直接用它的导出代码:models/export.py """Exports a YOLOv5 *.pt model ...

  3. YOLOv5系列(2)——YOLOv5导出jit,onnx,engine

    文章目录 一.YOLOv5导出jit 二.YOLOv5导出onnx 三.使用onnx 四.YOLOv5导出engine(tensorrt/trt) 五.总结所有代码 5.1 models/common ...

  4. 漏洞挖掘工具-CE(Cheat Engine) 简介

    CE 简介 CE 介绍 Cheat Engine是一款专注于游戏的修改器.它可以用来扫描游戏中的内存,并允许修改它们.它还附带了调试器.反汇编器.汇编器.变速器.作弊器生成.Direct3D操作工具. ...

  5. 19、Jetson Xavier NX使用yolov5对比GPU模型下的pt、onnx、engine 、 DeepStream 加速性能

    基本思想:手中有块Jetson Xavier NX开发板,难得对比一下yolov5在相同模型下,不同形式下的加速性能 一.在ubuntu20.04的基础上,简单做一下对比实验,然后在使用 Jetson ...

  6. NVIDIA可编程推理加速器TensorRT学习笔记(二)——实操

    NVIDIA可编程推理加速器TensorRT学习笔记(二)--实操 ​ TensorRT 是 NVIDIA 自家的高性能推理库,其 Getting Started 列出了各资料入口,如下: 本文基于博 ...

  7. TensorRT - 自带工具trtexec的参数使用说明

    本文以TensorRT-7.2.3.4说明自带工具trtexec工具的使用参数进行说明. 1 trtexec的参数使用说明 === Model Options ===--uff=<file> ...

  8. 把onnx模型转TensorRT模型的trt模型报错:Your ONNX model has been generated with INT64 weights. while TensorRT

    欢迎大家关注笔者,你的关注是我持续更博的最大动力 原创文章,转载告知,盗版必究 把onnx模型转TensorRT模型的trt模型报错:[TRT] onnx2trt_utils.cpp:198: You ...

  9. yolov3_tiny.onnx转trt采用tensorrt加速模型推理

    既然上一篇博客都把yolov3-tiny.weights转onnx做了,推理也测了.那么呢,就再直接转个trt模型吧.这样感觉博客的内容就更加连贯了吧,实用性貌似会更加强吧. (如果没看过yolov3 ...

最新文章

  1. OAuth2 服务器Keycloak中的Realm
  2. AIR 中 File 对象的几个系统文件夹及其属性.
  3. 华为手机EMUI换鸿蒙,华为手机3月全面切换鸿蒙 EMUI 11或为安卓内核绝唱
  4. Enjoy Android
  5. 手把手叫你一台电脑配置两个Git账户
  6. android下测试方法及junit单元测试框架配置方法
  7. CodeSmith模板(生成实体类)
  8. android 梯形按钮_PLC编程入门梯形图实例讲解
  9. respond是空的_httpClient 获取response 中的 content 为空的原因(急求)
  10. bootice添加linux_如何使用老毛桃winpe的Bootice工具新建实模式启动项(Grub/Linux)?
  11. Redis+Nginx+设计模式+Spring全家桶+Dubbo+阿里P8技术精选文档
  12. 清华大学操作系统OS学习(六)——进程和线程
  13. Android ndk开发入门集锦一
  14. java ftp 卡死_ftpclient卡死问题
  15. 项目方案宣讲应该注意的内容
  16. linux调用一个函数失败 打印错误,write函数出现错误invalid argument
  17. 【t006】三角形分形描绘问题
  18. 使用cdrecord命令刻录光盘
  19. Android 内核源码编译记录
  20. 6 月直播 7 场全剧透。今天:飞腾CPU调优原理及方法 | 第 19 期

热门文章

  1. 大疆笔试中的涉及矩阵最小二乘求解思路
  2. vue utils.js公共方法中axios请求返回数据
  3. 头顶脱发严重怎么办 需要时刻牢记五个建议
  4. 小程序:在手机上如果不打开调试模式,不能正常请求接口,打开调试模试就可以正常请求,真机测试和开发者工具都可以正常显示
  5. 02 从亚马逊的实践,谈分布式系统的难点
  6. 网易云音乐(2)————加载失败的原因
  7. pyqt5打包成exe可执行文件
  8. matlab控制系统仿真论文,基于MATLAB的过程控制系统仿真毕业设计论文.doc
  9. 对不起,让大家久等了,RETURNS!
  10. 《数据结构C语言版》——二叉树详解(图文并茂)