50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署

基本思想：想学习一下TensorRT的使用，随笔记录一下；

链接：https://pan.baidu.com/s/1uFOktdF-bHcDDsufIqmNSA
提取码：k55w
复制这段内容后打开百度网盘手机App，操作更方便哦

记录一下pip安装命令：

pip install **** -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com

一、安装显卡驱动，显卡版本为RTX2080显卡

注意：首先将secure boot 设置为disabled

避免sudo apt-get install nvidia-*安装方式造成登录界面循环。

1. ubuntu 18.04默认安装nvidia显卡驱动首先需要禁用nouveau。

ubuntu@ubuntu:~$ sudo vim /etc/modprobe.d/blacklist.conf

在文件最后部分插入以下两行内容

blacklist nouveau
options nouveau modeset=0

更新系统

ubuntu@ubuntu:~$ sudo update-initramfs -u

重启系统，并且验证nouveau是否已禁用

ubuntu@ubuntu:~$ lsmod | grep nouveau

没有信息显示，说明nouveau已被禁用，重启一下。

2. 在英伟达的官网上查找你自己电脑的显卡型号然后下载相应的驱动。网址：官方 GeForce 驱动程序 | NVIDIA

我下载的版本：NVIDIA-Linux-x86_64-460.67.run

下载后的run文件拷贝至home目录下。

3. 在ubuntu下按ctrl+alt+f1进入命令行界面，

然后在命令行界面下输入：

ubuntu@ubuntu:~$ sudo apt-get install lightdmubuntu@ubuntu:~$ sudo service lightdm stop

4.然后卸载掉原有驱动：

ubuntu@ubuntu:~$ sudo apt-get remove nvidia-*

给驱动run文件赋予执行权限：

ubuntu@ubuntu:~$ sudo chmod  a+x NVIDIA-Linux-x86_64-460.67.run

执行安装：

ubuntu@ubuntu:~$ sudo ./NVIDIA-Linux-x86_64-460.67.run -no-x-check -no-nouveau-check -no-opengl-files

检查驱动是否安装成功：

ubuntu@ubuntu:~$ nvidia-smi

如果出现如下提示，则说明安装成功：

Mon Mar 22 13:00:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:01:00.0 Off |                  N/A |
| 22%   43C    P0    53W / 250W |      0MiB / 11016MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:03:00.0 Off |                  N/A |
| 19%   42C    P0    62W / 250W |      0MiB / 11019MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  Off  | 00000000:82:00.0 Off |                  N/A |
|  0%   40C    P0     1W / 250W |      0MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

还需要设置一下，禁止更新内核，以防以后出问题

命令行禁用linux自动更新内核

ubuntu@ubuntu:~$ cat /etc/apt/apt.conf.d/10periodic

APT::Periodic::Update-Package-Lists "1";

APT::Periodic::Download-Upgradeable-Packages "0";

APT::Periodic::AutocleanInterval "0";

将配置中的"Update-Package-Lists"参数配置为"0";

二、然后下载cuda11.1的驱动文件

ubuntu@ubuntu:~$ sudo ./cuda_11.1.0_455.23.05_linux.run

分别的选项为

下一步，选中EULA选项

下一步，安装不带驱动文件的cuda11.1

显示安装完成


===========
= Summary =
===========Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-11.1/
Samples:  Installed in /home/ubuntu/Please make sure that-   PATH includes /usr/local/cuda-11.1/bin-   LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.3/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.3 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:sudo <CudaInstaller>.run --silent --driverLogfile is /var/log/cuda-installer.log

然后在配置文件添加配置项

ubuntu@ubuntu:~$ sudo gedit ~/.bashrcexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64
export PATH=$PATH:/usr/local/cuda-11.1/bin
export CUDA_HOME=/usr/local/cuda-11.1ubuntu@ubuntu:~$ source ~/.bashrc

测试版本

nvcc -V

Pytorch

pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

三、下载cudnn https://developer.nvidia.com/rdp/cudnn-download

解压文件，进行复制 11.3

ubuntu@ubuntu:~$ tar -zxvf cudnn-11.3-linux-x64-v8.2.0.53.tgz
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn*.h   /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn.h    /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/lib64/libcudnn*    /usr/local/cuda/lib64
ubuntu@ubuntu:~$ sudo chmod a+r /usr/local/cuda/include/cudnn.h   /usr/local/cuda/lib64/libcudnn*

查看一下

ubuntu@ubuntu:~$ cat /usr/local/cuda/include/cudnn.h | grep cudnn
/*   cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"

四、软连接

sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8

sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8

五、安装tensorRT只安装一种类型即可，本菜是后来升级了本地得TensorRT7.2-----TensoRT8.6

1)安装7.2安装一下tensorRT https://developer.nvidia.com/nvidia-tensorrt-7x-download

ubuntu@ubuntu:~$ tar xzvf TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz /home/ubuntu/NVIDIA_CUDA-11.1_Samples
ubuntu@ubuntu:~$ sudo vim ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ source ~/.bashrc
ubuntu@ubuntu:~$ cd NVIDIA_CUDA-11.3_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ cd python
ubuntu@ubuntu:~$ sudo pip3 install tensorrt-7.2.2.3-cp37-none-linux_x86_64.whl
ubuntu@ubuntu:~$ cd ../uff
ubuntu@ubuntu:~$ sudo pip3 install uff-0.6.5-py2.py3-none-any.whl
ubuntu@ubuntu:~$ cd ../graphsurgeon
ubuntu@ubuntu:~$ sudo pip3 install graphsurgeon-0.4.1-py2.py3-none-any.whl

拷贝库

# TensorRT路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include

在以后遇到代码执行过程中问题（Python）然后需要拷贝so到/usr/lib文件夹中以后缺什么so 就从该目录中拷贝so到usr/local中

ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvonnxparser.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvparsers.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libmyelin.so.1 /usr/lib/

最后的文件配置为{.bashrc}

export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib:$LD_LIBRARY_PATH

测试一下版本信息

cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

ubuntu@ubuntu:~$ sudo ldconfig /usr/local/cuda/lib64
ubuntu@ubuntu:~$ python3
Python 3.8.6 (default, Sep 25 2020, 09:36:53)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> from torch.backends import cudnn
>>> print(cudnn.is_available())
True
>>>import tensorrt
>>>

2)安装8.6

ubuntu@ubuntu:~$ axel -n 100 https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.6.1/tars/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gzubuntu@ubuntu:~$ chmod 777 TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
ubuntu@ubuntu:~$ tar -zxvf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
ubuntu@ubuntu:~$ mv TensorRT-8.6.1.6 NVIDIA_CUDA-11.1_Samples/ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/python$ pip3 install tensorrt-8.6.1-cp38-none-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./tensorrt-8.6.1-cp38-none-linux_x86_64.whl
Installing collected packages: tensorrtAttempting uninstall: tensorrtFound existing installation: tensorrt 7.2.2.3Uninstalling tensorrt-7.2.2.3:Successfully uninstalled tensorrt-7.2.2.3
Successfully installed tensorrt-8.6.1ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/uff$ pip3 install uff-0.6.9-py2.py3-none-any.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./uff-0.6.9-py2.py3-none-any.whl
Requirement already satisfied: numpy>=1.11.0 in /home/ubuntu/.local/lib/python3.8/site-packages (from uff==0.6.9) (1.23.4)
Requirement already satisfied: protobuf>=3.3.0 in /home/ubuntu/.local/lib/python3.8/site-packages (from uff==0.6.9) (3.20.2)
Installing collected packages: uff
Successfully installed uff-0.6.9ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/graphsurgeon$ pip3 install graphsurgeon-0.4.6-py2.py3-none-any.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./graphsurgeon-0.4.6-py2.py3-none-any.whl
Installing collected packages: graphsurgeonAttempting uninstall: graphsurgeonFound existing installation: graphsurgeon 0.4.5Uninstalling graphsurgeon-0.4.5:Successfully uninstalled graphsurgeon-0.4.5
Successfully installed graphsurgeon-0.4.6ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6$ sudo cp -r ./lib/* /usr/lib
[sudo] password for ubuntu:
Sorry, try again.
[sudo] password for ubuntu:
ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6$ sudo cp -r ./include/* /usr/include
ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6$ python3
Python 3.8.10 (default, Mar 13 2023, 10:26:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt
>>> tensorrt.__version__
'8.6.1修正bashrc
export LD_LIBRARY_PATH=/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6

然后下载代码；yolo5&

同时也顺便把vulkan安装一下，以便后续使用ncnn的vulkan加速功能

wget -qO - http://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.1.97-bionic.list http://packages.lunarg.com/vulkan/1.1.97/lunarg-vulkan-1.1.97-bionic.list
sudo apt update
sudo apt install lunarg-vulkan-sdk
sudo apt-get install cmake git gcc g++ mesa-* libwayland-dev libxrandr-dev
sudo apt-get install libvulkan1 mesa-vulkan-drivers vulkan-utils
vulkaninfo

测试结果

ubuntu@ubuntu:~/ncnn/build/benchmark$ vulkaninfo
ERROR: [Loader Message] Code 0 : libGLX_nvidia.so.0: cannot open shared object file: No such file or directory
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_intel.so: wrong ELF class: ELFCLASS32
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0==========
VULKANINFO
==========Vulkan Instance Version: 1.2.131Instance Extensions: count = 18
====================VK_EXT_acquire_xlib_display            : extension revision 1VK_EXT_debug_report                    : extension revision 9VK_EXT_debug_utils                     : extension revision 1VK_EXT_direct_mode_display             : extension revision 1VK_EXT_display_surface_counter         : extension revision 1VK_KHR_device_group_creation           : extension revision 1VK_KHR_display                         : extension revision 23VK_KHR_external_fence_capabilities     : extension revision 1VK_KHR_external_memory_capabilities    : extension revision 1VK_KHR_external_semaphore_capabilities : extension revision 1VK_KHR_get_display_properties2         : extension revision 1VK_KHR_get_physical_device_properties2 : extension revision 1VK_KHR_get_surface_capabilities2       : extension revision 1VK_KHR_surface                         : extension revision 25VK_KHR_surface_protected_capabilities  : extension revision 1VK_KHR_wayland_surface                 : extension revision 6VK_KHR_xcb_surface                     : extension revision 6VK_KHR_xlib_surface                    : extension revision 6Layers: count = 12
=======
....
---------------------------------samplerMirrorClampToEdge                           = truedrawIndirectCount                                  = truestorageBuffer8BitAccess                            = trueuniformAndStorageBuffer8BitAccess                  = truestoragePushConstant8                               = trueshaderBufferInt64Atomics                           = trueshaderSharedInt64Atomics                           = falseshaderFloat16                                      = trueshaderInt8                                         = truedescriptorIndexing                                 = trueshaderInputAttachmentArrayDynamicIndexing          = falseshaderUniformTexelBufferArrayDynamicIndexing       = trueshaderStorageTexelBufferArrayDynamicIndexing       = trueshaderUniformBufferArrayNonUniformIndexing         = falseshaderSampledImageArrayNonUniformIndexing          = trueshaderStorageBufferArrayNonUniformIndexing         = trueshaderStorageImageArrayNonUniformIndexing          = trueshaderInputAttachmentArrayNonUniformIndexing       = falseshaderUniformTexelBufferArrayNonUniformIndexing    = trueshaderStorageTexelBufferArrayNonUniformIndexing    = truedescriptorBindingUniformBufferUpdateAfterBind      = falsedescriptorBindingSampledImageUpdateAfterBind       = truedescriptorBindingStorageImageUpdateAfterBind       = truedescriptorBindingStorageBufferUpdateAfterBind      = truedescriptorBindingUniformTexelBufferUpdateAfterBind = truedescriptorBindingStorageTexelBufferUpdateAfterBind = truedescriptorBindingUpdateUnusedWhilePending          = truedescriptorBindingPartiallyBound                    = truedescriptorBindingVariableDescriptorCount           = falseruntimeDescriptorArray                             = truesamplerFilterMinmax                                = truescalarBlockLayout                                  = trueimagelessFramebuffer                               = trueuniformBufferStandardLayout                        = trueshaderSubgroupExtendedTypes                        = trueseparateDepthStencilLayouts                        = truehostQueryReset                                     = truetimelineSemaphore                                  = truebufferDeviceAddress                                = truebufferDeviceAddressCaptureReplay                   = truebufferDeviceAddressMultiDevice                     = falsevulkanMemoryModel                                  = truevulkanMemoryModelDeviceScope                       = truevulkanMemoryModelAvailabilityVisibilityChains      = trueshaderOutputViewportIndex                          = trueshaderOutputLayer                                  = truesubgroupBroadcastDynamicId                         = trueVkPhysicalDeviceVulkanMemoryModelFeatures:
------------------------------------------vulkanMemoryModel                             = truevulkanMemoryModelDeviceScope                  = truevulkanMemoryModelAvailabilityVisibilityChains = trueVkPhysicalDeviceYcbcrImageArraysFeaturesEXT:
--------------------------------------------ycbcrImageArrays = trueubuntu@ubuntu:~/ncnn/build/benchmark$ vkcube
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0

测试结果图

貌似NCNN执行需要以vulkan的sdk导入方式使用，官网https://vulkan.lunarg.com/sdk/home 下载的vulkansdk-linux-x86_64-1.2.182.0 然后解压，放入了/usr/local目录下了

ubuntu@ubuntu:~$ sudo cp -r  vulkansdk-linux-x86_64-1.2.182.0/ /usr/local/
[sudo] password for ubuntu:
ubuntu@ubuntu:~$ cd /usr/local/

添加环境变量记得source一下

export Vulkan_LIBRARY=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/lib
export Vulkan_INCLUDE_DIR=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/include
export Vulkan_BIN=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/bin
export PATH=$PATH:$Vulkan_LIBRARY
export PATH=$PATH:$Vulkan_INCLUDE_DIR
export PATH=$PATH:$Vulkan_BIN

（七）下载大佬提供的tensorRT源代码

ubuntu@ubuntu:~$  https://github.com/wang-xinyu/tensorrtx.git
ubuntu@ubuntu:~$ https://github.com/ultralytics/yolov5.git
ubuntu@ubuntu:~$ cp tensorrtx/yolov5/gen_wts.py yolov5

然后修改一下py脚本的内容

import torch
import struct
from utils.torch_utils import select_device# Initialize
device = select_device('cpu')
# Load model
model = torch.load('/home/ubuntu/yolov5/runs/train/exp/weights/best.pt', map_location=device)['model'].float()  # load to FP32
model.to(device).eval()f = open('/home/ubuntu/yolov5/runs/train/exp/weights/bestyolov5x.wts', 'w')
f.write('{}\n'.format(len(model.state_dict().keys())))
for k, v in model.state_dict().items():vr = v.reshape(-1).cpu().numpy()f.write('{} {} '.format(k, len(vr)))for vv in vr:f.write(' ')f.write(struct.pack('>f',float(vv)).hex())f.write('\n')

修改问题点

ubuntu@ubuntu:~/yolov5$ sudo apt-get install liblzma-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
liblzma-dev is already the newest version (5.2.2-1.3).
0 upgraded, 0 newly installed, 0 to remove and 190 not upgraded.
ubuntu@ubuntu:~/yolov5$ sudo pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple backports.lzma
ubuntu@ubuntu:~/yolov5$ sudo vim /usr/local/lib/python3.7/lzma.py
源码代码
from _lzma import *
from _lzma import _encode_filter_properties, _decode_filter_properties
修改代码为
try:#from _lzma import *from _lzma import _encode_filter_properties, _decode_filter_properties
except ImportError:#from backports.lzma import * ##from backports.lzma import _encode_filter_properties, _decode_filter_properties#

然后继续执行成功

ubuntu@ubuntu:~/yolov5$ python3 gen_wts.py
ubuntu@ubuntu:~/yolov5$ ls runs/train/exp/weights/
best.pt  bestyolov5x.wts  last.pt
ubuntu@ubuntu:~/yolov5$ cp runs/train/exp/weights/bestyolov5x.wts ../tensorrtx/yolov5

进行模型转换

ubuntu@ubuntu:~$ mkdir tensorrtx/yolov5/build
ubuntu@ubuntu:~$ cd tensorrtx/yolov5/build
ubuntu@ubuntu:~/tensorrtx/yolov5$ cp /home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/include/* .
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
/usr/bin/ld: cannot find -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/myplugins.dir/build.make:341: recipe for target 'libmyplugins.so' failed
make[2]: *** [libmyplugins.so] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
[ 40%] Built target myplugins
Scanning dependencies of target yolov5
[ 60%] Building CXX object CMakeFiles/yolov5.dir/calibrator.cpp.o
[ 80%] Building CXX object CMakeFiles/yolov5.dir/yolov5.cpp.o
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libnvinfer.so.7: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so.7 /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libmyelin.so.1: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libmyelin.so.1 /usr/local/lib/

转换模型报错

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
Building engine, please wait for a while...
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 475) [Convolution]_output, because the network is not valid.
[03/23/2021-09:24:48] [E] [TRT] Network validation failed.
Build engine successfully!
yolov5: /home/ubuntu/tensorrtx/yolov5/yolov5.cpp:143: void APIToModel(unsigned int, nvinfer1::IHostMemory**, float&, float&, std::__cxx11::string&): Assertion `engine != nullptr' failed.
Aborted

修改代码类别

yololayer.h中修改static constexpr int CLASS_NUM = 2;默认是80

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
Building engine, please wait for a while...
[03/23/2021-09:32:55] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[03/23/2021-09:33:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[03/23/2021-09:34:10] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Build engine successfully!

测试一下Pytorch和tensorRT速度

ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090237.avi --weights runs/train/exp/weights/best.pt --device "0"

视频时长 1 minute 11 seconds ;

PyTorch处理时间 (166.625s)

ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090749.avi --weights runs/train/exp/weights/best.pt --device "0"

视频时长 5 seconds ;

PyTorch处理时间 (12.689s)

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090237.avi

视频时长 1 minute 11 seconds ;

PyTorch处理时间 (137200ms)

ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090749.avi

视频时长 5 seconds ;

TensorRT处理时间 (12832ms)

付录一个修改读视频的代码很简单/home/ubuntu/tensorrtx/yolov5/yolov5.cpp 修改版

#include <iostream>
#include <chrono>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"#define USE_FP16  // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0  // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) +1;  // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char *INPUT_BLOB_NAME = "data";
const char *OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;static int get_width(int x, float gw, int divisor = 8) {//return math.ceil(x / divisor) * divisorif (int(x * gw) % divisor == 0) {return int(x * gw);}return (int(x * gw / divisor) + 1) * divisor;
}static int get_depth(int x, float gd) {if (x == 1) {return 1;} else {return round(x * gd) > 1 ? round(x * gd) : 1;}
}ICudaEngine *
build_engine(unsigned int maxBatchSize, IBuilder *builder, IBuilderConfig *config, DataType dt, float &gd, float &gw,std::string &wts_name) {INetworkDefinition *network = builder->createNetworkV2(0U);// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAMEITensor *data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{3, INPUT_H, INPUT_W});assert(data);std::map<std::string, Weights> weightMap = loadWeights(wts_name);Weights emptywts{DataType::kFLOAT, nullptr, 0};/* ------ yolov5 backbone------ */auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw),get_depth(3, gd), true, 1, 0.5, "model.2");auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw),get_depth(9, gd), true, 1, 0.5, "model.4");auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw),get_depth(9, gd), true, 1, 0.5, "model.6");auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13,"model.8");/* ------ yolov5 head ------ */auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw),get_depth(3, gd), false, 1, 0.5, "model.9");auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1,"model.10");float *deval = reinterpret_cast<float *>(malloc(sizeof(float) * get_width(512, gw) * 2 * 2));for (int i = 0; i < get_width(512, gw) * 2 * 2; i++) {deval[i] = 1.0;}Weights deconvwts11{DataType::kFLOAT, deval, get_width(512, gw) * 2 * 2};IDeconvolutionLayer *deconv11 = network->addDeconvolutionNd(*conv10->getOutput(0), get_width(512, gw), DimsHW{2, 2},deconvwts11, emptywts);deconv11->setStrideNd(DimsHW{2, 2});deconv11->setNbGroups(get_width(512, gw));weightMap["deconv11"] = deconvwts11;ITensor *inputTensors12[] = {deconv11->getOutput(0), bottleneck_csp6->getOutput(0)};auto cat12 = network->addConcatenation(inputTensors12, 2);auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw),get_depth(3, gd), false, 1, 0.5, "model.13");auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1,"model.14");Weights deconvwts15{DataType::kFLOAT, deval, get_width(256, gw) * 2 * 2};IDeconvolutionLayer *deconv15 = network->addDeconvolutionNd(*conv14->getOutput(0), get_width(256, gw), DimsHW{2, 2},deconvwts15, emptywts);deconv15->setStrideNd(DimsHW{2, 2});deconv15->setNbGroups(get_width(256, gw));ITensor *inputTensors16[] = {deconv15->getOutput(0), bottleneck_csp4->getOutput(0)};auto cat16 = network->addConcatenation(inputTensors16, 2);auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw),get_depth(3, gd), false, 1, 0.5, "model.17");// yolo layer 0IConvolutionLayer *det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),DimsHW{1, 1}, weightMap["model.24.m.0.weight"],weightMap["model.24.m.0.bias"]);auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1,"model.18");ITensor *inputTensors19[] = {conv18->getOutput(0), conv14->getOutput(0)};auto cat19 = network->addConcatenation(inputTensors19, 2);auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw),get_depth(3, gd), false, 1, 0.5, "model.20");//yolo layer 1IConvolutionLayer *det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),DimsHW{1, 1}, weightMap["model.24.m.1.weight"],weightMap["model.24.m.1.bias"]);auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1,"model.21");ITensor *inputTensors22[] = {conv21->getOutput(0), conv10->getOutput(0)};auto cat22 = network->addConcatenation(inputTensors22, 2);auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw),get_depth(3, gd), false, 1, 0.5, "model.23");IConvolutionLayer *det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),DimsHW{1, 1}, weightMap["model.24.m.2.weight"],weightMap["model.24.m.2.bias"]);auto yolo = addYoLoLayer(network, weightMap, det0, det1, det2);yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*yolo->getOutput(0));// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;assert(builder->platformHasFastInt8());config->setFlag(BuilderFlag::kINT8);Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);config->setInt8Calibrator(calibrator);
#endifstd::cout << "Building engine, please wait for a while..." << std::endl;ICudaEngine *engine = builder->buildEngineWithConfig(*network, *config);std::cout << "Build engine successfully!" << std::endl;// Don't need the network any morenetwork->destroy();// Release host memoryfor (auto &mem : weightMap) {free((void *) (mem.second.values));}return engine;
}void APIToModel(unsigned int maxBatchSize, IHostMemory **modelStream, float &gd, float &gw, std::string &wts_name) {// Create builderIBuilder *builder = createInferBuilder(gLogger);IBuilderConfig *config = builder->createBuilderConfig();// Create model to populate the network, then set the outputs and create an engineICudaEngine *engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);assert(engine != nullptr);// Serialize the engine(*modelStream) = engine->serialize();// Close everything downengine->destroy();builder->destroy();config->destroy();
}void doInference(IExecutionContext &context, cudaStream_t &stream, void **buffers, float *input, float *output,int batchSize) {// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float),cudaMemcpyHostToDevice, stream));context.enqueue(batchSize, buffers, stream, nullptr);CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost,stream));cudaStreamSynchronize(stream);
}bool
parse_args(int argc, char **argv, std::string &wts, std::string &engine, float &gd, float &gw, std::string &img_dir,std::string &video_path) {if (argc < 4) return false;if (std::string(argv[1]) == "-s" && (argc == 5 || argc == 7)) {wts = std::string(argv[2]);engine = std::string(argv[3]);auto net = std::string(argv[4]);if (net == "s") {gd = 0.33;gw = 0.50;} else if (net == "m") {gd = 0.67;gw = 0.75;} else if (net == "l") {gd = 1.0;gw = 1.0;} else if (net == "x") {gd = 1.33;gw = 1.25;} else if (net == "c" && argc == 7) {gd = atof(argv[5]);gw = atof(argv[6]);} else {return false;}} else if (std::string(argv[1]) == "-d" && argc == 4) {engine = std::string(argv[2]);img_dir = std::string(argv[3]);} else if (std::string(argv[1]) == "-v" && argc == 4) {engine = std::string(argv[2]);video_path = std::string(argv[3]);} else {return false;}return true;
}int main(int argc, char **argv) {cudaSetDevice(DEVICE);std::string wts_name = "";std::string engine_name = "";float gd = 0.0f, gw = 0.0f;std::string img_dir;std::string video_path = "";if (!parse_args(argc, argv, wts_name, engine_name, gd, gw, img_dir, video_path)) {std::cerr << "arguments not right!" << std::endl;std::cerr << "./yolov5 -s [.wts] [.engine] [s/m/l/x or c gd gw]  // serialize model to plan file" << std::endl;std::cerr << "./yolov5 -d [.engine] ../samples  // deserialize plan file and run inference" << std::endl;std::cerr << "./yolov5 -v [.engine] [.mp4]  // deserialize plan file and run inference"<< std::endl; //sxj731533730return -1;}// create a model using the API directly and serialize it to a streamif (!wts_name.empty()) {IHostMemory *modelStream{nullptr};APIToModel(BATCH_SIZE, &modelStream, gd, gw, wts_name);assert(modelStream != nullptr);std::ofstream p(engine_name, std::ios::binary);if (!p) {std::cerr << "could not open plan output file" << std::endl;return -1;}p.write(reinterpret_cast<const char *>(modelStream->data()), modelStream->size());modelStream->destroy();return 0;}// deserialize the .engine and run inferencestd::ifstream file(engine_name, std::ios::binary);if (!file.good()) {std::cerr << "read " << engine_name << " error!" << std::endl;return -1;}char *trtModelStream = nullptr;size_t size = 0;file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream = new char[size];assert(trtModelStream);file.read(trtModelStream, size);file.close();std::vector<std::string> file_names;if (read_files_in_dir(img_dir.c_str(), file_names) < 0 && video_path.empty()) {std::cerr << "read_files_in_dir failed." << std::endl;return -1;}// prepare input data ---------------------------static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];//for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)//    data[i] = 1.0;static float prob[BATCH_SIZE * OUTPUT_SIZE];IRuntime *runtime = createInferRuntime(gLogger);assert(runtime != nullptr);ICudaEngine *engine = runtime->deserializeCudaEngine(trtModelStream, size);assert(engine != nullptr);IExecutionContext *context = engine->createExecutionContext();assert(context != nullptr);delete[] trtModelStream;assert(engine->getNbBindings() == 2);void *buffers[2];// In order to bind the buffers, we need to know the names of the input and output tensors.// Note that indices are guaranteed to be less than IEngine::getNbBindings()const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);assert(inputIndex == 0);assert(outputIndex == 1);// Create GPU buffers on deviceCUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));// Create streamcudaStream_t stream;CUDA_CHECK(cudaStreamCreate(&stream));if (!video_path.empty()) {cv::Mat frame;std::cout << video_path << std::endl;cv::VideoCapture capture(video_path);if (!capture.isOpened()) {printf("could not read this video file...\n");return -1;}int type = static_cast<int>(capture.get(cv::CAP_PROP_FOURCC));cv::Size S = cv::Size((int)capture.get(cv::CAP_PROP_FRAME_WIDTH), (int)capture.get(cv::CAP_PROP_FRAME_HEIGHT));int fps = capture.get(cv::CAP_PROP_FPS);printf("当前视频文件 FPS:  %d \n", fps);cv::VideoWriter out("/home/ubuntu/yolov5/runs/detect/tensorRTbest/20210201090237.mp4", type, fps, S, true);auto Tstart = std::chrono::system_clock::now();while (true) {capture >> frame;            //读取当前帧if (frame.empty()) {          //判断break;}cv::Mat pr_img = preprocess_img(frame, INPUT_W, INPUT_H); // letterbox BGR to RGBint i = 0;for (int row = 0; row < INPUT_H; ++row) {uchar *uc_pixel = pr_img.data + row * pr_img.step;for (int col = 0; col < INPUT_W; ++col) {data[0 * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;data[0 * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;data[0 * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;uc_pixel += 3;++i;}}// Run inferenceauto start = std::chrono::system_clock::now();doInference(*context, stream, buffers, data, prob, BATCH_SIZE);auto end = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"<< std::endl;std::vector<std::vector<Yolo::Detection>> batch_res(1);auto &res = batch_res[0];nms(res, &prob[0 * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);//std::cout << res.size() << std::endl;for (size_t j = 0; j < res.size(); j++) {cv::Rect r = get_rect(frame, res[j].bbox);cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);cv::putText(frame, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);}cv::imshow("demo", frame);out << frame;if (cv::waitKey(20) == 'q')   //延时20ms,获取用户是否按键的情况，如果按下q，会推出程序break;}auto Tend = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(Tend - Tstart).count() << "ms"<< std::endl;out.release();capture.release();     //释放摄像头资源} else {int fcount = 0;for (int f = 0; f < (int) file_names.size(); f++) {fcount++;if (fcount < BATCH_SIZE && f + 1 != (int) file_names.size()) continue;for (int b = 0; b < fcount; b++) {cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);if (img.empty()) continue;cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGBint i = 0;for (int row = 0; row < INPUT_H; ++row) {uchar *uc_pixel = pr_img.data + row * pr_img.step;for (int col = 0; col < INPUT_W; ++col) {data[b * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;uc_pixel += 3;++i;}}}// Run inferenceauto start = std::chrono::system_clock::now();doInference(*context, stream, buffers, data, prob, BATCH_SIZE);auto end = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"<< std::endl;std::vector<std::vector<Yolo::Detection>> batch_res(fcount);for (int b = 0; b < fcount; b++) {auto &res = batch_res[b];nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);}for (int b = 0; b < fcount; b++) {auto &res = batch_res[b];//std::cout << res.size() << std::endl;cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);for (size_t j = 0; j < res.size(); j++) {cv::Rect r = get_rect(img, res[j].bbox);cv::rectangle(img, r, cv::Scalar(0x27, 0xC1, 0x36), 2);cv::putText(img, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);}cv::imwrite("_" + file_names[f - fcount + 1 + b], img);}fcount = 0;}}// Release stream and bufferscudaStreamDestroy(stream);CUDA_CHECK(cudaFree(buffers[inputIndex]));CUDA_CHECK(cudaFree(buffers[outputIndex]));// Destroy the enginecontext->destroy();engine->destroy();runtime->destroy();// Print histogram of the output distribution//std::cout << "\nOutput:\n\n";//for (unsigned int i = 0; i < OUTPUT_SIZE; i++)//{//    std::cout << prob[i] << ", ";//    if (i % 10 == 0) std::cout << std::endl;//}//std::cout << std::endl;return 0;
}

好像偶尔存在这个问题

cudnn 初始化失败的情况~

安装一下这个就好了

pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html

如果某一天遇到问题,如下

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

解决方案，请先把自动更新内核关掉，参考上述文章，在按照下面方法修改

ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo apt-get install dkms
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:linux-headers-5.8.0-50-generic linux-hwe-5.8-headers-5.8.0-50 linux-image-5.8.0-50-generic linux-modules-5.8.0-50-genericlinux-modules-extra-5.8.0-50-generic
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:dctrl-tools
Suggested packages:debtags menu
The following NEW packages will be installed:dctrl-tools dkms
0 upgraded, 2 newly installed, 0 to remove and 173 not upgraded.
Need to get 128 kB of archives.
After this operation, 599 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://mirrors.aliyun.com/ubuntu focal/main amd64 dctrl-tools amd64 2.24-3 [61.5 kB]
Get:2 http://mirrors.aliyun.com/ubuntu focal-updates/main amd64 dkms all 2.8.1-5ubuntu2 [66.8 kB]
Fetched 128 kB in 1s (157 kB/s)
Selecting previously unselected package dctrl-tools.
(Reading database ... 288405 files and directories currently installed.)
Preparing to unpack .../dctrl-tools_2.24-3_amd64.deb ...
Unpacking dctrl-tools (2.24-3) ...
Selecting previously unselected package dkms.
Preparing to unpack .../dkms_2.8.1-5ubuntu2_all.deb ...
Unpacking dkms (2.8.1-5ubuntu2) ...
Setting up dctrl-tools (2.24-3) ...
Setting up dkms (2.8.1-5ubuntu2) ...
Processing triggers for man-db (2.9.1-1) ...
ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo dkms install -m nvidia -v 460.67Creating symlink /var/lib/dkms/nvidia/460.67/source ->/usr/src/nvidia-460.67DKMS: add completed.Kernel preparation unnecessary for this kernel.  Skipping...Building module:
cleaning build area...
'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.8.0-53-generic IGNORE_CC_MISMATCH='' modules.............
Signing module:
Generating a new Secure Boot signing key:
Can't load /var/lib/shim-signed/mok/.rnd into RNG
140140304328000:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:98:Filename=/var/lib/shim-signed/mok/.rnd
Generating a RSA private key
.+++++
......................................+++++
writing new private key to '/var/lib/shim-signed/mok/MOK.priv'
------ /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia.ko- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-drm.ko- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-uvm.ko- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-modeset.ko
Secure Boot not enabled on this system.
cleaning build area...DKMS: build completed.nvidia.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/nvidia-uvm.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/nvidia-modeset.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/nvidia-drm.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/depmod...........DKMS: install completed.
ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
Wed Jul  7 22:35:07 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   52C    P8    N/A /  N/A |      0MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

如果修改之后仍然未生效，那就在进入ubuntu系统过程中选择高级选项，试试提示的3个内核版本，哪个可用，就用哪个内核版本。

（八）、Deepstream5-1的安装

25、Jetson Xavier NX使用yolov5对比GPU模型下的pt、onnx、engine 、 DeepStream 加速性能_sxj731533730-CSDN博客_jetson nx yolov5

50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署相关推荐

基于30系显卡以及Ubuntu18.04系统的YOLOv3环境搭建和训练模型以及测试
基于30系显卡以及Ubuntu18.04系统的YOLOv3环境搭建和训练模型以及测试安装环境下面是官网对N卡框架以及驱动和cuda版本的部分对应关系驱动 (可以跳过这段安装,你可以在安装CUDA ...
ESP32s3-EYE ESP-IDF环境搭建Ubuntu18.04 Micropython环境搭建Pycharm 物联网
提示:该项目建立于ubuntu18.04版本,esp-idf版本为4.4.1,ESP32S3-EYE开发板由乐鑫公司提供,在此表示感谢.项目中的rPPG技术来源于github上的nasir,本项目所有 ...
hadoop与spark环境搭建命令简易教程（Ubuntu18.04）
hadoop与spark环境搭建命令简易教程(Ubuntu18.04) Hadoop 一.single node cluster 二.multi node cluster 三.快速版(远程复制) Sp ...
Ubuntu18.04下基于ROS和PX4的无人机仿真平台的基础配置搭建（XTDrone的）
摘自:https://www.ngui.cc/51cto/show-23557.html Ubuntu18.04下基于ROS和PX4的无人机仿真平台的基础配置搭建编程学习 · 2020/7/12 1 ...
pytorch环境安装（配置：CUDA11.1+CUDNN11.1+torch.9.0+cu111+torchvision0.10.0+cu111+torchaudio==0.9.0）
文章目录 1.下载CUDA 2 .下载CUDNN 3.CUDA安装 4.安装CUDNN 5.下载pytorch 6.安装pytorch 本文的显卡是 NVIDIA GeForce RTX 3060 L ...
Ubuntu18.04人工智能环境搭建
Ubuntu18.04人工智能环境搭建相较于之前的Ubuntu16.04,Ubuntu18.04的环境搭建显然要方便很多,系统更加稳定和完善.从16.04就开始有的snap安装方式使得应用从下载到使 ...
ESP8266基于WIN10+UBUNTU18.04的开发环境搭建（RTOS 3.2）（比较水）
ESP8266基于WIN10+UBUNTU18.04的开发环境搭建(RTOS v3.2) 软件基础环境参考硬件环境参考软件环境安装的详细内容软件基础环境参考 WIN10家庭版1803的64位版本 ...
Ubuntu18.04+python3.6+pcl-1.8+opencv3+realsense D415环境搭建
Ubuntu18.04+python3.6+pcl-1.8+opencv3+realsense D415环境搭建说明:此篇文章是参考了几位博主,因为自己要用realsenseD415深度相机,并且使 ...
Ubuntu18.04 python环境搭建 pycharm+anaconda3+Pyqt5
PyQt5环境搭建一.ubuntu18.04安装qt5 工具二.安装Anaconda 三.安装pycharm 四.配置pyQt5工具五.创建pyQt项目一.ubuntu18.04安装qt5 工 ...

50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署

50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署相关推荐

最新文章

热门文章