50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署
基本思想:想学习一下TensorRT的使用,随笔记录一下;
链接:https://pan.baidu.com/s/1uFOktdF-bHcDDsufIqmNSA
提取码:k55w
复制这段内容后打开百度网盘手机App,操作更方便哦
记录一下pip安装命令:
pip install **** -i http://mirrors.aliyun.com/pypi/simple --trusted-host mirrors.aliyun.com
一、安装显卡驱动,显卡版本为RTX2080显卡
注意:首先将secure boot 设置为disabled
避免sudo apt-get install nvidia-*安装方式造成登录界面循环。
1. ubuntu 18.04默认安装nvidia显卡驱动首先需要禁用nouveau。
ubuntu@ubuntu:~$ sudo vim /etc/modprobe.d/blacklist.conf
在文件最后部分插入以下两行内容
blacklist nouveau
options nouveau modeset=0
更新系统
ubuntu@ubuntu:~$ sudo update-initramfs -u
重启系统,并且验证nouveau是否已禁用
ubuntu@ubuntu:~$ lsmod | grep nouveau
没有信息显示,说明nouveau已被禁用,重启一下。
2. 在英伟达的官网上查找你自己电脑的显卡型号然后下载相应的驱动。网址:官方 GeForce 驱动程序 | NVIDIA
我下载的版本:NVIDIA-Linux-x86_64-460.67.run
下载后的run文件拷贝至home目录下。
3. 在ubuntu下按ctrl+alt+f1进入命令行界面,
然后在命令行界面下输入:
ubuntu@ubuntu:~$ sudo apt-get install lightdmubuntu@ubuntu:~$ sudo service lightdm stop
4.然后卸载掉原有驱动:
ubuntu@ubuntu:~$ sudo apt-get remove nvidia-*
给驱动run文件赋予执行权限:
ubuntu@ubuntu:~$ sudo chmod a+x NVIDIA-Linux-x86_64-460.67.run
执行安装:
ubuntu@ubuntu:~$ sudo ./NVIDIA-Linux-x86_64-460.67.run -no-x-check -no-nouveau-check -no-opengl-files
检查驱动是否安装成功:
ubuntu@ubuntu:~$ nvidia-smi
如果出现如下提示,则说明安装成功:
Mon Mar 22 13:00:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:01:00.0 Off | N/A |
| 22% 43C P0 53W / 250W | 0MiB / 11016MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:03:00.0 Off | N/A |
| 19% 42C P0 62W / 250W | 0MiB / 11019MiB | 1% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:82:00.0 Off | N/A |
| 0% 40C P0 1W / 250W | 0MiB / 11019MiB | 0% Default |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
还需要设置一下,禁止更新内核,以防以后出问题
命令行禁用linux自动更新内核
ubuntu@ubuntu:~$ cat /etc/apt/apt.conf.d/10periodic
APT::Periodic::Update-Package-Lists "1";
APT::Periodic::Download-Upgradeable-Packages "0";
APT::Periodic::AutocleanInterval "0";
将配置中的"Update-Package-Lists"参数配置为"0";
二、然后下载cuda11.1的驱动文件
ubuntu@ubuntu:~$ sudo ./cuda_11.1.0_455.23.05_linux.run
分别的选项为
下一步,选中EULA选项
下一步,安装不带驱动文件的cuda11.1
显示安装完成
===========
= Summary =
===========Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-11.1/
Samples: Installed in /home/ubuntu/Please make sure that- PATH includes /usr/local/cuda-11.1/bin- LD_LIBRARY_PATH includes /usr/local/cuda-11.1/lib64, or, add /usr/local/cuda-11.3/lib64 to /etc/ld.so.conf and run ldconfig as rootTo uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-11.3/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least .00 is required for CUDA 11.3 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:sudo <CudaInstaller>.run --silent --driverLogfile is /var/log/cuda-installer.log
然后在配置文件添加配置项
ubuntu@ubuntu:~$ sudo gedit ~/.bashrcexport LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.1/lib64
export PATH=$PATH:/usr/local/cuda-11.1/bin
export CUDA_HOME=/usr/local/cuda-11.1ubuntu@ubuntu:~$ source ~/.bashrc
测试版本
nvcc -V
Pytorch
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
三、下载cudnn https://developer.nvidia.com/rdp/cudnn-download
解压文件,进行复制 11.3
ubuntu@ubuntu:~$ tar -zxvf cudnn-11.3-linux-x64-v8.2.0.53.tgz
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
ubuntu@ubuntu:~$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
ubuntu@ubuntu:~$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
查看一下
ubuntu@ubuntu:~$ cat /usr/local/cuda/include/cudnn.h | grep cudnn
/* cudnn : Neural Networks Library
#include "cudnn_version.h"
#include "cudnn_ops_infer.h"
#include "cudnn_ops_train.h"
#include "cudnn_adv_infer.h"
#include "cudnn_adv_train.h"
#include "cudnn_cnn_infer.h"
#include "cudnn_cnn_train.h"
#include "cudnn_backend.h"
四、软连接
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.1.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8
or
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8
sudo ln -sf /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.2.0 /usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8
五、安装tensorRT只安装一种类型即可,本菜是后来升级了本地得TensorRT7.2-----TensoRT8.6
1)安装7.2安装一下tensorRT https://developer.nvidia.com/nvidia-tensorrt-7x-download
ubuntu@ubuntu:~$ tar xzvf TensorRT-7.2.2.3.Ubuntu-18.04.x86_64-gnu.cuda-11.1.cudnn8.0.tar.gz /home/ubuntu/NVIDIA_CUDA-11.1_Samples
ubuntu@ubuntu:~$ sudo vim ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ source ~/.bashrc
ubuntu@ubuntu:~$ cd NVIDIA_CUDA-11.3_Samples/TensorRT-7.2.2.3
ubuntu@ubuntu:~$ cd python
ubuntu@ubuntu:~$ sudo pip3 install tensorrt-7.2.2.3-cp37-none-linux_x86_64.whl
ubuntu@ubuntu:~$ cd ../uff
ubuntu@ubuntu:~$ sudo pip3 install uff-0.6.5-py2.py3-none-any.whl
ubuntu@ubuntu:~$ cd ../graphsurgeon
ubuntu@ubuntu:~$ sudo pip3 install graphsurgeon-0.4.1-py2.py3-none-any.whl
拷贝库
# TensorRT路径下
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include
在以后遇到代码执行过程中问题(Python)然后需要拷贝so到/usr/lib文件夹中 以后缺什么so 就从该目录中拷贝so到usr/local中
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvonnxparser.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvinfer_plugin.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libnvparsers.so.7 /usr/lib/
ubuntu@ubuntu:~/tensorrt_inference/arcface/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/targets/x86_64-linux-gnu/lib/libmyelin.so.1 /usr/lib/
最后的文件配置为{.bashrc}
export CUDA_HOME=/usr/local/cuda
export PATH=$PATH:$CUDA_HOME/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
export LD_LIBRARY_PATH=/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib:$LD_LIBRARY_PATH
测试一下版本信息
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
ubuntu@ubuntu:~$ sudo ldconfig /usr/local/cuda/lib64
ubuntu@ubuntu:~$ python3
Python 3.8.6 (default, Sep 25 2020, 09:36:53)
[GCC 10.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.cuda.is_available())
True
>>> from torch.backends import cudnn
>>> print(cudnn.is_available())
True
>>>import tensorrt
>>>
2)安装8.6
ubuntu@ubuntu:~$ axel -n 100 https://developer.nvidia.com/downloads/compute/machine-learning/tensorrt/secure/8.6.1/tars/TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gzubuntu@ubuntu:~$ chmod 777 TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
ubuntu@ubuntu:~$ tar -zxvf TensorRT-8.6.1.6.Linux.x86_64-gnu.cuda-11.8.tar.gz
ubuntu@ubuntu:~$ mv TensorRT-8.6.1.6 NVIDIA_CUDA-11.1_Samples/ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/python$ pip3 install tensorrt-8.6.1-cp38-none-linux_x86_64.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./tensorrt-8.6.1-cp38-none-linux_x86_64.whl
Installing collected packages: tensorrtAttempting uninstall: tensorrtFound existing installation: tensorrt 7.2.2.3Uninstalling tensorrt-7.2.2.3:Successfully uninstalled tensorrt-7.2.2.3
Successfully installed tensorrt-8.6.1ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/uff$ pip3 install uff-0.6.9-py2.py3-none-any.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./uff-0.6.9-py2.py3-none-any.whl
Requirement already satisfied: numpy>=1.11.0 in /home/ubuntu/.local/lib/python3.8/site-packages (from uff==0.6.9) (1.23.4)
Requirement already satisfied: protobuf>=3.3.0 in /home/ubuntu/.local/lib/python3.8/site-packages (from uff==0.6.9) (3.20.2)
Installing collected packages: uff
Successfully installed uff-0.6.9ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/graphsurgeon$ pip3 install graphsurgeon-0.4.6-py2.py3-none-any.whl
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing ./graphsurgeon-0.4.6-py2.py3-none-any.whl
Installing collected packages: graphsurgeonAttempting uninstall: graphsurgeonFound existing installation: graphsurgeon 0.4.5Uninstalling graphsurgeon-0.4.5:Successfully uninstalled graphsurgeon-0.4.5
Successfully installed graphsurgeon-0.4.6ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6$ sudo cp -r ./lib/* /usr/lib
[sudo] password for ubuntu:
Sorry, try again.
[sudo] password for ubuntu:
ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6$ sudo cp -r ./include/* /usr/include
ubuntu@ubuntu:~/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6$ python3
Python 3.8.10 (default, Mar 13 2023, 10:26:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorrt
>>> tensorrt.__version__
'8.6.1修正bashrc
export LD_LIBRARY_PATH=/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-8.6.1.6
然后下载代码;yolo5&
同时也顺便把vulkan安装一下,以便后续使用ncnn的vulkan加速功能
wget -qO - http://packages.lunarg.com/lunarg-signing-key-pub.asc | sudo apt-key add -
sudo wget -qO /etc/apt/sources.list.d/lunarg-vulkan-1.1.97-bionic.list http://packages.lunarg.com/vulkan/1.1.97/lunarg-vulkan-1.1.97-bionic.list
sudo apt update
sudo apt install lunarg-vulkan-sdk
sudo apt-get install cmake git gcc g++ mesa-* libwayland-dev libxrandr-dev
sudo apt-get install libvulkan1 mesa-vulkan-drivers vulkan-utils
vulkaninfo
测试结果
ubuntu@ubuntu:~/ncnn/build/benchmark$ vulkaninfo
ERROR: [Loader Message] Code 0 : libGLX_nvidia.so.0: cannot open shared object file: No such file or directory
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_radeon.so: wrong ELF class: ELFCLASS32
ERROR: [Loader Message] Code 0 : /usr/lib/i386-linux-gnu/libvulkan_intel.so: wrong ELF class: ELFCLASS32
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0==========
VULKANINFO
==========Vulkan Instance Version: 1.2.131Instance Extensions: count = 18
====================VK_EXT_acquire_xlib_display : extension revision 1VK_EXT_debug_report : extension revision 9VK_EXT_debug_utils : extension revision 1VK_EXT_direct_mode_display : extension revision 1VK_EXT_display_surface_counter : extension revision 1VK_KHR_device_group_creation : extension revision 1VK_KHR_display : extension revision 23VK_KHR_external_fence_capabilities : extension revision 1VK_KHR_external_memory_capabilities : extension revision 1VK_KHR_external_semaphore_capabilities : extension revision 1VK_KHR_get_display_properties2 : extension revision 1VK_KHR_get_physical_device_properties2 : extension revision 1VK_KHR_get_surface_capabilities2 : extension revision 1VK_KHR_surface : extension revision 25VK_KHR_surface_protected_capabilities : extension revision 1VK_KHR_wayland_surface : extension revision 6VK_KHR_xcb_surface : extension revision 6VK_KHR_xlib_surface : extension revision 6Layers: count = 12
=======
....
---------------------------------samplerMirrorClampToEdge = truedrawIndirectCount = truestorageBuffer8BitAccess = trueuniformAndStorageBuffer8BitAccess = truestoragePushConstant8 = trueshaderBufferInt64Atomics = trueshaderSharedInt64Atomics = falseshaderFloat16 = trueshaderInt8 = truedescriptorIndexing = trueshaderInputAttachmentArrayDynamicIndexing = falseshaderUniformTexelBufferArrayDynamicIndexing = trueshaderStorageTexelBufferArrayDynamicIndexing = trueshaderUniformBufferArrayNonUniformIndexing = falseshaderSampledImageArrayNonUniformIndexing = trueshaderStorageBufferArrayNonUniformIndexing = trueshaderStorageImageArrayNonUniformIndexing = trueshaderInputAttachmentArrayNonUniformIndexing = falseshaderUniformTexelBufferArrayNonUniformIndexing = trueshaderStorageTexelBufferArrayNonUniformIndexing = truedescriptorBindingUniformBufferUpdateAfterBind = falsedescriptorBindingSampledImageUpdateAfterBind = truedescriptorBindingStorageImageUpdateAfterBind = truedescriptorBindingStorageBufferUpdateAfterBind = truedescriptorBindingUniformTexelBufferUpdateAfterBind = truedescriptorBindingStorageTexelBufferUpdateAfterBind = truedescriptorBindingUpdateUnusedWhilePending = truedescriptorBindingPartiallyBound = truedescriptorBindingVariableDescriptorCount = falseruntimeDescriptorArray = truesamplerFilterMinmax = truescalarBlockLayout = trueimagelessFramebuffer = trueuniformBufferStandardLayout = trueshaderSubgroupExtendedTypes = trueseparateDepthStencilLayouts = truehostQueryReset = truetimelineSemaphore = truebufferDeviceAddress = truebufferDeviceAddressCaptureReplay = truebufferDeviceAddressMultiDevice = falsevulkanMemoryModel = truevulkanMemoryModelDeviceScope = truevulkanMemoryModelAvailabilityVisibilityChains = trueshaderOutputViewportIndex = trueshaderOutputLayer = truesubgroupBroadcastDynamicId = trueVkPhysicalDeviceVulkanMemoryModelFeatures:
------------------------------------------vulkanMemoryModel = truevulkanMemoryModelDeviceScope = truevulkanMemoryModelAvailabilityVisibilityChains = trueVkPhysicalDeviceYcbcrImageArraysFeaturesEXT:
--------------------------------------------ycbcrImageArrays = trueubuntu@ubuntu:~/ncnn/build/benchmark$ vkcube
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
测试结果图
貌似NCNN执行需要以vulkan的sdk导入方式使用,官网https://vulkan.lunarg.com/sdk/home 下载的vulkansdk-linux-x86_64-1.2.182.0 然后解压,放入了/usr/local目录下了
ubuntu@ubuntu:~$ sudo cp -r vulkansdk-linux-x86_64-1.2.182.0/ /usr/local/
[sudo] password for ubuntu:
ubuntu@ubuntu:~$ cd /usr/local/
添加环境变量 记得source一下
export Vulkan_LIBRARY=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/lib
export Vulkan_INCLUDE_DIR=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/include
export Vulkan_BIN=/usr/local/vulkansdk-linux-x86_64-1.2.182.0/1.2.182.0/x86_64/bin
export PATH=$PATH:$Vulkan_LIBRARY
export PATH=$PATH:$Vulkan_INCLUDE_DIR
export PATH=$PATH:$Vulkan_BIN
(七)下载大佬提供的tensorRT源代码
ubuntu@ubuntu:~$ https://github.com/wang-xinyu/tensorrtx.git
ubuntu@ubuntu:~$ https://github.com/ultralytics/yolov5.git
ubuntu@ubuntu:~$ cp tensorrtx/yolov5/gen_wts.py yolov5
然后修改一下py脚本的内容
import torch
import struct
from utils.torch_utils import select_device# Initialize
device = select_device('cpu')
# Load model
model = torch.load('/home/ubuntu/yolov5/runs/train/exp/weights/best.pt', map_location=device)['model'].float() # load to FP32
model.to(device).eval()f = open('/home/ubuntu/yolov5/runs/train/exp/weights/bestyolov5x.wts', 'w')
f.write('{}\n'.format(len(model.state_dict().keys())))
for k, v in model.state_dict().items():vr = v.reshape(-1).cpu().numpy()f.write('{} {} '.format(k, len(vr)))for vv in vr:f.write(' ')f.write(struct.pack('>f',float(vv)).hex())f.write('\n')
修改问题点
ubuntu@ubuntu:~/yolov5$ sudo apt-get install liblzma-dev
Reading package lists... Done
Building dependency tree
Reading state information... Done
liblzma-dev is already the newest version (5.2.2-1.3).
0 upgraded, 0 newly installed, 0 to remove and 190 not upgraded.
ubuntu@ubuntu:~/yolov5$ sudo pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple backports.lzma
ubuntu@ubuntu:~/yolov5$ sudo vim /usr/local/lib/python3.7/lzma.py
源码代码
from _lzma import *
from _lzma import _encode_filter_properties, _decode_filter_properties
修改代码为
try:#from _lzma import *from _lzma import _encode_filter_properties, _decode_filter_properties
except ImportError:#from backports.lzma import * ##from backports.lzma import _encode_filter_properties, _decode_filter_properties#
然后继续执行成功
ubuntu@ubuntu:~/yolov5$ python3 gen_wts.py
ubuntu@ubuntu:~/yolov5$ ls runs/train/exp/weights/
best.pt bestyolov5x.wts last.pt
ubuntu@ubuntu:~/yolov5$ cp runs/train/exp/weights/bestyolov5x.wts ../tensorrtx/yolov5
进行模型转换
ubuntu@ubuntu:~$ mkdir tensorrtx/yolov5/build
ubuntu@ubuntu:~$ cd tensorrtx/yolov5/build
ubuntu@ubuntu:~/tensorrtx/yolov5$ cp /home/ubuntu/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/include/* .
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
/usr/bin/ld: cannot find -lnvinfer
collect2: error: ld returned 1 exit status
CMakeFiles/myplugins.dir/build.make:341: recipe for target 'libmyplugins.so' failed
make[2]: *** [libmyplugins.so] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/myplugins.dir/all' failed
make[1]: *** [CMakeFiles/myplugins.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ make
[ 20%] Linking CXX shared library libmyplugins.so
[ 40%] Built target myplugins
Scanning dependencies of target yolov5
[ 60%] Building CXX object CMakeFiles/yolov5.dir/calibrator.cpp.o
[ 80%] Building CXX object CMakeFiles/yolov5.dir/yolov5.cpp.o
[100%] Linking CXX executable yolov5
[100%] Built target yolov5
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libnvinfer.so.7: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libnvinfer.so.7 /usr/local/lib/
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s
./yolov5: error while loading shared libraries: libmyelin.so.1: cannot open shared object file: No such file or directory
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo cp ~/NVIDIA_CUDA-11.1_Samples/TensorRT-7.2.2.3/lib/libmyelin.so.1 /usr/local/lib/
转换模型报错
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
Building engine, please wait for a while...
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: kernel weights has count 6720 but 81600 was expected
[03/23/2021-09:24:48] [E] [TRT] (Unnamed Layer* 475) [Convolution]: count of 6720 weights in kernel, but kernel dimensions (1,1) with 320 input channels, 255 output channels and 1 groups were specified. Expected Weights count is 320 * 1*1 * 255 / 1 = 81600
[03/23/2021-09:24:48] [E] [TRT] Could not compute dimensions for (Unnamed Layer* 475) [Convolution]_output, because the network is not valid.
[03/23/2021-09:24:48] [E] [TRT] Network validation failed.
Build engine successfully!
yolov5: /home/ubuntu/tensorrtx/yolov5/yolov5.cpp:143: void APIToModel(unsigned int, nvinfer1::IHostMemory**, float&, float&, std::__cxx11::string&): Assertion `engine != nullptr' failed.
Aborted
修改代码类别
yololayer.h
中 修改static constexpr int CLASS_NUM = 2;
默认是80
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ sudo ./yolov5 -s ../bestyolov5x.wts ../bestyolov5.eigine x
Loading weights: ../bestyolov5x.wts
Building engine, please wait for a while...
[03/23/2021-09:32:55] [W] [TRT] Half2 support requested on hardware without native FP16 support, performance will be negatively affected.
[03/23/2021-09:33:03] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
[03/23/2021-09:34:10] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.3.0 but loaded cuBLAS/cuBLAS LT 11.2.1
Build engine successfully!
测试一下Pytorch和tensorRT速度
ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090237.avi --weights runs/train/exp/weights/best.pt --device "0"
视频时长 1 minute 11 seconds ;
PyTorch处理时间 (166.625s)
ubuntu@ubuntu:~/yolov5$ python3 detect.py --source /home/ubuntu/Downloads/video/defect/20210201090749.avi --weights runs/train/exp/weights/best.pt --device "0"
视频时长 5 seconds ;
PyTorch处理时间 (12.689s)
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090237.avi
视频时长 1 minute 11 seconds ;
PyTorch处理时间 (137200ms)
ubuntu@ubuntu:~/tensorrtx/yolov5/build$ ./yolov5 -v ../bestyolov5.eigine /home/ubuntu/Downloads/video/defect/20210201090749.avi
视频时长 5 seconds ;
TensorRT处理时间 (12832ms)
付录一个修改 读视频的代码 很简单/home/ubuntu/tensorrtx/yolov5/yolov5.cpp 修改版
#include <iostream>
#include <chrono>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0 // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) +1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char *INPUT_BLOB_NAME = "data";
const char *OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;static int get_width(int x, float gw, int divisor = 8) {//return math.ceil(x / divisor) * divisorif (int(x * gw) % divisor == 0) {return int(x * gw);}return (int(x * gw / divisor) + 1) * divisor;
}static int get_depth(int x, float gd) {if (x == 1) {return 1;} else {return round(x * gd) > 1 ? round(x * gd) : 1;}
}ICudaEngine *
build_engine(unsigned int maxBatchSize, IBuilder *builder, IBuilderConfig *config, DataType dt, float &gd, float &gw,std::string &wts_name) {INetworkDefinition *network = builder->createNetworkV2(0U);// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAMEITensor *data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{3, INPUT_H, INPUT_W});assert(data);std::map<std::string, Weights> weightMap = loadWeights(wts_name);Weights emptywts{DataType::kFLOAT, nullptr, 0};/* ------ yolov5 backbone------ */auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw),get_depth(3, gd), true, 1, 0.5, "model.2");auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw),get_depth(9, gd), true, 1, 0.5, "model.4");auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw),get_depth(9, gd), true, 1, 0.5, "model.6");auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13,"model.8");/* ------ yolov5 head ------ */auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw),get_depth(3, gd), false, 1, 0.5, "model.9");auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1,"model.10");float *deval = reinterpret_cast<float *>(malloc(sizeof(float) * get_width(512, gw) * 2 * 2));for (int i = 0; i < get_width(512, gw) * 2 * 2; i++) {deval[i] = 1.0;}Weights deconvwts11{DataType::kFLOAT, deval, get_width(512, gw) * 2 * 2};IDeconvolutionLayer *deconv11 = network->addDeconvolutionNd(*conv10->getOutput(0), get_width(512, gw), DimsHW{2, 2},deconvwts11, emptywts);deconv11->setStrideNd(DimsHW{2, 2});deconv11->setNbGroups(get_width(512, gw));weightMap["deconv11"] = deconvwts11;ITensor *inputTensors12[] = {deconv11->getOutput(0), bottleneck_csp6->getOutput(0)};auto cat12 = network->addConcatenation(inputTensors12, 2);auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw),get_depth(3, gd), false, 1, 0.5, "model.13");auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1,"model.14");Weights deconvwts15{DataType::kFLOAT, deval, get_width(256, gw) * 2 * 2};IDeconvolutionLayer *deconv15 = network->addDeconvolutionNd(*conv14->getOutput(0), get_width(256, gw), DimsHW{2, 2},deconvwts15, emptywts);deconv15->setStrideNd(DimsHW{2, 2});deconv15->setNbGroups(get_width(256, gw));ITensor *inputTensors16[] = {deconv15->getOutput(0), bottleneck_csp4->getOutput(0)};auto cat16 = network->addConcatenation(inputTensors16, 2);auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw),get_depth(3, gd), false, 1, 0.5, "model.17");// yolo layer 0IConvolutionLayer *det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),DimsHW{1, 1}, weightMap["model.24.m.0.weight"],weightMap["model.24.m.0.bias"]);auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1,"model.18");ITensor *inputTensors19[] = {conv18->getOutput(0), conv14->getOutput(0)};auto cat19 = network->addConcatenation(inputTensors19, 2);auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw),get_depth(3, gd), false, 1, 0.5, "model.20");//yolo layer 1IConvolutionLayer *det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),DimsHW{1, 1}, weightMap["model.24.m.1.weight"],weightMap["model.24.m.1.bias"]);auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1,"model.21");ITensor *inputTensors22[] = {conv21->getOutput(0), conv10->getOutput(0)};auto cat22 = network->addConcatenation(inputTensors22, 2);auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw),get_depth(3, gd), false, 1, 0.5, "model.23");IConvolutionLayer *det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5),DimsHW{1, 1}, weightMap["model.24.m.2.weight"],weightMap["model.24.m.2.bias"]);auto yolo = addYoLoLayer(network, weightMap, det0, det1, det2);yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);network->markOutput(*yolo->getOutput(0));// Build enginebuilder->setMaxBatchSize(maxBatchSize);config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
#if defined(USE_FP16)config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;assert(builder->platformHasFastInt8());config->setFlag(BuilderFlag::kINT8);Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);config->setInt8Calibrator(calibrator);
#endifstd::cout << "Building engine, please wait for a while..." << std::endl;ICudaEngine *engine = builder->buildEngineWithConfig(*network, *config);std::cout << "Build engine successfully!" << std::endl;// Don't need the network any morenetwork->destroy();// Release host memoryfor (auto &mem : weightMap) {free((void *) (mem.second.values));}return engine;
}void APIToModel(unsigned int maxBatchSize, IHostMemory **modelStream, float &gd, float &gw, std::string &wts_name) {// Create builderIBuilder *builder = createInferBuilder(gLogger);IBuilderConfig *config = builder->createBuilderConfig();// Create model to populate the network, then set the outputs and create an engineICudaEngine *engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);assert(engine != nullptr);// Serialize the engine(*modelStream) = engine->serialize();// Close everything downengine->destroy();builder->destroy();config->destroy();
}void doInference(IExecutionContext &context, cudaStream_t &stream, void **buffers, float *input, float *output,int batchSize) {// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to hostCUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float),cudaMemcpyHostToDevice, stream));context.enqueue(batchSize, buffers, stream, nullptr);CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost,stream));cudaStreamSynchronize(stream);
}bool
parse_args(int argc, char **argv, std::string &wts, std::string &engine, float &gd, float &gw, std::string &img_dir,std::string &video_path) {if (argc < 4) return false;if (std::string(argv[1]) == "-s" && (argc == 5 || argc == 7)) {wts = std::string(argv[2]);engine = std::string(argv[3]);auto net = std::string(argv[4]);if (net == "s") {gd = 0.33;gw = 0.50;} else if (net == "m") {gd = 0.67;gw = 0.75;} else if (net == "l") {gd = 1.0;gw = 1.0;} else if (net == "x") {gd = 1.33;gw = 1.25;} else if (net == "c" && argc == 7) {gd = atof(argv[5]);gw = atof(argv[6]);} else {return false;}} else if (std::string(argv[1]) == "-d" && argc == 4) {engine = std::string(argv[2]);img_dir = std::string(argv[3]);} else if (std::string(argv[1]) == "-v" && argc == 4) {engine = std::string(argv[2]);video_path = std::string(argv[3]);} else {return false;}return true;
}int main(int argc, char **argv) {cudaSetDevice(DEVICE);std::string wts_name = "";std::string engine_name = "";float gd = 0.0f, gw = 0.0f;std::string img_dir;std::string video_path = "";if (!parse_args(argc, argv, wts_name, engine_name, gd, gw, img_dir, video_path)) {std::cerr << "arguments not right!" << std::endl;std::cerr << "./yolov5 -s [.wts] [.engine] [s/m/l/x or c gd gw] // serialize model to plan file" << std::endl;std::cerr << "./yolov5 -d [.engine] ../samples // deserialize plan file and run inference" << std::endl;std::cerr << "./yolov5 -v [.engine] [.mp4] // deserialize plan file and run inference"<< std::endl; //sxj731533730return -1;}// create a model using the API directly and serialize it to a streamif (!wts_name.empty()) {IHostMemory *modelStream{nullptr};APIToModel(BATCH_SIZE, &modelStream, gd, gw, wts_name);assert(modelStream != nullptr);std::ofstream p(engine_name, std::ios::binary);if (!p) {std::cerr << "could not open plan output file" << std::endl;return -1;}p.write(reinterpret_cast<const char *>(modelStream->data()), modelStream->size());modelStream->destroy();return 0;}// deserialize the .engine and run inferencestd::ifstream file(engine_name, std::ios::binary);if (!file.good()) {std::cerr << "read " << engine_name << " error!" << std::endl;return -1;}char *trtModelStream = nullptr;size_t size = 0;file.seekg(0, file.end);size = file.tellg();file.seekg(0, file.beg);trtModelStream = new char[size];assert(trtModelStream);file.read(trtModelStream, size);file.close();std::vector<std::string> file_names;if (read_files_in_dir(img_dir.c_str(), file_names) < 0 && video_path.empty()) {std::cerr << "read_files_in_dir failed." << std::endl;return -1;}// prepare input data ---------------------------static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];//for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)// data[i] = 1.0;static float prob[BATCH_SIZE * OUTPUT_SIZE];IRuntime *runtime = createInferRuntime(gLogger);assert(runtime != nullptr);ICudaEngine *engine = runtime->deserializeCudaEngine(trtModelStream, size);assert(engine != nullptr);IExecutionContext *context = engine->createExecutionContext();assert(context != nullptr);delete[] trtModelStream;assert(engine->getNbBindings() == 2);void *buffers[2];// In order to bind the buffers, we need to know the names of the input and output tensors.// Note that indices are guaranteed to be less than IEngine::getNbBindings()const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);assert(inputIndex == 0);assert(outputIndex == 1);// Create GPU buffers on deviceCUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));// Create streamcudaStream_t stream;CUDA_CHECK(cudaStreamCreate(&stream));if (!video_path.empty()) {cv::Mat frame;std::cout << video_path << std::endl;cv::VideoCapture capture(video_path);if (!capture.isOpened()) {printf("could not read this video file...\n");return -1;}int type = static_cast<int>(capture.get(cv::CAP_PROP_FOURCC));cv::Size S = cv::Size((int)capture.get(cv::CAP_PROP_FRAME_WIDTH), (int)capture.get(cv::CAP_PROP_FRAME_HEIGHT));int fps = capture.get(cv::CAP_PROP_FPS);printf("当前视频文件 FPS: %d \n", fps);cv::VideoWriter out("/home/ubuntu/yolov5/runs/detect/tensorRTbest/20210201090237.mp4", type, fps, S, true);auto Tstart = std::chrono::system_clock::now();while (true) {capture >> frame; //读取当前帧if (frame.empty()) { //判断break;}cv::Mat pr_img = preprocess_img(frame, INPUT_W, INPUT_H); // letterbox BGR to RGBint i = 0;for (int row = 0; row < INPUT_H; ++row) {uchar *uc_pixel = pr_img.data + row * pr_img.step;for (int col = 0; col < INPUT_W; ++col) {data[0 * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;data[0 * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;data[0 * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;uc_pixel += 3;++i;}}// Run inferenceauto start = std::chrono::system_clock::now();doInference(*context, stream, buffers, data, prob, BATCH_SIZE);auto end = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"<< std::endl;std::vector<std::vector<Yolo::Detection>> batch_res(1);auto &res = batch_res[0];nms(res, &prob[0 * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);//std::cout << res.size() << std::endl;for (size_t j = 0; j < res.size(); j++) {cv::Rect r = get_rect(frame, res[j].bbox);cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);cv::putText(frame, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);}cv::imshow("demo", frame);out << frame;if (cv::waitKey(20) == 'q') //延时20ms,获取用户是否按键的情况,如果按下q,会推出程序break;}auto Tend = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(Tend - Tstart).count() << "ms"<< std::endl;out.release();capture.release(); //释放摄像头资源} else {int fcount = 0;for (int f = 0; f < (int) file_names.size(); f++) {fcount++;if (fcount < BATCH_SIZE && f + 1 != (int) file_names.size()) continue;for (int b = 0; b < fcount; b++) {cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);if (img.empty()) continue;cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGBint i = 0;for (int row = 0; row < INPUT_H; ++row) {uchar *uc_pixel = pr_img.data + row * pr_img.step;for (int col = 0; col < INPUT_W; ++col) {data[b * 3 * INPUT_H * INPUT_W + i] = (float) uc_pixel[2] / 255.0;data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float) uc_pixel[1] / 255.0;data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float) uc_pixel[0] / 255.0;uc_pixel += 3;++i;}}}// Run inferenceauto start = std::chrono::system_clock::now();doInference(*context, stream, buffers, data, prob, BATCH_SIZE);auto end = std::chrono::system_clock::now();std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms"<< std::endl;std::vector<std::vector<Yolo::Detection>> batch_res(fcount);for (int b = 0; b < fcount; b++) {auto &res = batch_res[b];nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);}for (int b = 0; b < fcount; b++) {auto &res = batch_res[b];//std::cout << res.size() << std::endl;cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);for (size_t j = 0; j < res.size(); j++) {cv::Rect r = get_rect(img, res[j].bbox);cv::rectangle(img, r, cv::Scalar(0x27, 0xC1, 0x36), 2);cv::putText(img, std::to_string((int) res[j].class_id), cv::Point(r.x, r.y - 1),cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);}cv::imwrite("_" + file_names[f - fcount + 1 + b], img);}fcount = 0;}}// Release stream and bufferscudaStreamDestroy(stream);CUDA_CHECK(cudaFree(buffers[inputIndex]));CUDA_CHECK(cudaFree(buffers[outputIndex]));// Destroy the enginecontext->destroy();engine->destroy();runtime->destroy();// Print histogram of the output distribution//std::cout << "\nOutput:\n\n";//for (unsigned int i = 0; i < OUTPUT_SIZE; i++)//{// std::cout << prob[i] << ", ";// if (i % 10 == 0) std::cout << std::endl;//}//std::cout << std::endl;return 0;
}
好像偶尔存在这个问题
cudnn 初始化失败的情况~
安装一下这个就好了
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html
如果某一天遇到问题,如下
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
解决方案,请先把自动更新内核关掉,参考上述文章,在按照下面方法修改
ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo apt-get install dkms
[sudo] password for ubuntu:
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:linux-headers-5.8.0-50-generic linux-hwe-5.8-headers-5.8.0-50 linux-image-5.8.0-50-generic linux-modules-5.8.0-50-genericlinux-modules-extra-5.8.0-50-generic
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:dctrl-tools
Suggested packages:debtags menu
The following NEW packages will be installed:dctrl-tools dkms
0 upgraded, 2 newly installed, 0 to remove and 173 not upgraded.
Need to get 128 kB of archives.
After this operation, 599 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://mirrors.aliyun.com/ubuntu focal/main amd64 dctrl-tools amd64 2.24-3 [61.5 kB]
Get:2 http://mirrors.aliyun.com/ubuntu focal-updates/main amd64 dkms all 2.8.1-5ubuntu2 [66.8 kB]
Fetched 128 kB in 1s (157 kB/s)
Selecting previously unselected package dctrl-tools.
(Reading database ... 288405 files and directories currently installed.)
Preparing to unpack .../dctrl-tools_2.24-3_amd64.deb ...
Unpacking dctrl-tools (2.24-3) ...
Selecting previously unselected package dkms.
Preparing to unpack .../dkms_2.8.1-5ubuntu2_all.deb ...
Unpacking dkms (2.8.1-5ubuntu2) ...
Setting up dctrl-tools (2.24-3) ...
Setting up dkms (2.8.1-5ubuntu2) ...
Processing triggers for man-db (2.9.1-1) ...
ubuntu@ubuntu:~/ncnn/build/benchmark$ sudo dkms install -m nvidia -v 460.67Creating symlink /var/lib/dkms/nvidia/460.67/source ->/usr/src/nvidia-460.67DKMS: add completed.Kernel preparation unnecessary for this kernel. Skipping...Building module:
cleaning build area...
'make' -j12 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.8.0-53-generic IGNORE_CC_MISMATCH='' modules.............
Signing module:
Generating a new Secure Boot signing key:
Can't load /var/lib/shim-signed/mok/.rnd into RNG
140140304328000:error:2406F079:random number generator:RAND_load_file:Cannot open file:../crypto/rand/randfile.c:98:Filename=/var/lib/shim-signed/mok/.rnd
Generating a RSA private key
.+++++
......................................+++++
writing new private key to '/var/lib/shim-signed/mok/MOK.priv'
------ /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia.ko- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-drm.ko- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-uvm.ko- /var/lib/dkms/nvidia/460.67/5.8.0-53-generic/x86_64/module/nvidia-modeset.ko
Secure Boot not enabled on this system.
cleaning build area...DKMS: build completed.nvidia.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/nvidia-uvm.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/nvidia-modeset.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/nvidia-drm.ko:
Running module version sanity check.- Original module- No original module exists within this kernel- Installation- Installing to /lib/modules/5.8.0-53-generic/updates/dkms/depmod...........DKMS: install completed.
ubuntu@ubuntu:~/ncnn/build/benchmark$ nvidia-smi
Wed Jul 7 22:35:07 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 105... Off | 00000000:01:00.0 Off | N/A |
| N/A 52C P8 N/A / N/A | 0MiB / 4040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------++-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
如果修改之后仍然未生效,那就在进入ubuntu系统过程中选择高级选项,试试提示的3个内核版本,哪个可用,就用哪个内核版本。
(八)、Deepstream5-1的安装
25、Jetson Xavier NX使用yolov5对比GPU模型下的pt、onnx、engine 、 DeepStream 加速性能_sxj731533730-CSDN博客_jetson nx yolov5
50、ubuntu18.0420.04+CUDA11.1+cudnn11.3+TensorRT7.2/8.6+Deepsteam5.1+vulkan环境搭建和YOLO5部署相关推荐
- 基于30系显卡以及Ubuntu18.04系统的YOLOv3环境搭建和训练模型以及测试
基于30系显卡以及Ubuntu18.04系统的YOLOv3环境搭建和训练模型以及测试 安装环境 下面是官网对N卡框架以及驱动和cuda版本的部分对应关系 驱动 (可以跳过这段安装,你可以在安装CUDA ...
- ESP32s3-EYE ESP-IDF环境搭建Ubuntu18.04 Micropython环境搭建Pycharm 物联网
提示:该项目建立于ubuntu18.04版本,esp-idf版本为4.4.1,ESP32S3-EYE开发板由乐鑫公司提供,在此表示感谢.项目中的rPPG技术来源于github上的nasir,本项目所有 ...
- hadoop与spark环境搭建命令简易教程(Ubuntu18.04)
hadoop与spark环境搭建命令简易教程(Ubuntu18.04) Hadoop 一.single node cluster 二.multi node cluster 三.快速版(远程复制) Sp ...
- Ubuntu18.04下基于ROS和PX4的无人机仿真平台的基础配置搭建(XTDrone的)
摘自:https://www.ngui.cc/51cto/show-23557.html Ubuntu18.04下基于ROS和PX4的无人机仿真平台的基础配置搭建 编程学习 · 2020/7/12 1 ...
- pytorch环境安装(配置:CUDA11.1+CUDNN11.1+torch.9.0+cu111+torchvision0.10.0+cu111+torchaudio==0.9.0)
文章目录 1.下载CUDA 2 .下载CUDNN 3.CUDA安装 4.安装CUDNN 5.下载pytorch 6.安装pytorch 本文的显卡是 NVIDIA GeForce RTX 3060 L ...
- Ubuntu18.04人工智能环境搭建
Ubuntu18.04人工智能环境搭建 相较于之前的Ubuntu16.04,Ubuntu18.04的环境搭建显然要方便很多,系统更加稳定和完善.从16.04就开始有的snap安装方式使得应用从下载到使 ...
- ESP8266基于WIN10+UBUNTU18.04的开发环境搭建(RTOS 3.2)(比较水)
ESP8266基于WIN10+UBUNTU18.04的开发环境搭建(RTOS v3.2) 软件基础环境参考 硬件环境参考 软件环境安装的详细内容 软件基础环境参考 WIN10家庭版1803的64位版本 ...
- Ubuntu18.04+python3.6+pcl-1.8+opencv3+realsense D415环境搭建
Ubuntu18.04+python3.6+pcl-1.8+opencv3+realsense D415环境搭建 说明:此篇文章是参考了几位博主,因为自己要用realsenseD415深度相机,并且使 ...
- Ubuntu18.04 python环境搭建 pycharm+anaconda3+Pyqt5
PyQt5环境搭建 一.ubuntu18.04安装qt5 工具 二.安装Anaconda 三.安装pycharm 四.配置pyQt5工具 五.创建pyQt项目 一.ubuntu18.04安装qt5 工 ...
最新文章
- paddle自定义weight初始参数(parameter)
- Scala比较器:Ordered与Ordering
- NUMA架构的CPU -- 你真的用好了么?
- Django的Field(字段)
- 前端:QuickJS到底能干什么
- Java方法中的参数太多,第4部分:重载
- 代码实现——MapReduce统计单词出现次数
- Mysql权限控制-允许用户远程链接
- GPGGA \ GPRMC 格式解析
- matlab从无到有系列(二):矩阵运算基础
- FPGA的学习:TFT_LCD液晶屏字符显示
- 分手快乐,需要一把Pasotti的定制雨伞_奢侈品品味-中国奢侈品网(中奢网)
- 计算机硬盘只显示c盘,电脑只显示C盘我们应该怎么办
- 柱状图中最大的矩形多种解法
- Windows Mobile 开发常见问题集(转自zsu_darkwind的专栏)
- 出线资格 finals berth
- google adwords express使用心得
- typeof与keyof
- 为freeswitch 添加h323协议
- RAID中有一块硬盘离线的情况下应该对其采取强制上线操作么?
热门文章
- nuScenes 数据集(CVPR 2020)
- QT Android wifi自动重连开发
- CDH6.2环境中启用Kerberos
- 新手小白适合做哪个跨境电商平台?测评自养号能带来哪些收益及优势?
- Spring Boot 整合MyBatis(23)
- 手机号码清洗的优势是什么
- ..\OBJ\Template.axf: Error: L6218E: Undefined symbol FSMC_NORSRAMCmd (referred from lcd.o).
- 违反和解除劳动合同的经济补偿办法
- 人工智能定义及三次热潮
- 如何制作一张完美的VR全景图片?