上一篇：Jetson AGX Xavier安装torch、torchvision且成功运行yolov5算法

下一篇：Jetson AGX Xavier测试YOLOv4

一、前言

由于YOLOv5在Xavier上对实时画面的检测速度较慢，需要采用TensorRT对其进行推理加速。接下来记录一下我的实现过程。

二、环境准备

如果还没有搭建YOLOv5的python环境，按照下文步骤执行。反之，直接跳过第一步执行第二步。

1、参考文章《Jetson AGX Xavier配置yolov5虚拟环境》建立YOLOv5的Python环境，并参照《Jetson AGX Xavier安装Archiconda虚拟环境管理器与在虚拟环境中调用opencv》，将opencv导入环境，本文Opencv采用的是3.4.3版本。

2、在环境中导入TensorRT的库。与opencv的导入相同。将路径 /usr/lib/python3.6/dist-packages/ 下关于TensorRT的文件夹，复制到自己所创建环境的site-packages文件夹下。例如：复制到/home/jetson/archiconda3/envs/yolov5env/lib/python3.6/site-packages/之下。

3、在环境中安装pycuda，如果pip安装不成功，网上有许多解决办法。

conda activate yolov5env
pip install pycuda

三、加速步骤

以加速YOLOv5s模型为例，以下有v4.0与v5.0两个版本，大家任选其一即可。

1、克隆工程

①v4.0

git clone -b v4.0 https://github.com/ultralytics/yolov5.git
git clone -b yolov5-v4.0 https://github.com/wang-xinyu/tensorrtx.git

②v5.0

git clone -b v5.0 https://github.com/ultralytics/yolov5.git
git clone -b yolov5-v5.0 https://github.com/wang-xinyu/tensorrtx.git

2、生成引擎文件

①下载yolov5s.pt到yolov5工程的weights文件夹下。

②复制tensorrtx/yolov5文件夹下的gen_wts.py文件到yolov5工程下。

③生成yolov5s.wts文件。

conda activate yolov5env
cd /xxx/yolov5以下按照自己所下版本选择
#v4.0
python gen_wts.py#v5.0
python gen_wts.py -w yolov5s.pt -o yolov5s.wts

④生成引擎文件

进入tensorrtx/yolov5文件夹下。

mkdir build

复制yolov5工程中生成的yolov5s.wts文件到tensorrtx/yolov5/build文件夹中。并在build文件夹中打开终端：

cmake ..
make
#v4.0   sudo ./yolov5 -s [.wts] [.engine] [s/m/l/x/]
#v5.0   sudo ./yolov5 -s [.wts] [.engine] [s/m/l/x/s6/m6/l6/x6 or c/c6 gd gw]
sudo ./yolov5 -s yolov5s.wts yolov5s.engine s

生成yolov5s.engine文件。

四、加速实现

1、图片检测加速

sudo ./yolov5 -d yolov5s.engine ../samples或者conda activate yolov5env
python yolov5_trt.py

2、摄像头实时检测加速

由于本人没有学习过C++语言，所以只能硬着头皮修改了下yolov5_trt.py脚本，脚本的代码格式较差，但是能够实现加速，有需要的可以作为一个参考。

在tensorrt工程下新建一个yolo_trt_test.py文件。复制下面 v4.0或者v5.0的代码到yolo_trt_test.py。注意yolov5s.engine的路径，自行更改。

①v4.0代码

"""
An example that uses TensorRT's Python api to make inferences.
"""
import ctypes
import os
import random
import sys
import threading
import timeimport cv2
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import torch
import torchvisionINPUT_W = 608
INPUT_H = 608
CONF_THRESH = 0.15
IOU_THRESHOLD = 0.45
int_box=[0,0,0,0]
int_box1=[0,0,0,0]
fps1=0.0
def plot_one_box(x, img, color=None, label=None, line_thickness=None):"""description: Plots one bounding box on image img,this function comes from YoLov5 project.param:x:      a box likes [x1,y1,x2,y2]img:    a opencv image objectcolor:  color to draw rectangle, such as (0,255,0)label:  strline_thickness: intreturn:no return"""tl = (line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1)  # line/font thicknesscolor = color or [random.randint(0, 255) for _ in range(3)]c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))C2 = c2cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)if label:tf = max(tl - 1, 1)  # font thicknesst_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]c2 = c1[0] + t_size[0], c1[1] + t_size[1] + 8cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filledcv2.putText(img,label,(c1[0], c1[1]+t_size[1] + 5),0,tl / 3,[255,255,255],thickness=tf,lineType=cv2.LINE_AA,)class YoLov5TRT(object):"""description: A YOLOv5 class that warps TensorRT ops, preprocess and postprocess ops."""def __init__(self, engine_file_path):# Create a Context on this device,self.cfx = cuda.Device(0).make_context()stream = cuda.Stream()TRT_LOGGER = trt.Logger(trt.Logger.INFO)runtime = trt.Runtime(TRT_LOGGER)# Deserialize the engine from filewith open(engine_file_path, "rb") as f:engine = runtime.deserialize_cuda_engine(f.read())context = engine.create_execution_context()host_inputs = []cuda_inputs = []host_outputs = []cuda_outputs = []bindings = []for binding in engine:size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_sizedtype = trt.nptype(engine.get_binding_dtype(binding))# Allocate host and device buffershost_mem = cuda.pagelocked_empty(size, dtype)cuda_mem = cuda.mem_alloc(host_mem.nbytes)# Append the device buffer to device bindings.bindings.append(int(cuda_mem))# Append to the appropriate list.if engine.binding_is_input(binding):host_inputs.append(host_mem)cuda_inputs.append(cuda_mem)else:host_outputs.append(host_mem)cuda_outputs.append(cuda_mem)# Storeself.stream = streamself.context = contextself.engine = engineself.host_inputs = host_inputsself.cuda_inputs = cuda_inputsself.host_outputs = host_outputsself.cuda_outputs = cuda_outputsself.bindings = bindingsdef infer(self, input_image_path):global int_box,int_box1,fps1# threading.Thread.__init__(self)# Make self the active context, pushing it on top of the context stack.self.cfx.push()# Restorestream = self.streamcontext = self.contextengine = self.enginehost_inputs = self.host_inputscuda_inputs = self.cuda_inputshost_outputs = self.host_outputscuda_outputs = self.cuda_outputsbindings = self.bindings# Do image preprocessinput_image, image_raw, origin_h, origin_w = self.preprocess_image(input_image_path)# Copy input image to host buffernp.copyto(host_inputs[0], input_image.ravel())# Transfer input data  to the GPU.cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)# Run inference.context.execute_async(bindings=bindings, stream_handle=stream.handle)# Transfer predictions back from the GPU.cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)# Synchronize the streamstream.synchronize()# Remove any context from the top of the context stack, deactivating it.self.cfx.pop()# Here we use the first row of output in that batch_size = 1output = host_outputs[0]# Do postprocessresult_boxes, result_scores, result_classid = self.post_process(output, origin_h, origin_w)# Draw rectangles and labels on the original imagefor i in range(len(result_boxes)):box1 = result_boxes[i]plot_one_box(box1,image_raw,label="{}:{:.2f}".format(categories[int(result_classid[i])], result_scores[i]),)return image_raw# parent, filename = os.path.split(input_image_path)# save_name = os.path.join(parent, "output_" + filename)# # 　Save image# cv2.imwrite(save_name, image_raw)def destroy(self):# Remove any context from the top of the context stack, deactivating it.self.cfx.pop()def preprocess_image(self, input_image_path):"""description: Read an image from image path, convert it to RGB,resize and pad it to target size, normalize to [0,1],transform to NCHW format.param:input_image_path: str, image pathreturn:image:  the processed imageimage_raw: the original imageh: original heightw: original width"""image_raw = input_image_pathh, w, c = image_raw.shapeimage = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB)# Calculate widht and height and paddingsr_w = INPUT_W / wr_h = INPUT_H / hif r_h > r_w:tw = INPUT_Wth = int(r_w * h)tx1 = tx2 = 0ty1 = int((INPUT_H - th) / 2)ty2 = INPUT_H - th - ty1else:tw = int(r_h * w)th = INPUT_Htx1 = int((INPUT_W - tw) / 2)tx2 = INPUT_W - tw - tx1ty1 = ty2 = 0# Resize the image with long side while maintaining ratioimage = cv2.resize(image, (tw, th))# Pad the short side with (128,128,128)image = cv2.copyMakeBorder(image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, (128, 128, 128))image = image.astype(np.float32)# Normalize to [0,1]image /= 255.0# HWC to CHW format:image = np.transpose(image, [2, 0, 1])# CHW to NCHW formatimage = np.expand_dims(image, axis=0)# Convert the image to row-major order, also known as "C order":image = np.ascontiguousarray(image)return image, image_raw, h, wdef xywh2xyxy(self, origin_h, origin_w, x):"""description:    Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-rightparam:origin_h:   height of original imageorigin_w:   width of original imagex:          A boxes tensor, each row is a box [center_x, center_y, w, h]return:y:          A boxes tensor, each row is a box [x1, y1, x2, y2]"""y = torch.zeros_like(x) if isinstance(x, torch.Tensor) else np.zeros_like(x)r_w = INPUT_W / origin_wr_h = INPUT_H / origin_hif r_h > r_w:y[:, 0] = x[:, 0] - x[:, 2] / 2y[:, 2] = x[:, 0] + x[:, 2] / 2y[:, 1] = x[:, 1] - x[:, 3] / 2 - (INPUT_H - r_w * origin_h) / 2y[:, 3] = x[:, 1] + x[:, 3] / 2 - (INPUT_H - r_w * origin_h) / 2y /= r_welse:y[:, 0] = x[:, 0] - x[:, 2] / 2 - (INPUT_W - r_h * origin_w) / 2y[:, 2] = x[:, 0] + x[:, 2] / 2 - (INPUT_W - r_h * origin_w) / 2y[:, 1] = x[:, 1] - x[:, 3] / 2y[:, 3] = x[:, 1] + x[:, 3] / 2y /= r_hreturn ydef post_process(self, output, origin_h, origin_w):"""description: postprocess the predictionparam:output:     A tensor likes [num_boxes,cx,cy,w,h,conf,cls_id, cx,cy,w,h,conf,cls_id, ...]origin_h:   height of original imageorigin_w:   width of original imagereturn:result_boxes: finally boxes, a boxes tensor, each row is a box [x1, y1, x2, y2]result_scores: finally scores, a tensor, each element is the score correspoing to boxresult_classid: finally classid, a tensor, each element is the classid correspoing to box"""# Get the num of boxes detectednum = int(output[0])# Reshape to a two dimentional ndarraypred = np.reshape(output[1:], (-1, 6))[:num, :]# to a torch Tensorpred = torch.Tensor(pred).cuda()# Get the boxesboxes = pred[:, :4]# Get the scoresscores = pred[:, 4]# Get the classidclassid = pred[:, 5]# Choose those boxes that score > CONF_THRESHsi = scores > CONF_THRESHboxes = boxes[si, :]scores = scores[si]classid = classid[si]# Trandform bbox from [center_x, center_y, w, h] to [x1, y1, x2, y2]boxes = self.xywh2xyxy(origin_h, origin_w, boxes)# Do nmsindices = torchvision.ops.nms(boxes, scores, iou_threshold=IOU_THRESHOLD).cpu()result_boxes = boxes[indices, :].cpu()result_scores = scores[indices].cpu()result_classid = classid[indices].cpu()return result_boxes, result_scores, result_classidclass myThread(threading.Thread):def __init__(self, func, args):threading.Thread.__init__(self)self.func = funcself.args = argsdef run(self):self.func(*self.args)if __name__ == "__main__":# load custom pluginsPLUGIN_LIBRARY = "build/libmyplugins.so"ctypes.CDLL(PLUGIN_LIBRARY)engine_file_path = "yolov5s.engine"# load coco labelscategories = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light","fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow","elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee","skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple","sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch","potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone","microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear","hair drier", "toothbrush"]# a  YoLov5TRT instanceyolov5_wrapper = YoLov5TRT(engine_file_path)cap = cv2.VideoCapture(0)while 1:_,image =cap.read()img=yolov5_wrapper.infer(image)cv2.imshow("result", img)if cv2.waitKey(1) & 0XFF == ord('q'):  # 1 millisecondbreakcap.release()cv2.destroyAllWindows()yolov5_wrapper.destroy()

②v5.0代码

"""
An example that uses TensorRT's Python api to make inferences.
"""
import ctypes
import os
import shutil
import random
import sys
import threading
import time
import cv2
import numpy as np
import pycuda.autoinit
import pycuda.driver as cuda
import tensorrt as trt
import torch
import torchvision
import argparseCONF_THRESH = 0.5
IOU_THRESHOLD = 0.4def get_img_path_batches(batch_size, img_dir):ret = []batch = []for root, dirs, files in os.walk(img_dir):for name in files:if len(batch) == batch_size:ret.append(batch)batch = []batch.append(os.path.join(root, name))if len(batch) > 0:ret.append(batch)return retdef plot_one_box(x, img, color=None, label=None, line_thickness=None):"""description: Plots one bounding box on image img,this function comes from YoLov5 project.param: x:      a box likes [x1,y1,x2,y2]img:    a opencv image objectcolor:  color to draw rectangle, such as (0,255,0)label:  strline_thickness: intreturn:no return"""tl = (line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1)  # line/font thicknesscolor = color or [random.randint(0, 255) for _ in range(3)]c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)if label:tf = max(tl - 1, 1)  # font thicknesst_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filledcv2.putText(img,label,(c1[0], c1[1] - 2),0,tl / 3,[225, 255, 255],thickness=tf,lineType=cv2.LINE_AA,)class YoLov5TRT(object):"""description: A YOLOv5 class that warps TensorRT ops, preprocess and postprocess ops."""def __init__(self, engine_file_path):# Create a Context on this device,self.ctx = cuda.Device(0).make_context()stream = cuda.Stream()TRT_LOGGER = trt.Logger(trt.Logger.INFO)runtime = trt.Runtime(TRT_LOGGER)# Deserialize the engine from filewith open(engine_file_path, "rb") as f:engine = runtime.deserialize_cuda_engine(f.read())context = engine.create_execution_context()host_inputs = []cuda_inputs = []host_outputs = []cuda_outputs = []bindings = []for binding in engine:print('bingding:', binding, engine.get_binding_shape(binding))size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_sizedtype = trt.nptype(engine.get_binding_dtype(binding))# Allocate host and device buffershost_mem = cuda.pagelocked_empty(size, dtype)cuda_mem = cuda.mem_alloc(host_mem.nbytes)# Append the device buffer to device bindings.bindings.append(int(cuda_mem))# Append to the appropriate list.if engine.binding_is_input(binding):self.input_w = engine.get_binding_shape(binding)[-1]self.input_h = engine.get_binding_shape(binding)[-2]host_inputs.append(host_mem)cuda_inputs.append(cuda_mem)else:host_outputs.append(host_mem)cuda_outputs.append(cuda_mem)# Storeself.stream = streamself.context = contextself.engine = engineself.host_inputs = host_inputsself.cuda_inputs = cuda_inputsself.host_outputs = host_outputsself.cuda_outputs = cuda_outputsself.bindings = bindingsself.batch_size = engine.max_batch_sizedef infer(self, input_image_path):threading.Thread.__init__(self)# Make self the active context, pushing it on top of the context stack.self.ctx.push()self.input_image_path = input_image_path# Restorestream = self.streamcontext = self.contextengine = self.enginehost_inputs = self.host_inputscuda_inputs = self.cuda_inputshost_outputs = self.host_outputscuda_outputs = self.cuda_outputsbindings = self.bindings# Do image preprocessbatch_image_raw = []batch_origin_h = []batch_origin_w = []batch_input_image = np.empty(shape=[self.batch_size, 3, self.input_h, self.input_w])input_image, image_raw, origin_h, origin_w = self.preprocess_image(input_image_path)batch_origin_h.append(origin_h)batch_origin_w.append(origin_w)np.copyto(batch_input_image, input_image)batch_input_image = np.ascontiguousarray(batch_input_image)# Copy input image to host buffernp.copyto(host_inputs[0], batch_input_image.ravel())start = time.time()# Transfer input data  to the GPU.cuda.memcpy_htod_async(cuda_inputs[0], host_inputs[0], stream)# Run inference.context.execute_async(batch_size=self.batch_size, bindings=bindings, stream_handle=stream.handle)# Transfer predictions back from the GPU.cuda.memcpy_dtoh_async(host_outputs[0], cuda_outputs[0], stream)# Synchronize the streamstream.synchronize()end = time.time()# Remove any context from the top of the context stack, deactivating it.self.ctx.pop()# Here we use the first row of output in that batch_size = 1output = host_outputs[0]# Do postprocessresult_boxes, result_scores, result_classid = self.post_process(output, origin_h, origin_w)# Draw rectangles and labels on the original imagefor j in range(len(result_boxes)):box = result_boxes[j]plot_one_box(box,image_raw,label="{}:{:.2f}".format(categories[int(result_classid[j])], result_scores[j]),)return image_raw, end - startdef destroy(self):# Remove any context from the top of the context stack, deactivating it.self.ctx.pop()def get_raw_image(self, image_path_batch):"""description: Read an image from image path"""for img_path in image_path_batch:yield cv2.imread(img_path)def get_raw_image_zeros(self, image_path_batch=None):"""description: Ready data for warmup"""for _ in range(self.batch_size):yield np.zeros([self.input_h, self.input_w, 3], dtype=np.uint8)def preprocess_image(self, input_image_path):"""description: Convert BGR image to RGB,resize and pad it to target size, normalize to [0,1],transform to NCHW format.param:input_image_path: str, image pathreturn:image:  the processed imageimage_raw: the original imageh: original heightw: original width"""image_raw = input_image_pathh, w, c = image_raw.shapeimage = cv2.cvtColor(image_raw, cv2.COLOR_BGR2RGB)# Calculate widht and height and paddingsr_w = self.input_w / wr_h = self.input_h / hif r_h > r_w:tw = self.input_wth = int(r_w * h)tx1 = tx2 = 0ty1 = int((self.input_h - th) / 2)ty2 = self.input_h - th - ty1else:tw = int(r_h * w)th = self.input_htx1 = int((self.input_w - tw) / 2)tx2 = self.input_w - tw - tx1ty1 = ty2 = 0# Resize the image with long side while maintaining ratioimage = cv2.resize(image, (tw, th))# Pad the short side with (128,128,128)image = cv2.copyMakeBorder(image, ty1, ty2, tx1, tx2, cv2.BORDER_CONSTANT, (128, 128, 128))image = image.astype(np.float32)# Normalize to [0,1]image /= 255.0# HWC to CHW format:image = np.transpose(image, [2, 0, 1])# CHW to NCHW formatimage = np.expand_dims(image, axis=0)# Convert the image to row-major order, also known as "C order":image = np.ascontiguousarray(image)return image, image_raw, h, wdef xywh2xyxy(self, origin_h, origin_w, x):"""description:    Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-rightparam:origin_h:   height of original imageorigin_w:   width of original imagex:          A boxes tensor, each row is a box [center_x, center_y, w, h]return:y:          A boxes tensor, each row is a box [x1, y1, x2, y2]"""y = torch.zeros_like(x) if isinstance(x, torch.Tensor) else np.zeros_like(x)r_w = self.input_w / origin_wr_h = self.input_h / origin_hif r_h > r_w:y[:, 0] = x[:, 0] - x[:, 2] / 2y[:, 2] = x[:, 0] + x[:, 2] / 2y[:, 1] = x[:, 1] - x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2y[:, 3] = x[:, 1] + x[:, 3] / 2 - (self.input_h - r_w * origin_h) / 2y /= r_welse:y[:, 0] = x[:, 0] - x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2y[:, 2] = x[:, 0] + x[:, 2] / 2 - (self.input_w - r_h * origin_w) / 2y[:, 1] = x[:, 1] - x[:, 3] / 2y[:, 3] = x[:, 1] + x[:, 3] / 2y /= r_hreturn ydef post_process(self, output, origin_h, origin_w):"""description: postprocess the predictionparam:output:     A tensor likes [num_boxes,cx,cy,w,h,conf,cls_id, cx,cy,w,h,conf,cls_id, ...] origin_h:   height of original imageorigin_w:   width of original imagereturn:result_boxes: finally boxes, a boxes tensor, each row is a box [x1, y1, x2, y2]result_scores: finally scores, a tensor, each element is the score correspoing to boxresult_classid: finally classid, a tensor, each element is the classid correspoing to box"""# Get the num of boxes detectednum = int(output[0])# Reshape to a two dimentional ndarraypred = np.reshape(output[1:], (-1, 6))[:num, :]# to a torch Tensorpred = torch.Tensor(pred).cuda()# Get the boxesboxes = pred[:, :4]# Get the scoresscores = pred[:, 4]# Get the classidclassid = pred[:, 5]# Choose those boxes that score > CONF_THRESHsi = scores > CONF_THRESHboxes = boxes[si, :]scores = scores[si]classid = classid[si]# Trandform bbox from [center_x, center_y, w, h] to [x1, y1, x2, y2]boxes = self.xywh2xyxy(origin_h, origin_w, boxes)# Do nmsindices = torchvision.ops.nms(boxes, scores, iou_threshold=IOU_THRESHOLD).cpu()result_boxes = boxes[indices, :].cpu()result_scores = scores[indices].cpu()result_classid = classid[indices].cpu()return result_boxes, result_scores, result_classidclass inferThread(threading.Thread):def __init__(self, yolov5_wrapper):threading.Thread.__init__(self)self.yolov5_wrapper = yolov5_wrapperdef infer(self , frame):batch_image_raw, use_time = self.yolov5_wrapper.infer(frame)# for i, img_path in enumerate(self.image_path_batch):#     parent, filename = os.path.split(img_path)#     save_name = os.path.join('output', filename)#     # Save image#     cv2.imwrite(save_name, batch_image_raw[i])# print('input->{}, time->{:.2f}ms, saving into output/'.format(self.image_path_batch, use_time * 1000))return batch_image_raw,use_timeclass warmUpThread(threading.Thread):def __init__(self, yolov5_wrapper):threading.Thread.__init__(self)self.yolov5_wrapper = yolov5_wrapperdef run(self):batch_image_raw, use_time = self.yolov5_wrapper.infer(self.yolov5_wrapper.get_raw_image_zeros())print('warm_up->{}, time->{:.2f}ms'.format(batch_image_raw[0].shape, use_time * 1000))if __name__ == "__main__":# load custom pluginsparser = argparse.ArgumentParser()parser.add_argument('--engine', nargs='+', type=str, default="build/yolov5s.engine", help='.engine path(s)')parser.add_argument('--save', type=int, default=0, help='save?')opt = parser.parse_args()PLUGIN_LIBRARY = "build/libmyplugins.so"engine_file_path = opt.enginectypes.CDLL(PLUGIN_LIBRARY)# load coco labelscategories = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light","fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow","elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee","skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple","sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch","potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone","microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear","hair drier", "toothbrush"]# a YoLov5TRT instanceyolov5_wrapper = YoLov5TRT(engine_file_path)cap = cv2.VideoCapture(0)try:thread1 = inferThread(yolov5_wrapper)thread1.start()thread1.join()while 1:_,frame = cap.read()img,t=thread1.infer(frame)cv2.imshow("result", img)if cv2.waitKey(1) & 0XFF == ord('q'):  # 1 millisecondbreakfinally:# destroy the instancecap.release()cv2.destroyAllWindows()yolov5_wrapper.destroy()

最后在yolov5env环境中执行yolo_trt_test.py脚本。

conda activate yolov5env
python yolo_trt_test.py

3、实现效果

（a）未加速 (b)加速

五、总结

TensorRT加速对于深度学习模型在移动嵌入式部署十分重要，解决了一些算力较低的嵌入式设备无法部署深度学习算法或者部署效果差的情况。个人感觉当然使用v5.0的最好，它支持YOLOv5新出的几个模型加速。到此，我使用TensorRT加速yolov5的过程就到此结束，如果有问题可以随时问我，希望得到点赞和关注。翻过一座山又是一座山，下座山峰见。

六、参考文章

https://github.com/wang-xinyu/tensorrtx

Jetson AGX Xavier实现TensorRT加速YOLOv5进行实时检测相关推荐

NVIDIA jetson tensorrt加速yolov5摄像头检测
link 在使用摄像头直接检测目标时,检测的实时画面还是有点慢,下面是tensorrt加速过程记录. 一.设备 1.设备jetson agx xavier 2.jetpack4.6.1 3.tenso ...
TX2安装pytorch+TensorRT+yolov5实现实时检测
已完成的环境配置: TX2刷机后,完成了opencv4.5.1的编译:Ubuntu18.04安装opencv4.5.1+contrib 支持cuda加速(附带编译好的opencv4.5.1及缺失文件) ...
jetson agx xavier：从亮机到yolov5下tensorrt加速
重要的下载资源链接放在前面: jetpack4.5资源主要内容记录在了自己的石墨文档里,自己习惯性地修改起来比较快,可能后续小修小改在那边更新.这里就做一个csdn的拷贝造福各位. https:// ...
玩转NVIDIA Jetson AGX Xavier
最近项目应用需要部署到边缘模块,所以就玩起了Xavier.感觉网上的资料不多,所以从头记录一下笔记,方便大家一起学习应用. 玩转NVIDIA Jetson AGX Xavier 1.刷机(安装sdkm ...
NVIDIA Jetson AGX Xavier冰壶的跟踪与识别
NVIDIA Jetson AGX Xavier冰壶的跟踪与识别一.项目简介二.硬件平台的选取 2.1为什么选择NVIDIA Jetson AGX Xavier 三.识别环境配置 3.1 下载py ...
Jetson AGX Xavier刷机及环境配置
写在前面:近期打算做一下视觉算法部署的内容,正好实验室有几个Xavier控制器,于是拿来用了一下,从零接触开始nvidia的硬件,以此记录学习过程. 一.Jetson AGX Xavier刷机网上教 ...
评测Jetson AGX Xavier性能
NVIDIA Jetson AGX Xavier的GPU有512个核,是Jetson TX2的两倍,并且搭载了深度学习加速器,以及视觉加速器.Xavier的CPU表现也有了提升,从原来的6核提升到了8 ...
YolactEdge：首个开源边缘设备上的实时实例分割（Jetson AGX Xavier: 30 FPS）
YolactEdge 是第一个可在小型边缘设备上以实时速度运行的有竞争力的实例分割方法. 在550x550分辨率的图像上,以ResNet-101为主干网的YolactEdge 在Jetson AGX ...
nvidia jetson agx xavier运行 OpenCL
nvidia jetson agx xavier 运行 OpenCL 最近用OpenCL在某个GPGPU上开发了某些功能,想移植到nvidia的gpu上时才发现nvidia的移动端GPU居然没有Ope ...

Jetson AGX Xavier实现TensorRT加速YOLOv5进行实时检测