将yolov5的detect.py改写成可以供其他程序调用的方式，并实现低时延（＜0.5s）直播推理

将yolov5的推理代码改成可供其它程序调用的方式，并实现低时延（<0.5s）直播推理

yolov5的代码具有高度的模块化，对于初学者十分友好，但是如果咱们要做二次开发，想直接调用其中一些函数，恐怕还是得费一番功夫。

参考https://www.pythonheidong.com/blog/article/851830/44a42d351037d307d02d/
和https://blog.csdn.net/ld_long/article/details/113920521（不知道为什么失效了）

实现了：
t=detectapi(weights)
results,names=t.detect(source)
其中参数 weights是权重文件的路径。参数source是一个列表，列表的每个元素是由cv2的读取的图片。返回值results是一个列表。列表的元素个数为source的元素个数，每个元素为每张图片的处理结果。每张图片的处理结果有两个，一个是一张在原图片中画框标识物品的cv2图片。另一个是一个列表，这个列表的元素个数等于本图片探测到的物品数量。元素为这个物品的信息：（物品在names中的引索，[物品的位置x1,y1,x2,y2],置信度）。返回值names为物品字典。
应用如下：打开摄像头，实时探测目标物品

import cv2
import detect
cap=cv2.VideoCapture(0)
a=detect.detectapi(weights='weights/yolov5s.pt')
while True:rec,img = cap.read()result,names =a.detect([img])img=result[0][0] #第一张图片的处理结果图片'''for cls,(x1,y1,x2,y2),conf in result[0][1]: #第一张图片的处理结果标签。print(cls,x1,y1,x2,y2,conf)cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0))cv2.putText(img,names[cls],(x1,y1-20),cv2.FONT_HERSHEY_DUPLEX,1.5,(255,0,0))'''cv2.imshow("vedio",img)if cv2.waitKey(1)==ord('q'):break

下面将detect.py做如下新增，原来的代码不要删

# 增加运行参数，原来的参数是通过命令行解析对象提供的，这里改为由调用者在代码中提供。需要一个
# 大体上完成一样功能的参数对象。
# 我想要的功能是传一组由cv2读取的图片，交给api，然后得到一组打上标签的图片，以及每张图片对应的标签类别引索，位置信息，置信度的信息，还有类别名称字典
# 要实现这个功能，需要权重文件，输入文件两个参数，其他参数与原代码命令行默认参数保持一致就行。
class simulation_opt:# 参数对象。def __init__(self,weights,img_size=640,conf_thres=0.25,iou_thres=0.45,device='',view_img=False,classes=None,agnostic_nms=False,augment=False,update=False,exist_ok=False):self.weights=weightsself.source=Noneself.img_size=img_sizeself.conf_thres=conf_thresself.iou_thres=iou_thresself.device=deviceself.view_img=view_imgself.classes=classesself.agnostic_nms=agnostic_nmsself.augment=augmentself.update=updateself.exist_ok=exist_ok#增加一个新类，这个新类是在原来detect函数上进行删减。可以先复制原来的detect函数代码，再着手修改
class detectapi:def __init__(self,weights,img_size=640):# 构造函数中先做好必要的准备，如初始化参数，加载模型''' 删掉source, weights, view_img, save_txt, imgsz = opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_sizewebcam = source.isnumeric() or source.endswith('.txt') or source.lower().startswith(('rtsp://', 'rtmp://', 'http://'))''' #改为self.opt=simulation_opt(weights=weights,img_size=img_size)weights, imgsz= self.opt.weights, self.opt.img_size''' 删掉# Directories#save_dir = Path(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))  # increment run#(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir'''# Initializeset_logging()self.device = select_device(self.opt.device)self.half = self.device.type != 'cpu'  # half precision only supported on CUDA# Load modelself.model = attempt_load(weights, map_location=self.device)  # load FP32 modelself.stride = int(self.model.stride.max())  # model strideself.imgsz = check_img_size(imgsz, s=self.stride)  # check img_sizeif self.half:self.model.half()  # to FP16# Second-stage classifierself.classify = Falseif self.classify:self.modelc = load_classifier(name='resnet101', n=2)  # initializeself.modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=self.device)['model']).to(self.device).eval()'''self.names,和self.colors是由后面的代码拉到这里来的。names是类别名称字典，colors是画框时用到的颜色。'''# read names and colorsself.names = self.model.module.names if hasattr(self.model, 'module') else self.model.namesself.colors = [[random.randint(0, 255) for _ in range(3)] for _ in self.names]def detect(self,source): # 使用时，调用这个函数if type(source)!=list:raise TypeError('source must be a list which contain  pictures read by cv2')'''删掉if webcam:view_img = check_imshow()cudnn.benchmark = True  # set True to speed up constant image size inferencedataset = LoadStreams(source, img_size=imgsz, stride=stride)else:save_img = Truedataset = LoadImages(source, img_size=imgsz, stride=stride)'''# 改为# Set Dataloaderdataset = MyLoadImages(source, img_size=self.imgsz, stride=self.stride)# 原来是通过路径加载数据集的，现在source里面就是加载好的图片，所以数据集对象的实现要# 重写。修改代码后附。在utils.dataset.py上修改。'''移动到构造方法末尾。names是类别名称字典，colors是画框时用到的颜色。names = model.module.names if hasattr(model, 'module') else model.namescolors = [[random.randint(0, 255) for _ in range(3)] for _ in names]'''# Run inferenceif self.device.type != 'cpu':self.model(torch.zeros(1, 3, self.imgsz, self.imgsz).to(self.device).type_as(next(self.model.parameters())))  # run onceresult=[]''' 删掉for path, img, im0s, vid_cap in dataset: 因为不用保存，所以path可以不要，因为不处理视频，所以vid_cap不要。''' #改为for img, im0s in dataset:img = torch.from_numpy(img).to(self.device)img = img.half() if self.half else img.float()  # uint8 to fp16/32img /= 255.0  # 0 - 255 to 0.0 - 1.0if img.ndimension() == 3:img = img.unsqueeze(0)# Inference# t1 = time_synchronized() #计算预测用时的，可以不要pred = self.model(img, augment=self.opt.augment)[0]# Apply NMSpred = non_max_suppression(pred, self.opt.conf_thres, self.opt.iou_thres, classes=self.opt.classes, agnostic=self.opt.agnostic_nms)# t2 = time_synchronized() #计算预测用时的，可以不要# Apply Classifierif self.classify:pred = apply_classifier(pred, self.modelc, img, im0s)'''删掉for i, det in enumerate(pred):  # detections per imageif webcam:  # batch_size >= 1p, s, im0, frame = path[i], '%g: ' % i, im0s[i].copy(), dataset.countelse:p, s, im0, frame = path, '', im0s, getattr(dataset, 'frame', 0)p = Path(p)  # to Pathsave_path = str(save_dir / p.name)  # img.jpgtxt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # img.txts += '%gx%g ' % img.shape[2:]  # print stringgn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwhif len(det):# Rescale boxes from img_size to im0 sizedet[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()# Print resultsfor c in det[:, -1].unique():n = (det[:, -1] == c).sum()  # detections per classs += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string# Write resultsfor *xyxy, conf, cls in reversed(det):if save_txt:  # Write to filexywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywhline = (cls, *xywh, conf) if opt.save_conf else (cls, *xywh)  # label formatwith open(txt_path + '.txt', 'a') as f:f.write(('%g ' * len(line)).rstrip() % line + '\n')if save_img or view_img:  # Add bbox to imagelabel = f'{names[int(cls)]} {conf:.2f}'plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)''' # 改为# Process detectionsdet=pred[0] #原来的情况是要保持图片，因此多了很多关于保持路径上的处理。另外，pred# 其实是个列表。元素个数为batch_size。由于对于我这个api，每次只处理一个图片，# 所以pred中只有一个元素，直接取出来就行，不用for循环。im0 = im0s.copy() # 这是原图片，与被传进来的图片是同地址的，需要copy一个副本，否则，原来的图片会受到影响# s += '%gx%g ' % img.shape[2:]  # print string# gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwhresult_txt = []# 对于一张图片，可能有多个可被检测的目标。所以结果标签也可能有多个。# 每被检测出一个物体，result_txt的长度就加一。result_txt中的每个元素是个列表，记录着# 被检测物的类别引索，在图片上的位置，以及置信度if len(det):# Rescale boxes from img_size to im0 sizedet[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()# Print results'''for c in det[:, -1].unique():n = (det[:, -1] == c).sum()  # detections per classs += f"{n} {self.names[int(c)]}{'s' * (n > 1)}, "  # add to string'''# Write resultsfor *xyxy, conf, cls in reversed(det):#xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywhline = (int(cls.item()), [int(_.item()) for _ in xyxy], conf.item())  # label formatresult_txt.append(line)label = f'{self.names[int(cls)]}{conf:.2f}'plot_one_box(xyxy, im0, label=label, color=self.colors[int(cls)], line_thickness=3)result.append((im0,result_txt)) # 对于每张图片，返回画完框的图片，以及该图片的标签列表。return result, self.names

下面对 yolov5/utils/dataset.py 修改，直接把下面代码增加到dataset.py即可，其他代码不用动。

class MyLoadImages:  # for inferencedef __init__(self, path, img_size=640, stride=32):for img in path:if type(img)!=np.ndarray or len(img.shape)!=3:raise TypeError('there is a object which is not a picture read by cv2 in source')'''p = str(Path(path).absolute())  # os-agnostic absolute pathif '*' in p:files = sorted(glob.glob(p, recursive=True))  # globelif os.path.isdir(p):files = sorted(glob.glob(os.path.join(p, '*.*')))  # direlif os.path.isfile(p):files = [p]  # fileselse:raise Exception(f'ERROR: {p} does not exist')images = [x for x in files if x.split('.')[-1].lower() in img_formats]videos = [x for x in files if x.split('.')[-1].lower() in vid_formats]ni, nv = len(images), len(videos)'''self.img_size = img_sizeself.stride = strideself.files = pathself.nf = len(path)#self.video_flag = [False] * ni + [True] * nvself.mode = 'image'#if any(videos):#self.new_video(videos[0])  # new video#else:#self.cap = None#assert self.nf > 0, f'No images or videos found in {p}. ' \#f'Supported formats are:\nimages: {img_formats}\nvideos: {vid_formats}'def __iter__(self):self.count = 0return selfdef __next__(self):if self.count == self.nf:raise StopIterationpath = self.files[self.count]'''if self.video_flag[self.count]:# Read videoself.mode = 'video'ret_val, img0 = self.cap.read()if not ret_val:self.count += 1self.cap.release()if self.count == self.nf:  # last videoraise StopIterationelse:path = self.files[self.count]self.new_video(path)ret_val, img0 = self.cap.read()self.frame += 1print(f'video {self.count + 1}/{self.nf} ({self.frame}/{self.nframes}) {path}: ', end='')'''# Read imageself.count += 1#img0 = cv2.imread(path)  # BGR#assert img0 is not None, 'Image Not Found ' + path#print(f'image {self.count}/{self.nf} {path}: ', end='')# Padded resizeimg = letterbox(path, self.img_size, stride=self.stride)[0]# Convertimg = img[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416img = np.ascontiguousarray(img)return img, path

效果如下：

在实际的使用中，我并没有用摄像头，而是通过rtmp拉取视频流的方式读取的数据，会有10秒推理延迟，为解决这个问题，我新建了缓存队列，将读取到的数据存入队列中，保证入队和出队速度相同（相当于一个滑动窗口），yolov5每隔0.01秒在队头取一帧用来推理。从而防止出现动态延迟。
代码如下：
新建demo.py，把刚刚加了代码的detect.py导入进来


import cv2
import multiprocessing as mp
import detect
import timedef image_put(q, ip, port, name):cap = cv2.VideoCapture("rtmp://localhost:1935/live/movie")if cap.isOpened():print(name)while True:q.put(cap.read()[1])q.get() if q.qsize() > 1 else time.sleep(0.01)#print("555" * 25) if cap.read()[0] == False else print(" ")def get_frames():camera_ip, camera_port, camera_name = "192.168.2.119", "554", "stream0"mp.set_start_method(method='spawn')  # initqueue = mp.Queue(maxsize=2)processes = mp.Process(target=image_put, args=(queue, camera_ip, camera_port, camera_name)),[process.start() for process in processes]while True:yield queue.get()def main():a=detect.detectapi(weights='runs/train/exp24/weights/best.pt')frames=get_frames()for frame in frames:result,names =a.detect([frame])img=result[0][0] #第一张图片的处理结果图片'''for cls,(x1,y1,x2,y2),conf in result[0][1]: #第一张图片的处理结果标签。print(cls,x1,y1,x2,y2,conf)cv2.rectangle(img,(x1,y1),(x2,y2),(0,255,0))cv2.putText(img,names[cls],(x1,y1-20),cv2.FONT_HERSHEY_DUPLEX,1.5,(255,0,0))'''cv2.namedWindow("video",cv2.WINDOW_NORMAL)cv2.imshow("video",img)cv2.waitKey(1)
if __name__ == '__main__':main()

最终效果请见视频：

YOLOv5实现数据推拉流与实时推理

具体实现如下：

参考了yolov5 rtmp实时推理
一、下个OBS，捕获特定桌面窗口，功耗极低.

二、安装golang，只要命令行敲个命令运行go就行了。装完golang之后，git拉取livego，这东西是本地服务器，可以用OBS推流到服务器上，再从OBS上拉取rtmp视频流。
git地址：https://github.com/gwuhaolin/livego.git

livego使用步骤：1、转到 livego 目录并执行go build或make build2、双击exe文件运行livego3、获取串流密钥 http://localhost:8090/control/get?room=movie4、推流地址 rtmp://localhost:1935/live5、拉取播放地址 rtmp://localhost:1935/live/movie

三、验证一下是否获取到窗口rtmp视频流，OBS自定义推流到livego的推流地址rtmp://localhost:1935/live，随便用个播放器找到网络播放输入livego的播放地址rtmp://localhost:1935/live/movie，就能看到你的窗口rtmp视频流了。

四、yolov5推理指令–source后输入livego的播放地址rtmp://localhost:1935/live/movie，后面在跟一个–view-img，就能实时推理某一特定窗口了。（虽然我最终没有用官方的detect了。）

将yolov5的detect.py改写成可以供其他程序调用的方式，并实现低时延（＜0.5s）直播推理相关推荐

Pyinstaller将yolov5的detect.py封装成detect.exe，并用C++调用
Pyinstaller打包与c++调用 Pyinstaller打包 Pyinstall常用命令 Pyinstaller的踩坑记录 pyinstaller打包流程 C++调用 vs2019创建一个解决方 ...
yolov5的detect.py代码详解
目标检测系列之yolov5的detect.py代码详解前言哈喽呀!今天又是小白挑战读代码啊!所写的是目标检测系列之yolov5的detect.py代码详解.yolov5代码对应的是官网v6.1版本 ...
yolov5 检测detect.py笔记
参考 https://github.com/ultralytics/yolov5 带你一行行读懂yolov5代码,yolov5源码试运行安装环境 (yolo) ┌──(venv)─(***㉿kal ...
python如何封装成可调用的库_Python实现打包成库供别的模块调用
1.创建python项目bricewulib 2.新建test_package包并创建info1类以及print_hello方法 3.为了让包的结构再复杂点,我们再在test_package下面新建一 ...
Python 调用中控门禁并包装成webservice供移动设备调用。
前段时间做了个小试验,用Python 调用中控门禁并包装成webservice供移动设备调用. 移动端用的是泛微的OA企业微信端. 实现手机远程开门效果,拿着手机,走到哪,哪的门就自动开了,很屌的样子 ...
【matlab】matlab算法封装成工具包提供给程序调用
说明: 1.非进程通讯协议,无需在电脑上安装完整版的matlab开发环境. 2.本项目以C#为案例,调用的语言不限,操作流程基本相同. 一.准备工作 1.安装MATLABWebAppServerSet ...
YOLOv5的Tricks | 【Trick13】YOLOv5的detect.py脚本的解析与简化
如有错误,恳请指出. 在之前介绍了一堆yolov5的训练技巧,train.py脚本也介绍得差不多了.之后还有detect和val两个脚本文件,还想把它们总结完. 在之前测试yolov5训练好的模型时, ...
C++代码封装成dll供C#中调用、调用dll无可用源
C#工程不可以直接调用C++的头文件和Lib库等所以在程序中C#需要调用的现象,先将C++的东西封装成动态链接库,再调用若调用dll时显示:无可用源调用,说明导入的DLL路径不对. 静态库和动态库 ...
YOLOv5之detect.py文件
一.解析指令的参数二.检测包安装完整三.调用run()函数 1.确定数据类型 2.创建保存结果的目录 3.加载权重文件 4.加载图片.视频等,将图片或视频的每一帧进行缩放(640*640),缩放比 ...

将yolov5的detect.py改写成可以供其他程序调用的方式，并实现低时延（＜0.5s）直播推理

将yolov5的推理代码改成可供其它程序调用的方式，并实现低时延（<0.5s）直播推理

将yolov5的detect.py改写成可以供其他程序调用的方式，并实现低时延（＜0.5s）直播推理相关推荐

最新文章

热门文章