手把手教你用深度学习做物体检测(四)：模型使用

上一篇《手把手教你用深度学习做物体检测(三)：模型训练》中介绍了如何使用yolov3训练我们自己的物体检测模型，本篇文章将重点介绍如何使用我们训练好的模型来检测图片或视频中的物体。

如果你看过了上一篇文章，那么就知道我们用的是 AlexeyAB/darknet项目，该项目虽然提供了物体检测的方法，分别是基于c++和python编写的物体检测代码，但是有几个问题如下：

都不支持中文显示。
都没有显示置信度。
程序检测框样式都不够友好。
python编写的物体检测代码执行总是报类型相关错误，估计是底层c++程序的问题。

其中，中文显示乱码的问题和opencv有关，网上也有很多文章有所介绍，但是都十分繁琐，所以我基于python，借鉴 qqwweee/keras-yolo3项目的代码，重新写了一套物体检测程序，主要思想是用python的PIL库代替opencv来绘制检测信息到图像上，当然还有其它一些细节改动，就不一一说明了，直接上代码：
darknet.py文件主要是修改了detect_image方法

def detect_image(class_names, net, meta, im, thresh=.5, hier_thresh=.5, nms=.45, debug=False):num = c_int(0)if debug: print("Assigned num")pnum = pointer(num)if debug: print("Assigned pnum")predict_image(net, im)if debug: print("did prediction")dets = get_network_boxes(net, im.w, im.h, thresh, hier_thresh, None, 0, pnum, 0)if debug: print("Got dets")num = pnum[0]if debug: print("got zeroth index of pnum")if nms:do_nms_sort(dets, num, meta.classes, nms)if debug: print("did sort")res = []if debug: print("about to range")for j in range(num):if debug: print("Ranging on " + str(j) + " of " + str(num))if debug: print("Classes: " + str(meta), meta.classes, meta.names)for i in range(meta.classes):if debug: print("Class-ranging on " + str(i) + " of " + str(meta.classes) + "= " + str(dets[j].prob[i]))if dets[j].prob[i] > 0.0:b = dets[j].bboxif altNames is None:# nameTag = meta.names[i] 该步骤会导致段错误，初步判断应该是和c++程序有关，所以直接传入类别列表参数，以绕过该问题。nameTag = class_names[i]print(nameTag)else:nameTag = altNames[i]print(nameTag)if debug:print("Got bbox", b)print(nameTag)print(dets[j].prob[i])print((b.x, b.y, b.w, b.h))res.append((nameTag, dets[j].prob[i], (b.x, b.y, b.w, b.h)))if debug: print("did range")res = sorted(res, key=lambda x: -x[1])if debug: print("did sort")free_detections(dets, num)if debug: print("freed detections")return res

添加darknet_video_custom.py，内容如下

# -*- coding: utf-8 -*-
"""
本模块使用yolov3模型探测目标在图片或视频中的位置
"""
__author__ = '程序员一一涤生'import colorsys
import os
from timeit import default_timer as timer
import cv2
import numpy as np
from PIL import ImageDraw, ImageFont, Image
import darknetdef _convertBack(x, y, w, h):xmin = int(round(x - (w / 2)))xmax = int(round(x + (w / 2)))ymin = int(round(y - (h / 2)))ymax = int(round(y + (h / 2)))return xmin, ymin, xmax, ymaxdef letterbox_image(image, size):'''resize image with unchanged aspect ratio using padding'''iw, ih = image.sizew, h = sizescale = min(w / iw, h / ih)nw = int(iw * scale)nh = int(ih * scale)image = image.resize((nw, nh), Image.BICUBIC)new_image = Image.new('RGB', size, (128, 128, 128))new_image.paste(image, ((w - nw) // 2, (h - nh) // 2))return new_imageclass YOLO(object):_defaults = {"configPath": "names-data/yolo-obj.cfg","weightPath": "names-data/backup/yolo-obj_3000.weights","metaPath": "names-data/voc.data","classes_path": "names-data/voc.names","thresh": 0.3,"iou_thresh": 0.5,# "model_image_size": (416, 416),# "model_image_size": (608, 608),"model_image_size": (800, 800),"gpu_num": 1,}def __init__(self, **kwargs):self.__dict__.update(self._defaults)  # set up default valuesself.__dict__.update(kwargs)  # and update with user overridesself.class_names = self._get_class()self.colors = self._get_colors()self.netMain = darknet.load_net_custom(self.configPath.encode("ascii"), self.weightPath.encode("ascii"), 0,1)  # batch size = 1self.metaMain = darknet.load_meta(self.metaPath.encode("ascii"))self.altNames = self._get_alt_names()def _get_class(self):classes_path = os.path.expanduser(self.classes_path)with open(classes_path, encoding="utf-8") as f:class_names = f.readlines()class_names = [c.strip() for c in class_names]return class_namesdef _get_colors(self):class_names = self._get_class()# Generate colors for drawing bounding boxes.hsv_tuples = [(x / len(class_names), 1., 1.)for x in range(len(class_names))]colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples))colors = list(map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), colors))np.random.seed(10101)  # Fixed seed for consistent colors across runs.np.random.shuffle(colors)  # Shuffle colors to decorrelate adjacent classes.np.random.seed(None)  # Reset seed to default.return colorsdef _get_alt_names(self):try:with open(self.metaPath) as metaFH:metaContents = metaFH.read()import rematch = re.search("names *= *(.*)$", metaContents, re.IGNORECASE | re.MULTILINE)if match:result = match.group(1)else:result = Nonetry:if os.path.exists(result):with open(result) as namesFH:namesList = namesFH.read().strip().split("\n")altNames = [x.strip() for x in namesList]except TypeError:passexcept Exception:passreturn altNamesdef cvDrawBoxes(self, detections, image):# 字体相关设置，包括字体文件路径、字体大小font = ImageFont.truetype(font='font/simfang.ttf',size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))# 检测框的边框厚度，该公式使得厚度可以根据图片的大小来自动调整thickness = (image.size[0] + image.size[1]) // 300  #
        # 遍历每个检测到的目标detection:(classname,probaility,(x,y,w,h))for c, detection in enumerate(detections):# 获取当前目标的类别和置信度分数classname = detection[0]# score = round(detection[1] * 100, 2)score = round(detection[1], 2)label = '{} {:.2f}'.format(classname, score)# 计算检测框左上角(xmin, ymin)和右下角的坐标(xmax, ymax)x, y, w, h = detection[2][0], \detection[2][1], \detection[2][2], \detection[2][3]xmin, ymin, xmax, ymax = _convertBack(float(x), float(y), float(w), float(h))# 获取绘制实例draw = ImageDraw.Draw(image)# 获取将显示的文本的大小label_size = draw.textsize(label, font)# 将坐标对应到top, left, bottom, right，注意不要对应错了top, left, bottom, right = (ymin, xmin, ymax, xmax)top = max(0, np.floor(top + 0.5).astype('int32'))left = max(0, np.floor(left + 0.5).astype('int32'))bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))right = min(image.size[0], np.floor(right + 0.5).astype('int32'))print(label, (left, top), (right, bottom))if top - label_size[1] >= 0:text_origin = np.array([left, top - label_size[1]])else:text_origin = np.array([left, top + 1])if c > len(self.class_names) - 1:c = 1# 绘制边框厚度for i in range(thickness):draw.rectangle([left + i, top + i, right - i, bottom - i],outline=self.colors[c])# 绘制检测框的文本边界
            draw.rectangle([tuple(text_origin), tuple(text_origin + label_size)],fill=self.colors[c])# 绘制文本draw.text(text_origin, label, fill=(0, 0, 0), font=font)del drawreturn imagedef detect_video(self, video_path, output_path="",show=True):nw = self.model_image_size[0]nh = self.model_image_size[1]assert nw % 32 == 0, 'Multiples of 32 required'assert nh % 32 == 0, 'Multiples of 32 required'vid = cv2.VideoCapture(video_path)if not vid.isOpened():raise IOError("Couldn't open webcam or video")video_FourCC = cv2.VideoWriter_fourcc(*"mp4v")video_fps = vid.get(cv2.CAP_PROP_FPS)video_size = (nw,nh)isOutput = True if output_path != "" else Falseif isOutput:print("!!! TYPE:", type(output_path), type(video_FourCC), type(video_fps), type(video_size))out = cv2.VideoWriter(output_path, video_FourCC, video_fps, video_size)accum_time = 0curr_fps = 0fps = "FPS: ??"prev_time = timer()# Create an image we reuse for each detectdarknet_image = darknet.make_image(nw, nh, 3)while True:return_value, frame = vid.read()if return_value:# 转成RGB格式，因为opencv默认使用BGR格式读取图片，而PIL是用RGBframe_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)image = Image.fromarray(frame_rgb)image_resized = image.resize(video_size, Image.LINEAR)darknet.copy_image_from_bytes(darknet_image, np.asarray(image_resized).tobytes())detections = darknet.detect_image(self.class_names, self.netMain, self.metaMain, darknet_image,thresh=self.thresh, debug=True)image_resized = self.cvDrawBoxes(detections, image_resized)result = np.asarray(image_resized)# 转成BGR格式以便opencv处理result = cv2.cvtColor(result, cv2.COLOR_RGB2BGR)curr_time = timer()exec_time = curr_time - prev_timeprev_time = curr_timeaccum_time = accum_time + exec_timecurr_fps = curr_fps + 1if accum_time > 1:accum_time = accum_time - 1fps = "FPS: " + str(curr_fps)curr_fps = 0cv2.putText(result, text=fps, org=(3, 15), fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=0.50, color=(255, 0, 0), thickness=2)if show:cv2.imshow("Object Detect", result)if isOutput:print("start write...==========================================")out.write(result)if cv2.waitKey(1) & 0xFF == ord('q'):breakelse:breakout.release()vid.release()cv2.destroyAllWindows()def detect_image(self, image_path, save_path):nw = self.model_image_size[0]nh = self.model_image_size[1]assert nw % 32 == 0, 'Multiples of 32 required'assert nh % 32 == 0, 'Multiples of 32 required'try:image = Image.open(image_path)except:print('Open Error! Try again!')else:image_resized = image.resize((nw, nh), Image.LINEAR)darknet_image = darknet.make_image(nw, nh, 3)darknet.copy_image_from_bytes(darknet_image, np.asarray(image_resized).tobytes())# 识别图片得到目标的类别、置信度、中心点坐标和检测框的高宽detections = darknet.detect_image(self.class_names, self.netMain, self.metaMain, darknet_image,thresh=0.25, debug=True)# 在图片上将detections信息绘制出来image_resized = self.cvDrawBoxes(detections, image_resized)# 显示绘制后的图片
            image_resized.show()image_resized.save(save_path)if __name__ == "__main__":_yolo = YOLO()_yolo.detect_image("names-data/images/food.JPG", "names-data/images/food_detect.JPG")# _yolo.detect_video("names-data/videos/food.mp4", "names-data/videos/food_detect.mp4",show=False)

上面的代码的关键部分都附有相关的注释，这里就不一一解读了，另外附上中文字体文件，放到项目的font目录下即可。

下载链接： https://github.com/Halfish/lstm-ctc-ocr/blob/master/fonts/simfang.ttf

下面是我收藏的一些其他字体，你可以挑选自己喜欢的字体使用。

链接：https://pan.baidu.com/s/1PWS7Hw1z3dkDyq7feZxqEQ
提取码：xu8q

下面看看如何显示置信度，打开src/images.c文件，将draw_detections_cv_v3函数用如下代码替换，注意替换后要重新make一下项目：

void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output){int i, j;if (!show_img) return;static int frame_id = 0;frame_id++;for (i = 0; i < num; ++i) {char labelstr[4096] = { 0 };int class_id = -1;for (j = 0; j < classes; ++j) {int show = strncmp(names[j], "dont_show", 9);if (dets[i].prob[j] > thresh && show) {float score=dets[i].prob[j];//在label标签上加入置信度if (class_id < 0) {strcat(labelstr, names[j]);strcat(labelstr, ", ");sprintf(labelstr + strlen(labelstr), "%0.2f", score);class_id = j;}else {strcat(labelstr, ", ");strcat(labelstr, names[j]);strcat(labelstr, ", ");sprintf(labelstr + strlen(labelstr), "%0.2f", score);}printf("%s: %.0f%% ", names[j], score * 100);}}if (class_id >= 0) {int width = show_img->height * .006;int offset = class_id * 123457 % classes;float red = get_color(2, offset, classes);float green = get_color(1, offset, classes);float blue = get_color(0, offset, classes);float rgb[3];rgb[0] = red;rgb[1] = green;rgb[2] = blue;box b = dets[i].bbox;b.w = (b.w < 1) ? b.w : 1;b.h = (b.h < 1) ? b.h : 1;b.x = (b.x < 1) ? b.x : 1;b.y = (b.y < 1) ? b.y : 1;int left = (b.x - b.w / 2.)*show_img->width;int right = (b.x + b.w / 2.)*show_img->width;int top = (b.y - b.h / 2.)*show_img->height;int bot = (b.y + b.h / 2.)*show_img->height;if (left < 0) left = 0;if (right > show_img->width - 1) right = show_img->width - 1;if (top < 0) top = 0;if (bot > show_img->height - 1) bot = show_img->height - 1;float const font_size = show_img->height / 1000.F;CvPoint pt1, pt2, pt_text, pt_text_bg1, pt_text_bg2;pt1.x = left;pt1.y = top;pt2.x = right;pt2.y = bot;pt_text.x = left;pt_text.y = top - 12;pt_text_bg1.x = left;pt_text_bg1.y = top - (10 + 25 * font_size);pt_text_bg2.x = right;pt_text_bg2.y = top;CvScalar color;color.val[0] = red * 256;color.val[1] = green * 256;color.val[2] = blue * 256;cvRectangle(show_img, pt1, pt2, color, width, 8, 0);if (ext_output)printf("\t(left_x: %4.0f   top_y: %4.0f   width: %4.0f   height: %4.0f)\n",(float)left, (float)top, b.w*show_img->width, b.h*show_img->height);elseprintf("\n");cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, width, 8, 0);cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, CV_FILLED, 8, 0);    // filled
            CvScalar black_color;black_color.val[0] = 0;CvFont font;cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, font_size, font_size, 0, font_size * 3, 8);cvPutText(show_img, labelstr, pt_text, &font, black_color);}}if (ext_output) {fflush(stdout);}
}

以上操作都准备好了之后，执行python darknet_video_custom.py即可开始检测图片或视频中的物体。效果如下：

是不是很酷呢O(∩_∩)O~。本系列文章到此已经写了4篇，分别是《快速感受物体检测的酷炫》、《数据标注》、《模型训练》、《模型使用》，我们已经体验了整个物体检测的过程，对物体检测的过程有了一定的了解。下一篇《手把手教你用深度学习做物体检测(五)：YOLO》会介绍一下YOLO算法的相关内容，让我们了解目标检测背后是如何工作的。

ok，本篇就这么多内容啦~，感谢阅读O(∩_∩)O，88~

名句分享

人不是向外奔走才是旅行，静静坐着思维也是旅行，凡是探索、追寻、触及那些不可知的情境，不论是风土的，或是心灵的，都是一种旅行。

—— 林清玄

为您推荐

如何在阿里云租一台GPU服务器做深度学习？

手把手教你用深度学习做物体检测(三)：模型训练

手把手教你用深度学习做物体检测(二)：数据标注

手把手教你用深度学习做物体检测(一)：快速感受物体检测的酷炫

ubuntu16.04安装Anaconda3

Unbuntu下持续观察NvidiaGPU的状态

我的博客即将同步至腾讯云+社区，邀请大家一同入驻：https://cloud.tencent.com/developer/support-plan?invite_code=1kvpuxzlylh68

转载于:https://www.cnblogs.com/anai/p/11459569.html

手把手教你用深度学习做物体检测(四)：模型使用相关推荐

手把手教你用深度学习做物体检测(三)：模型训练
本篇文章旨在快速试验使用yolov3算法训练出自己的物体检测模型,所以会重过程而轻原理,当然,原理是非常重要的,只是原理会安排在后续文章中专门进行介绍.所以如果本文中有些地方你有原理方面的疑惑,也没关 ...
手把手教你用深度学习做物体检测(二)：数据标注
"本篇文章将开始我们训练自己的物体检测模型之旅的第一步-- 数据标注." 上篇文章介绍了如何基于训练好的模型检测图片和视频中的物体,若你也想先感受一下物体检测,可以看看上篇 ...
python硬件驱动_从零开始：手把手教你安装深度学习操作系统、驱动和各种python库！...
原标题:从零开始:手把手教你安装深度学习操作系统.驱动和各种python库! 为了研究强化学习,最近购置了一台基于 Ubuntu 和英伟达 GPU 的深度学习机器.尽管目前在网络中能找到一些环境部署指 ...
案例：手把手教你运用深度学习构建视频人脸识别模型(Python实现)
作者:Faizan Shaikh :翻译:季洋:校对:王雨桐: 本文约2700字,建议阅读10+分钟. 本文将展示如何使用开源工具完成一个人脸识别的算法. 引言 "计算机视觉和机器学习已经开 ...
手把手教你使用深度学习的方法进行人脸解锁
来源:DeepHub IMBA 本文约3000字,建议阅读8分钟本文手把手教你如何创建人脸解锁算法. 今天,我们将使用深度学习来创建面部解锁算法.要完成我们的任务需要三个主要部分. 查找人脸的算法 ...
独家 | 手把手教你运用深度学习构建视频人脸识别模型(Python实现)
作者:Faizan Shaikh 翻译:季洋校对:王雨桐本文约2700字,建议阅读10+分钟. 本文将展示如何使用开源工具完成一个人脸识别的算法. 引言 "计算机视觉和机器学习已经开始腾 ...
手把手教你安装深度学习软件环境（附代码）
来源:机器之心本文长度为2800字,建议阅读5分钟. 本文向你解释如何在一台新装的 Ubuntu 机器上安装 Python 和 Nvidia 硬件驱动.各类库和软件包. 为了进行强化学习研究,我最近 ...
保姆级教程：手把手教你使用深度学习处理文本
大家好,今天给大家分享使用深度学习处理文本,更多技术干货,后面会陆续分享出来,感兴趣可以持续关注. 文章目录 NLP技术历程准备数据标准化词元化Tokenization(文本拆分) 技术提升建 ...
手把手教你搭建深度学习开发环境（Tensorflow）
前段时间在阿里云买了一台服务器,准备部署网站,近期想玩一些深度学习项目,正好拿来用.TensorFlow官网的安装仅提及Ubuntu,但我的ECS操作系统是 CentOS 7.6 64位,搭建Pyth ...

手把手教你用深度学习做物体检测(四)：模型使用

手把手教你用深度学习做物体检测(四)：模型使用相关推荐

最新文章

热门文章