Single shot object detection SSD using MobileNet and OpenCV

微信公众号：小白图像与视觉

关于技术、关注yysilence00。有问题或建议，请公众号留言。

主题：Single shot object detection SSD using MobileNet and OpenCV

①版本：opencv3.4.1 numpy imutils

②模型参数将其放置在model_data /中

③项目结构：

1、引言

在本节中，我们将使用深度学习，MobileNet和OpenCV对单发目标检测（SSD）进行实用概述。对象检测是计算机视觉领域最热门的话题。对象检测正在进入广泛的行业，其用例范围从人身安全到工作场所的生产力。对象检测和识别已应用于计算机视觉的许多领域，包括图像检索，安全性，监视，自动车牌识别，光学字符识别，交通控制，医疗领域，农业领域等等。

2、什么是物体检测

在“对象检测”中，我们对图像进行分类，并确定对象在图像中的位置。为此，我们必须获得边界框，即图像的（x，y）坐标。帧检测被视为回归问题。单镜头检测（SSD）是物体检测的方法之一。

3、各种物体检测方法

每当我们谈论对象检测时，我们主要讨论这些主要的检测方法。

Faster RCNN
You look only once (YOLO)
Single-shot detection

A:
FasterRCNN对各个区域执行检测，然后最终对图像中的各个区域进行多次预测。它的速度从每秒5到7帧不等。

B:
还有另一种类型的检测，称为YOLO对象检测，在计算机视觉的实时对象检测器中非常流行。它的架构类似于Faster RCNN。YOLO在训练和测试期间会看到整个图像，因此它隐式地编码有关类及其外观的上下文信息。简而言之，我们将图像通过Faster RCNN网络传递一次并输出其主要预测。
YOLO的属性

它的处理速度为每秒45帧，比实时检测要好。
与RCNN相比，它产生的背景错误更少。
YOLO在训练数据集上使用k-means聚类策略来确定那些默认边界框。
YOLO的主要问题在于，仍然有很多精度不足。

C:
SSD单次拍摄对象检测或SSD拍摄一次即可检测图像中的多个对象.
它由两部分组成

提取特征图
应用卷积过滤器检测对象
SSD是由Google研究人员团队开发的，旨在平衡YOLO和RCNN这两种对象检测方法之间的平衡。

具体来说，有两种型号的SSD可用:

SSD300：在此型号中，输入大小固定为300×300。它用于较低分辨率的图像，更快的处理速度，并且精度不如SSD512
SSD512：在此型号中，输入大小固定为500×500。它用于更高分辨率的图像，并且比其他模型更准确。
SSD比R-CNN快，因为在R-CNN中，我们需要两次拍摄，一个用于生成区域建议，一次用于检测对象，而在SSD中，它可以一次完成。

首先在COCO数据集上训练MobileNet SSD方法，然后在PASCAL VOC上进行微调，达到72.7％mAP（平均平均精度）。

例如-在Pascal VOC 2007数据集中，SSD300的mAP为79.6％，SSD512的mAP为81.6％，这比R-CNN的78.8％mAP快。

我们正在使用MobileNet-SSD（这是MobileNet-SSD检测网络的Caffe实现，其VOC0712和mAP = 0.727上的权重经过预先训练）。

VOC0712是用于对象类别识别的图像数据集，而mAP（平均精度）是用于对象识别的最常用指标。如果我们将MobileNet架构和Single Shot Detector（SSD）框架合并在一起，我们将得出快速，高效的基于深度学习的目标检测方法。

4、为什么要在SSD中使用MobileNet

产生了一个问题，为什么我们使用MobileNet，为什么我们不能使用Resnet，VGG或alexnet。
答案很简单。Resnet或VGG或alexnet具有较大的网络规模，并且增加了计算量，而在Mobilenet中，存在一个简单的架构，该架构由3×3 深度卷积和 1×1 向量卷积组成.

5、所有主要目标检测方法之间的比较

6、完整代码

mobilenet-ssd.py

"""#first step1cd .. 先到项目根目录#and second step2python src/mobilenet-ssd.py -m model_data/MobileNetSSD_deploy.caffemodel -p model_data/MobileNetSSD_deploy.prototxt -i images/airp.jpg -o out/out.jpg

"""

import numpy as npimport argparseimport cv2

#1、构造参数解释器并解析参数ap = argparse.ArgumentParser()

ap.add_argument("-i", "--image", required=True, help="path to input image")ap.add_argument("-p", "--prototxt", required=True, help="path to caffe 'deploy' prototxt file")ap.add_argument("-m", "--model", required=True, help="path to Caffe pre-trained model")ap.add_argument("-c", "--confidence", type=float, default=0.2,help="minimum probability to filter weak detections")ap.add_argument("-o", "--out", required=True, help="path to output image")

args = vars(ap.parse_args())print("args:",args)

#2、初始化MobileNet SSD受过训练的类别标签列表#检测，然后为每个类生成一组边界框颜色CLASSES = ["background", "aeroplane", "bicycle", "bird", "boat",     "bottle", "bus", "car", "cat", "chair", "cow", "diningtable",     "dog", "horse", "motorbike", "person", "pottedplant", "sheep",     "sofa", "train", "tvmonitor"]COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

#3、加载模型并调用命令行参数print("[INFO] loding model....")net = cv2.dnn.readNetFromCaffe(args["prototxt"], args["model"])

#4、load the input image and construct an input blob for the image# by resizing to a fixed 300x300 pixels and then normalizing it# (note: normalization is done via the authors of the MobileNet SSD# implementation)image = cv2.imread(args["image"])(h, w) = image.shape[:2]blob = cv2.dnn.blobFromImage(cv2.resize(image, (300, 300)), 0.007843, (300, 300), 127.5)

#4、通过神经网络传递Blobprint('[INFO] computing object detection...')net.setInput(blob)detections = net.forward()print("detections-tensor: ", detections)print("detections-shape: ", detections.shape)

#5、循环检测# loop over the detectionsfor i in np.arange(0, detections.shape[2]):    # extract the confidence (i.e., the probability) associated with the prediction    confidence = detections[0, 0, i, 2]    print("confidence: ", confidence)    # filter out weak detections by ensuring the 'confidence' is greater than the minimum confidence    if confidence > args['confidence']:        # extract the index of the classes label from the 'detections',        # then compute the (x, y)-coordinates of the bounding box for the object        # Network produces output blob with a shape 1x1xNx7 where N is a number of        #detections and an every detection is a vector of values        # [batchId, classId, confidence, left, top, right, bottom]        idx = int(detections[0, 0, i, 1])        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])        (startX, startY, endX, endY) = box.astype('int')

        # display the prediction        label = '{}: {:.2f}%'.format(CLASSES[idx], confidence * 100)        print('[INFO] {}'.format(label))        cv2.rectangle(image, (startX, startY), (endX, endY), COLORS[idx], 2)        y = startY - 15 if startY - 15 > 15 else startY + 15        cv2.putText(image, label, (startX, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)    print("i=", i)

# 6、show the output imagecv2.imshow('Output', image)cv2.imwrite(args['out'], image)cv2.waitKey(0)

(opencver) D:\PythonProjects\MobileNet_SSD>python src/mobilenet-ssd.py -m model_data/MobileNetSSD_deploy.caffemodel -p model_data/MobileNetSSD_deploy.prototxt -i images/airp.jpg -o out/out.jpgargs: {'image': 'images/airp.jpg', 'prototxt': 'model_data/MobileNetSSD_deploy.prototxt', 'model': 'model_data/MobileNetSSD_deploy.caffemodel', 'confidence': 0.2, 'out': 'out/out.jpg'}[INFO] loding model....[INFO] computing object detection...detections-tensor:  [[[[0.         1.         0.9873991  0.4052945  0.28947714 0.8447337    0.6101209 ]]]]detections-shape:  (1, 1, 1, 7)confidence:  0.9873991i= 0

7、总结

8、参考文献：

1、https://honingds.com/blog/ssd-single-shot-object-detection-mobilenet-opencv/

2、https://blog.csdn.net/v_JULY_v/article/details/80170182

更多请扫码关注：