吴恩达深度学习 4.3 卷积神经网络-目标检测

1. 知识点

目标与特征点

分类：判断图片是否为汽车

定位：确定汽车的位置

检测：图片中有多个不同物体，及其位置

目标分类与定位的卷积输出：是否存在对象及位置坐标

y=[ $p_{c},b_{x},b_{y},b_{h},b_{w},c_{1},c_{2},c_{3}$ ]，其中 $p_{c}$ 表示图片中是否有物体， $b_{x},b_{y},b_{h},b_{w}$ 表示物体位置， $c_{1},c_{2},c_{3}$ 表示是否存在该分类物体。

损失函数

对c（表示物体是否存在），用交叉熵。对边界框，用平方误差。对p（表示图片中是否有物体），交叉熵。

特征点检测：通过特征点的定位，来实现对目标特征的识别

目标检测-训练集：训练集X，整张几乎被汽车占据的图片或没有汽车的图片；训练集Y，标注0或1。

用滑动窗口获取输入图片：以特定大小窗口获取图片，以固定步幅滑动，遍历整个图片。选取更大的窗口，重复以上。

用1X1卷积核代替卷积网络中的全连接层：1X1卷积层卷积核的个数与隐藏层神经元个数相同

滑动窗口在整幅图片上进行滑动卷积，和直接在该图片上直接进行卷积运算是等同的（我看懂为什么？？？）
目标检测：无需利用滑动窗口，直接卷积

YOLO：用滑动窗口以寻找到更加准确的边界框

将图片划为nxn个小图片

采用图像分类和定位算法，应用在nxn个图片上

输出nxn个标签值，yi=[ $p_{c},b_{x},b_{y},b_{h},b_{w},c_{1},c_{2},c_{3}$ ] ,i=1,2,...,nxn

YOLO notation

1) 对于跨子图的检测对象，观察对象的中点，对于跨子图的对象，将其分配到中点所在的子图中

2) 输出的边界框不受滑动窗口步幅的限制

3) 是一次卷积运算，不是nxn次卷积运算

边界框坐标定义：bx,by为目标中点的坐标，bh,bw为目标的宽和高

交并比：用理想边界框和预测边界框的交界和并集之比，来评价目标检测算法是否良好

非最大值抑制算法（non-max suppression,NMS）：对于同一目标，舍弃Pc值较小的输出

对于每一个输出，依次比较Pc的大小，选取Pc值最大的，输出边界框

Anchor box：在每个子图片上检测多个对象

用YOLO进行目标检测：检测图片中的人、汽车和摩托车

输入：图片划分为3X3个格子的

输出：3X3X16向量

用NMS对概率低的边界框进行抑制

提前分割出候选区域：R-CNN算法，运用图像分割算法，将图片按颜色分块，然后在这些色块上放置窗口，提高检测效率

2. 应用实例：识别图片中的物体

实现思路：

分类阈值过滤：
1）由检测对象存在概率Pc和每个分类的概率（c1,c2,c3,......,c80）相乘，计算出每个分类的得分box_scores。

2）获取得分最高的分类box_classes，比如，class:c=3(car);获取对应的类别的分数box_classes_scores，比如，score:0.44。

3）根据阈值创建掩码，比如，[0.9,0.3,0.4,0.5,0.1]>0.4，返回[1,0,0,1,0]。用得分阈值创建掩码filtering_mask，用对得分较低的类进行丢弃。

4）对box_classes、box_classes_scrores、boxes使用掩码，对得分较低的预测结果进行丢弃。

非最大值抑制法：
1）给定交比并阈值。
2）选择得分高的锚框。
3）依次计算所选择的框与其他框的交并比，丢弃交并比大于阈值的锚框。

用训练好的yolov2.h5进行预测：
1）加载训练好的模型yolov2.h5
2）对图片进行预测，获取scores(预测概率值)、boxes(锚框位置)、classes(预测类别)。

import argparse
import os
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
import scipy.io
import scipy.misc
import numpy as np
import pandas as pd
import PIL
import tensorflow as tf
from keras import backend as K
from keras.layers import Input, Lambda, Conv2D
from keras.models import load_model, Modelfrom yad2k.models.keras_yolo import yolo_head, yolo_boxes_to_corners, preprocess_true_boxes, yolo_loss, yolo_bodyimport yolo_utils%matplotlib inline

Using TensorFlow backend.

def yolo_filter_boxes(box_confidence , boxes, box_class_probs, threshold = 0.6):"""通过阈值来过滤对象和分类的置信度。参数：box_confidence  - tensor类型，维度为（19,19,5,1）,包含19x19单元格中每个单元格预测的5个锚框中的所有的锚框的pc （一些对象的置信概率）。boxes - tensor类型，维度为(19,19,5,4)，包含了所有的锚框的（px,py,ph,pw ）。box_class_probs - tensor类型，维度为(19,19,5,80)，包含了所有单元格中所有锚框的所有对象( c1,c2,c3，···，c80 )检测的概率。threshold - 实数，阈值，如果分类预测的概率高于它，那么这个分类预测的概率就会被保留。返回：scores - tensor 类型，维度为(None,)，包含了保留了的锚框的分类概率。boxes - tensor 类型，维度为(None,4)，包含了保留了的锚框的(b_x, b_y, b_h, b_w)classess - tensor 类型，维度为(None,)，包含了保留了的锚框的索引注意："None"是因为你不知道所选框的确切数量，因为它取决于阈值。比如：如果有10个锚框，scores的实际输出大小将是（10,）"""#第一步：计算锚框的得分box_scores  = box_confidence * box_class_probs#第二步：找到最大值的锚框的索引以及对应的最大值的锚框的分数box_classes = K.argmax(box_scores, axis=-1)box_class_scores = K.max(box_scores, axis=-1)print("box_class_scores=",box_class_scores)#第三步：根据阈值创建掩码filtering_mask = (box_class_scores >= threshold)#对scores, boxes 以及 classes使用掩码scores = tf.boolean_mask(box_class_scores,filtering_mask)print("scores=",scores)boxes = tf.boolean_mask(boxes,filtering_mask)classes = tf.boolean_mask(box_classes,filtering_mask)return scores , boxes , classes

with tf.Session() as test_a:box_confidence = tf.random_normal([19,19,5,1], mean=1, stddev=4, seed=1)boxes = tf.random_normal([19,19,5,4],  mean=1, stddev=4, seed=1)box_class_probs = tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1)scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, threshold = 0.5)print("scores=",scores)print("scores[2] = " + str(scores[2].eval()))print("boxes[2] = " + str(boxes[2].eval()))print("classes[2] = " + str(classes[2].eval()))print("scores.shape = " + str(scores.shape))print("boxes.shape = " + str(boxes.shape))print("classes.shape = " + str(classes.shape))test_a.close()

box_class_scores= Tensor("Max_10:0", shape=(19, 19, 5), dtype=float32)
scores= Tensor("boolean_mask_30/GatherV2:0", shape=(?,), dtype=float32)
scores= Tensor("boolean_mask_30/GatherV2:0", shape=(?,), dtype=float32)
scores[2] = 10.750582
boxes[2] = [ 8.426533   3.2713668 -0.5313436 -4.9413733]
classes[2] = 7
scores.shape = (?,)
boxes.shape = (?, 4)
classes.shape = (?,)

def iou(box1, box2):"""实现两个锚框的交并比的计算参数：box1 - 第一个锚框，元组类型，(x1, y1, x2, y2)box2 - 第二个锚框，元组类型，(x1, y1, x2, y2)返回：iou - 实数，交并比。"""#计算相交的区域的面积xi1 = np.maximum(box1[0], box2[0])yi1 = np.maximum(box1[1], box2[1])xi2 = np.minimum(box1[2], box2[2])yi2 = np.minimum(box1[3], box2[3])inter_area = (xi1-xi2)*(yi1-yi2)#计算并集，公式为：Union(A,B) = A + B - Inter(A,B)box1_area = (box1[2]-box1[0])*(box1[3]-box1[1])box2_area = (box2[2]-box2[0])*(box2[3]-box2[1])union_area = box1_area + box2_area - inter_area#计算交并比iou = inter_area / union_areareturn iou

box1 = (2,1,4,3)
box2 = (1,2,3,4)print("iou = " + str(iou(box1, box2)))

iou = 0.14285714285714285

def yolo_non_max_suppression(scores, boxes, classes, max_boxes=10, iou_threshold=0.5):"""为锚框实现非最大值抑制（ Non-max suppression (NMS)）参数：scores - tensor类型，维度为(None,)，yolo_filter_boxes()的输出boxes - tensor类型，维度为(None,4)，yolo_filter_boxes()的输出，已缩放到图像大小（见下文）classes - tensor类型，维度为(None,)，yolo_filter_boxes()的输出max_boxes - 整数，预测的锚框数量的最大值iou_threshold - 实数，交并比阈值。返回：scores - tensor类型，维度为(,None)，每个锚框的预测的可能值boxes - tensor类型，维度为(4,None)，预测的锚框的坐标classes - tensor类型，维度为(,None)，每个锚框的预测的分类注意："None"是明显小于max_boxes的，这个函数也会改变scores、boxes、classes的维度，这会为下一步操作提供方便。"""max_boxes_tensor = K.variable(max_boxes,dtype="int32") #用于tf.image.non_max_suppression()K.get_session().run(tf.variables_initializer([max_boxes_tensor])) #初始化变量max_boxes_tensor#使用使用tf.image.non_max_suppression()来获取与我们保留的框相对应的索引列表nms_indices = tf.image.non_max_suppression(boxes, scores,max_boxes,iou_threshold)#使用K.gather()来选择保留的锚框scores = K.gather(scores, nms_indices)boxes = K.gather(boxes, nms_indices)classes = K.gather(classes, nms_indices)return scores, boxes, classes

with tf.Session() as test_b:scores = tf.random_normal([54,], mean=1, stddev=4, seed = 1)boxes = tf.random_normal([54, 4], mean=1, stddev=4, seed = 1)classes = tf.random_normal([54,], mean=1, stddev=4, seed = 1)scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes)print("scores[2] = " + str(scores[2].eval()))print("boxes[2] = " + str(boxes[2].eval()))print("classes[2] = " + str(classes[2].eval()))print("scores.shape = " + str(scores.eval().shape))print("boxes.shape = " + str(boxes.eval().shape))print("classes.shape = " + str(classes.eval().shape))test_b.close()

scores[2] = 6.938395
boxes[2] = [-5.299932    3.1379814   4.450367    0.95942086]
classes[2] = -2.2452729
scores.shape = (10,)
boxes.shape = (10, 4)
classes.shape = (10,)

def yolo_eval(yolo_outputs, image_shape=(720.,1280.), max_boxes=10, score_threshold=0.6,iou_threshold=0.5):"""将YOLO编码的输出（很多锚框）转换为预测框以及它们的分数，框坐标和类。参数：yolo_outputs - 编码模型的输出（对于维度为（608,608,3）的图片），包含4个tensors类型的变量：box_confidence ： tensor类型，维度为(None, 19, 19, 5, 1)box_xy         ： tensor类型，维度为(None, 19, 19, 5, 2)box_wh         ： tensor类型，维度为(None, 19, 19, 5, 2)box_class_probs： tensor类型，维度为(None, 19, 19, 5, 80)image_shape - tensor类型，维度为（2,），包含了输入的图像的维度，这里是(608.,608.)max_boxes - 整数，预测的锚框数量的最大值score_threshold - 实数，可能性阈值。iou_threshold - 实数，交并比阈值。返回：scores - tensor类型，维度为(,None)，每个锚框的预测的可能值boxes - tensor类型，维度为(4,None)，预测的锚框的坐标classes - tensor类型，维度为(,None)，每个锚框的预测的分类"""#获取YOLO模型的输出box_confidence, box_xy, box_wh, box_class_probs = yolo_outputs#中心点转换为边角boxes = yolo_boxes_to_corners(box_xy,box_wh)#可信度分值过滤scores, boxes, classes = yolo_filter_boxes(box_confidence, boxes, box_class_probs, score_threshold)#缩放锚框，以适应原始图像boxes = yolo_utils.scale_boxes(boxes, image_shape)#使用非最大值抑制scores, boxes, classes = yolo_non_max_suppression(scores, boxes, classes, max_boxes, iou_threshold)return scores, boxes, classes

with tf.Session() as test_c:yolo_outputs = (tf.random_normal([19, 19, 5, 1], mean=1, stddev=4, seed = 1),tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),tf.random_normal([19, 19, 5, 2], mean=1, stddev=4, seed = 1),tf.random_normal([19, 19, 5, 80], mean=1, stddev=4, seed = 1))scores, boxes, classes = yolo_eval(yolo_outputs)print("scores[2] = " + str(scores[2].eval()))print("boxes[2] = " + str(boxes[2].eval()))print("classes[2] = " + str(classes[2].eval()))print("scores.shape = " + str(scores.eval().shape))print("boxes.shape = " + str(boxes.eval().shape))print("classes.shape = " + str(classes.eval().shape))test_c.close()

scores[2] = 138.79124
boxes[2] = [1292.3297  -278.52167 3876.9893  -835.56494]
classes[2] = 54
scores.shape = (10,)
boxes.shape = (10, 4)
classes.shape = (10,)

sess = K.get_session()

class_names = yolo_utils.read_classes("model_data/coco_classes.txt")
anchors = yolo_utils.read_anchors("model_data/yolo_anchors.txt")
image_shape = (720.,1280.)

yolo_model = load_model("model_data/yolov2.h5")

yolo_model.summary()

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            (None, 608, 608, 3)  0
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 608, 608, 32) 864         input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 608, 608, 32) 128         conv2d_1[0][0]
__________________________________________________________________________________________________
leaky_re_lu_1 (LeakyReLU)       (None, 608, 608, 32) 0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 304, 304, 32) 0           leaky_re_lu_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 304, 304, 64) 18432       max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 304, 304, 64) 256         conv2d_2[0][0]
__________________________________________________________________________________________________
leaky_re_lu_2 (LeakyReLU)       (None, 304, 304, 64) 0           batch_normalization_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 152, 152, 64) 0           leaky_re_lu_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 152, 152, 128 73728       max_pooling2d_2[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 152, 152, 128 512         conv2d_3[0][0]
__________________________________________________________________________________________________
leaky_re_lu_3 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_3[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 152, 152, 64) 8192        leaky_re_lu_3[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 152, 152, 64) 256         conv2d_4[0][0]
__________________________________________________________________________________________________
leaky_re_lu_4 (LeakyReLU)       (None, 152, 152, 64) 0           batch_normalization_4[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 152, 152, 128 73728       leaky_re_lu_4[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 152, 152, 128 512         conv2d_5[0][0]
__________________________________________________________________________________________________
leaky_re_lu_5 (LeakyReLU)       (None, 152, 152, 128 0           batch_normalization_5[0][0]
__________________________________________________________________________________________________
max_pooling2d_3 (MaxPooling2D)  (None, 76, 76, 128)  0           leaky_re_lu_5[0][0]
__________________________________________________________________________________________________
conv2d_6 (Conv2D)               (None, 76, 76, 256)  294912      max_pooling2d_3[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 76, 76, 256)  1024        conv2d_6[0][0]
__________________________________________________________________________________________________
leaky_re_lu_6 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_6[0][0]
__________________________________________________________________________________________________
conv2d_7 (Conv2D)               (None, 76, 76, 128)  32768       leaky_re_lu_6[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 76, 76, 128)  512         conv2d_7[0][0]
__________________________________________________________________________________________________
leaky_re_lu_7 (LeakyReLU)       (None, 76, 76, 128)  0           batch_normalization_7[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 76, 76, 256)  294912      leaky_re_lu_7[0][0]
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 76, 76, 256)  1024        conv2d_8[0][0]
__________________________________________________________________________________________________
leaky_re_lu_8 (LeakyReLU)       (None, 76, 76, 256)  0           batch_normalization_8[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 38, 38, 256)  0           leaky_re_lu_8[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 38, 38, 512)  1179648     max_pooling2d_4[0][0]
__________________________________________________________________________________________________
batch_normalization_9 (BatchNor (None, 38, 38, 512)  2048        conv2d_9[0][0]
__________________________________________________________________________________________________
leaky_re_lu_9 (LeakyReLU)       (None, 38, 38, 512)  0           batch_normalization_9[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_9[0][0]
__________________________________________________________________________________________________
batch_normalization_10 (BatchNo (None, 38, 38, 256)  1024        conv2d_10[0][0]
__________________________________________________________________________________________________
leaky_re_lu_10 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_10[0][0]
__________________________________________________________________________________________________
conv2d_11 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_10[0][0]
__________________________________________________________________________________________________
batch_normalization_11 (BatchNo (None, 38, 38, 512)  2048        conv2d_11[0][0]
__________________________________________________________________________________________________
leaky_re_lu_11 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_11[0][0]
__________________________________________________________________________________________________
conv2d_12 (Conv2D)              (None, 38, 38, 256)  131072      leaky_re_lu_11[0][0]
__________________________________________________________________________________________________
batch_normalization_12 (BatchNo (None, 38, 38, 256)  1024        conv2d_12[0][0]
__________________________________________________________________________________________________
leaky_re_lu_12 (LeakyReLU)      (None, 38, 38, 256)  0           batch_normalization_12[0][0]
__________________________________________________________________________________________________
conv2d_13 (Conv2D)              (None, 38, 38, 512)  1179648     leaky_re_lu_12[0][0]
__________________________________________________________________________________________________
batch_normalization_13 (BatchNo (None, 38, 38, 512)  2048        conv2d_13[0][0]
__________________________________________________________________________________________________
leaky_re_lu_13 (LeakyReLU)      (None, 38, 38, 512)  0           batch_normalization_13[0][0]
__________________________________________________________________________________________________
max_pooling2d_5 (MaxPooling2D)  (None, 19, 19, 512)  0           leaky_re_lu_13[0][0]
__________________________________________________________________________________________________
conv2d_14 (Conv2D)              (None, 19, 19, 1024) 4718592     max_pooling2d_5[0][0]
__________________________________________________________________________________________________
batch_normalization_14 (BatchNo (None, 19, 19, 1024) 4096        conv2d_14[0][0]
__________________________________________________________________________________________________
leaky_re_lu_14 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_14[0][0]
__________________________________________________________________________________________________
conv2d_15 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_14[0][0]
__________________________________________________________________________________________________
batch_normalization_15 (BatchNo (None, 19, 19, 512)  2048        conv2d_15[0][0]
__________________________________________________________________________________________________
leaky_re_lu_15 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_15[0][0]
__________________________________________________________________________________________________
conv2d_16 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_15[0][0]
__________________________________________________________________________________________________
batch_normalization_16 (BatchNo (None, 19, 19, 1024) 4096        conv2d_16[0][0]
__________________________________________________________________________________________________
leaky_re_lu_16 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_16[0][0]
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, 19, 19, 512)  524288      leaky_re_lu_16[0][0]
__________________________________________________________________________________________________
batch_normalization_17 (BatchNo (None, 19, 19, 512)  2048        conv2d_17[0][0]
__________________________________________________________________________________________________
leaky_re_lu_17 (LeakyReLU)      (None, 19, 19, 512)  0           batch_normalization_17[0][0]
__________________________________________________________________________________________________
conv2d_18 (Conv2D)              (None, 19, 19, 1024) 4718592     leaky_re_lu_17[0][0]
__________________________________________________________________________________________________
batch_normalization_18 (BatchNo (None, 19, 19, 1024) 4096        conv2d_18[0][0]
__________________________________________________________________________________________________
leaky_re_lu_18 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_18[0][0]
__________________________________________________________________________________________________
conv2d_19 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_18[0][0]
__________________________________________________________________________________________________
batch_normalization_19 (BatchNo (None, 19, 19, 1024) 4096        conv2d_19[0][0]
__________________________________________________________________________________________________
conv2d_21 (Conv2D)              (None, 38, 38, 64)   32768       leaky_re_lu_13[0][0]
__________________________________________________________________________________________________
leaky_re_lu_19 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_19[0][0]
__________________________________________________________________________________________________
batch_normalization_21 (BatchNo (None, 38, 38, 64)   256         conv2d_21[0][0]
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, 19, 19, 1024) 9437184     leaky_re_lu_19[0][0]
__________________________________________________________________________________________________
leaky_re_lu_21 (LeakyReLU)      (None, 38, 38, 64)   0           batch_normalization_21[0][0]
__________________________________________________________________________________________________
batch_normalization_20 (BatchNo (None, 19, 19, 1024) 4096        conv2d_20[0][0]
__________________________________________________________________________________________________
space_to_depth_x2 (Lambda)      (None, 19, 19, 256)  0           leaky_re_lu_21[0][0]
__________________________________________________________________________________________________
leaky_re_lu_20 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_20[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 19, 19, 1280) 0           space_to_depth_x2[0][0]          leaky_re_lu_20[0][0]
__________________________________________________________________________________________________
conv2d_22 (Conv2D)              (None, 19, 19, 1024) 11796480    concatenate_1[0][0]
__________________________________________________________________________________________________
batch_normalization_22 (BatchNo (None, 19, 19, 1024) 4096        conv2d_22[0][0]
__________________________________________________________________________________________________
leaky_re_lu_22 (LeakyReLU)      (None, 19, 19, 1024) 0           batch_normalization_22[0][0]
__________________________________________________________________________________________________
conv2d_23 (Conv2D)              (None, 19, 19, 425)  435625      leaky_re_lu_22[0][0]
==================================================================================================
Total params: 50,983,561
Trainable params: 50,962,889
Non-trainable params: 20,672
__________________________________________________________________________________________________

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))

scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

def predict(sess, image_file, is_show_info=True, is_plot=True):"""运行存储在sess的计算图以预测image_file的边界框，打印出预测的图与信息。参数：sess - 包含了YOLO计算图的TensorFlow/Keras的会话。image_file - 存储在images文件夹下的图片名称返回：out_scores - tensor类型，维度为(None,)，锚框的预测的可能值。out_boxes - tensor类型，维度为(None,4)，包含了锚框位置信息。out_classes - tensor类型，维度为(None,)，锚框的预测的分类索引。 """import imageio#图像预处理image, image_data = yolo_utils.preprocess_image("images/" + image_file, model_image_size = (608, 608))#运行会话并在feed_dict中选择正确的占位符.out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict = {yolo_model.input:image_data, K.learning_phase(): 0})#打印预测信息if is_show_info:print("在" + str(image_file) + "中找到了" + str(len(out_boxes)) + "个锚框。")#指定要绘制的边界框的颜色colors = yolo_utils.generate_colors(class_names)#在图中绘制边界框yolo_utils.draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)#保存已经绘制了边界框的图image.save(os.path.join("out", image_file), quality=100)#打印出已经绘制了边界框的图if is_plot:#output_image = scipy.misc.imread(os.path.join("out", image_file))output_image = imageio.imread(os.path.join("out", image_file))plt.imshow(output_image)return out_scores, out_boxes, out_classes

out_scores, out_boxes, out_classes = predict(sess, "test.jpg")

在test.jpg中找到了7个锚框。
car 0.60 (925, 285) (1045, 374)
car 0.66 (706, 279) (786, 350)
bus 0.67 (5, 266) (220, 407)
car 0.70 (947, 324) (1280, 705)
car 0.74 (159, 303) (346, 440)
car 0.80 (761, 282) (942, 412)
car 0.89 (367, 300) (745, 648)


for i in range(1,121):#计算需要在前面填充几个0num_fill = int( len("0000") - len(str(1))) + 1#对索引进行填充filename = str(i).zfill(num_fill) + ".jpg"print("当前文件：" + str(filename))#开始绘制，不打印信息，不绘制图out_scores, out_boxes, out_classes = predict(sess, filename,is_show_info=False,is_plot=False)print("绘制完成！")

当前文件：0001.jpg
当前文件：0002.jpg
当前文件：0003.jpg
car 0.69 (347, 289) (445, 321)
car 0.70 (230, 307) (317, 354)
car 0.73 (671, 284) (770, 315)
当前文件：0004.jpg
car 0.63 (400, 285) (515, 327)
car 0.66 (95, 297) (227, 342)
car 0.68 (1, 321) (121, 410)
car 0.72 (539, 277) (658, 318)
当前文件：0005.jpg
car 0.64 (207, 297) (338, 340)
car 0.65 (741, 266) (918, 313)
car 0.67 (15, 313) (128, 362)
car 0.72 (883, 260) (1026, 303)
car 0.75 (517, 282) (689, 336)
当前文件：0006.jpg
car 0.72 (470, 286) (686, 343)
car 0.72 (72, 320) (220, 367)
当前文件：0007.jpg
car 0.67 (1086, 243) (1225, 312)
car 0.78 (468, 292) (685, 353)
当前文件：0008.jpg
truck 0.63 (852, 252) (1083, 330)
car 0.78 (1082, 275) (1275, 340)
当前文件：0009.jpg
当前文件：0010.jpg
truck 0.66 (736, 266) (1054, 368)
当前文件：0011.jpg
truck 0.73 (727, 269) (1054, 376)
car 0.85 (6, 336) (212, 457)
当前文件：0012.jpg
car 0.77 (792, 279) (1163, 408)
car 0.87 (539, 330) (998, 459)
当前文件：0013.jpg
truck 0.65 (718, 276) (1053, 385)
当前文件：0014.jpg
truck 0.64 (715, 274) (1056, 385)
当前文件：0015.jpg
truck 0.72 (713, 275) (1086, 386)
当前文件：0016.jpg
truck 0.63 (708, 276) (1106, 388)
当前文件：0017.jpg
truck 0.64 (666, 274) (1063, 392)
car 0.79 (1103, 300) (1271, 356)
car 0.82 (1, 358) (183, 427)
当前文件：0018.jpg
car 0.76 (71, 362) (242, 419)
car 0.77 (340, 339) (553, 401)
当前文件：0019.jpg
truck 0.64 (685, 275) (1050, 396)
car 0.85 (16, 377) (450, 559)
当前文件：0020.jpg
truck 0.75 (538, 286) (926, 413)
当前文件：0021.jpg
car 0.62 (691, 292) (914, 403)
truck 0.72 (88, 317) (493, 450)
当前文件：0022.jpg
car 0.65 (894, 302) (980, 348)
car 0.79 (751, 318) (879, 370)
当前文件：0023.jpg
当前文件：0024.jpg
当前文件：0025.jpg
car 0.65 (664, 296) (705, 321)
当前文件：0026.jpg
当前文件：0027.jpg
当前文件：0028.jpg
car 0.72 (711, 303) (792, 368)
当前文件：0029.jpg
truck 0.65 (698, 282) (781, 336)
当前文件：0030.jpg
当前文件：0031.jpg
car 0.67 (187, 316) (313, 409)
当前文件：0032.jpg
当前文件：0033.jpg
car 0.62 (899, 279) (964, 306)
当前文件：0034.jpg
traffic light 0.61 (200, 107) (228, 170)
car 0.70 (179, 326) (312, 424)
当前文件：0035.jpg
car 0.62 (1084, 278) (1194, 319)
当前文件：0036.jpg
car 0.65 (211, 313) (349, 402)
car 0.73 (1014, 274) (1201, 338)
当前文件：0037.jpg
car 0.63 (326, 302) (419, 365)
当前文件：0038.jpg
当前文件：0039.jpg
car 0.67 (312, 301) (398, 364)
当前文件：0040.jpg
car 0.61 (330, 299) (415, 363)
当前文件：0041.jpg
car 0.65 (341, 294) (415, 367)
当前文件：0042.jpg
当前文件：0043.jpg
car 0.76 (118, 312) (237, 384)
当前文件：0044.jpg
car 0.61 (551, 283) (624, 329)
当前文件：0045.jpg
traffic light 0.70 (383, 40) (416, 101)
traffic light 0.73 (569, 33) (604, 102)
当前文件：0046.jpg
当前文件：0047.jpg
当前文件：0048.jpg
当前文件：0049.jpg
当前文件：0050.jpg
当前文件：0051.jpg
car 0.68 (151, 323) (247, 379)
traffic light 0.72 (500, 79) (532, 138)
当前文件：0052.jpg
当前文件：0053.jpg
当前文件：0054.jpg
car 0.63 (726, 293) (800, 353)
car 0.72 (786, 292) (941, 410)
当前文件：0055.jpg
car 0.73 (758, 277) (904, 389)
当前文件：0056.jpg
当前文件：0057.jpg
当前文件：0058.jpg
当前文件：0059.jpg
car 0.77 (0, 307) (257, 464)
car 0.82 (570, 277) (864, 417)
car 0.86 (86, 319) (527, 493)
当前文件：0060.jpg
当前文件：0061.jpg
当前文件：0062.jpg
当前文件：0063.jpg
当前文件：0064.jpg
当前文件：0065.jpg
car 0.69 (380, 270) (462, 324)
当前文件：0066.jpg
traffic light 0.62 (532, 68) (564, 113)
car 0.77 (372, 281) (454, 333)
当前文件：0067.jpg
traffic light 0.65 (535, 60) (570, 105)
car 0.70 (369, 280) (454, 345)
当前文件：0068.jpg
traffic light 0.64 (378, 87) (405, 146)
traffic light 0.64 (536, 60) (572, 108)
car 0.66 (367, 288) (450, 348)
当前文件：0069.jpg
traffic light 0.60 (537, 62) (577, 109)
car 0.62 (367, 289) (450, 346)
traffic light 0.63 (379, 87) (407, 147)
当前文件：0070.jpg
car 0.65 (369, 291) (452, 354)
当前文件：0071.jpg
truck 0.70 (87, 287) (569, 450)
当前文件：0072.jpg
traffic light 0.61 (535, 65) (572, 111)
traffic light 0.62 (378, 91) (406, 148)
car 0.62 (291, 301) (357, 351)
truck 0.64 (1049, 263) (1280, 399)
car 0.64 (0, 331) (84, 449)
car 0.66 (368, 292) (450, 357)
当前文件：0073.jpg
car 0.74 (145, 313) (248, 374)
car 0.85 (503, 299) (858, 421)
当前文件：0074.jpg
traffic light 0.60 (380, 91) (407, 147)
car 0.61 (365, 294) (450, 346)
car 0.87 (31, 319) (424, 485)
当前文件：0075.jpg
car 0.75 (151, 315) (246, 372)
当前文件：0076.jpg
traffic light 0.62 (381, 93) (407, 146)
car 0.66 (246, 298) (336, 366)
car 0.70 (369, 292) (451, 357)
car 0.75 (150, 313) (245, 375)
当前文件：0077.jpg
traffic light 0.60 (536, 65) (571, 112)
traffic light 0.60 (380, 92) (407, 147)
car 0.70 (243, 296) (345, 368)
car 0.71 (368, 292) (450, 356)
car 0.75 (150, 313) (245, 374)
当前文件：0078.jpg
traffic light 0.62 (380, 92) (407, 146)
car 0.65 (367, 293) (453, 353)
car 0.70 (242, 295) (339, 367)
car 0.75 (151, 313) (245, 373)
当前文件：0079.jpg
traffic light 0.61 (535, 65) (571, 111)
traffic light 0.61 (378, 92) (406, 148)
car 0.72 (235, 298) (327, 367)
car 0.76 (151, 314) (243, 373)
当前文件：0080.jpg
traffic light 0.61 (379, 92) (407, 147)
car 0.64 (5, 309) (188, 416)
car 0.71 (237, 298) (324, 366)
car 0.79 (714, 282) (916, 362)
当前文件：0081.jpg
car 0.68 (187, 309) (301, 381)
car 0.76 (612, 293) (722, 353)
car 0.79 (25, 328) (141, 398)
当前文件：0082.jpg
traffic light 0.61 (380, 92) (408, 147)
car 0.62 (585, 287) (660, 335)
car 0.70 (410, 282) (609, 388)
当前文件：0083.jpg
traffic light 0.63 (380, 91) (407, 148)
car 0.64 (0, 328) (82, 447)
car 0.72 (609, 280) (888, 397)
当前文件：0084.jpg
traffic light 0.62 (378, 91) (406, 150)
car 0.70 (990, 272) (1270, 381)
当前文件：0085.jpg
traffic light 0.61 (535, 64) (571, 114)
traffic light 0.61 (378, 91) (406, 150)
当前文件：0086.jpg
traffic light 0.60 (378, 92) (407, 150)
traffic light 0.61 (536, 65) (572, 113)
当前文件：0087.jpg
truck 0.60 (0, 315) (93, 404)
traffic light 0.60 (536, 66) (572, 112)
traffic light 0.61 (382, 92) (410, 149)
当前文件：0088.jpg
traffic light 0.61 (377, 92) (406, 150)
当前文件：0089.jpg
当前文件：0090.jpg
当前文件：0091.jpg
traffic light 0.71 (300, 76) (333, 159)
当前文件：0092.jpg
traffic light 0.62 (232, 25) (266, 95)
当前文件：0093.jpg
car 0.60 (361, 313) (414, 341)
当前文件：0094.jpg
当前文件：0095.jpg
当前文件：0096.jpg
car 0.68 (202, 319) (301, 369)
当前文件：0097.jpg
car 0.69 (74, 330) (188, 395)
car 0.69 (235, 315) (336, 375)
当前文件：0098.jpg
car 0.60 (747, 289) (811, 356)
car 0.63 (836, 291) (968, 405)
car 0.81 (898, 315) (1113, 452)
当前文件：0099.jpg
car 0.75 (1046, 352) (1279, 608)
car 0.78 (859, 316) (972, 427)
car 0.86 (921, 336) (1120, 476)
当前文件：0100.jpg
当前文件：0101.jpg
当前文件：0102.jpg
bus 0.79 (180, 259) (304, 362)
当前文件：0103.jpg
bus 0.74 (0, 286) (203, 420)
当前文件：0104.jpg
traffic light 0.62 (241, 24) (273, 97)
truck 0.82 (0, 223) (291, 421)
当前文件：0105.jpg
当前文件：0106.jpg
car 0.62 (1200, 287) (1276, 432)
当前文件：0107.jpg
car 0.60 (376, 309) (421, 335)
当前文件：0108.jpg
当前文件：0109.jpg
当前文件：0110.jpg
当前文件：0111.jpg
car 0.61 (97, 322) (180, 366)
fire hydrant 0.63 (1177, 374) (1237, 483)
当前文件：0112.jpg
当前文件：0113.jpg
当前文件：0114.jpg
当前文件：0115.jpg
当前文件：0116.jpg
traffic light 0.63 (522, 76) (543, 113)
car 0.80 (5, 271) (241, 672)
当前文件：0117.jpg
当前文件：0118.jpg
当前文件：0119.jpg
traffic light 0.61 (1056, 0) (1138, 131)
当前文件：0120.jpg
绘制完成！mm

吴恩达深度学习 4.3 卷积神经网络-目标检测相关推荐

吴恩达.深度学习系列-C4卷积神经网络-W2深度卷积模型案例
吴恩达.深度学习系列-C4卷积神经网络-W2深度卷积模型案例 (本笔记部分内容直接引用redstone的笔记http://redstonewill.com/1240/.原文整理的非常好,引入并添加我自 ...
吴恩达深度学习之四《卷积神经网络》学习笔记
一.卷积神经网络 1.1 计算机视觉举了几个例子,可以完成什么样的任务最重要的是特征向量太大了,比如分辨率1000 x 1000 的彩色图片,三个颜色通道,维数是 3000000 意味着隐藏层第一 ...
吴恩达深度学习笔记- lesson4 卷积神经网络
文章目录 Week 1 卷积神经网络基础 4.1.1 计算机视觉(Computer vision) 4.1.2 边缘检测示例(Edge detection example) 4.1.3 更多边缘检测内 ...
吴恩达深度学习 4.1 卷积神经网络-卷积神经网络基础
1. 知识点计算机视觉:图片分类.目标检测.图片风格迁移等对于小尺寸图片,可以用深度神经网络模型学习预测.但对于大尺寸图片,输入数据规模大,用深度神经网络会有非常多的参数需要学习,不再合适. 卷积 ...
吴恩达深度学习 | (12) 改善深层神经网络专项课程第三周学习笔记
课程视频第三周PPT汇总吴恩达深度学习专项课程共分为五个部分,本篇博客将介绍第二部分改善深层神经网络专项的第三周课程:超参数调试.Batch Normalization和深度学习框架. 目录 1. ...
吴恩达深度学习 —— 4.1 深层神经网络
什么是深度学习网络,我们已经学过了逻辑回归,并且见过了单隐层神经网络,下图是一个双隐层神经网络,以及一个五隐层的神经网络的例子. 我们说逻辑回归是一个浅层模型,浅层或是深层是一个程度的问题,要记住当我 ...
吴恩达深度学习笔记2-Course1-Week2【神经网络基础：损失函数、梯度下降】
神经网络基础:损失函数.梯度下降本篇以最简单的多个输入一个输出的1层神经网络为例,使用logistic regression讲解了神经网络的前向反向计算(forward/backward propa ...
吴恩达深度学习 —— 3.3 计算神经网络的输出
如图是一个两层的神经网络,让我们更深入地了解神经网络到底在计算什么. 我们之前说过逻辑回归,下图中的圆圈代表了回归计算的两个步骤,首先按步骤计算出z,然后在第二部计算激活函数,就是函数sigmoid( ...
吴恩达深度学习的改善深层神经网络编程作业：优化Optimization

吴恩达深度学习 4.3 卷积神经网络-目标检测

吴恩达深度学习 4.3 卷积神经网络-目标检测相关推荐

最新文章

热门文章