说明：这几篇文章是讲解SSD，从算法原理、代码到部署到rk3588芯片上的过程。环境均是TF2.2，具体的安装过程请参考网上其他的文章。

一、SSD简介

SSD算法是一个优秀的one-stage目标检测算法。能够一次就完成目标的检测和分类过程。主要是的思路是利用CNN提前特征之后，在图像上进行不同位置的密集抽样，抽样时采用不同尺度和长宽比，物体分类和预测框回归同时完成，所以速度很快。

二、SSD实现思路

1. 主干网络

图1 SSD结构图

这张图，清楚的表明了SSD的网络结构。SSD的主干网络是VGG，但对VGG进行了修改，主要是：

1）将VGG的FC6、FC7转化为卷积层

2）去掉所有的Dropout和F8层

3）新增了Conv6、Conv7、Conv8、Conv9

总的过程：

1）输入：输入的图像为300×300×3(RGB三个通道)

2）Conv1，两次[3, 3]卷积，输出[300,300,64]，再[2,2]最大池化，步长为2，输出[150,150,64]

3）Conv2，两次[3, 3]卷积，输出[150,150,128]，再[2,2]最大池化，步长为2，输出[75,75,128]

4）Conv3，三次[3, 3]卷积，输出[75,75,256]，再[2,2]最大池化，步长为2，输出[38,38,256]

5）Conv4，三次[3, 3]卷积，输出[38,38,512]，再[2,2]最大池化，步长为2，输出[19,19,512]

6）Conv5，三次[3, 3]卷积，输出[19,19,512]，再[3,3]最大池化，步长为1，输出[19,19,512]

7）FC6、FC7，一次[3,3]卷积和一次[1,1]卷积，输出通道为1024，输出[19,19,1024]

8) Conv6，一次[1,1]卷积，调整通道数，一次步长2的[3,3]卷积，输出[10，10，512]

8) Conv7，一次[1,1]卷积，调整通道数，一次步长2的[3,3]卷积，输出[5，5，256]

9) Conv8，一次[1,1]卷积，调整通道数，一次padding为valid的[3,3]卷积，输出[3,3,256]

10) Conv9，一次[1,1]卷积，调整通道数，一次padding为valid的[3,3]卷积，输出[1,1,256]

2. 主干网络代码

SSD的网络整体结构比较清晰，下面是是实现代码

class Normalize(Layer):def __init__(self, scale, **kwargs):self.axis = 3self.scale = scalesuper(Normalize, self).__init__(**kwargs)def build(self, input_shape):self.input_spec = [InputSpec(shape=input_shape)]shape = (input_shape[self.axis],)init_gamma = self.scale * np.ones(shape)self.gamma = K.variable(init_gamma, name='{}_gamma'.format(self.name))def call(self, x, mask=None):output = K.l2_normalize(x, self.axis)output *= self.gammareturn output# class_num是检测的目标种类，必须有
# input_shape一般为[300, 300, 3]
def ssd_net(class_num, input_shape=[300, 300, 3], weight_decay=5e-4):# ssd的前几层网络是vgg,input_tensor = Input(shape=input_shape)print('input_tensor: ' + str(input_tensor))# SSD网络模型 net是字典net = {}# Block 0  输入层net['input'] = input_tensor# Block 1  300,300,3 -> 150,150,64# 2次[3, 3]网络卷积，输出的特征层为64，输出为[300, 300, 64],再2×2最大池化，该最大池化步长为2，输出为[150, 150, 64]net['conv1_1'] = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv1_1')(net['input'])net['conv1_2'] = Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv1_2')(net['conv1_1'])net['pool1'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='pool1')(net['conv1_2'])# Block 2  150,150,64 -> 75,75,128# 2次[3, 3]网络卷积，输出的特征层为128，输出为[150, 150, 128],再2×2最大池化，该最大池化步长为2，输出为[75, 75, 128]net['conv2_1'] = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv2_1')(net['pool1'])net['conv2_2'] = Conv2D(128, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv2_2')(net['conv2_1'])net['pool2'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',  name='pool2')(net['conv2_2'])# Block 3   75,75,128 -> 38,38,256# 3次[3, 3]网络卷积，输出的特征层为256，输出为[75, 75, 256],再2×2最大池化，该最大池化步长为2，输出为[38, 38, 256]net['conv3_1'] = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv3_1')(net['pool2'])net['conv3_2'] = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv3_2')(net['conv3_1'])net['conv3_3'] = Conv2D(256, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv3_3')(net['conv3_2'])net['pool3'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='pool3')(net['conv3_3'])# Block 4   38,38,256 -> 19,19,512# 3次[3, 3]网络卷积，输出的特征层为512，输出为[38, 38, 512],再2×2最大池化，该最大池化步长为2，输出为[19, 19, 512]net['conv4_1'] = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv4_1')(net['pool3'])net['conv4_2'] = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv4_2')(net['conv4_1'])net['conv4_3'] = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv4_3')(net['conv4_2'])net['pool4'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same', name='pool4')(net['conv4_3'])# Block 5   19,19,512 -> 19,19,512# 3次[3, 3]网络卷积，输出的特征层为512，输出为[19, 19, 512],再3×3最大池化，该最大池化步长为1，输出为[19, 19, 512]net['conv5_1'] = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv5_1')(net['pool4'])net['conv5_2'] = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv5_2')(net['conv5_1'])net['conv5_3'] = Conv2D(512, kernel_size=(3, 3), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv5_3')(net['conv5_2'])net['pool5'] = MaxPooling2D((3, 3), strides=(1, 1), padding='same', name='pool5')(net['conv5_3'])# FC6         19,19,512 -> 19,19,1024# 1次[3, 3]网络卷积，1次[1, 1]网络卷积，分别为fc6和fc7，输出的特征层为1024，输出为[19, 19, 1024]net['fc6'] = Conv2D(1024, kernel_size=(3, 3), dilation_rate=(6, 6), activation='relu', padding='same',kernel_regularizer=l2(weight_decay), name='fc6')(net['pool5'])# FC7         19,19,1024 -> 19,19,1024net['fc7'] = Conv2D(1024, kernel_size=(1, 1), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='fc7')(net['fc6'])#  ---------------------- 以上是VGG网络（fc6、fc7有修改）------------------------ ## Block 6     19,19,512 -> 10,10,512# 1次[1, 1]网络卷积，调整通道数，1次步长为2的[3, 3]卷积网络，输出通道为512，输出为[10, 10, 512]net['conv6_1'] = Conv2D(256, kernel_size=(1, 1), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv6_1')(net['fc7'])# 表示将上一层的输出上下左右补充一行（一列）0,行数+2,列数+2。# Zeropadding2D即为2D输入的零填充层。为2D输入的零填充层,# 为下一层卷积做准备，保证卷积之后，尺寸不变net['conv6_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv6_padding')(net['conv6_1'])net['conv6_2'] = Conv2D(512, kernel_size=(3, 3), strides=(2, 2), activation='relu', kernel_regularizer=l2(weight_decay), name='conv6_2')(net['conv6_2'])# Block 7      10,10,512 -> 5,5,256# 1次[1, 1]网络卷积，调整通道数，1次步长为2的[3, 3]卷积网络，输出通道为256，输出为[5, 5, 256]net['conv7_1'] = Conv2D(128, kernel_size=(1, 1), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv7_1')(net['conv6_2'])net['conv7_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv7_padding')(net['conv7_1'])net['conv7_2'] = Conv2D(256, kernel_size=(3, 3), strides=(2, 2), activation='relu', padding='valid', kernel_regularizer=l2(weight_decay),name='conv7_2')(net['conv7_2'])# Block 8      5,5,256 -> 3,3,256# 1次[1, 1]网络卷积，调整通道数，1次padding为valid的[3, 3]卷积网络，输出通道为256，输出为[3, 3, 256]net['conv8_1'] = Conv2D(128, kernel_size=(1, 1), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv8_1')(net['conv7_2'])net['conv8_2'] = Conv2D(256, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='valid', kernel_regularizer=l2(weight_decay),name='conv8_2')(net['conv8_1'])# Block 9      3,3,256 -> 1,1,256# 1次[1, 1]网络卷积，调整通道数，1次padding为valid的[3, 3]卷积网络，输出通道为256，输出为[1, 1, 256]net['conv9_1'] = Conv2D(128, kernel_size=(1, 1), activation='relu', padding='same', kernel_regularizer=l2(weight_decay), name='conv9_1')(net['conv8_2'])net['conv9_2'] = Conv2D(256, kernel_size=(3, 3), strides=(1, 1), activation='relu', padding='valid', kernel_regularizer=l2(weight_decay),name='conv9_2')(net['conv9_1'])# ----------------------------主干特征提取网络结束--------------------------- ## -----------------------将提取到的主干特征进行处理--------------------------- ## 对conv4_3的通道进行l2标准化处理# 38,38,512net['conv4_3_norm'] = Normalize(20, name='conv4_3_norm')(net['conv4_3'])num_priors = 4# 预测框的处理# num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整net['conv4_3_norm_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay),name='conv4_3_norm_mbox_loc')(net['conv4_3_norm'])net['conv4_3_norm_mbox_loc_flat'] = Flatten(name='conv4_3_norm_mbox_loc_flat')(net['conv4_3_norm_mbox_loc'])# num_priors表示每个网格点先验框的数量，class_num是所分的类net['conv4_3_norm_mbox_conf'] = Conv2D(num_priors * class_num, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay),name='conv4_3_norm_mbox_conf')(net['conv4_3_norm'])net['conv4_3_norm_mbox_conf_flat'] = Flatten(name='conv4_3_norm_mbox_conf_flat')(net['conv4_3_norm_mbox_conf'])# 对fc7层进行处理# 19,19,1024num_priors = 6# 预测框的处理# num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整net['fc7_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay), name='fc7_mbox_loc')(net['fc7'])net['fc7_mbox_loc_flat'] = Flatten(name='fc7_mbox_loc_flat')(net['fc7_mbox_loc'])# num_priors表示每个网格点先验框的数量，class_num是所分的类net['fc7_mbox_conf'] = Conv2D(num_priors * class_num, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay), name='fc7_mbox_conf')(net['fc7'])net['fc7_mbox_conf_flat'] = Flatten(name='fc7_mbox_conf_flat')(net['fc7_mbox_conf'])# 对conv6_2进行处理# 10,10,512num_priors = 6# 预测框的处理# num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整net['conv6_2_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3, 3), padding='same',kernel_regularizer=l2(weight_decay),  name='conv6_2_mbox_loc')(net['conv6_2'])net['conv6_2_mbox_loc_flat'] = Flatten(name='conv6_2_mbox_loc_flat')(net['conv6_2_mbox_loc'])# num_priors表示每个网格点先验框的数量，class_num是所分的类net['conv6_2_mbox_conf'] = Conv2D(num_priors * class_num, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay),name='conv6_2_mbox_conf')(net['conv6_2'])net['conv6_2_mbox_conf_flat'] = Flatten(name='conv6_2_mbox_conf_flat')(net['conv6_2_mbox_conf'])# 对conv7_2进行处理# 5,5,256num_priors = 6# 预测框的处理# num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整net['conv7_2_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay), name='conv7_2_mbox_loc')(net['conv7_2'])net['conv7_2_mbox_loc_flat'] = Flatten(name='conv7_2_mbox_loc_flat')(net['conv7_2_mbox_loc'])# num_priors表示每个网格点先验框的数量，class_num是所分的类net['conv7_2_mbox_conf'] = Conv2D(num_priors * class_num, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay),name='conv7_2_mbox_conf')(net['conv7_2'])net['conv7_2_mbox_conf_flat'] = Flatten(name='conv7_2_mbox_conf_flat')(net['conv7_2_mbox_conf'])# 对conv8_2进行处理# 3,3,256num_priors = 4# 预测框的处理# num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整net['conv8_2_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay), name='conv8_2_mbox_loc')(net['conv8_2'])net['conv8_2_mbox_loc_flat'] = Flatten(name='conv8_2_mbox_loc_flat')(net['conv8_2_mbox_loc'])# num_priors表示每个网格点先验框的数量，class_num是所分的类net['conv8_2_mbox_conf'] = Conv2D(num_priors * class_num, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay),name='conv8_2_mbox_conf')(net['conv8_2'])net['conv8_2_mbox_conf_flat'] = Flatten(name='conv8_2_mbox_conf_flat')(net['conv8_2_mbox_conf'])# 对conv9_2进行处理# 1,1,256num_priors = 4# 预测框的处理# num_priors表示每个网格点先验框的数量，4是x,y,h,w的调整net['conv9_2_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay), name='conv9_2_mbox_loc')(net['conv9_2'])net['conv9_2_mbox_loc_flat'] = Flatten(name='conv9_2_mbox_loc_flat')(net['conv9_2_mbox_loc'])# num_priors表示每个网格点先验框的数量，class_num是所分的类net['conv9_2_mbox_conf'] = Conv2D(num_priors * class_num, kernel_size=(3, 3), padding='same', kernel_regularizer=l2(weight_decay),name='conv9_2_mbox_conf')(net['conv9_2'])net['conv9_2_mbox_conf_flat'] = Flatten(name='conv9_2_mbox_conf_flat')(net['conv9_2_mbox_conf'])# 最终的模型，输入层是net['input']，输出层为net['predictions']model = Model(net['input'], net['predictions'])# print('add net finish!')return model

到这里这里，图1中最上部横着的主干网络基本介绍完了。这个部分主要用来提取特征，下面开始介绍使用特征部分。

在图1中，可以看到，Conv4第三次卷积、FC7、Conv6第二次卷积、Conv7第二次卷积、Conv8第二次卷积、Conv9第二次卷积，都往下走，这6层得到的结果将进一步处理，通过他们得到不同尺寸的特征结果。这部分也是SSD算法能够识别不同尺寸物体的原因。

3.先验框

在进一步说明上述六层特征层之前，先说明下先验框。

图2 分成8×8

假设一张图[300,300]，分成[8,8]（注：SSD网络部分没有图像是分成[8,8]，仅仅举例使用）先验框就是图2中虚线部分。假设物体在左下方四个虚线框框里面，则通过这四个虚线框框能够找到物体。四个虚线框框尺寸不是固定，具体的参数需要训练。无论物体多大，总能够通过将图像分成某个比例，有一个虚线框框将其框住(仅仅简单说明，实际上还有一个图像放不下一个物体、物体只有一部分在图像中) 。所以，训练SSD模型，实际上就是训练虚线框框(x,y,w,h)这四个参数与实际物体的关系。而实际物体的具体(x,y,w,h)是提取通过软件标注好。加上物体的名字，一共是(x,y,w,h,name)五个参数。

这些虚线框框，叫做先验框，训练SSD模型，就是训练这些先验框的参数。在SSD中，不是直接训练(x,y,w,h) ,而是训练物体真实坐标与提取设定好先验框(x0,y0,w0,h0)之间的比例关系。训练之前需要转化一次，预测时，也需要将结果转化回来并显示。

对于六层特征层，先验框每一个像素点设置的数量为[4,6,6,6,4,4](每一层分好之后，分辨率都不一样)

对于Conv4来说，图像变成了[38,38]分辨率（与刚才说明的[8,8]相对应），每个像素点的先验框数量为4，则该层的先验框数量为38×38×4=5776。同理，其他的几层先验框数量分别为2166，600，150，36，4，一共8732个。所以，可以说，SSD模型，实际就是在[300,300]大小的图像上，设置一共8732个先验框，每个先验框大小、位置不同，去识别物体。对于一个物体来说，可能被多个先验框识别出来，因此，需要对识别出来的先验框进行非极大值抑制操作，选择最合适的做为最终结果。

对于这六层特征层来说，每一层都需要进行anchors_num×4的卷积（上面说的就是这个过程，4我理解的表示(x,y,w,h)），除此之外，还需要进行anchors_num×classes_num的卷积，用于预测物体的种类，每个先验框都有自己的物体种类名字。

先验框部分代码，这部分是获取8732个先验框

# 计算每个有效特征层的anchor box
# 六层分别为
# 38*38*4  19*19*6 10*10*6 5*5*6 3*3*4 1*1*4
#  5576     2166    600     150   36     4
class AnchorBox():def __init__(self, input_shape, min_size, max_size=None, aspect_ratios=None, flip=True):self.input_shape = input_shapeself.min_size = min_sizeself.max_size = max_sizeself.aspect_ratios = []  # aspect_ratios 结果 [1, 1.0, 2, 0.5]for ar in aspect_ratios:self.aspect_ratios.append(ar)self.aspect_ratios.append(1.0 / ar)# print('AnchorBox aspect_ratios ' + str(self.aspect_ratios))def call(self, layer_shape, mask=None):# --------------------------------- ##   获取输入进来的特征层的宽和高#   比如38x38# --------------------------------- #layer_height = layer_shape[0]layer_width = layer_shape[1]# print('AnchorBox layer_height ' + str(layer_height))# print('AnchorBox layer_width ' + str(layer_width))# --------------------------------- ##   获取输入进来的图片的宽和高#   比如300x300# --------------------------------- #img_height = self.input_shape[0]img_width = self.input_shape[1]# print('AnchorBox img_height ' + str(img_height))# print('AnchorBox img_width ' + str(img_width))box_widths = []box_heights = []# --------------------------------- ##   self.aspect_ratios一般有两个值#   [1, 1, 2, 1/2]#   [1, 1, 2, 1/2, 3, 1/3]# --------------------------------- #for ar in self.aspect_ratios:# print('AnchorBox box_widths ' + str(len(box_widths)))# 首先添加一个较小的正方形if ar == 1 and len(box_widths) == 0:box_widths.append(self.min_size)box_heights.append(self.min_size)# 然后添加一个较大的正方形elif ar == 1 and len(box_widths) > 0:box_widths.append(np.sqrt(self.min_size * self.max_size))box_heights.append(np.sqrt(self.min_size * self.max_size))# 然后添加长方形elif ar != 1:box_widths.append(self.min_size * np.sqrt(ar))box_heights.append(self.min_size / np.sqrt(ar))# print('AnchorBox box_widths ' + str(box_widths))# print('AnchorBox box_heights ' + str(box_heights))# --------------------------------- ##   获得所有先验框的宽高1/2# --------------------------------- #box_widths = 0.5 * np.array(box_widths)box_heights = 0.5 * np.array(box_heights)# print('AnchorBox box_widths ' + str(box_widths))# print('AnchorBox box_heights ' + str(box_heights))# --------------------------------- ##   每一个特征层对应的步长#   每个特征层分成[layer_width,layer_height]大小，#   每个分成[layer_width,layer_height]之后的点对应[300, 300]上的长度#   比如[3, 3]，每层步长step_x = 300 / 3 = 100，分成[3, 3]之后的点对应[300, 300]就是100个点# --------------------------------- #step_x = img_width / layer_widthstep_y = img_height / layer_height# print('AnchorBox layer_width ' + str(layer_width))# print('AnchorBox layer_height ' + str(layer_height))# print('AnchorBox step_x ' + str(step_x))# print('AnchorBox step_y ' + str(step_y))# --------------------------------- ##   生成网格中心#   在每个特征层上，从最左往右生成所有的网格中心点#   linx, liny 数量对应于 layer_width，layer_height  [3, 3]# --------------------------------- #linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x, layer_width)  #[ 50. 150. 250.]liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y, layer_height) #[ 50. 150. 250.]# print('AnchorBox linx ' + str(linx))# print('AnchorBox liny ' + str(liny))#将(x,y)方向上的序列转化成坐标形式，变成一个(layer_width×layer_height)大小的矩阵centers_x, centers_y = np.meshgrid(linx, liny)  # 生成正方形的矩阵,,,长宽一样 变成3×3矩阵# print('AnchorBox centers_x ' + str(centers_x))centers_x = centers_x.reshape(-1, 1)  # 将3×3矩阵压缩成一维的矩阵  9行，1列centers_y = centers_y.reshape(-1, 1)# print('AnchorBox centers_x ' + str(centers_x))# print('AnchorBox centers_y ' + str(centers_y))# 每一个先验框需要两个(centers_x, centers_y)，前一个用来计算左上角，后一个计算右下角num_anchors_ = len(self.aspect_ratios)  # 4# print('AnchorBox num_anchors_ ' + str(num_anchors_))anchor_boxes = np.concatenate((centers_x, centers_y), axis=1)  # 将矩阵拼接起来，得到9个网格的中心点坐标，坐标(0, 0)是(50, 50)，9行，每行一个坐标 一共9 × 2个数据，2个坐标# print('AnchorBox anchor_boxes ' + str(anchor_boxes))anchor_boxes = np.tile(anchor_boxes, (1, 2 * num_anchors_))  # 变成9行，16列(num_anchors_ * 4 = 16),将每行的坐标，重复8次# print('AnchorBox anchor_boxes ' + str(len(anchor_boxes)))# print('AnchorBox anchor_boxes ' + str(anchor_boxes))# 获得先验框的左上角和右下角anchor_boxes[:, ::4] -= box_widths    # 将每行的数据，从第一个开始，每隔4个，将其值减去box_widths，因为box_widths有四个元素，所以相当去anchor_boxes中的4个元素减去box_widths对应的元素anchor_boxes[:, 1::4] -= box_heightsanchor_boxes[:, 2::4] += box_widths   # 将每行的数据，从第一个开始，每隔4个，将其值加上box_widthsanchor_boxes[:, 3::4] += box_heights# 这里anchor_boxes变成了9行，每行16个元素# print('AnchorBox anchor_boxes ' + str(len(anchor_boxes)))# print('AnchorBox anchor_boxes ' + str(anchor_boxes))# --------------------------------- ##   将先验框变成小数的形式#   归一化# --------------------------------- #anchor_boxes[:, ::2] /= img_widthanchor_boxes[:, 1::2] /= img_heightanchor_boxes = anchor_boxes.reshape(-1, 4)  # 分成4列，行自动计算，这里行为36，每一行对应一个锚点框参数，这里默认锚点框 def anchor boxes为4# print('AnchorBox anchor_boxes ' + str(len(anchor_boxes)))# print('AnchorBox anchor_boxes ' + str(anchor_boxes))anchor_boxes = np.minimum(np.maximum(anchor_boxes, 0.0), 1.0)  # 这里去掉anchor_boxes中的负值# print('AnchorBox anchor_boxes ' + str(len(anchor_boxes)))# print('AnchorBox anchor_boxes ' + str(anchor_boxes))return anchor_boxes#---------------------------------------------------#
#   用于计算共享特征层的大小
#   当(height, width) = (300, 300)是，输出的为[150, 75, 38, 19, 10, 5, 3, 1]
#   后面的[38, 19, 10, 5, 3, 1]是六个有效特征层对应的分辨率
#   [38, 38, 512], [19, 19, 1024], [10, 10, 512],
#   [ 5,  5, 256], [ 3,  3,  256], [ 1,  1, 256]
#---------------------------------------------------#
def get_img_output_length(height, width):filter_sizes    = [3, 3, 3, 3, 3, 3, 3, 3]padding         = [1, 1, 1, 1, 1, 1, 0, 0]stride          = [2, 2, 2, 2, 2, 2, 1, 1]feature_heights = []feature_widths  = []# print('get_img_output_length height ' + str(height))# print('get_img_output_length width ' + str(width))# print('get_img_output_length filter_sizes ' + str(len(filter_sizes)))for i in range(len(filter_sizes)):height  = (height + 2 * padding[i] - filter_sizes[i]) // stride[i] + 1width   = (width + 2 * padding[i] - filter_sizes[i]) // stride[i] + 1# print(str(i) + ' height ' + str(height)  + ' width ' + str(width))feature_heights.append(height)feature_widths.append(width)return np.array(feature_heights)[-6:], np.array(feature_widths)[-6:]# 获取所有的anchor框,在六层有效层上获取,
def get_anchors(input_shape = [300,300], anchors_size = [30, 60, 111, 162, 213, 264, 315]):feature_heights, feature_widths = get_img_output_length(input_shape[0], input_shape[1])  # w,h = (300,300)aspect_ratios = [[1, 2], [1, 2, 3], [1, 2, 3], [1, 2, 3], [1, 2], [1, 2]]anchors = []# print('get_anchors feature_heights ' + str(feature_heights))# print('get_anchors feature_heights ' + str(feature_heights))for i in range(len(feature_heights)):# 计算每层有效特征层的锚点框 分别对应# 38*38*4  19*19*6 10*10*6 5*5*6 3*3*4 1*1*4#  5576     2166    600     150   36     4anchors.append(AnchorBox(input_shape, anchors_size[i], max_size = anchors_size[i+1],aspect_ratios = aspect_ratios[i]).call([feature_heights[i], feature_widths[i]]))# print('get_anchors anchors ' + str(len(anchors)))anchors = np.concatenate(anchors, axis=0)return anchors

前面谈到的编码、解码部分。

# 获取真实框
class BBoxUtility(object):def __init__(self, num_classes, nms_thresh=0.45, top_k=300):self.num_classes = num_classesself._nms_thresh = nms_threshself._top_k = top_kdef ssd_correct_boxes(self, box_xy, box_wh, input_shape, image_shape, letterbox_image):# -----------------------------------------------------------------##   把y轴放前面是因为方便预测框和图像的宽高进行相乘# -----------------------------------------------------------------#box_yx = box_xy[..., ::-1]box_hw = box_wh[..., ::-1]input_shape = np.array(input_shape)image_shape = np.array(image_shape)if letterbox_image:# -----------------------------------------------------------------##   这里求出来的offset是图像有效区域相对于图像左上角的偏移情况#   new_shape指的是宽高缩放情况# -----------------------------------------------------------------#new_shape = np.round(image_shape * np.min(input_shape / image_shape))offset = (input_shape - new_shape) / 2. / input_shapescale = input_shape / new_shapebox_yx = (box_yx - offset) * scalebox_hw *= scalebox_mins = box_yx - (box_hw / 2.)box_maxes = box_yx + (box_hw / 2.)boxes = np.concatenate([box_mins[..., 0:1], box_mins[..., 1:2], box_maxes[..., 0:1], box_maxes[..., 1:2]],axis=-1)boxes *= np.concatenate([image_shape, image_shape], axis=-1)return boxes# 对回归结果进行处理,得到真实框的位置信息# 针对一张图片进行解码预测结果# mbox_loc 8732个预测框所对应目标的坐标信息,一个物体有四个数据表示(x, y, w, h)# anchors所有的锚点框 8732个def decode_boxes(self, mbox_loc, anchors, variances=[0.1, 0.1, 0.2, 0.2]):# 获得先验框的宽与高# 每个锚点框有四个数据,四条边分别与左上点的距离# print('decode_boxes mbox_loc ' + str(len(mbox_loc[0])))anchor_width = anchors[:, 2] - anchors[:, 0]  # 每个锚点框对应的w,hanchor_height = anchors[:, 3] - anchors[:, 1]# print('decode_boxes anchors ' + str(len(anchors)))# print('decode_boxes anchors ' + str(len(anchors[0])))# print('decode_boxes anchor_width ' + str(len(anchor_width)))# print('decode_boxes anchor_height ' + str(len(anchor_height)))# print('decode_boxes anchor_width ' + str(anchor_width))# 获得每个先验框的中心点anchor_center_x = 0.5 * (anchors[:, 2] + anchors[:, 0])anchor_center_y = 0.5 * (anchors[:, 3] + anchors[:, 1])# 真实框距离先验框中心的xy轴偏移情况#decode_bbox_center_x = mbox_loc[:, 0] * anchor_width * variances[0]# decode_bbox_center_x = anchor_center_x + mbox_loc[:, 0] * anchor_width * variances[0]# decode_bbox_center_x += anchor_center_x# decode_bbox_center_y = mbox_loc[:, 1] * anchor_height * variances[1]# decode_bbox_center_y += anchor_center_y# 对应的预测框和先验框进行融合# 每个先验框的距离是固定的,预测框是经过模型运算的,两者相乘表示真实框相对于先验框的距离,decode_bbox_center_x = anchor_center_x + mbox_loc[:, 0] * anchor_width * variances[0]decode_bbox_center_y = anchor_center_y + mbox_loc[:, 1] * anchor_height * variances[1]# print('decode_boxes decode_bbox_center_x ' + str(len(decode_bbox_center_x)))# print('decode_boxes decode_bbox_center_y ' + str(len(decode_bbox_center_y)))# 真实框的宽与高的求取decode_bbox_width = np.exp(mbox_loc[:, 2] * variances[2])  # 不知道为啥要用自然数e作为底数进行运算decode_bbox_width *= anchor_widthdecode_bbox_height = np.exp(mbox_loc[:, 3] * variances[3])decode_bbox_height *= anchor_height# 获取真实框的左上角与右下角decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_widthdecode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_heightdecode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_widthdecode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height# 真实框的左上角与右下角进行堆叠# 所有得到结果的左上角与右下角数据进行合并decode_bbox = np.concatenate((decode_bbox_xmin[:, None],decode_bbox_ymin[:, None],decode_bbox_xmax[:, None],decode_bbox_ymax[:, None]), axis=-1)# 防止超出0与1decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)return decode_bbox# 解码ssd模型得到的预测结果# anchors所有的锚点框# image_shape 输入图像尺寸 不确定,可以为如1330×1330# input_shape是SSD算法模型输入尺寸,固定为300×300def decode_box(self, predictions, anchors, image_shape, input_shape, letterbox_image,variances=[0.1, 0.1, 0.2, 0.2], confidence=0.5):# print('decode_box anchors ' + str(len(anchors)))# print('decode_box image_shape ' + str(image_shape))# print('decode_box input_shape ' + str(input_shape))# ---------------------------------------------------##   :4是回归预测结果# ---------------------------------------------------#mbox_loc = predictions[:, :, :4]   # 取得所有预测框的坐标信息,一共8732# print('decode_box mbox_loc ' + str(len(mbox_loc[0])))# print('decode_box mbox_loc ' + str(mbox_loc))# ---------------------------------------------------##   获得种类的置信度# ---------------------------------------------------#mbox_conf = predictions[:, :, 4:]   #获得所有预测框的置信度,一共8732# print('decode_box mbox_conf ' + str(len(mbox_conf[0])))# print('decode_box mbox_conf ' + str(mbox_conf))results = []# ----------------------------------------------------------------------------------------------------------------##   对每一张图片进行处理，由于在predict.py的时候，我们只输入一张图片，所以for i in range(len(mbox_loc))只进行一次# ----------------------------------------------------------------------------------------------------------------#for i in range(len(mbox_loc)):results.append([])# --------------------------------##   利用回归结果对先验框进行解码# --------------------------------#decode_bbox = self.decode_boxes(mbox_loc[i], anchors, variances)  # 得到所有的真是狂信息,一共四个数据,左上角和右下角# print('decode_box decode_bbox ' + str(len(decode_bbox)))# print('decode_box decode_bbox ' + str(len(decode_bbox[0])))# 处理所有的真实框for c in range(1, self.num_classes):# --------------------------------##   取出属于该类的所有框的置信度#   判断是否大于门限# --------------------------------#c_confs = mbox_conf[i, :, c]  # 获取某个类别所有预测框的置信度c_confs_m = c_confs > confidence  # 置信度大于一定值# print('decode_box c_confs_m ' + '   ' + str(c)  + '  ' + str(len(c_confs[c_confs_m])))#  len(c_confs[c_confs_m]) 就是某个类别所有置信度大于confidence的数量#  c_confs[c_confs_m]就是对应的大于confidence预测框if len(c_confs[c_confs_m]) > 0:   ## -----------------------------------------##   取出得分高于confidence的框# -----------------------------------------## boxes_to_process,confs_to_process的元素数量等于len(c_confs[c_confs_m])# 这里得到置信度大于confidence的预测框的物体信息boxes_to_process = decode_bbox[c_confs_m]  # 解码预测框对应的物体信息confs_to_process = c_confs[c_confs_m]     #  大于confidence的所有预测框的集合# print('decode_box boxes_to_process ' + str(len(boxes_to_process)))# print('decode_box boxes_to_process ' + str(len(confs_to_process)))# -----------------------------------------##   进行iou的非极大抑制,某些预测框可能重复在某个物体上面,重复的只保留物体对应一个最大的预测框#   idx的数量就是某个种类物体最终被检测出来多少个# -----------------------------------------#idx = tf.image.non_max_suppression(tf.cast(boxes_to_process, tf.float32),tf.cast(confs_to_process, tf.float32),self._top_k,iou_threshold=self._nms_thresh).numpy()# print('decode_box idx ' + str(idx))# -----------------------------------------##   取出在非极大抑制中效果较好的内容#   每个被检测的物体有一个对应的位置信息和置信度#   good_boxes 保存某个种类的所有被检测出出来的物体的位置信息,confs保存对应的置信度,这两个的列表的长度一样# -----------------------------------------#good_boxes = boxes_to_process[idx]confs = confs_to_process[idx][:, None]# [:, None]的用法是将横着的列表转化为竖着的列表,,,# 如[0.9922133  0.9003193  0.81056666]转化为 [[0.9922133 ]#                                            [0.9003193 ]#                                            [0.81056666]]# print('decode_box good_boxes ' + str(good_boxes))# print('decode_box confs ' + str(confs))# print('decode_box confs ' + str(confs_to_process[idx]))labels = (c - 1) * np.ones((len(idx), 1))  # np.ones生成一个len(idx) × 1矩阵,元素均为1# -----------------------------------------##   将框的位置、label、置信度进行堆叠。# -----------------------------------------#c_pred = np.concatenate((good_boxes, labels, confs), axis=1)# print('decode_box c_pred ' + str(c_pred))# 添加进result里results[-1].extend(c_pred)if len(results[-1]) > 0:results[-1] = np.array(results[-1])box_xy, box_wh = (results[-1][:, 0:2] + results[-1][:, 2:4]) / 2, results[-1][:, 2:4] - results[-1][:,0:2]results[-1][:, :4] = self.ssd_correct_boxes(box_xy, box_wh, input_shape, image_shape, letterbox_image)return results

个人认为SSD算法的核心部分代码大概这么多，剩下的比较好写了。GitHub链接暂时不放了，有些还没做好，等全部写好了再开源。

三、测试

SSD模型以及训练好，只用的是自制的数据集，标注软件是labelImg

终端输出结果

图像输出，在图中标注了物体。

四、总结

SSD算法速度和经度都还可以。算法简单的说就是，对一张[300,300]的图像，在上面分布一共8732个先验框，每个先验框负责识别一个区域，将结果进行回归，得到最终结果。

SSD算法原理与代码（三）相关推荐

5.9 SSD算法原理
5.9 SSD算法原理学习目标目标知道SSD的结构说明Detector & classifier的作用说明SSD的优点应用无 5.9.1 SSD 5.9.1.1 简介 SSD算法 ...
多类线性分类器算法原理及代码实现 MATLAB
多类线性分类器算法原理及代码实现 MATLAB 一.算法原理下面举例说明为何蓝圈部分在case2中是确定的而在case1中不确定: 二.代码实现 1.HK函数 function [] = HK(w1 ...
HoughCircle(霍夫圆)算法原理及代码实现
此算法建立在Canny算法的基础上,对Canny算法检测出的边缘图像进行拟合,因此要用到Canny算法返回的边缘图像及梯度方向矩阵.Canny算法相关内容详见上一篇博客:Canny边缘检测算法原理 ...
计算机图形学--中点椭圆算法原理及代码实现
目录椭圆的几何特性: 算法原理: 代码实现: 说明,我们这里讨论的椭圆都是对称轴平行于坐标轴的椭圆,对于其他方程较为复杂的椭圆我们不做讨论. 椭圆的几何特性: 首先我们考虑椭圆的几何特性.椭圆是抽对 ...
论文｜Node2vec算法原理、代码实战和在微信朋友圈的应用
1 概述 Node2vec是2016年斯坦福教授 Jure Leskovec.Aditya Grover提出的论文,论文的下载链接为:https://arxiv.org/pdf/1607.00653. ...
深度强化学习-D3QN算法原理与代码
Dueling Double Deep Q Network(D3QN)算法结合了Double DQN和Dueling DQN算法的思想,进一步提升了算法的性能.如果对Doubel DQN和Duelin ...
萤火虫算法_40多种智能优化算法原理和代码分享
40多种智能优化算法原理和代码分享 <智能优化算法讲解>PDF下载地址: <智能优化算法原理讲解>PDFmianbaoduo.com 包括: 1.海鸥算法SOA 智能优化算法 ...
深度强化学习-Double DQN算法原理与代码
深度强化学习-Double DQN算法原理与代码引言 1 DDQN算法简介 2 DDQN算法原理 3 DDQN算法伪代码 4 仿真验证引言 Double Deep Q Network(DDQN)是 ...
目标检测 | 清晰易懂的SSD算法原理综述
点击上方"小白学视觉",选择加"星标"或"置顶" 重磅干货,第一时间送达 SSD(Single Shot Detection)是一个流行且强 ...

SSD算法原理与代码（三）