文章目录

SSD简介
网络搭建
- 卷积块
- 下采样块
- 主干网
- 多层特征提起层
- 输出头
数据处理
- 形成训练TXT
- Dataset
- DataLoader
- Anchors
- - 生成先验框
  - 匹配先验框
  - 位置 offset
损失函数
训练
代码及参考

SSD简介

SSD，全称Single Shot MultiBox Detector，是Wei Liu在ECCV 2016上提出的一种目标检测算法，截至目前是主要的检测框架之一，相比Faster RCNN有明显的速度优势，相比YOLO又有明显的mAP优势，是在 RCNN 系列和 YOLO 系列之外属于单阶段的另外一系列的奠基之作。
论文链接：https://arxiv.org/pdf/1512.02325.pdf
官方代码：https://github.com/weiliu89/caffe/tree/ssd

SSD的主要设计理念：根据不同的特征层设置不同大小的先验框，在不同的特征层上建立检测头和分类头，以满足大、小目标检测的需求。关于SSD的具体的技术细节，网上大神给出的解释很多，这里不再赘述。
网络架构如下：

本文主要是使用 pytorch 对 SSD 进行简单的实现。包括各个模块的讲解、数据的前、后处理、训练参数解释等。

网络搭建

论文中以 VGG16 为主干网络，替换了 VGG16 5_3 层和后面的部分，换成了 3*3 的卷积，再加上多尺度特征层，来实现多个检测和分类头。

卷积块

论文中以卷积 + 激活函数为一个卷积标准模块，这里实现的时候加上了 BatchNorm，具体实现如下：

def conv_blk(in_channels, out_channels, stride=1, padding=1):"""卷积块"""return nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride,padding=padding),nn.BatchNorm2d(out_channels),nn.ReLU())

下采样块

论文中以池化层作为下采样层，来实现特征层的大小减半，这里实现的时候，将卷积 + 池化层作为一个标准下采样模块，具体实现如下：

def down_sample_blk(in_channels, out_channels, ceil_mode = False):"""下采样块"""return nn.Sequential(conv_blk(in_channels, out_channels),nn.MaxPool2d(2, ceil_mode=ceil_mode))

主干网

使用上面两个模块，按照 VGG16 网络的基本架构实现主干网：具体实现代码如下：

def backbone(input_shape):"""搭建vgg16主干网络"""return nn.Sequential(conv_blk(input_shape[0], 64),down_sample_blk(64, 64),conv_blk(64, 128),down_sample_blk(128, 128),conv_blk(128, 256),conv_blk(256, 256),down_sample_blk(256, 512, ceil_mode=True),conv_blk(512, 512),conv_blk(512, 512),conv_blk(512, 512) # vgg16 4-3层)

多层特征提起层

主干网仅仅到 vgg16 4-3层，后面的我们需要加上特征提取层，来实现输出不同特征层的需求，具体实现代码如下：

class extra_feature(nn.Module):"""根据backbone输出的特征图，生成额外的特征图"""def __init__(self, input_shape) -> None:super().__init__()# (3, 300, 300) -> (512, 38, 38)self.out_layer1 = backbone(input_shape)# (512, 38, 38) -> (512, 19, 19)self.out_layer2 = nn.Sequential(nn.MaxPool2d(2, stride=1, padding=1),conv_blk(512, 512),conv_blk(512, 512),down_sample_blk(512, 521),conv_blk(521, 1024),conv_blk(1024, 1024))# (512, 19, 19) -> (256, 10, 10)self.out_layer3 = nn.Sequential(conv_blk(1024, 256),conv_blk(256, 512, stride=2))# (256, 10, 10) -> (256, 5, 5)self.out_layer4 = nn.Sequential(conv_blk(512, 128),conv_blk(128, 256, stride=2))# (256, 5, 5) -> (256, 3, 3)self.out_layer5 = nn.Sequential(conv_blk(256, 128),conv_blk(128, 256, stride=2))# (256, 3, 3) -> (256, 1, 1)self.out_layer6 = nn.Sequential(conv_blk(256, 128),nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=0))def forward(self, x):out1 = self.out_layer1(x)out2 = self.out_layer2(out1)out3 = self.out_layer3(out2)out4 = self.out_layer4(out3)out5 = self.out_layer5(out4)out6 = self.out_layer6(out5)return out1, out2, out3, out4, out5, out6

经过 extra_feature 层，我们将会得到六层特征层的输出，对应上面图片的六层输出层，特征层的大小依次为：（38, 38）、（19， 19）、（10， 10）、（5， 5）、（3， 3）、（1， 1）。

输出头

得到六个输出层之后，我们需要对六层输出的通道数目进行调整，使其为我们需要分类的数目和需要输出的检测框的数目的大小。具体代码如下：

def mutiboxhead(num_classes):"""搭建SSD多尺度检测头"""num_anchors = [4, 6, 6, 6, 4, 4] # 每个尺度的锚框数量num_channels = [512, 1024, 512, 256, 256, 256] # 每个尺度的通道数量cls_predictors = [] # 类别预测器bbox_predictors = [] # 边界框预测器for i in range(6):cls_predictors.append(nn.Conv2d(num_channels[i], num_anchors[i] * (num_classes + 1), kernel_size=3, padding=1))bbox_predictors.append(nn.Conv2d(num_channels[i], num_anchors[i] * 4, kernel_size=3, padding=1))cls_predictors = nn.ModuleList(cls_predictors)bbox_predictors = nn.ModuleList(bbox_predictors)return cls_predictors, bbox_predictors

根据上面构建的所有东西来实现我们的SSD网络，实现代码如下：

class SSD(nn.Module):def __init__(self, input_shape=(3, 300, 300), num_classes=20, **kwargs):super(SSD, self).__init__(**kwargs)self.num_classes = num_classesself.input_shape = input_shapeself.extra_feature = extra_feature(input_shape)self.cls_predictors, self.bbox_predictors = mutiboxhead(num_classes)def forward(self, x):x = self.extra_feature(x)cls_preds = []bbox_preds = []for i, feature in enumerate(x):cls_preds.append(self.cls_predictors[i](feature).permute(0, 2, 3, 1)) # (batch_size, num_classes, h, w) -> (batch_size, h, w, num_classes)bbox_preds.append(self.bbox_predictors[i](feature).permute(0, 2, 3, 1)) # (batch_size, 4, h, w) -> (batch_size, h, w, 4)bbox_preds = torch.cat([pred.reshape(pred.shape[0], -1, 4) for pred in bbox_preds], dim=1) # (batch_size, num_anchors, 4)cls_preds = torch.cat([pred.reshape(pred.shape[0], -1, self.num_classes + 1) for pred in cls_preds], dim=1) # (batch_size, num_anchors, num_classes + 1)return bbox_preds, cls_preds # (batch_size, num_anchors, 4), (batch_size, num_anchors, num_classes + 1)

如果对搭建的网络参数或者输出的形状不是很放心的话，可以输出和跟踪一下看看具体的网络的架构。

if __name__ == "__main__":x = torch.randn(1, 3, 300, 300)ssd = SSD((3, 300, 300), 20)cls_preds, bbox_preds = ssd(x)print(cls_preds[0].shape, bbox_preds[0].shape)

数据处理

一般情况下，我们使用 COCO 或者 VOC 数据集进行训练，这里以 VOC 数据进行讲解，如何加载数据。

形成训练TXT

VOC 的标签为 XML 文件，在网络加载数据的时候，不是很方便，这里将其处理为 TXT 文件，方便我们读取。处理的结果如下：

每一行分别为：图片的路径，x_min y_min x_max y_max label … x_min y_min x_max y_max label
在后面加载数据的时候，直接读取即可，稍微方便一点。
处理的代码如下;

# 从 voc 数据集中读取数据，并生成可用于训练、验证和测试txt文件
import os
try:import xml.etree.cElementTree as ET
except ImportError:import xml.etree.ElementTree as ETdef get_class_names(class_names_path):"""获取数据集的类别名字Args:class_names_path (str): 数据集路径Returns:class_names (list): 类别名字列表"""with open(class_names_path, "r") as f:class_names = f.readlines()class_names = {c.strip(): i  for i, c in enumerate(class_names)}return class_names, len(class_names)def write_txt(txt_path, JPEGImages_path, annotation_path, txt_f, name_dict):"""将txt文件写入到txt_pathArgs:txt_path (str): txt文件保存路径txt_f (list): txt文件内容"""with open(txt_path, "r") as f:img_name_lists = f.readlines()train_name_lists = [img_name.strip() for img_name in img_name_lists]for train_name in train_name_lists:txt_f.write(os.path.join(JPEGImages_path, train_name + ".jpg").replace('\\', '/'))xml_name = train_name + ".xml"xml_path = os.path.join(annotation_path, xml_name)tree = ET.parse(xml_path)root = tree.getroot()for obj in root.iter("object"):cls = name_dict[obj.find("name").text]xmlbox = obj.find("bndbox")xmin = xmlbox.find("xmin").textymin = xmlbox.find("ymin").textxmax = xmlbox.find("xmax").textymax = xmlbox.find("ymax").texttxt_f.write(f",{xmin} {ymin} {xmax} {ymax} {cls}")txt_f.write("\n")def convert_annotation_2_txt(dataset_path, txt_path):"""将voc数据集的xml标注转换为txt格式Args:dataset_path (str): 数据集路径txt_path (str): txt文件保存路径"""annotation_path = os.path.join(dataset_path, "Annotations")ImageSets_path = os.path.join(dataset_path, "ImageSets/Main")JPEGImages_path = os.path.join(dataset_path, "JPEGImages")train_txt_f = open(os.path.join(txt_path, "train.txt"), "w")val_txt_f = open(os.path.join(txt_path, "val.txt"), "w")test_txt_f = open(os.path.join(txt_path, "test.txt"), "w")name_dict, _ = get_class_names(os.path.join(txt_path, "voc_classes.txt"))write_txt(os.path.join(ImageSets_path, "train.txt"), JPEGImages_path, annotation_path, train_txt_f, name_dict)train_txt_f.close()write_txt(os.path.join(ImageSets_path, "val.txt"), JPEGImages_path, annotation_path, val_txt_f, name_dict)val_txt_f.close()write_txt(os.path.join(ImageSets_path, "test.txt"), JPEGImages_path, annotation_path, test_txt_f, name_dict)test_txt_f.close()if __name__ == "__main__":convert_annotation_2_txt("你的\VOC07+12", "输出txt想要保存的地点")

运行完以上程序，会形成 train.txt, test.txt val.txt，分别对应训练集、验证集和测试集。训练的时候读取对应的txt即可。

Dataset

获得完上面的 txt，我们就可以编写自己的 Dataset 函数，形成自己在网络训练时候的数据加载方式。具体代码如下：

# 加载数据集的函数
class SSDDataset(Dataset):"""定义自己的数据集加载方式"""def __init__(self, data_lines, transform=None):"""初始化文件路径和数据增强所需的信息Args:data (list): 文件路径列表transform: torchvision.transforms"""self.data_lines = data_linesself.transform = transformdef __len__(self):"""返回数据集的总长度Returns:length (int): 数据集的总长度"""return len(self.data_lines)def __getitem__(self, index):"""通过索引返回一个数据Args:index (int): 索引Returns:img (Tensor): 图像数据label (Tensor): 标签数据"""data_line = self.data_lines[index].split(",")image       = np.array((Image.open(data_line[0]).convert("RGB")))bboxes      = np.array([list(map(int, box.split(' '))) for box in data_line[1:]])if self.transform is not None:augmentations = self.transform(image=image, bboxes=bboxes)image = augmentations["image"]bboxes = augmentations["bboxes"]return image, bboxes

也就是通过输入的 index，来获取图片和对应的标签信息， transform 是是否采用数据增强，数据增强的具体代码如下：

train_transforms = A.Compose([# 保持图片的比例进行图片的放大A.LongestMaxSize(max_size=300),# 不保证图片比例进行放大，需要的时候会对图片添加 0 A.PadIfNeeded(min_height=300, min_width=300, border_mode=cv2.BORDER_CONSTANT),# 进行正则化A.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010], max_pixel_value=255,),# 转变为 tensorToTensorV2(),],# 对检测框进行调整bbox_params=A.BboxParams(format="pascal_voc", label_fields=[],check_each_transform=False),
) val_transforms = A.Compose([# 保持图片的比例进行图片的放大A.LongestMaxSize(max_size=300),# 不保证图片比例进行放大，需要的时候会对图片添加 0 A.PadIfNeeded(min_height=300, min_width=300, border_mode=cv2.BORDER_CONSTANT),# 进行正则化A.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010], max_pixel_value=255,),# 转变为 tensorToTensorV2(),],# 对检测框进行调整bbox_params=A.BboxParams(format="pascal_voc", label_fields=[], check_each_transform=False),
)

因为我们使用的格式为 pascal_voc 的格式，所以这里的我们选择的增强方式为 pascal_voc，如果使用的为 coco 的话，换成 coco 即可。
一般情况下，在训练的时候和在验证的时候采用的数据增强手段是不一样的，训练的时候，我们强调数据的多样性，会使用较多的数据增强的手段，为反转、随机剪裁、色域变换等，但是在验证的时候，我们强调当前网络的性能，一般采取较少的数据增强手段。

注意：我们在对图片进行数据增强的时候，要保证检测的一致性。

DataLoader

刚刚只是定义了如何获取图片，但是如何设置为可以供 pytorch 使用，需要放到 DataLoader 中，形成迭代器，来实现不间断的拿取 batch_size 的图片。
具体代码如下：

def load_data(root_path, batch_size=8, num_workers=0):# 第一步：加载数据集with open(os.path.join(root_path, "train.txt"), "r") as f:train_lines = f.readlines()with open(os.path.join(root_path, "train.txt"), "r") as f:val_lines = f.readlines()train_dataset = SSDDataset(train_lines, train_transforms)val_dataset = SSDDataset(val_lines, val_transforms)train_iter = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, collate_fn=collate_fn)test_iter = DataLoader(val_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers, collate_fn=collate_fn)return train_iter, test_iterdef collate_fn(batch):"""自定义一个函数，将一个batch的数据拼接起来Args:batch (list): batch数据Returns:images (Tensor): 图像数据bboxes (list): 边界框数据"""images = []bboxes = []for image, bbox in batch:images.append(image)bboxes.append(torch.tensor(bbox))return torch.stack(images, dim=0), bboxes

Anchors

生成先验框

SSD 网络需要生成在不同的特征层生成先验框，以满足对不同大、小物体检测的需求；具体实现代码如下：

# SSD 生成 anchors 的方式, 中心宽高的方式
def generate_anchors(feature_maps_size=[38, 19, 10, 5, 3, 1], image_size=300, steps=[8, 16, 32, 64, 100, 300], min_sizes=[30, 60, 111, 162, 213, 264], max_sizes=[60, 111, 162, 213, 264, 315], aspect_ratios=[[2], [2, 3], [2, 3], [2, 3], [2], [2]], clip=True):mean = []for k, f in enumerate(feature_maps_size): #[38, 19, 10, 5, 3, 1],for i, j in product(range(f), repeat=2):f_k = image_size / steps[k]# unit center x,ycx = (j + 0.5) / f_kcy = (i + 0.5) / f_k# aspect_ratio: 1# rel size: min_sizes_k = min_sizes[k] / image_sizemean += [cx, cy, s_k, s_k]# aspect_ratio: 1# rel size: sqrt(s_k * s_(k+1))s_k_prime = sqrt(s_k * (max_sizes[k]/image_size))mean += [cx, cy, s_k_prime, s_k_prime]# rest of aspect ratiosfor ar in aspect_ratios[k]:mean += [cx, cy, s_k*sqrt(ar), s_k/sqrt(ar)]mean += [cx, cy, s_k/sqrt(ar), s_k*sqrt(ar)]# back to torch landprior_anchors = torch.Tensor(mean).view(-1, 4)  # * image_size ##[8732,4]# 对于超出边界的先验框进行裁剪到边界内if clip:prior_anchors.clamp_(max=1, min=0)# 返回的时候转化为 corner 的数据格式，并且乘上 图片 对应的尺度获取真实大小return box_center_to_corner(prior_anchors) * image_size

匹配先验框

在训练的时候，我们需要将真实的标签框和先验框进行匹配，来获取训练的正样本。也就是先确保每个标签框有一个先验框负责预测，完成一对一的匹配。如果其他的先验框没有被分配，且和其中某个标签框的 iou 也很大，也将对列为正样本，完成先验框 -> 标签框的多对一的匹配。具体技术细节，请参考：沐神的动手学深度学习的 13.4. 锚框的章节。
具体实现代码如下：

def assign_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5):""" 为锚框分配边界框Args:ground_truth (Tensor): 边界框，形状为 (n, 4)anchors (Tensor): 锚框，形状为 (k, 4)device (torch.device): 用于分配张量的设备iou_threshold (float): 阈值Returns:Tensor: 所分配的边界框，形状为 (k, 4)"""num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]# 计算锚框和边界框的交并比anchors_boxes_iou = box_iou(anchors, ground_truth)# 为每个锚框分配边界框的索引, 形状为 (k, ), 默认为 -1anchor_to_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long, device=device)# 找到每个 anchors 对应的最大 iou 值和索引max_ious, indices = torch.max(anchors_boxes_iou, dim=1)# 将交并比大于阈值的锚框分配给边界框anchor_to_bbox_map[max_ious >= iou_threshold] = indices[max_ious >= iou_threshold]# 找到应该丢弃的行和列，也就是找到每一个 gt 真实值匹配的 anchor 所以对应的行和列col_discard = torch.full((num_anchors, ), -1)row_discard = torch.full((num_gt_boxes,), -1)# 每个 iou 较大的 anchor只保留一个gt对应for _ in range(num_gt_boxes):# 找到目前 iou 最大的位置，因为没有指定维度的话，默认将所有维度参与# anchors_boxes_iou 的维度为 ： [num_anchors, num_gt_boxes]max_idx = torch.argmax(anchors_boxes_iou)# 找到对应的列box_idx = (max_idx % num_gt_boxes).long()# 找到对应的行anc_idx = (max_idx / num_gt_boxes).long()# 将对应的 gt 作为 anchor 的真值anchor_to_bbox_map[anc_idx] = box_idx# 取出对应的行和列anchors_boxes_iou[:, box_idx] = col_discardanchors_boxes_iou[anc_idx, :] = row_discardreturn anchor_to_bbox_map

位置 offset

网络并不是直接预测标签的中心点和宽高，那样会增加网络预测的难度，我们预测的时候，先验框相对于标签框的偏移量，包括中心点的偏移量和宽高的偏移量。参考沐神动手学深度的讲解如下;

具体代码如下：

def offset_boxes(c_anc, assigned_bbox, eps=1e-6):"""根据所分配的边界框来调整锚框Args:c_anc (Tensor): 锚框，形状为 (n, 4) center 形状assigned_bbox (Tensor): 所分配的边界框，形状为 (n, 4) corner 形状eps (float): 一个极小值，防止被零整除Returns:Tensor: 调整后的锚框，形状为 (n, 4)"""c_anc = box_corner_to_center(c_anc)c_assigned_bbox = box_corner_to_center(assigned_bbox)offset_xy = 10.0 * (c_assigned_bbox[:, :2] - c_anc[:, :2]) / (c_anc[:, 2:] + eps)offset_wh = 5.0 * torch.log((c_assigned_bbox[:, 2:] + eps) / (c_anc[:, 2:] + eps))offset = torch.cat([offset_xy, offset_wh], axis=1)return offset

这些偏移量将作为训练的时候，真正的标签。

损失函数

有了网络的输出和真正的偏移量的标签，我们就可以定义我们的损失函数了。
关于 SSD 损失讲解，可以参考大神的解释：链接
简单来说，分为两个部分：

前面是置信度损失函数，后面是位置损失函数，中间为权重因子。再详细划分如下：

根据上面公式，定义损失函数如下：

def loc_loss(self, loc_preds, bbox_labels, bbox_masks):"""计算位置损失Args:loc_preds (Tensor): 预测的位置结果，形状为 (batch_size, num_anchors, 4)bbox_labels (Tensor): 真实的位置标签，形状为 (batch_size, num_anchors, 4)bbox_masks (Tensor): 真实的位置标签的掩码，形状为 (batch_size, num_anchors, 4)Returns:Tensor: 位置损失"""# 第一步：计算损失return  F.l1_loss(loc_preds * bbox_masks, bbox_labels * bbox_masks, reduction='sum')def conf_loss(self, conf_preds, cls_labels):"""计算分类损失Args:conf_preds (Tensor): 预测的分类结果，形状为 (batch_size, num_anchors, num_classes)cls_labels (Tensor): 真实的分类标签，形状为 (batch_size, num_anchors, 1)Returns:Tensor: 分类损失"""# 第一步：计算正样本的损失# 获取正样本的索引pos_mask = cls_labels > 1# 获取正样本的数量num_pos = pos_mask.long().sum(dim=1, keepdim=True) # (batch_size, 1)# 计算正样本的损失, 交叉熵损失函数的输入为 (N, C) 和 (N, )，其中 N 为样本数量，C 为类别数量，会自动找到(N, C)中的对应的值，计算损失pos_loss = F.cross_entropy( conf_preds.reshape(-1, conf_preds.shape[-1])[pos_mask.reshape(-1)], cls_labels[pos_mask])# 第二步：计算负样本的损失# 获取负样本的索引neg_mask = cls_labels == 0# 计算负样本的损失neg_loss = F.cross_entropy(conf_preds.reshape(-1, conf_preds.shape[-1])[neg_mask.reshape(-1)], cls_labels[neg_mask])# 第三步：计算损失loss = pos_loss +  neg_loss# 返回损失return loss

根据上面的损失函数，可以定义我们训练的总的损失函数类：具体代码如下：

class MultiBoxLoss():"""根据预测的结果计算损失函数"""def __init__(self, num_classes, device, alpha=1) -> None:super().__init__()self.num_classes   = num_classesself.image_size    = 300self.prior_anchors = generate_anchors()self.device        = deviceself.alpha         = alphadef __call__(self, net_output, labels):"""计算损失函数Args:net_output (tuple): 网络输出，包含回归预测和分类预测，形状为 loc_preds:(batch_size, num_anchors, 4), conf_preds: (batch_size, num_anchors, 21)labels (Tensor): 标签，形状为 (n, 5)，其中 n 是所有边界框的数量, 5 表示 (类别, x, y, w, h)Returns:Tensor: 损失值"""# 第一步：获取网络输出 (batch_size, num_anchors, 4), (batch_size, num_anchors, 21)loc_preds, conf_preds = net_outputbatch_size, device = len(labels), loc_preds.devicenum_anchors = self.prior_anchors.shape[0] # self.prior_anchors (num_anchors, 4)# 第二步：为不同大小的特征层生成对应的 anchors 找到对应 target 标号, 并且将其转换为相对于 anchor 的偏移量bbox_labels, bbox_masks, cls_labels = multibox_target(self.prior_anchors, labels)cls_labels = cls_labels.to(device)bbox_labels = bbox_labels.to(device)bbox_masks = bbox_masks.to(device)num_prior_anchors = bbox_masks[:,:, 0].sum() # (batch_size, 1)# 第三步：计算损失# 将预测的结果转换为 (batch_size, num_anchors, num_classes + 1)， + 1 为背景类conf_preds = conf_preds.view(batch_size, num_anchors, self.num_classes + 1)# 计算分类损失cls_loss = self.conf_loss(conf_preds, cls_labels)# 计算位置损失loc_loss = self.loc_loss(loc_preds, bbox_labels, bbox_masks)all_loss = (cls_loss + self.alpha * loc_loss) / num_prior_anchorsreturn all_lossdef loc_loss(self, loc_preds, bbox_labels, bbox_masks):"""计算位置损失Args:loc_preds (Tensor): 预测的位置结果，形状为 (batch_size, num_anchors, 4)bbox_labels (Tensor): 真实的位置标签，形状为 (batch_size, num_anchors, 4)bbox_masks (Tensor): 真实的位置标签的掩码，形状为 (batch_size, num_anchors, 4)Returns:Tensor: 位置损失"""# 第一步：计算损失return  F.l1_loss(loc_preds * bbox_masks, bbox_labels * bbox_masks, reduction='sum')def conf_loss(self, conf_preds, cls_labels):"""计算分类损失Args:conf_preds (Tensor): 预测的分类结果，形状为 (batch_size, num_anchors, num_classes)cls_labels (Tensor): 真实的分类标签，形状为 (batch_size, num_anchors, 1)Returns:Tensor: 分类损失"""# 第一步：计算正样本的损失# 获取正样本的索引pos_mask = cls_labels > 1# 获取正样本的数量num_pos = pos_mask.long().sum(dim=1, keepdim=True) # (batch_size, 1)# 计算正样本的损失, 交叉熵损失函数的输入为 (N, C) 和 (N, )，其中 N 为样本数量，C 为类别数量，会自动找到(N, C)中的对应的值，计算损失pos_loss = F.cross_entropy( conf_preds.reshape(-1, conf_preds.shape[-1])[pos_mask.reshape(-1)], cls_labels[pos_mask])# 第二步：计算负样本的损失# 获取负样本的索引neg_mask = cls_labels == 0# 计算负样本的损失neg_loss = F.cross_entropy(conf_preds.reshape(-1, conf_preds.shape[-1])[neg_mask.reshape(-1)], cls_labels[neg_mask])# 第三步：计算损失loss = pos_loss +  neg_loss# 返回损失return loss#-------------------------------##   计算预测结果#-------------------------------#def eval(self, net_output, labels, show_img_flag=False):"""计算准确率Args:net_output (tuple): 网络输出，包含回归预测和分类预测，形状为 (loc_preds (1, num_anchors, 4), conf_preds (1, num_anchors, 21))labels (Tensor): 标签，形状为 (n, 5)，其中 n 是所有边界框的数量, 5 表示 (类别, x, y, w, h)Returns:Tensor: 损失值"""# 第一步：获取网络输出 (1, num_anchors, 4), (1, num_anchors, 21)loc_preds, conf_preds = net_outputbatch_size, device = len(labels), loc_preds.device# self.prior_anchors (num_anchors, 4)num_anchors = self.prior_anchors.shape[0]# 第二步：为不同大小的特征层生成对应的 anchors 找到对应 target 标号, 并且将其转换为相对于 anchor 的偏移量bbox_labels, bbox_masks, cls_labels = multibox_target(self.prior_anchors, labels)cls_labels = cls_labels.to(device)bbox_labels = bbox_labels.to(device)bbox_masks = bbox_masks.to(device)num_prior_anchors = bbox_masks[:,:, 0].sum() # (1,)# 第三步：计算准确度# 计算分类准确度cls_acc = self.cls_eval(conf_preds, cls_labels) / (batch_size * num_anchors)#计算分类损失cls_loss = self.conf_loss(conf_preds, cls_labels) # / (batch_size * num_anchors)# 计算位置损失bbox_loss = self.bbox_eval(loc_preds, bbox_labels, bbox_masks) #  / (batch_size * num_anchors)# 返回准确度return cls_acc, cls_loss / num_prior_anchors , bbox_loss / num_prior_anchors def cls_eval(self, cls_preds, cls_labels):# 由于类别预测结果放在最后一维，argmax需要指定最后一维。return float((cls_preds.argmax(dim=-1).type(cls_labels.dtype) == cls_labels).sum())def bbox_eval(self, bbox_preds, bbox_labels, bbox_masks):return float((torch.abs((bbox_labels - bbox_preds) * bbox_masks)).sum())

其中 multibox_target 函数的作用为：为不同大小的特征层生成对应的 anchors 找到对应 target 标号, 并且将其转换为相对于 anchor 的偏移量

def multibox_target(anchors, labels):""" 使用真实边界框标记锚框Args:anchors (Tensor): 先验框，对应的大小为，(num_anchors, 4)labels (Tensor): 多个图片的真实标签，(batch_size, num_labels, 4)  都是 corner 的形式 Returns:bbox_offset  (batch_size, num_boxes, 4)bbox_mask    (batch_size, num_boxes, 4)class_labels (batch_size, num_boxes, 1)"""# labels 为多个图片的box的标签合集，维度为 [batch_size, num_gt, 4 + 1]， 其中 1 为类别batch_size, anchors = len(labels), anchors.squeeze(0)batch_offset, batch_mask, batch_class_labels = [], [], []device, num_anchors = anchors.device, anchors.shape[0]for i in range(batch_size):# 取出第 i 张图片对应的 box 的 gtlabel = labels[i]# 找到和标签 box 最接近的真实标签框，作为训练的正样本anchor_bbox_map = assign_anchor_to_bbox(label[:, :4], anchors, device)# 构造 mask 矩阵，大于零的我们认为找打了相应的gt，扩展到4维的目的是为了后面bbox的四个点的maskbbox_mask = ((anchor_bbox_map >= 0).float().unsqueeze(-1)).repeat(1, 4)# 将类标签和分配的边界框坐标初始化为0class_labels = torch.zeros(num_anchors, dtype=torch.long, device=device)assigned_bbox = torch.zeros((num_anchors, 4), dtype=torch.float32, device=device)# 使用真实边界框来标记锚框的类别# 如果一个锚框没有被分配，标记为背景，值为零indices_true = torch.nonzero(anchor_bbox_map >= 0)# 拿出已经分配好的锚框对应的 gt bboxbbox_idx = anchor_bbox_map[indices_true]# 将类别和对应的标签进行赋值。label + 1 的原因是为了和 0 进行区别。class_labels[indices_true] = label[bbox_idx, 4].long() + 1assigned_bbox[indices_true] = label[bbox_idx, :4].float()# 偏移量的转换，得到真实框和对应anchor之间的位置偏移offset = offset_boxes(anchors, assigned_bbox) * bbox_mask# 记录一下对应的关系batch_offset.append(offset)batch_mask.append(bbox_mask)batch_class_labels.append(class_labels)bbox_offset = torch.stack(batch_offset)bbox_mask = torch.stack(batch_mask)class_labels = torch.stack(batch_class_labels)return bbox_offset, bbox_mask, class_labels

训练

定义相关的优化器、学习率等，我们就可以开始我们的训练了；具体训练代码如下;

if __name__ == "__main__":# -------------------超参数-------------------#batch_size = 32num_workers = 4input_size = (3, 300, 300)num_classes = 20 learning_rate = 1e-2epochs = 100weight_decay = 5e-4root_path = "./data"# -------------------超参数-------------------## 设置使用的设备# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")device = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")# 第一步：加载数据集train_iter, test_iter = load_data(root_path, batch_size, num_workers)# 第二步：定义模型net = SSD(input_size, num_classes)net.to(device)# summary(net,input_size=(3, 300, 300))# 第三步：定义损失函数Loss = MultiBoxLoss(num_classes, device)# 第四步：定义优化器optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9, weight_decay=weight_decay)# 第五步：训练模型train_day = datetime.now().strftime("%Y_%m_%d_%H_%M_%S")print("#", "-"*30, "训练参数", "-"*30, "#")print("训练日期:{0:>65}".format(train_day))print("训练设备:{0:>65}".format(str(device)))print("训练数据集大小:{0:>59}".format(len(train_iter) * batch_size))print("测试数据集大小:{0:>59}".format(len(test_iter) * batch_size))print("训练批次大小:{0:>61}".format(batch_size))print("训练轮次:{0:>65}".format(epochs))print("学习率:{0:>67}".format(learning_rate))print("权重衰减:{0:>65}".format(weight_decay))print("#", "-"*30, "训练开始", "-"*30, "#")train_file = "./log/train_log_" + train_dayos.makedirs(train_file, exist_ok=True)min_loss = 1e+10for epoch in range(epochs):net.train()phgr = tqdm(train_iter, total=len(train_iter), desc="Train epoch " + str(epoch + 1) + "/" + str(epochs))epoch_train_losses = []for i, (images, labels) in enumerate(phgr):# 将数据转换为 cuda 的 tensorimages = images.to(device)# 前向传播net_output = net(images)# 计算损失net_loss = Loss(net_output, labels)loss = net_loss.item()epoch_train_losses.append(loss)# 反向传播optimizer.zero_grad()net_loss.backward()optimizer.step()phgr.set_postfix({"loss": loss})# 保存最优模型epoch_train_loss = (sum(epoch_train_losses) / len(epoch_train_losses))print("epoch_train_loss:", epoch_train_loss, " min_loss:", min_loss)if epoch_train_loss < min_loss:min_loss = epoch_train_losssave_weight_path = train_file + f"/ssd_best_loss.pth"torch.save(net.state_dict(), save_weight_path) if epoch % 5 == 0 or epoch == epochs - 1:net.eval()epoch_test_losses = []show_img_flag = Truewith torch.no_grad():phgr_test = tqdm(test_iter, total=len(test_iter), desc="Test epoch " + str(epoch + 1) + "/" + str(epochs))for images, labels in phgr_test:# 将数据转换为 cuda 的 tensortensor_images = images.to(device)# 前向传播net_output = net(tensor_images)if show_img_flag:# 显示第一张图片的预测结果作为显示# decoder_out = decoder(net_output, Loss.prior_anchors)# out_frame = draw_infer_box(images[0], decoder_out[0], decoder_out[1], decoder_out[2], class_names, colors)# cv2.imwrite("train_eval_result.png", out_frame)show_img_flag = False# 计算准确度cls_acc, cls_loss, bbox_loss = Loss.eval(net_output, labels)epoch_test_losses.append(cls_loss + bbox_loss)phgr_test.set_postfix({"cls_acc": cls_acc," cls_loss": cls_loss.item(), " bbox_loss": bbox_loss})epoch_test_loss = (sum(epoch_test_losses) / len(epoch_test_losses))save_weight_path = train_file + f"/ssd_epoch_{epoch + 1}_testloss_{0:.3f}_trainloss_{0:.3f}.pth".format(epoch_test_loss, epoch_train_loss)

代码及参考

完整包含预测的代码地址为：
参考
1、沐神的动手学深度学习

目标检测第三篇：基于SSD的目标检测算法相关推荐

【NodeJs-5天学习】第三天实战篇③ ——基于MQTT的环境温度检测
[NodeJs-5天学习]第三天实战篇③ --基于MQTT的环境温度检测 1. 前言 2.实现思路 2.1 NodeJs服务器代码 2.2.1 本地部署MQTT服务器,端口1883 2.2.1.1 用 ...
【物联网服务NodeJs-5天学习】第三天实战篇③ ——基于MQTT的环境温度检测
[NodeJs-5天学习]第三天实战篇③ --基于MQTT的环境温度检测 1. 前言 2.实现思路 2.1 NodeJs服务器代码 2.2.1 本地部署MQTT服务器,端口1883 2.2.1.1 用 ...
【NodeJs-5天学习】第四天存储篇④ ——基于MQTT的环境温度检测，升级存储为mysql
[NodeJs-5天学习]第四天存储篇④ --基于MQTT的环境温度检测,升级存储为mysql 1. 前言 2. 服务器代码 2.1 配置MySQL服务器 2.2 NodeJs服务器代码 2.2.1 ...
【物联网服务NodeJs-5天学习】第四天存储篇④ ——基于MQTT的环境温度检测，升级存储为mysql
[NodeJs-5天学习]第四天存储篇④ --基于MQTT的环境温度检测,升级存储为mysql 1. 前言 2. 服务器代码 2.1 配置MySQL服务器 2.2 NodeJs服务器代码 2.2.1 ...
自动驾驶采标系列三：基于图像的目标检测技术
标注猿的第54篇原创一个用数据视角看AI世界的标注猿上一篇文章我们从"环境感知"数据的采集设备上进行了详细说明,已经了解了相应设备采集的数据及采集前 ...
智慧交通day02-车流量检测实现12：基于yoloV3的目标检测
在本章节代码编写中,发现之前的代码所处的环境是python3,因此导致了cv2.dnn.readNetFromDarknet()在代码运行中导致了i[0]的获值失败,故总结如下: cv2.dnn.re ...
python运动目标检测与跟踪_基于OpenCV的运动目标检测与跟踪
尹俊超,刘直芳:基于 OpenCV 的运动目标检测与跟踪 2011, V ol.32, No.8 2817 0 引言运动目标检测跟踪技术在航空航天遥感. 生物医学. 工业自动化生产. 军事公安目 ...
基于51单片机的光照强度检测c语言程序,基于51单片机光照强度检测报告.doc
基于51单片机光照强度检测报告课程设计报告课程名称: 智能仪器课程设计题目: 基于51单片机的光照强度摘要光敏电阻测光强度系统,该系统可以自动检测光照强度的强弱并显示让人们知道此时光照强度 ...
电感检测_三、电感线圈的识别与检测（二）
变压器变压器也是一种电感器.它是利用两个电感线圈靠近时的互感现象工作的,是电子产品中十分常见的元件.变压器是将两组或两组以上线圈绕在同一骨架上,并在绕好的线圈中插入铁芯或磁芯等导磁材料而构成,它在 ...

目标检测第三篇：基于SSD的目标检测算法