说明

本博客代码来自开源项目：《动手学深度学习》(PyTorch版)
并且在博主学习的理解上对代码进行了大量注释，方便理解各个函数的原理和用途

配置环境

使用环境：python3.8
平台：Windows10
IDE：PyCharm

此节说明

此节对应书本上9.5节
此节功能为：多尺度目标检测
由于此节相对复杂，代码注释量较多

代码

# 本书链接https://tangshusen.me/Dive-into-DL-PyTorch/#/
# 7.7 AdaDelta算法
# 注释：黄文俊
# E-mail：hurri_cane@qq.comfrom matplotlib import pyplot as plt
from PIL import Image
import numpy as np
import torchimport sys
sys.path.append("..")
import d2lzh_pytorch as d2limg = Image.open('F:/PyCharm/Learning_pytorch/data/img/catdog.jpg')
w, h = img.size # (728, 561)d2l.set_figsize()def display_anchors(fmap_w, fmap_h, s):# 前两维的取值不影响输出结果(原书这里是(1, 10, fmap_w, fmap_h), 我认为错了)fmap = torch.zeros((1, 10, fmap_h, fmap_w), dtype=torch.float32)# 平移所有锚框使均匀分布在图片上offset_x, offset_y = 1.0/fmap_w, 1.0/fmap_hanchors = d2l.MultiBoxPrior(fmap, sizes=s, ratios=[1, 2, 0.5]) + torch.tensor([offset_x/2, offset_y/2, offset_x/2, offset_y/2])# d2l.MultiBoxPrior函数用处：指定输入（fmap）、一组大小和一组宽高比，该函数将返回输入的所有锚框。'''这里之所以说会均匀采样，是因为在图像位置标示值中都采用了归一化，及所有图像上的位置都可以用两个0到1的数表示。通过anchors=d2l.MultiBoxPrior(fmap,sizes=s,ratios=[1,2,0.5])+torch.tensor([offset_x/2,offset_y/2,offset_x/2,offset_y/2])得到的Anchors是针对fmap的anchor，其形状为1，fmap的像素高宽乘积再乘上设定的锚框高宽比长度，4其实就是返回fmap的像素高宽乘积再乘上设定的锚框高宽比长度个锚框，每个锚框包含4个坐标，坐标值为归一化之后的值在后面绘制目标图像（非fmap）时，因为采用的是归一化位置大小来表示锚框位置，所以本来在fmap上紧密排列的锚框被均匀分布了'''bbox_scale = torch.tensor([[w, h, w, h]], dtype=torch.float32)d2l.show_bboxes(d2l.plt.imshow(img).axes,anchors[0] * bbox_scale)display_anchors(fmap_w=4, fmap_h=2, s=[0.15])
# 锚框大小s = 0.15的意思是，锚框的基准长度为整个图像长度的0.15倍
plt.show()# 特征图的高和宽分别减半，并用更大的锚框检测更大的目标。当锚框大小设0.4时，有些锚框的区域有重合
display_anchors(fmap_w=2, fmap_h=1, s=[0.4])
plt.show()# 最后，我们将特征图的宽进一步减半至1，并将锚框大小增至0.8。此时锚框中心即图像中心
display_anchors(fmap_w=1, fmap_h=1, s=[0.8])
plt.show()print("*"*50)# 为锚框标注类别和偏移量
# 以下函数已保存在d2lzh_pytorch包中方便以后使用
def assign_anchor(bb, anchor, jaccard_threshold=0.5):"""Args:bb: 真实边界框(bounding box), shape:（nb, 4）anchor: 待分配的anchor, shape:（na, 4）jaccard_threshold: 预先设定的阈值Returns:assigned_idx: shape: (na, ), 每个anchor分配的真实bb对应的索引, 若未分配任何bb则为-1"""na = anchor.shape[0]nb = bb.shape[0]jaccard = compute_jaccard(anchor, bb).detach().cpu().numpy()    # shape: (na, nb)assigned_idx = np.ones(na) * -1  # 初始全为-1# 先为每个bb分配一个anchor(不要求满足jaccard_threshold)jaccard_cp = jaccard.copy()for j in range(nb):i = np.argmax(jaccard_cp[:, j])assigned_idx[i] = jjaccard_cp[i, :] = float("-inf")    # 赋值为负无穷, 相当于去掉这一行# 处理还未被分配的anchor, 要求满足jaccard_thresholdfor i in range(na):if assigned_idx[i] == -1:j = np.argmax(jaccard[i, :])if jaccard[i, j] >= jaccard_threshold:assigned_idx[i] = jreturn torch.tensor(assigned_idx, dtype=torch.long)def xy_to_cxcy(xy):"""Args:xy: bounding boxes in boundary coordinates, a tensor of size (n_boxes, 4)Returns:bounding boxes in center-size coordinates, a tensor of size (n_boxes, 4)"""return torch.cat([(xy[:, 2:] + xy[:, :2]) / 2,  # c_x, c_yxy[:, 2:] - xy[:, :2]], 1)  # w, hdef MultiBoxTarget(anchor, label):"""Args:anchor: torch tensor, 输入的锚框, 一般是通过MultiBoxPrior生成, shape:（1，锚框总数，4）label: 真实标签, shape为(bn, 每张图片最多的真实锚框数, 5)第二维中，如果给定图片没有这么多锚框, 可以先用-1填充空白, 最后一维中的元素为[类别标签, 四个坐标值]Returns:列表, [bbox_offset, bbox_mask, cls_labels]bbox_offset: 每个锚框的标注偏移量，形状为(bn，锚框总数*4)bbox_mask: 形状同bbox_offset, 每个锚框的掩码, 一一对应上面的偏移量, 负类锚框(背景)对应的掩码均为0, 正类锚框的掩码均为1cls_labels: 每个锚框的标注类别, 其中0表示为背景, 形状为(bn，锚框总数)"""assert len(anchor.shape) == 3 and len(label.shape) == 3bn = label.shape[0]def MultiBoxTarget_one(anc, lab, eps=1e-6):"""MultiBoxTarget函数的辅助函数, 处理batch中的一个Args:anc: shape of (锚框总数, 4)lab: shape of (真实锚框数, 5), 5代表[类别标签, 四个坐标值]eps: 一个极小值, 防止log0Returns:offset: (锚框总数*4, )bbox_mask: (锚框总数*4, ), 0代表背景, 1代表非背景cls_labels: (锚框总数, 4), 0代表背景"""an = anc.shape[0]assigned_idx = assign_anchor(lab[:, 1:], anc)   # (锚框总数, )# 将真实锚框的位置代入assign_anchor函数，而不代入真实锚框的编号bbox_mask = ((assigned_idx >= 0).float().unsqueeze(-1)).repeat(1, 4)    # (锚框总数, 4)# .repeat(1, 4)第一列元素重复4次cls_labels = torch.zeros(an, dtype=torch.long)  # 0表示背景assigned_bb = torch.zeros((an, 4), dtype=torch.float32) # 所有anchor对应的bb坐标for i in range(an):bb_idx = assigned_idx[i]if bb_idx >= 0: # 即非背景cls_labels[i] = lab[bb_idx, 0].long().item() + 1 # 注意要加一assigned_bb[i, :] = lab[bb_idx, 1:]center_anc = xy_to_cxcy(anc) # (center_x, center_y, w, h)center_assigned_bb = xy_to_cxcy(assigned_bb)offset_xy = 10.0 * (center_assigned_bb[:, :2] - center_anc[:, :2]) / center_anc[:, 2:]offset_wh = 5.0 * torch.log(eps + center_assigned_bb[:, 2:] / center_anc[:, 2:])offset = torch.cat([offset_xy, offset_wh], dim = 1) * bbox_mask # (锚框总数, 4)return offset.view(-1), bbox_mask.view(-1), cls_labelsbatch_offset = []batch_mask = []batch_cls_labels = []for b in range(bn):offset, bbox_mask, cls_labels = MultiBoxTarget_one(anchor[0, :, :], label[b, :, :])batch_offset.append(offset)batch_mask.append(bbox_mask)batch_cls_labels.append(cls_labels)bbox_offset = torch.stack(batch_offset)bbox_mask = torch.stack(batch_mask)cls_labels = torch.stack(batch_cls_labels)return [bbox_offset, bbox_mask, cls_labels]# 我们通过unsqueeze函数为锚框和真实边界框添加样本维。
labels = MultiBoxTarget(anchors.unsqueeze(dim=0),ground_truth.unsqueeze(dim=0))print(labels)
print("*"*50)# 9.4.4 输出预测边界框# 非极大值抑制
# 以下函数已保存在d2lzh_pytorch包中方便以后使用
from collections import namedtuple
Pred_BB_Info = namedtuple("Pred_BB_Info", ["index", "class_id", "confidence", "xyxy"])def non_max_suppression(bb_info_list, nms_threshold = 0.5):"""非极大抑制处理预测的边界框Args:bb_info_list: Pred_BB_Info的列表, 包含预测类别、置信度等信息nms_threshold: 阈值Returns:output: Pred_BB_Info的列表, 只保留过滤后的边界框信息"""output = []# 先根据置信度从高到低排序sorted_bb_info_list = sorted(bb_info_list, key = lambda x: x.confidence, reverse=True)while len(sorted_bb_info_list) != 0:best = sorted_bb_info_list.pop(0)output.append(best)if len(sorted_bb_info_list) == 0:breakbb_xyxy = []for bb in sorted_bb_info_list:bb_xyxy.append(bb.xyxy)iou = compute_jaccard(torch.tensor([best.xyxy]),torch.tensor(bb_xyxy))[0] # shape: (len(sorted_bb_info_list), )n = len(sorted_bb_info_list)sorted_bb_info_list = [sorted_bb_info_list[i] for i in range(n) if iou[i] <= nms_threshold]return outputdef MultiBoxDetection(cls_prob, loc_pred, anchor, nms_threshold = 0.5):"""Args:cls_prob: 经过softmax后得到的各个锚框的预测概率, shape:(bn, 预测总类别数+1, 锚框个数)loc_pred: 预测的各个锚框的偏移量, shape:(bn, 锚框个数*4)anchor: MultiBoxPrior输出的默认锚框, shape: (1, 锚框个数, 4)nms_threshold: 非极大抑制中的阈值Returns:所有锚框的信息, shape: (bn, 锚框个数, 6)每个锚框信息由[class_id, confidence, xmin, ymin, xmax, ymax]表示class_id=-1 表示背景或在非极大值抑制中被移除了"""assert len(cls_prob.shape) == 3 and len(loc_pred.shape) == 2 and len(anchor.shape) == 3bn = cls_prob.shape[0]def MultiBoxDetection_one(c_p, l_p, anc, nms_threshold = 0.5):"""MultiBoxDetection的辅助函数, 处理batch中的一个Args:c_p: (预测总类别数+1, 锚框个数)l_p: (锚框个数*4, )anc: (锚框个数, 4)nms_threshold: 非极大抑制中的阈值Return:output: (锚框个数, 6)"""pred_bb_num = c_p.shape[1]anc = (anc + l_p.view(pred_bb_num, 4)).detach().cpu().numpy()   # 加上偏移量confidence, class_id = torch.max(c_p, 0)confidence = confidence.detach().cpu().numpy()class_id = class_id.detach().cpu().numpy()pred_bb_info = [Pred_BB_Info(index = i,class_id = class_id[i] - 1, # 正类label从0开始confidence = confidence[i],xyxy=[*anc[i]]) # xyxy是个列表for i in range(pred_bb_num)]# 正类的indexobj_bb_idx = [bb.index for bb in non_max_suppression(pred_bb_info, nms_threshold)]output = []for bb in pred_bb_info:output.append([(bb.class_id if bb.index in obj_bb_idx else -1.0),bb.confidence,*bb.xyxy])return torch.tensor(output) # shape: (锚框个数, 6)batch_output = []for b in range(bn):batch_output.append(MultiBoxDetection_one(cls_prob[b], loc_pred[b], anchor[0], nms_threshold))return torch.stack(batch_output)anchors = torch.tensor([[0.1, 0.08, 0.52, 0.92], [0.08, 0.2, 0.56, 0.95],[0.15, 0.3, 0.62, 0.91], [0.55, 0.2, 0.9, 0.88]])
offset_preds = torch.tensor([0.0] * (4 * len(anchors)))
cls_probs = torch.tensor([[0., 0., 0., 0.,],  # 背景的预测概率[0.9, 0.8, 0.7, 0.1],  # 狗的预测概率[0.1, 0.2, 0.3, 0.9]])  # 猫的预测概率fig = d2l.plt.imshow(img)
show_bboxes(fig.axes, anchors * bbox_scale,['dog=0.9', 'dog=0.8', 'dog=0.7', 'cat=0.9'])
plt.show()output = MultiBoxDetection(cls_probs.unsqueeze(dim=0), offset_preds.unsqueeze(dim=0),anchors.unsqueeze(dim=0), nms_threshold=0.5)
print(output)fig = d2l.plt.imshow(img)
for i in output[0].detach().cpu().numpy():if i[0] == -1:continuelabel = ('dog=', 'cat=')[int(i[0])] + str(i[1])show_bboxes(fig.axes, [torch.tensor(i[2:]) * bbox_scale], label)
plt.show()print("*"*50)

《动手学深度学习》(PyTorch版)代码注释 - 48 【Multi-scale_target_detection】相关推荐

伯禹公益AI《动手学深度学习PyTorch版》Task 04 学习笔记
伯禹公益AI<动手学深度学习PyTorch版>Task 04 学习笔记 Task 04:机器翻译及相关技术:注意力机制与Seq2seq模型:Transformer 微信昵称:WarmIce ...
伯禹公益AI《动手学深度学习PyTorch版》Task 07 学习笔记
伯禹公益AI<动手学深度学习PyTorch版>Task 07 学习笔记 Task 07:优化算法进阶:word2vec:词嵌入进阶微信昵称:WarmIce 优化算法进阶 emmmm,讲实 ...
伯禹公益AI《动手学深度学习PyTorch版》Task 03 学习笔记
伯禹公益AI<动手学深度学习PyTorch版>Task 03 学习笔记 Task 03:过拟合.欠拟合及其解决方案:梯度消失.梯度爆炸:循环神经网络进阶微信昵称:WarmIce 过拟合. ...
【动手学深度学习PyTorch版】6 权重衰退
上一篇移步[动手学深度学习PyTorch版]5 模型选择 + 过拟合和欠拟合_水w的博客-CSDN博客目录一.权重衰退 1.1 权重衰退 weight decay:处理过拟合的最常见方法(L2_p ...
【动手学深度学习PyTorch版】12 卷积层
上一篇移步[动手学深度学习PyTorch版]11 使用GPU_水w的博客-CSDN博客目录一.卷积层 1.1从全连接到卷积 ◼ 回顾单隐藏层MLP ◼ Waldo在哪里? ◼ 原则1-平移不变性 ...
【动手学深度学习PyTorch版】27 数据增强
上一篇请移步[动手学深度学习PyTorch版]23 深度学习硬件CPU 和 GPU_水w的博客-CSDN博客目录一.数据增强 1.1 数据增强(主要是关于图像增强) ◼ CES上的真实的故事 ◼ ...
【动手学深度学习PyTorch版】13 卷积层的填充和步幅
上一篇移步[动手学深度学习PyTorch版]12 卷积层_水w的博客-CSDN博客目录一.卷积层的填充和步幅 1.1 填充 1.2 步幅 1.3 总结二.代码实现填充和步幅(使用框架) 一.卷积 ...
【动手学深度学习PyTorch版】23 深度学习硬件CPU 和 GPU
上一篇请移步[动手学深度学习PyTorch版]22续 ResNet为什么能训练出1000层的模型_水w的博客-CSDN博客目录一.深度学习硬件CPU 和 GPU 1.1 深度学习硬件 ◼ 计算机构 ...
【动手学深度学习PyTorch版】15 池化层
上一篇请移步[动手学深度学习PyTorch版]14 卷积层里的多输入多输出通道_水w的博客-CSDN博客目录一.池化层 1.1 池化层 ◼池化层原因 ◼ 二维最大池化 1.2 填充.步幅与多个通道 ...
伯禹公益AI《动手学深度学习PyTorch版》Task 05 学习笔记
伯禹公益AI<动手学深度学习PyTorch版>Task 05 学习笔记 Task 05:卷积神经网络基础:LeNet:卷积神经网络进阶微信昵称:WarmIce 昨天打了一天的<大革 ...

《动手学深度学习》(PyTorch版)代码注释 - 48 【Multi-scale_target_detection】

目录

说明

配置环境

此节说明

代码

《动手学深度学习》(PyTorch版)代码注释 - 48 【Multi-scale_target_detection】相关推荐

最新文章

热门文章