下文所有代码：https://pan.baidu.com/s/1p-Q-edFXXcvzxlZNd9saOw 提取码：x72s
原理可以参考：yolov1-v5学习笔记及源码解读

1 目录结构

config文件夹:
coco.data：用于存放训练数据的索引
yolov3.cfg：用于存放网络的具体参数（所有网络的配置层信息）

data文件夹： 用于存放所有的训练数据
coco.name:存放类别名

utils文件夹：
datasets.py：为数据准备的脚本
logger.py ：为日志生成脚本
utils.py ：一些功能函数
parse_config.py：获取config文件中参数的实现
weights文件夹 ：下存放预训练模型
models.py ：网络模型搭建的具体脚本

2 train.py

# 导入数据库
from __future__ import divisionfrom models import *
from utils.logger import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *
from test import evaluateimport warnings
warnings.filterwarnings("ignore")from terminaltables import AsciiTableimport os
import sys
import time
import datetime
import argparseimport torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision import transforms
from torch.autograd import Variable
import torch.optim as optimif __name__ == "__main__":#传递训练参数parser = argparse.ArgumentParser()parser.add_argument("--epochs", type=int, default=100, help="number of epochs")parser.add_argument("--batch_size", type=int, default=4, help="size of each image batch")parser.add_argument("--gradient_accumulations", type=int, default=2, help="number of gradient accums before step")parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")parser.add_argument("--data_config", type=str, default="config/coco.data", help="path to data config file")parser.add_argument("--pretrained_weights", type=str,default="weights/darknet53.conv.74", help="if specified starts from checkpoint model")parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")parser.add_argument("--checkpoint_interval", type=int, default=1, help="interval between saving model weights")parser.add_argument("--evaluation_interval", type=int, default=1, help="interval evaluations on validation set")parser.add_argument("--compute_map", default=False, help="if True computes mAP every tenth batch")parser.add_argument("--multiscale_training", default=True, help="allow for multi-scale training")opt = parser.parse_args()print(opt)logger = Logger("logs")#选用训练设备device = torch.device("cuda" if torch.cuda.is_available() else "cpu")#创建输出存放文件夹os.makedirs("output", exist_ok=True)os.makedirs("checkpoints", exist_ok=True)# Get data configuration获取数据配置（数据的位置索引，及类别）data_config = parse_data_config(opt.data_config) #{ 'gpus ': '0,1,2,3'，'num_workers ': '10'，'classes ': '80', 'train': 'data/coco/trainvalno5k.txt ', 'valid ':'data/coco/5k.txt '，'names ' : 'data/coco.names ','backup ': 'backup/', 'eval ' : 'coco'}train_path = data_config["train"]  #train_path: 'data/coco/trainvalno5k.txt 'valid_path = data_config["valid"]  #valid_path: 'data/coco/5k.txt'class_names = load_classes(data_config["names"])  #class_names: [ 'person', 'bicycle' , 'car', 'motorbike ', 'aeroplane', "bus'，'train', 'truck',# Initiate model 初始化模型，在这里开始按照模型配置参数搭建模型model = Darknet(opt.model_def).to(device)model.apply(weights_init_normal)# If specified we start from checkpoint  加载预训练模型if opt.pretrained_weights:if opt.pretrained_weights.endswith(".pth"):model.load_state_dict(torch.load(opt.pretrained_weights))else:model.load_darknet_weights(opt.pretrained_weights)# Get dataloader  #加载训练数据（图片与标签）dataset = ListDataset(train_path, augment=True, multiscale=opt.multiscale_training)#创建一个训练数据的投放器dataloader = torch.utils.data.DataLoader(dataset,batch_size=opt.batch_size,shuffle=True,num_workers=opt.n_cpu,pin_memory=True,collate_fn=dataset.collate_fn,)#优化器设置optimizer = torch.optim.Adam(model.parameters())#存放数据名，后面用于保存相应的日志metrics = ["grid_size","loss","x","y","w","h","conf","cls","cls_acc","recall50","recall75","precision","conf_obj","conf_noobj",]#每个epoch中进行循环读取数据并训练for epoch in range(opt.epochs):model.train()# 记录开始时间start_time = time.time()#在投放器中依次读取训练数据for batch_i, (_, imgs, targets) in enumerate(dataloader):#记录当前所处的batch数batches_done = len(dataloader) * epoch + batch_iimgs = Variable(imgs.to(device))targets = Variable(targets.to(device), requires_grad=False)print ('imgs',imgs.shape)print ('targets',targets.shape)#模型训练 前向传播loss, outputs = model(imgs, targets)#反向传播loss.backward()#每隔一段次数进行优化（梯度更新）if batches_done % opt.gradient_accumulations:# Accumulates gradient before each stepoptimizer.step()optimizer.zero_grad()# ----------------#   Log progress  训练日志# ----------------log_str = "\n---- [Epoch %d/%d, Batch %d/%d] ----\n" % (epoch, opt.epochs, batch_i, len(dataloader))metric_table = [["Metrics", *[f"YOLO Layer {i}" for i in range(len(model.yolo_layers))]]]# Log metrics at each YOLO layer# 记录上面metrics列表中各项在训练过程中的参数结果for i, metric in enumerate(metrics):formats = {m: "%.6f" for m in metrics}formats["grid_size"] = "%2d"formats["cls_acc"] = "%.2f%%"row_metrics = [formats[metric] % yolo.metrics.get(metric, 0) for yolo in model.yolo_layers]metric_table += [[metric, *row_metrics]]# Tensorboard logging     Tensorboard可视化tensorboard_log = []for j, yolo in enumerate(model.yolo_layers):for name, metric in yolo.metrics.items():if name != "grid_size":tensorboard_log += [(f"{name}_{j+1}", metric)]tensorboard_log += [("loss", loss.item())]logger.list_of_scalars_summary(tensorboard_log, batches_done)log_str += AsciiTable(metric_table).tablelog_str += f"\nTotal loss {loss.item()}"# Determine approximate time left for epochepoch_batches_left = len(dataloader) - (batch_i + 1)time_left = datetime.timedelta(seconds=epoch_batches_left * (time.time() - start_time) / (batch_i + 1))log_str += f"\n---- ETA {time_left}"print(log_str)model.seen += imgs.size(0)#每隔一个epoch更新计算当前评价指标if epoch % opt.evaluation_interval == 0:print("\n---- Evaluating Model ----")# Evaluate the model on the validation setprecision, recall, AP, f1, ap_class = evaluate(model,path=valid_path,iou_thres=0.5,conf_thres=0.5,nms_thres=0.5,img_size=opt.img_size,batch_size=8,)evaluation_metrics = [("val_precision", precision.mean()),("val_recall", recall.mean()),("val_mAP", AP.mean()),("val_f1", f1.mean()),]logger.list_of_scalars_summary(evaluation_metrics, epoch)# Print class APs and mAPap_table = [["Index", "Class name", "AP"]]for i, c in enumerate(ap_class):ap_table += [[c, class_names[c], "%.5f" % AP[i]]]print(AsciiTable(ap_table).table)print(f"---- mAP {AP.mean()}")#每隔几个epoch存一次当前训好的模型if epoch % opt.checkpoint_interval == 0:torch.save(model.state_dict(), f"checkpoints/yolov3_ckpt_%d.pth" % epoch)

2.1 数据读取 dataset.py

dataset.py脚本中：

import glob
import random
import os
import sys
import numpy as np
from PIL import Image
import torch
import torch.nn.functional as Ffrom utils.augmentations import horisontal_flip
from torch.utils.data import Dataset
import torchvision.transforms as transforms#图片的填补（填成正方形）
def pad_to_square(img, pad_value):c, h, w = img.shapedim_diff = np.abs(h - w)# (upper / left) padding and (lower / right) paddingpad1, pad2 = dim_diff // 2, dim_diff - dim_diff // 2# Determine paddingpad = (0, 0, pad1, pad2) if h <= w else (pad1, pad2, 0, 0)# Add paddingimg = F.pad(img, pad, "constant", value=pad_value)return img, paddef resize(image, size):image = F.interpolate(image.unsqueeze(0), size=size, mode="nearest").squeeze(0)return imagedef random_resize(images, min_size=288, max_size=448):new_size = random.sample(list(range(min_size, max_size + 1, 32)), 1)[0]images = F.interpolate(images, size=new_size, mode="nearest")return images#检测时数据准备
class ImageFolder(Dataset):def __init__(self, folder_path, img_size=416):self.files = sorted(glob.glob("%s/*.*" % folder_path))self.img_size = img_sizedef __getitem__(self, index):img_path = self.files[index % len(self.files)]# Extract image as PyTorch tensorimg = transforms.ToTensor()(Image.open(img_path))# Pad to square resolutionimg, _ = pad_to_square(img, 0)# Resizeimg = resize(img, self.img_size)return img_path, imgdef __len__(self):return len(self.files)# 训练时的数据准备
class ListDataset(Dataset):#传入训练数据路径def __init__(self, list_path, img_size=416, augment=True, multiscale=True, normalized_labels=True):with open(list_path, "r") as file:#读取图片路径self.img_files = file.readlines()#利用图片路径获取同名的标签数据路径self.label_files = [path.replace("images", "labels").replace(".png", ".txt").replace(".jpg", ".txt")for path in self.img_files]self.img_size = img_sizeself.max_objects = 100self.augment = augmentself.multiscale = multiscaleself.normalized_labels = normalized_labelsself.min_size = self.img_size - 3 * 32self.max_size = self.img_size + 3 * 32self.batch_count = 0def __getitem__(self, index):# ---------#  Image# ---------img_path = self.img_files[index % len(self.img_files)].rstrip()#这个路径是自己文件所在位置img_path = 'E:...\\PyTorch-YOLOv3\\data\\coco' + img_path#print (img_path)# Extract image as PyTorch tensor 图片格式转换img = transforms.ToTensor()(Image.open(img_path).convert('RGB'))# Handle images with less than three channels#通道数不够的进行填补if len(img.shape) != 3:img = img.unsqueeze(0)img = img.expand((3, img.shape[1:]))_, h, w = img.shapeh_factor, w_factor = (h, w) if self.normalized_labels else (1, 1)# Pad to square resolutionimg, pad = pad_to_square(img, 0)_, padded_h, padded_w = img.shape# ---------#  Label# ---------label_path = self.label_files[index % len(self.img_files)].rstrip()label_path = 'E:...\\PyTorch-YOLOv3\\data\\coco\\labels' + label_path#print (label_path)targets = None# 对标签数据按照操作进行相应的转换if os.path.exists(label_path):boxes = torch.from_numpy(np.loadtxt(label_path).reshape(-1, 5))# Extract coordinates for unpadded + unscaled imagex1 = w_factor * (boxes[:, 1] - boxes[:, 3] / 2)y1 = h_factor * (boxes[:, 2] - boxes[:, 4] / 2)x2 = w_factor * (boxes[:, 1] + boxes[:, 3] / 2)y2 = h_factor * (boxes[:, 2] + boxes[:, 4] / 2)# Adjust for added paddingx1 += pad[0]y1 += pad[2]x2 += pad[1]y2 += pad[3]# Returns (x, y, w, h)boxes[:, 1] = ((x1 + x2) / 2) / padded_wboxes[:, 2] = ((y1 + y2) / 2) / padded_hboxes[:, 3] *= w_factor / padded_wboxes[:, 4] *= h_factor / padded_htargets = torch.zeros((len(boxes), 6))targets[:, 1:] = boxes# Apply augmentationsif self.augment:if np.random.random() < 0.5:img, targets = horisontal_flip(img, targets)return img_path, img, targetsdef collate_fn(self, batch):paths, imgs, targets = list(zip(*batch))# Remove empty placeholder targetstargets = [boxes for boxes in targets if boxes is not None]# Add sample index to targetsfor i, boxes in enumerate(targets):boxes[:, 0] = itargets = torch.cat(targets, 0)# Selects new image size every tenth batchif self.multiscale and self.batch_count % 10 == 0:self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))# Resize images to input shapeimgs = torch.stack([resize(img, self.img_size) for img in imgs])self.batch_count += 1return paths, imgs, targetsdef __len__(self):return len(self.img_files)

2.2 网络搭建 models.py

2.2.1 搭建模型

models.py中

def create_modules(module_defs):"""Constructs module list of layer blocks from module configuration in module_defs"""# 将yolov3.cfg中net部分数据提取出来,其中pop(0)是第一个[net]  module_defs中就剩下了其他[层]hyperparams = module_defs.pop(0)"""hyperparams {'type': 'net', 'batch': '16', 'subdivisions': '1', 'width': '416', 'height': '416', 'channels': '3', \'momentum': '0.9', 'decay': '0.0005', 'angle': '0', 'saturation': '1.5', 'exposure': '1.5', 'hue': '.1',\'learning_rate': '0.001', 'burn_in': '1000', 'max_batches': '500200', 'policy': 'steps', 'steps': '400000,450000', 'scales': '.1,.1'}"""output_filters = [int(hyperparams["channels"])]# module_list 用于存放模型块，一块一块的搭建module_list = nn.ModuleList()#在剩下的层中进行遍历，其中module_defs为：'''module_defs [{'type': 'convolutional', 'batch_normalize': '1', 'filters': '32', 'size': '3', 'stride': '1', 'pad': '1', 'activation': 'leaky'}, \{'type': 'convolutional', 'batch_normalize': '1', 'filters': '64', 'size': '3', 'stride': '2', 'pad': '1', 'activation': 'leaky'},\{'type': 'convolutional', 'batch_normalize': '1', 'filters': '32', 'size': '1', 'stride': '1', 'pad': '1', 'activation': 'leaky'},\{'type': 'convolutional', 'batch_normalize': '1', 'filters': '64', 'size': '3', 'stride': '1', 'pad': '1', 'activation': 'leaky'}, \{'type': 'shortcut', 'from': '-3', 'activation': 'linear'},……'''for module_i, module_def in enumerate(module_defs):modules = nn.Sequential()#如果是convolutional：if module_def["type"] == "convolutional":#把cfg中记录的该层的参数读出来bn = int(module_def["batch_normalize"])filters = int(module_def["filters"])kernel_size = int(module_def["size"])pad = (kernel_size - 1) // 2modules.add_module(f"conv_{module_i}",nn.Conv2d(#输入的卷积核个数为上一层的输出in_channels=output_filters[-1],#输出的卷积核个数out_channels=filters,#卷积核尺寸kernel_size=kernel_size,# 步长stride=int(module_def["stride"]),#填充padding=pad,#是否偏置bias=not bn,),)#设置搭建bn层if bn:modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))# 激活层reluif module_def["activation"] == "leaky":modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))#其他层同上：elif module_def["type"] == "maxpool":kernel_size = int(module_def["size"])stride = int(module_def["stride"])if kernel_size == 2 and stride == 1:modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, padding=int((kernel_size - 1) // 2))modules.add_module(f"maxpool_{module_i}", maxpool)elif module_def["type"] == "upsample":upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")modules.add_module(f"upsample_{module_i}", upsample)#路由层用于拼接，也就是论文中上采样以后得到与之前层一致的结构进行拼接elif module_def["type"] == "route": # 输入1：26*26*256 输入2：26*26*128  输出：26*26*（256+128）layers = [int(x) for x in module_def["layers"].split(",")]filters = sum([output_filters[1:][i] for i in layers])modules.add_module(f"route_{module_i}", EmptyLayer())#残差连接，只是单纯的加法，不是维度上拼接（区别于上者）elif module_def["type"] == "shortcut":filters = output_filters[1:][int(module_def["from"])]modules.add_module(f"shortcut_{module_i}", EmptyLayer())#论文中yolo层一共有三层，对应着三个尺度的检测器'''[yolo] mask = 3,4,5 anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1'''elif module_def["type"] == "yolo":#指定先验框的id (一个id对应的3种尺度的框)anchor_idxs = [int(x) for x in module_def["mask"].split(",")]# Extract anchors   anchors=w,h,w,h,w,h,w,h,w,h,w,h,w,h,w,h,w,hanchors = [int(x) for x in module_def["anchors"].split(",")]# 取到每组框的长宽  anchors=(w,h),(w,h),(w,h),(w,h),(w,h),(w,h),(w,h),(w,h),(w,h)anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]#每个anchor——id得到三个框anchors=(w,h),(w,h),(w,h)anchors = [anchors[i] for i in anchor_idxs]num_classes = int(module_def["classes"])img_size = int(hyperparams["height"])# Define detection layer 定义yolo层（具体看下节）yolo_layer = YOLOLayer(anchors, num_classes, img_size)modules.add_module(f"yolo_{module_i}", yolo_layer)# Register module list and number of output filters#每循环一次搭建一块网络，将这块网络append到模型块中module_list.append(modules)output_filters.append(filters)return hyperparams, module_list

2.2.2 yolo层的实现

详细看下构建yolo层的实现，主要是由一些loss需要计算更新：

也就是上图中的参数需要对应的更新求解。

class YOLOLayer(nn.Module):"""Detection layer"""def __init__(self, anchors, num_classes, img_dim=416):#初始化一些参数super(YOLOLayer, self).__init__()self.anchors = anchorsself.num_anchors = len(anchors)self.num_classes = num_classesself.ignore_thres = 0.5self.mse_loss = nn.MSELoss()self.bce_loss = nn.BCELoss()self.obj_scale = 1self.noobj_scale = 100self.metrics = {}self.img_dim = img_dimself.grid_size = 0  # grid size#计算网格偏移量def compute_grid_offsets(self, grid_size, cuda=True):self.grid_size = grid_sizeg = self.grid_sizeFloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensorself.stride = self.img_dim / self.grid_size# Calculate offsets for each grid#将网格做成坐标盘self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)#实际anchors框的大小经过比例缩放后，与grid的比例尺一致self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))def forward(self, x, targets=None, img_dim=None):# Tensors for cuda supportprint (x.shape)  #[4,255,15,15] batch,filter,15*15#指定格式FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensorLongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensorByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensorself.img_dim = img_dimnum_samples = x.size(0)grid_size = x.size(2) #网格大小#reshape操作，调整了下位置prediction = (x.view(num_samples, self.num_anchors, self.num_classes + 5, grid_size, grid_size).permute(0, 1, 3, 4, 2).contiguous())print (prediction.shape)# Get outputs#这里的x,y的坐标是相对与当前网格左上角的归一化的坐标x = torch.sigmoid(prediction[..., 0])  # Center xy = torch.sigmoid(prediction[..., 1])  # Center yw = prediction[..., 2]  # Widthh = prediction[..., 3]  # Heightpred_conf = torch.sigmoid(prediction[..., 4])  # Confpred_cls = torch.sigmoid(prediction[..., 5:])  # Cls pred.# If grid size does not match current we compute new offsetsif grid_size != self.grid_size:self.compute_grid_offsets(grid_size, cuda=x.is_cuda) #相对位置得到对应的绝对位置比如之前的位置是0.5,0.5变为 11.5，11.5这样的# Add offset and scale with anchors #特征图中的实际位置pred_boxes = FloatTensor(prediction[..., :4].shape)pred_boxes[..., 0] = x.data + self.grid_xpred_boxes[..., 1] = y.data + self.grid_ypred_boxes[..., 2] = torch.exp(w.data) * self.anchor_wpred_boxes[..., 3] = torch.exp(h.data) * self.anchor_houtput = torch.cat( (pred_boxes.view(num_samples, -1, 4) * self.stride, #还原到原始图中pred_conf.view(num_samples, -1, 1),pred_cls.view(num_samples, -1, self.num_classes),),-1,)#预测值最后格式（4，3，13，13）4为batch，3为先验框个数，13为网格数#为了计算与真实值的差异，就需要将真实的标签转换为与预测标签一致的格式if targets is None:return output, 0else:#利用build_targets处理真实数据标签iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf = build_targets(pred_boxes=pred_boxes,pred_cls=pred_cls,target=targets,anchors=self.scaled_anchors,ignore_thres=self.ignore_thres,)# iou_scores：真实值与最匹配的anchor的IOU得分值 class_mask：分类正确的索引  obj_mask：目标框所在位置的最好anchor置为1 noobj_mask obj_mask那里置0，还有计算的iou大于阈值的也置0，其他都为1 tx, ty, tw, th, 对应的对于该大小的特征图的xywh目标值也就是我们需要拟合的值 tconf 目标置信度# Loss : Mask outputs to ignore non-existing objects (except with conf. loss)  计算损失loss_x = self.mse_loss(x[obj_mask], tx[obj_mask]) # 只计算有目标的loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])loss_h = self.mse_loss(h[obj_mask], th[obj_mask])loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask]) loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj #有物体越接近1越好 没物体的越接近0越好loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask]) #分类损失total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls #总损失# Metrics  评价指标cls_acc = 100 * class_mask[obj_mask].mean()conf_obj = pred_conf[obj_mask].mean()conf_noobj = pred_conf[noobj_mask].mean()conf50 = (pred_conf > 0.5).float()iou50 = (iou_scores > 0.5).float()iou75 = (iou_scores > 0.75).float()detected_mask = conf50 * class_mask * tconfprecision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)self.metrics = {"loss": to_cpu(total_loss).item(),"x": to_cpu(loss_x).item(),"y": to_cpu(loss_y).item(),"w": to_cpu(loss_w).item(),"h": to_cpu(loss_h).item(),"conf": to_cpu(loss_conf).item(),"cls": to_cpu(loss_cls).item(),"cls_acc": to_cpu(cls_acc).item(),"recall50": to_cpu(recall50).item(),"recall75": to_cpu(recall75).item(),"precision": to_cpu(precision).item(),"conf_obj": to_cpu(conf_obj).item(),"conf_noobj": to_cpu(conf_noobj).item(),"grid_size": grid_size,}return output, total_loss

build_targets（utils.py中）：

def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensorFloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensornB = pred_boxes.size(0) # batchsieze 4nA = pred_boxes.size(1) # 每个格子对应了多少个anchor   3nC = pred_cls.size(-1)  # 类别的数量   80nG = pred_boxes.size(2) # gridsize# Output tensors#先初始化，拿固定值初始化obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)  # obj，anchor包含物体, 即为1，默认为0 考虑前景noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1) # noobj, anchor不包含物体, 则为1，默认为1 考虑背景class_mask = FloatTensor(nB, nA, nG, nG).fill_(0) # 类别掩膜，类别预测正确即为1，默认全为0iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0) # 预测框与真实框的iou得分tx = FloatTensor(nB, nA, nG, nG).fill_(0) # 真实框相对于网格的位置ty = FloatTensor(nB, nA, nG, nG).fill_(0)tw = FloatTensor(nB, nA, nG, nG).fill_(0) th = FloatTensor(nB, nA, nG, nG).fill_(0)tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)# Convert to position relative to boxtarget_boxes = target[:, 2:6] * nG #target中的xywh都是0-1的，可以得到其在当前gridsize上的xywh（原理见上图）gxy = target_boxes[:, :2] #拿到对应比例尺下的数据gwh = target_boxes[:, 2:]# Get anchors with best iou  获取最好的iouious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors]) #每一种规格的anchor跟每个标签上的框的IOU得分print (ious.shape)best_ious, best_n = ious.max(0) # 得到其最高分以及哪种规格框和当前目标最相似# Separate target valuesb, target_l abels = target[:, :2].long().t() # 真实框所对应的batch，以及每个框所代表的实际类别gx, gy = gxy.t()gw, gh = gwh.t()gi, gj = gxy.long().t() #位置信息，向下取整了# Set masksobj_mask[b, best_n, gj, gi] = 1 # 实际包含物体的设置成1noobj_mask[b, best_n, gj, gi] = 0 # 相反# Set noobj mask to zero where iou exceeds ignore thresholdfor i, anchor_ious in enumerate(ious.t()): # IOU超过了指定的阈值就相当于有物体了noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0# Coordinatestx[b, best_n, gj, gi] = gx - gx.floor() # 根据真实框所在位置，得到其相当于网络的位置ty[b, best_n, gj, gi] = gy - gy.floor()# Width and heighttw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)# One-hot encoding of labeltcls[b, best_n, gj, gi, target_labels] = 1 #将真实框的标签转换为one-hot编码形式# Compute label correctness and iou at best anchor 计算预测的和真实一样的索引class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False) #与真实框想匹配的预测框之间的iou值tconf = obj_mask.float() # 真实框的置信度，也就是1return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf

2.2.3 darknet进行模型前向传播

在train.py脚本中，通过darknet进行了模型的建立

loss计算、模型更新：

具体darknet：

class Darknet(nn.Module):"""YOLOv3 object detection model"""#初始化模型参数，构建相应对象def __init__(self, config_path, img_size=416):super(Darknet, self).__init__()#获取模型参数self.module_defs = parse_model_config(config_path)#利用上面create_modules模块搭建模型self.hyperparams, self.module_list = create_modules(self.module_defs)#yolo层self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]self.img_size = img_sizeself.seen = 0self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)def forward(self, x, targets=None):img_dim = x.shape[2]#初始化lossloss = 0#建立存放输出结果的列表layer_outputs, yolo_outputs = [], []for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):if module_def["type"] in ["convolutional", "upsample", "maxpool"]:#利用模型计算x = module(x)elif module_def["type"] == "route":#拼接x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)elif module_def["type"] == "shortcut":#残差layer_i = int(module_def["from"])x = layer_outputs[-1] + layer_outputs[layer_i] #-1和layer_i进行相加elif module_def["type"] == "yolo":#yolo层计算loss(具体见2节)x, layer_loss = module[0](x, targets, img_dim)loss += layer_lossyolo_outputs.append(x)layer_outputs.append(x)yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))return yolo_outputs if targets is None else (loss, yolo_outputs)

另外，上采样及空层（占位）：

class Upsample(nn.Module):""" nn.Upsample is deprecated """def __init__(self, scale_factor, mode="nearest"):super(Upsample, self).__init__()self.scale_factor = scale_factorself.mode = modedef forward(self, x):x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)return xclass EmptyLayer(nn.Module):"""Placeholder for 'route' and 'shortcut' layers"""def __init__(self):super(EmptyLayer, self).__init__()

3 test.py

和train.py内容类似，参考train.py

from __future__ import divisionfrom models import *
from utils.utils import *
from utils.datasets import *
from utils.parse_config import *import os
import sys
import time
import datetime
import argparse
import tqdmimport torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torchvision import transforms
from torch.autograd import Variable
import torch.optim as optimdef evaluate(model, path, iou_thres, conf_thres, nms_thres, img_size, batch_size):model.eval()# Get dataloaderdataset = ListDataset(path, img_size=img_size, augment=False, multiscale=False)dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=1, collate_fn=dataset.collate_fn)Tensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensorlabels = []sample_metrics = []  # List of tuples (TP, confs, pred)for batch_i, (_, imgs, targets) in enumerate(tqdm.tqdm(dataloader, desc="Detecting objects")):# Extract labelslabels += targets[:, 1].tolist()# Rescale targettargets[:, 2:] = xywh2xyxy(targets[:, 2:])targets[:, 2:] *= img_sizeimgs = Variable(imgs.type(Tensor), requires_grad=False)with torch.no_grad():outputs = model(imgs)outputs = non_max_suppression(outputs, conf_thres=conf_thres, nms_thres=nms_thres)sample_metrics += get_batch_statistics(outputs, targets, iou_threshold=iou_thres)# Concatenate sample statisticstrue_positives, pred_scores, pred_labels = [np.concatenate(x, 0) for x in list(zip(*sample_metrics))]precision, recall, AP, f1, ap_class = ap_per_class(true_positives, pred_scores, pred_labels, labels)return precision, recall, AP, f1, ap_classif __name__ == "__main__":parser = argparse.ArgumentParser()parser.add_argument("--batch_size", type=int, default=8, help="size of each image batch")parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")parser.add_argument("--data_config", type=str, default="config/coco.data", help="path to data config file")parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")parser.add_argument("--iou_thres", type=float, default=0.5, help="iou threshold required to qualify as detected")parser.add_argument("--conf_thres", type=float, default=0.001, help="object confidence threshold")parser.add_argument("--nms_thres", type=float, default=0.5, help="iou thresshold for non-maximum suppression")parser.add_argument("--n_cpu", type=int, default=8, help="number of cpu threads to use during batch generation")parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")opt = parser.parse_args()print(opt)device = torch.device("cuda" if torch.cuda.is_available() else "cpu")data_config = parse_data_config(opt.data_config)valid_path = data_config["valid"]class_names = load_classes(data_config["names"])# Initiate modelmodel = Darknet(opt.model_def).to(device)if opt.weights_path.endswith(".weights"):# Load darknet weightsmodel.load_darknet_weights(opt.weights_path)else:# Load checkpoint weightsmodel.load_state_dict(torch.load(opt.weights_path))print("Compute mAP...")precision, recall, AP, f1, ap_class = evaluate(model,path=valid_path,iou_thres=opt.iou_thres,conf_thres=opt.conf_thres,nms_thres=opt.nms_thres,img_size=opt.img_size,batch_size=8,)print("Average Precisions:")for i, c in enumerate(ap_class):print(f"+ Class '{c}' ({class_names[c]}) - AP: {AP[i]}")print(f"mAP: {AP.mean()}")

4 detect.py

from __future__ import divisionfrom models import *
from utils.utils import *
from utils.datasets import *import os
import sys
import time
import datetime
import argparsefrom PIL import Imageimport torch
from torch.utils.data import DataLoader
from torchvision import datasets
from torch.autograd import Variableimport matplotlib.pyplot as plt
import matplotlib.patches as patches
from matplotlib.ticker import NullLocatorif __name__ == "__main__":parser = argparse.ArgumentParser()parser.add_argument("--image_folder", type=str, default="data/samples", help="path to dataset")parser.add_argument("--model_def", type=str, default="config/yolov3.cfg", help="path to model definition file")parser.add_argument("--weights_path", type=str, default="weights/yolov3.weights", help="path to weights file")parser.add_argument("--class_path", type=str, default="data/coco.names", help="path to class label file")parser.add_argument("--conf_thres", type=float, default=0.8, help="object confidence threshold")parser.add_argument("--nms_thres", type=float, default=0.4, help="iou thresshold for non-maximum suppression")parser.add_argument("--batch_size", type=int, default=1, help="size of the batches")parser.add_argument("--n_cpu", type=int, default=0, help="number of cpu threads to use during batch generation")parser.add_argument("--img_size", type=int, default=416, help="size of each image dimension")parser.add_argument("--checkpoint_model", type=str, help="path to checkpoint model")opt = parser.parse_args()print(opt)device = torch.device("cuda" if torch.cuda.is_available() else "cpu")os.makedirs("output", exist_ok=True)# Set up model 加载模型model = Darknet(opt.model_def, img_size=opt.img_size).to(device)if opt.weights_path.endswith(".weights"):# Load darknet weightsmodel.load_darknet_weights(opt.weights_path)else:# Load checkpoint weightsmodel.load_state_dict(torch.load(opt.weights_path))model.eval()  # Set in evaluation mode#加载测试数据dataloader = DataLoader(ImageFolder(opt.image_folder, img_size=opt.img_size),batch_size=opt.batch_size,shuffle=False,num_workers=opt.n_cpu,)#获取类别名classes = load_classes(opt.class_path)  # Extracts class labels from fileTensor = torch.cuda.FloatTensor if torch.cuda.is_available() else torch.FloatTensorimgs = []  # Stores image pathsimg_detections = []  # Stores detections for each image indexprint("\nPerforming object detection:")prev_time = time.time()for batch_i, (img_paths, input_imgs) in enumerate(dataloader):# Configure inputinput_imgs = Variable(input_imgs.type(Tensor))# Get detectionswith torch.no_grad():#检测detections = model(input_imgs)#NMS筛选detections = non_max_suppression(detections, opt.conf_thres, opt.nms_thres)# Log progresscurrent_time = time.time()inference_time = datetime.timedelta(seconds=current_time - prev_time)prev_time = current_timeprint("\t+ Batch %d, Inference Time: %s" % (batch_i, inference_time))# Save image and detectionsimgs.extend(img_paths)img_detections.extend(detections)# Bounding-box colorscmap = plt.get_cmap("tab20b")colors = [cmap(i) for i in np.linspace(0, 1, 20)]print("\nSaving images:")# Iterate through images and save plot of detectionsfor img_i, (path, detections) in enumerate(zip(imgs, img_detections)):print("(%d) Image: '%s'" % (img_i, path))# Create plot 可视化img = np.array(Image.open(path))plt.figure()fig, ax = plt.subplots(1)ax.imshow(img)# Draw bounding boxes and labels of detectionsif detections is not None:# Rescale boxes to original imagedetections = rescale_boxes(detections, opt.img_size, img.shape[:2])unique_labels = detections[:, -1].cpu().unique()n_cls_preds = len(unique_labels)bbox_colors = random.sample(colors, n_cls_preds)for x1, y1, x2, y2, conf, cls_conf, cls_pred in detections:print("\t+ Label: %s, Conf: %.5f" % (classes[int(cls_pred)], cls_conf.item()))box_w = x2 - x1box_h = y2 - y1color = bbox_colors[int(np.where(unique_labels == int(cls_pred))[0])]# Create a Rectangle patchbbox = patches.Rectangle((x1, y1), box_w, box_h, linewidth=2, edgecolor=color, facecolor="none")# Add the bbox to the plotax.add_patch(bbox)# Add labelplt.text(x1,y1,s=classes[int(cls_pred)],color="white",verticalalignment="top",bbox={"color": color, "pad": 0},)# Save generated image with detectionsplt.axis("off")plt.gca().xaxis.set_major_locator(NullLocator())plt.gca().yaxis.set_major_locator(NullLocator())filename = path.split("/")[-1].split(".")[0]plt.savefig(f"output/{filename}.png", bbox_inches="tight", pad_inches=0.0)plt.close()

5 功能脚本

5.1 utils.py:

from __future__ import division
import math
import time
import tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patchesdef to_cpu(tensor):return tensor.detach().cpu()def load_classes(path):"""Loads class labels at 'path'"""fp = open(path, "r")names = fp.read().split("\n")[:-1]return namesdef weights_init_normal(m):classname = m.__class__.__name__if classname.find("Conv") != -1:torch.nn.init.normal_(m.weight.data, 0.0, 0.02)elif classname.find("BatchNorm2d") != -1:torch.nn.init.normal_(m.weight.data, 1.0, 0.02)torch.nn.init.constant_(m.bias.data, 0.0)def rescale_boxes(boxes, current_dim, original_shape):""" Rescales bounding boxes to the original shape """orig_h, orig_w = original_shape# The amount of padding that was addedpad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))# Image height and width after padding is removedunpad_h = current_dim - pad_yunpad_w = current_dim - pad_x# Rescale bounding boxes to dimension of original imageboxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_wboxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_hboxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_wboxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_hreturn boxesdef xywh2xyxy(x):y = x.new(x.shape)y[..., 0] = x[..., 0] - x[..., 2] / 2y[..., 1] = x[..., 1] - x[..., 3] / 2y[..., 2] = x[..., 0] + x[..., 2] / 2y[..., 3] = x[..., 1] + x[..., 3] / 2return ydef ap_per_class(tp, conf, pred_cls, target_cls):""" Compute the average precision, given the recall and precision curves.Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.# Argumentstp:    True positives (list).conf:  Objectness value from 0-1 (list).pred_cls: Predicted object classes (list).target_cls: True object classes (list).# ReturnsThe average precision as computed in py-faster-rcnn."""# Sort by objectnessi = np.argsort(-conf)tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]# Find unique classesunique_classes = np.unique(target_cls)# Create Precision-Recall curve and compute AP for each classap, p, r = [], [], []for c in tqdm.tqdm(unique_classes, desc="Computing AP"):i = pred_cls == cn_gt = (target_cls == c).sum()  # Number of ground truth objectsn_p = i.sum()  # Number of predicted objectsif n_p == 0 and n_gt == 0:continueelif n_p == 0 or n_gt == 0:ap.append(0)r.append(0)p.append(0)else:# Accumulate FPs and TPsfpc = (1 - tp[i]).cumsum()tpc = (tp[i]).cumsum()# Recallrecall_curve = tpc / (n_gt + 1e-16)r.append(recall_curve[-1])# Precisionprecision_curve = tpc / (tpc + fpc)p.append(precision_curve[-1])# AP from recall-precision curveap.append(compute_ap(recall_curve, precision_curve))# Compute F1 score (harmonic mean of precision and recall)p, r, ap = np.array(p), np.array(r), np.array(ap)f1 = 2 * p * r / (p + r + 1e-16)return p, r, ap, f1, unique_classes.astype("int32")def compute_ap(recall, precision):""" Compute the average precision, given the recall and precision curves.Code originally from https://github.com/rbgirshick/py-faster-rcnn.# Argumentsrecall:    The recall curve (list).precision: The precision curve (list).# ReturnsThe average precision as computed in py-faster-rcnn."""# correct AP calculation# first append sentinel values at the endmrec = np.concatenate(([0.0], recall, [1.0]))mpre = np.concatenate(([0.0], precision, [0.0]))# compute the precision envelopefor i in range(mpre.size - 1, 0, -1):mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])# to calculate area under PR curve, look for points# where X axis (recall) changes valuei = np.where(mrec[1:] != mrec[:-1])[0]# and sum (\Delta recall) * precap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])return apdef get_batch_statistics(outputs, targets, iou_threshold):""" Compute true positives, predicted scores and predicted labels per sample """batch_metrics = []for sample_i in range(len(outputs)):if outputs[sample_i] is None:continueoutput = outputs[sample_i]pred_boxes = output[:, :4]pred_scores = output[:, 4]pred_labels = output[:, -1]true_positives = np.zeros(pred_boxes.shape[0])annotations = targets[targets[:, 0] == sample_i][:, 1:]target_labels = annotations[:, 0] if len(annotations) else []if len(annotations):detected_boxes = []target_boxes = annotations[:, 1:]for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):# If targets are found breakif len(detected_boxes) == len(annotations):break# Ignore if label is not one of the target labelsif pred_label not in target_labels:continueiou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0)if iou >= iou_threshold and box_index not in detected_boxes:true_positives[pred_i] = 1detected_boxes += [box_index]batch_metrics.append([true_positives, pred_scores, pred_labels])return batch_metricsdef bbox_wh_iou(wh1, wh2):wh2 = wh2.t()w1, h1 = wh1[0], wh1[1]w2, h2 = wh2[0], wh2[1]inter_area = torch.min(w1, w2) * torch.min(h1, h2)union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_areareturn inter_area / union_areadef bbox_iou(box1, box2, x1y1x2y2=True):"""Returns the IoU of two bounding boxes"""if not x1y1x2y2:# Transform from center and width to exact coordinatesb1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2else:# Get the coordinates of bounding boxesb1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]# get the corrdinates of the intersection rectangleinter_rect_x1 = torch.max(b1_x1, b2_x1)inter_rect_y1 = torch.max(b1_y1, b2_y1)inter_rect_x2 = torch.min(b1_x2, b2_x2)inter_rect_y2 = torch.min(b1_y2, b2_y2)# Intersection areainter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(inter_rect_y2 - inter_rect_y1 + 1, min=0)# Union Areab1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)return ioudef non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):"""Removes detections with lower object confidence score than 'conf_thres' and performsNon-Maximum Suppression to further filter detections.Returns detections with shape:(x1, y1, x2, y2, object_conf, class_score, class_pred)"""# From (center x, center y, width, height) to (x1, y1, x2, y2)prediction[..., :4] = xywh2xyxy(prediction[..., :4])output = [None for _ in range(len(prediction))]for image_i, image_pred in enumerate(prediction):# Filter out confidence scores below thresholdimage_pred = image_pred[image_pred[:, 4] >= conf_thres]# If none are remaining => process next imageif not image_pred.size(0):continue# Object confidence times class confidencescore = image_pred[:, 4] * image_pred[:, 5:].max(1)[0]# Sort by itimage_pred = image_pred[(-score).argsort()]class_confs, class_preds = image_pred[:, 5:].max(1, keepdim=True)detections = torch.cat((image_pred[:, :5], class_confs.float(), class_preds.float()), 1)# Perform non-maximum suppressionkeep_boxes = []while detections.size(0):large_overlap = bbox_iou(detections[0, :4].unsqueeze(0), detections[:, :4]) > nms_threslabel_match = detections[0, -1] == detections[:, -1]# Indices of boxes with lower confidence scores, large IOUs and matching labelsinvalid = large_overlap & label_matchweights = detections[invalid, 4:5]# Merge overlapping bboxes by order of confidencedetections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()keep_boxes += [detections[0]]detections = detections[~invalid]if keep_boxes:output[image_i] = torch.stack(keep_boxes)return outputdef build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensorFloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensornB = pred_boxes.size(0) # batchsieze 4nA = pred_boxes.size(1) # 每个格子对应了多少个anchornC = pred_cls.size(-1)  # 类别的数量nG = pred_boxes.size(2) # gridsize# Output tensorsobj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)  # obj，anchor包含物体, 即为1，默认为0 考虑前景noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1) # noobj, anchor不包含物体, 则为1，默认为1 考虑背景class_mask = FloatTensor(nB, nA, nG, nG).fill_(0) # 类别掩膜，类别预测正确即为1，默认全为0iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0) # 预测框与真实框的iou得分tx = FloatTensor(nB, nA, nG, nG).fill_(0) # 真实框相对于网格的位置ty = FloatTensor(nB, nA, nG, nG).fill_(0)tw = FloatTensor(nB, nA, nG, nG).fill_(0) th = FloatTensor(nB, nA, nG, nG).fill_(0)tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)# Convert to position relative to boxtarget_boxes = target[:, 2:6] * nG #target中的xywh都是0-1的，可以得到其在当前gridsize上的xywhgxy = target_boxes[:, :2]gwh = target_boxes[:, 2:]# Get anchors with best iouious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors]) #每一种规格的anchor跟每个标签上的框的IOU得分print (ious.shape)best_ious, best_n = ious.max(0) # 得到其最高分以及哪种规格框和当前目标最相似# Separate target valuesb, target_labels = target[:, :2].long().t() # 真实框所对应的batch，以及每个框所代表的实际类别gx, gy = gxy.t()gw, gh = gwh.t()gi, gj = gxy.long().t() #位置信息，向下取整了# Set masksobj_mask[b, best_n, gj, gi] = 1 # 实际包含物体的设置成1noobj_mask[b, best_n, gj, gi] = 0 # 相反# Set noobj mask to zero where iou exceeds ignore thresholdfor i, anchor_ious in enumerate(ious.t()): # IOU超过了指定的阈值就相当于有物体了noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0# Coordinatestx[b, best_n, gj, gi] = gx - gx.floor() # 根据真实框所在位置，得到其相当于网络的位置ty[b, best_n, gj, gi] = gy - gy.floor()# Width and heighttw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)# One-hot encoding of labeltcls[b, best_n, gj, gi, target_labels] = 1 #将真实框的标签转换为one-hot编码形式# Compute label correctness and iou at best anchor 计算预测的和真实一样的索引class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()iou_scores[b, best_n, gj, gi] = bbox_iou(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False) #与真实框想匹配的预测框之间的iou值tconf = obj_mask.float() # 真实框的置信度，也就是1return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tcls, tconf

5.2 logger.py

import tensorflow as tfclass Logger(object):def __init__(self, log_dir):"""Create a summary writer logging to log_dir."""self.writer = tf.summary.create_file_writer(log_dir)def scalar_summary(self, tag, value, step):with self.writer.as_default():tf.summary.scalar(tag, value, step=step)self.writer.flush()def list_of_scalars_summary(self, tag_value_pairs, step):with self.writer.as_default():for tag, value in tag_value_pairs:tf.summary.scalar(tag, value, step=step)self.writer.flush()# summary = tf.Summary(value=[tf.Summary.Value(tag=tag, simple_value=value) for tag, value in tag_value_pairs])# self.writer.add_summary(summary, step)

5.3 augmentations.py

import torch
import torch.nn.functional as F
import numpy as npdef horisontal_flip(images, targets):images = torch.flip(images, [-1])targets[:, 2] = 1 - targets[:, 2]return images, targets

5.4 parse_config.py

def parse_model_config(path):"""Parses the yolo-v3 layer configuration file and returns module definitions"""file = open(path, 'r')lines = file.read().split('\n')lines = [x for x in lines if x and not x.startswith('#')]lines = [x.rstrip().lstrip() for x in lines] # get rid of fringe whitespacesmodule_defs = []for line in lines:if line.startswith('['): # This marks the start of a new blockmodule_defs.append({})module_defs[-1]['type'] = line[1:-1].rstrip()if module_defs[-1]['type'] == 'convolutional':module_defs[-1]['batch_normalize'] = 0else:key, value = line.split("=")value = value.strip()module_defs[-1][key.rstrip()] = value.strip()return module_defsdef parse_data_config(path):"""Parses the data configuration file"""options = dict()options['gpus'] = '0,1,2,3'options['num_workers'] = '10'with open(path, 'r') as fp:lines = fp.readlines()for line in lines:line = line.strip()if line == '' or line.startswith('#'):continuekey, value = line.split('=')options[key.strip()] = value.strip()return options

yolov3代码详细解读相关推荐

mapbox 修改初始位置_一行代码教你如何随心所欲初始化Bert参数(附Pytorch代码详细解读)...
微信公众号:NLP从入门到放弃微信文章在这里(排版更漂亮,但是内置链接不太行,看大家喜欢哪个点哪个看吧): 一行代码带你随心所欲重新初始化bert的参数(附Pytorch代码详细解读)mp.wei ...
线性规划单纯形法python实现与代码详细解读
线性规划单纯形法python实现与代码详细解读 1 单纯形法(Simplex method) 2 编程思路 3 python实现原理解读 4 python代码 5 后记 1 单纯形法(Simplex ...
基于pytorch搭建多特征CNN-LSTM时间序列预测代码详细解读（附完整代码）
系列文章目录 lstm系列文章目录 1.基于pytorch搭建多特征LSTM时间序列预测代码详细解读(附完整代码) 2.基于pytorch搭建多特征CNN-LSTM时间序列预测代码详细解读(附完整代码 ...
手写YOLOv3|代码详细注释
手写YOLOv3|代码详细注释一. 数据预处理一. Yolov3网络一. Train 一. Detection 源代码:https://github.com/eriklindernoren/Py ...
DDIM代码详细解读(4)：分类器classifier的网络设计、训练、推理
前言:之前写过三篇详细解读DDPM代码的博客,随着时间已经来到2022年11月,单纯使用DDPM已经逐渐被淘汰,最新的论文更多使用DDPM的改进版本.DDIM作为DDPM最重要的改进版本之一,从本篇博 ...
DDIM代码详细解读(3)：核心采样代码、超分辨率重建
Diffusion models代码解读:入门与实战前言:之前写过三篇详细解读DDPM代码的博客,随着时间已经来到2022年10月,单纯使用DDPM已经逐渐被淘汰,最新的论文更多使用DDPM的改进版 ...
YoloV5代码详细解读
本文重点描述开源YoloV5代码实现的细节,不会对YoloV5的整体思路进行介绍,整体思路可以参考江大白的博客江大白:深入浅出Yolo系列之Yolov3&Yolov4&Yolov5& ...
din算法代码_DIN算法代码详细解读
首先给出论文的地址:Deep Interest Network for Click-Through Rate Prediction 然后给出两篇对论文进行了详细介绍的文章: 王喆:推荐系统中的注意力机 ...
Twitter 是怎么做推荐的？开源代码详细解读
作者 | 上衫翔二整理 | NewBeeNLP 大家好,这里是 NewBeeNLP. 马斯克最近开源了大部分Twitter的代码,截止目前已经有接近50K star,但网上大多数的文章都属于相对宽泛 ...

yolov3代码详细解读

目录

1 目录结构

2 train.py

2.1 数据读取 dataset.py

2.2 网络搭建 models.py

2.2.1 搭建模型

2.2.2 yolo层的实现

2.2.3 darknet进行模型前向传播

3 test.py

4 detect.py

5 功能脚本

5.1 utils.py:

5.2 logger.py

5.3 augmentations.py

5.4 parse_config.py

yolov3代码详细解读相关推荐

最新文章

热门文章