1.文章目的

Github上已经有YOLOv3 Pytorch版本的实现,但我总觉得不是自己的东西直接拿过来用着不舒服。想着自己动手丰衣足食,因此,本文主要是出于学习的目的,将YOLO3网络自己搭建一遍,然后使用官方提供的预训练权重进行预测,这样有助于对YOLOv3模型的理解。

2.目标检测的任务

目标检测是计算机视觉中的一项任务,它包括识别给定照片中一个或多个目标的存在、位置和类型。这是一个具有挑战性的问题,涉及建立在对象识别方法(例如,它们在哪里)、对象定位(例如,它们的范围是什么)和对象分类(例如,它们是什么)的基础上。例如下面这张照片,目标检测的任务是识别出照片里有什么,它们在哪里,并用方框将它们标注出来。

三只斑马(Taken by Boegh)

3.YOLOv3模型

关于YOLOv3模型(原论文作者将其称之为“DarkNet”,这个名字听起来怪怪的)的介绍,网上有一大堆,这里不再哆嗦。网络结构如下图:

另外有一点:对于搭建好的模型,我们将使用预先训练好的权重文件来进行预测,因此,有必要先下载好预训练权重文件(在国内如果你有足够的时间等待下载或者网络不会抽风那你可以不用迅雷。ps,迅雷是个好东西):

DarkNet在MSCOCO数据集上的预训练权重

4.模型搭建

导入需要用到的库:

import numpy as np
import torch
import torch.nn as nn
import torchvision
from PIL import Image
import matplotlib.pyplot as plt

定义 YOLOv3(DarkNet)网络中的层:

每个 DarkNet 层包括卷积层、batch norm(BN)层、激活函数。如果 DarkNet 层中有 BN 层,则其中得到卷积层只有权重而没有 bias。

DarkNet 网络分别在第 82,94,106 层会输出预测,即共计三个在不同 stride 下的输出,在这三个输出层中没有 BN 层,也没有激活函数。

#Darknet层
class DarknetLayer(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding, bnorm = True, leaky = True):super().__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias = False if bnorm else True)self.bnorm = nn.BatchNorm2d(out_channels, eps = 1e-3) if bnorm else Noneself.leaky = nn.LeakyReLU(0.1) if leaky else Nonedef forward(self, x):x = self.conv(x)if self.bnorm is not None:x = self.bnorm(x)if self.leaky is not None:x = self.leaky(x)return x

定义 YOLOv3 网络中的块:

这里借鉴了 ResNet 中残差块的思想,一个块中会有一个跳跃,即输入在经过块中每一层之后得到一个临时输出,再将输入和临时输出在相同的位置处相加得到块的输出。

#DarkNet块
class DarknetBlock(nn.Module):def __init__(self, layers, skip = True):super().__init__()self.skip = skipself.layers = nn.ModuleDict()for i in range(len(layers)):self.layers[layers[i]['id']] = DarknetLayer(layers[i]['in_channels'], layers[i]['out_channels'], layers[i]['kernel_size'],layers[i]['stride'], layers[i]['padding'], layers[i]['bnorm'],layers[i]['leaky'])def forward(self, x):count = 0for _, layer in self.layers.items():if count == (len(self.layers) - 2) and self.skip:skip_connection = xcount += 1x = layer(x)return x + skip_connection if self.skip else x

上述代码将几个 DarkNet 层堆叠成一个块。layers 是包含了几个字典的一个列表,每个字典声明了 DarkNet 层的的输入通道数,卷积核数等参数。skip 用于指明这个块是否作为残差块使用。

forword 函数中有一个 if 语句,这个语句的作用是,如果这个块是残差块,则将块中 stride 为 2 的层的输出和块的临时输出相加,如果没有 stride 为 2 的层,才将块的输入和临时的输出相加得到块的输出。

将块堆叠成 YOLOv3 网络:

总共有 106 层,第 82,94,106 层是输出,结构稍微有点复杂。

#DarkNet网络
class Yolov3(nn.Module):def __init__(self):super().__init__()self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners = False)#layer0 -> layer4, input = (3, 416, 416), flow_out = (64, 208, 208)self.blocks = nn.ModuleDict()self.blocks['block0_4'] = DarknetBlock([{'id': 'layer_0', 'in_channels': 3, 'out_channels': 32, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_1', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_2', 'in_channels': 64, 'out_channels': 32, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_3', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer5 -> layer8, input = (64, 208, 208), flow_out = (128, 104, 104)self.blocks['block5_8'] = DarknetBlock([{'id': 'layer_5', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_6', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_7', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer9 -> layer11, input = (128, 104, 104), flow_out = (128, 104, 104)self.blocks['block9_11'] = DarknetBlock([{'id': 'layer_9', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_10', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer12 -> layer15, input = (128, 104, 104), flow_out = (256, 52, 52)self.blocks['block12_15'] = DarknetBlock([{'id': 'layer_12', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_13', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_14', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer16 -> layer36, input = (256, 52, 52), flow_out = (256, 52, 52)self.blocks['block16_18'] = DarknetBlock([{'id': 'layer_16', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_17', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block19_21'] = DarknetBlock([{'id': 'layer_19', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_20', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block22_24'] = DarknetBlock([{'id': 'layer_22', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_23', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block25_27'] = DarknetBlock([{'id': 'layer_25', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_26', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block28_30'] = DarknetBlock([{'id': 'layer_28', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_29', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block31_33'] = DarknetBlock([{'id': 'layer_31', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_32', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block34_36'] = DarknetBlock([{'id': 'layer_34', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_35', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer37 -> layer40, input = (256, 52, 52), flow_out = (512, 26, 26)self.blocks['block37_40'] = DarknetBlock([{'id': 'layer_37', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_38', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_39', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer41 -> layer61, input = (512, 26, 26), flow_out = (512, 26, 26)self.blocks['block41_43'] = DarknetBlock([{'id': 'layer_41', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_42', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block44_46'] = DarknetBlock([{'id': 'layer_44', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_45', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block47_49'] = DarknetBlock([{'id': 'layer_47', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_48', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block50_52'] = DarknetBlock([{'id': 'layer_50', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_51', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block53_55'] = DarknetBlock([{'id': 'layer_53', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_54', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block56_58'] = DarknetBlock([{'id': 'layer_56', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_57', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block59_61'] = DarknetBlock([{'id': 'layer_59', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_60', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer62 -> layer65, input = (512, 26, 26), flow_out = (1024, 13, 13)self.blocks['block62_65'] = DarknetBlock([{'id': 'layer_62', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_63', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_64', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer66 -> layer74, input = (1024, 13, 13), flow_out = (1024, 13, 13)self.blocks['block66_68'] = DarknetBlock([{'id': 'layer_66', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_67', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block69_71'] = DarknetBlock([{'id': 'layer_69', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_70', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block72_74'] = DarknetBlock([{'id': 'layer_72', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_73', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer75 -> layer79, input = (1024, 13, 13), flow_out = (512, 13, 13)self.blocks['block75_79'] = DarknetBlock([{'id': 'layer_75', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_76', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_77', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_78', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_79', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer80 -> layer82, input = (512, 13, 13), yolo_out = (255, 13, 13)self.blocks['yolo_82'] = DarknetBlock([{'id': 'layer_80', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_81', 'in_channels': 1024, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)#layer83 -> layer86, input = (512, 13, 13), -> (256, 13, 13) -> upsample and concate layer61(512, 26, 26), flow_out = (768, 26, 26)self.blocks['block83_86'] = DarknetBlock([{'id': 'layer_84', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer87 -> layer91, input = (768, 26, 26), flow_out = (256, 26, 26)self.blocks['block87_91'] = DarknetBlock([{'id': 'layer_87', 'in_channels': 768, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_88', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_89', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_90', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_91', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer92 -> layer94, input = (256, 26, 26), yolo_out = (255, 26, 26)self.blocks['yolo_94'] = DarknetBlock([{'id': 'layer_92', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_93', 'in_channels': 512, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)#layer95 -> layer98, input = (256, 26, 26), -> (128, 26, 26) -> upsample and concate layer36(256, 52, 52), flow_out = (384, 52, 52)self.blocks['block95_98'] = DarknetBlock([{'id': 'layer_96', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer99 -> layer106, input = (384, 52, 52), yolo_out = (255, 52, 52)self.blocks['yolo_106'] = DarknetBlock([{'id': 'layer_99', 'in_channels': 384, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_100', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_101', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_102', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_103', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_104', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_105', 'in_channels': 256, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)def forward(self, x):x = self.blocks['block0_4'](x)x = self.blocks['block5_8'](x)x = self.blocks['block9_11'](x)x = self.blocks['block12_15'](x)x = self.blocks['block16_18'](x)x = self.blocks['block19_21'](x)x = self.blocks['block22_24'](x)x = self.blocks['block25_27'](x)x = self.blocks['block28_30'](x)x = self.blocks['block31_33'](x)x = self.blocks['block34_36'](x)skip36 = xx = self.blocks['block37_40'](x)x = self.blocks['block41_43'](x)x = self.blocks['block44_46'](x)x = self.blocks['block47_49'](x)x = self.blocks['block50_52'](x)x = self.blocks['block53_55'](x)x = self.blocks['block56_58'](x)x = self.blocks['block59_61'](x)skip61 = xx = self.blocks['block62_65'](x)x = self.blocks['block66_68'](x)x = self.blocks['block69_71'](x)x = self.blocks['block72_74'](x)x = self.blocks['block75_79'](x)yolo_82 = self.blocks['yolo_82'](x)x = self.blocks['block83_86'](x)x = self.upsample(x)x = torch.cat((x, skip61), dim = 1)x = self.blocks['block87_91'](x)yolo_94 = self.blocks['yolo_94'](x)x = self.blocks['block95_98'](x)x = self.upsample(x)x = torch.cat((x, skip36), dim = 1)yolo_106 = self.blocks['yolo_106'](x)return yolo_82, yolo_94, yolo_106  

定义模型

model = Yolov3()

到这一步可以用 print 将模型结构打印出来。

加载预训练权重

这时候,权重文件应该已经下载好了,我们可以通过一个权重读取类来将权重参数加载到我们的模型里:

#权重读取类
class WeightReader():def __init__(self, weight_file):with open(weight_file, 'r') as fp:header = np.fromfile(fp, dtype = np.int32, count = 5)self.header = torch.from_numpy(header)self.seen = self.header[3]#The rest of the values are the weights#load them upself.weights = np.fromfile(fp, dtype = np.float32)#加载权重参数def load_weights(self, model):ptr = 0for _, block in model.blocks.items():for _, layer in block.layers.items():bn = layer.bnormconv = layer.convif bn is not None:#Get the number of weights of Batch Norm Layernum_bn_biases = bn.bias.numel()#Load the data#偏差bn_biases = torch.from_numpy(self.weights[ptr:ptr + num_bn_biases])ptr += num_bn_biases#权重bn_weights = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr  += num_bn_biases#均值bn_running_mean = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr  += num_bn_biases#方差bn_running_var = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr  += num_bn_biases#Cast the loaded weights into dims of model weights. bn_biases = bn_biases.view_as(bn.bias.data)bn_weights = bn_weights.view_as(bn.weight.data)bn_running_mean = bn_running_mean.view_as(bn.running_mean)bn_running_var = bn_running_var.view_as(bn.running_var)#Copy the data to modelbn.bias.data.copy_(bn_biases)bn.weight.data.copy_(bn_weights)bn.running_mean.copy_(bn_running_mean)bn.running_var.copy_(bn_running_var)  else:#Number of biasesnum_biases = conv.bias.numel()#Load the biasesconv_biases = torch.from_numpy(self.weights[ptr: ptr + num_biases])ptr = ptr + num_biases#reshape the loaded weights according to the dims of the model weightsconv_biases = conv_biases.view_as(conv.bias.data)#Finally copy the dataconv.bias.data.copy_(conv_biases)#load the weights for the Convolutional layersnum_weights = conv.weight.numel()#Do the same as above for weightsconv_weights = torch.from_numpy(self.weights[ptr:ptr+num_weights])ptr = ptr + num_weightsconv_weights = conv_weights.view_as(conv.weight.data)conv.weight.data.copy_(conv_weights)#查看网络参数def weight_summary(self, model):train_able, train_disable = 0, 0for _, block in model.blocks.items():for _, layer in block.layers.items():bn = layer.bnormconv = layer.convif bn is not None:train_able += (bn.bias.numel() + bn.weight.numel())train_disable += (bn.running_mean.numel() + bn.running_var.numel())else:train_able += conv.bias.numel()train_able += conv.weight.numel()print("total = %d"%(train_able + train_disable))print("count of train_able = %d"%train_able)print("count of train_disable = %d"%train_disable)

官方给出的预训练权重文件中去掉前 5 个数值,剩下的才是可以加载到模型里面的。需要注意权重文件中参数的保存格式,这里给出官方提供的一张图:

它是按照层的前向传播顺序来存储参数数值的。如果 DarkNet 层中有 BN 层,则依次存储 BN 的偏置,权重,均值,方差以及卷积层的权重。如果 DarkNet 层中没有 BN 层,则依次存储卷积层的偏置,卷积层的权重。

对于 BN 层,它的偏置和权重是可训练参数,而均值和方差是不可训练参数,但都需要加载到网络里。

通过以下代码加载参数并查看参数数量。

#加载模型参数,并查看模型参数数量
#####网络总参数为 62,001,757
#####其中,可训练参数(BN层以及卷积层的weight, bias) = 61,949,149, 不可训练参数(BN层的均值和方差) = 52,608
weight_reader = WeightReader('yolov3.weights')
weight_reader.load_weights(model)
weight_reader.weight_summary(model)

输入处理

定义一个图片加载的函数,将输入的图片裁剪成网络输入的大小(416),并将图片每个像素都除以 255,转成四维张量。最后返回图片和图片原始的宽高。

#加载图片
def img_loader(photo_file, input_w, input_h):img = Image.open(photo_file)img_w, img_h = img.sizeimg = img.resize((input_w, input_h))img = torchvision.transforms.ToTensor()(img)img = torch.unsqueeze(img, 0)#返回指定大小的图片张量和图片原始的宽高return img, img_w, img_h

接下来,模型就可以根据输入的图片得到准确的输出了。

photo_file = 'zebra.jpg'
input_w, input_h = 416, 416
img, img_w, img_h = img_loader(photo_file, input_w, input_h)
y_hat = model(img)

这时候得到的 y_hat 是一个含有三个元素的元组,每个元素都是一个四维张量,剩下要做的事就是对这些张量进行解码,做 IoU 过滤,使用非极大值抑制,画出边框等一些列操作,这里一并将涉及到的函数直接贴出。

#锚箱类
class BoundBox:def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):self.xmin = xminself.ymin = yminself.xmax = xmaxself.ymax = ymaxself.objness = objnessself.classes = classesself.label = -1self.score = -1def get_label(self):if self.label == -1:self.label = np.argmax(self.classes)return self.labeldef get_score(self):if self.score == -1:self.score = self.classes[self.get_label()]return self.scoredef _sigmoid(x):return 1. / (1. + np.exp(-x))#解码网络输出
def decode_netout(netout, anchors, obj_thresh, net_w, net_h):grid_h, grid_w = netout.shape[1: ]nb_box = 3netout = netout.permute(1, 2, 0).detach().numpy().reshape((grid_h, grid_w, nb_box, -1))nb_class = netout.shape[-1] - 5boxes = []netout[..., :2]  = _sigmoid(netout[..., :2])netout[..., 4:]  = _sigmoid(netout[..., 4:])netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]netout[..., 5:] *= netout[..., 5:] > obj_threshfor i in range(grid_h*grid_w):row = i / grid_wcol = i % grid_wfor b in range(nb_box):# 4th element is objectness scoreobjectness = netout[int(row)][int(col)][b][4]if(objectness.all() <= obj_thresh): continue# first 4 elements are x, y, w, and hx, y, w, h = netout[int(row)][int(col)][b][:4]x = (col + x) / grid_w # center position, unit: image widthy = (row + y) / grid_h # center position, unit: image heightw = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image widthh = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height# last elements are class probabilitiesclasses = netout[int(row)][col][b][5:]box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)boxes.append(box)return boxes#执行边界框坐标的转换,将边界框列表、加载照片的原始形状和网络输入的形状作为参数。
#边界框的坐标将直接更新。
def correct_yolo_boxes(boxes, image_w, image_h, net_w, net_h):new_w, new_h = net_w, net_hfor i in range(len(boxes)):x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_wy_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_hboxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)# 为计算 IoU 服务
def _interval_overlap(interval_a, interval_b):x1, x2 = interval_ax3, x4 = interval_bif x3 < x1:if x4 < x1:return 0else:return min(x2,x4) - x1else:if x2 < x3:return 0else:return min(x2,x4) - x3#计算两个箱体的 IoU
def bbox_iou(box1, box2):intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])intersect = intersect_w * intersect_hw1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.yminw2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.yminunion = w1*h1 + w2*h2 - intersectreturn float(intersect) / union#非极大值抑制
def do_nms(boxes, nms_thresh):if len(boxes) > 0:nb_class = len(boxes[0].classes)else:returnfor c in range(nb_class):sorted_indices = np.argsort([-box.classes[c] for box in boxes])for i in range(len(sorted_indices)):index_i = sorted_indices[i]if boxes[index_i].classes[c] == 0: continuefor j in range(i+1, len(sorted_indices)):index_j = sorted_indices[j]if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:boxes[index_j].classes[c] = 0#检索那些能强烈预测物体存在的箱子:它们的可信度超过 thresh
def get_boxes(boxes, labels, thresh):v_boxes, v_labels, v_scores = list(), list(), list()# enumerate all boxesfor box in boxes:# enumerate all possible labelsfor i in range(len(labels)):# check if the threshold for this label is high enoughif box.classes[i] > thresh:v_boxes.append(box)v_labels.append(labels[i])v_scores.append(box.classes[i]*100)# don't break, many labels may trigger for one boxreturn v_boxes, v_labels, v_scores#画出边界框
def draw_boxes(photo_file, v_boxes, v_labels, v_scores):# load the imagedata = plt.imread(photo_file)# plot the imageplt.imshow(data)# get the context for drawing boxesax = plt.gca()# plot each boxfor i in range(len(v_boxes)):box = v_boxes[i]# get coordinatesy1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax# calculate width and height of the boxwidth, height = x2 - x1, y2 - y1# create the shaperect = plt.Rectangle((x1, y1), width, height, fill=False, color='white')# draw the boxax.add_patch(rect)# draw text and score in top left cornerlabel = "%s (%.1f)" % (v_labels[i], v_scores[i])plt.text(x1, y1, label, color='white', bbox=dict(facecolor='red'))# show the plotplt.show()

写一个函数对上述步骤做一个封装。

def make_predict(photo_file):img, img_w, img_h = img_loader(photo_file, input_w, input_h)y_hat = model(img)boxes = []for i in range(len(y_hat)):# decode the output of the networkboxes += decode_netout(y_hat[i][0], anchors[i], class_threshold, input_w, input_h)# correct the sizes of the bounding boxes for the shape of the imagecorrect_yolo_boxes(boxes, img_w, img_h, input_w, input_h)# suppress non-maximal boxesdo_nms(boxes, 0.5)# get the details of the detected objectsv_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)# summarize what we foundfor i in range(len(v_boxes)):print(v_labels[i], v_scores[i])# draw what we founddraw_boxes(photo_file, v_boxes, v_labels, v_scores)

另外,需要将网络输出的类别序号映射成我们能够理解的自然语言,权重文件能够预测的标签如下:

#权重文件能够预测的标签
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck","boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench","bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe","backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard","sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana","apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake","chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse","remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator","book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

最后,执行我们封装好的函数。

#预先设定的锚点
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
#输入的网络的宽高
input_w, input_h = 416, 416
#置信度阈值
class_threshold = 0.75
#读取图片开始预测
photo_file = 'zebra.jpg'
make_predict(photo_file)

结果如下:

参考文献:

YOLOv3 论文

How to Perform Object Detection With YOLOv3 in Keras

How to implement a YOLO (v3) object detector from scratch in PyTorch

YOLOv3网络结构和解析

yolov3权重_目标检测之 YOLOv3 (Pytorch实现)相关推荐

  1. 计算机视觉:单阶段目标检测模型YOLO-V3

    计算机视觉:单阶段目标检测模型YOLO-V3 单阶段目标检测模型YOLO-V3 YOLO-V3 模型设计思想 产生候选区域 生成锚框 生成预测框 对候选区域进行标注 标注锚框是否包含物体 标注预测框的 ...

  2. 计算机视觉:基于YOLO-V3林业病虫害目标检测

    计算机视觉:基于YOLO-V3林业病虫害目标检测 卷积神经网络提取特征 根据输出特征图计算预测框位置和类别 建立输出特征图与预测框之间的关联 计算预测框是否包含物体的概率 计算预测框位置坐标 计算物体 ...

  3. 树莓派摄像头 C++ OpenCV YoloV3 实现实时目标检测

    树莓派摄像头 C++ OpenCV YoloV3 实现实时目标检测 本文将实现树莓派摄像头 C++ OpenCV YoloV3 实现实时目标检测,我们会先实现树莓派对视频文件的逐帧检测来验证算法流程, ...

  4. YOLOv3实现鱼类目标检测

    YOLOv3实现鱼类目标检测 我将以一个项目实例,记录如何用YOLOv3训练自己的数据集. 在开始之前,首先了解一下YOLO系列代表性的DarkNet网络. 如下图所示,是YOLOv3中使用的Dark ...

  5. 目标检测—基于Yolov3的目标检测项目实战(学习笔记)

    最近在学习tensorflow,尝试运行学习了github上基于yolov3的一个目标检测项目,此算法可对视频.图片.摄像头实时进行检测,本文主要讲述了,在windows电脑上,复现这一目标检测项目的 ...

  6. 手把手教你用yolov3模型实现目标检测教程(一) - 环境配置

    手把手教你用yolov3模型实现目标检测(一) 写在前面: 由于项目需要,使用yolov3模型做了各种现实场景物体的目标检测.做完了过了好长时间,感觉有些遗忘,还是该留下点东西,方便自己查找,也希望能 ...

  7. 目标检测算法(YOLOv3)

    目标检测算法(YOLOv3) YOLOv3在YOLOv2的基础上,改良了网络的主干,利用多尺度特征图进行检测,改进了多个独立的Logistic regression分类器来取代softmax来预测类别 ...

  8. 【深度学习】目标检测之YOLOv3算法

    [深度学习]目标检测之YOLOv3算法 YOLO系列目标检测算法官方代码 YOLOv3 网络结构 anchor的编解码 损失函数 binary cross-entropy loss AP(Averag ...

  9. 基于yolov3的行人目标检测算法在图像和视频中识别检测

    资源下载地址:https://download.csdn.net/download/sheziqiong/85772186 资源下载地址:https://download.csdn.net/downl ...

最新文章

  1. mysql去掉两个最高分_如何计算去掉一个最高分和一个最低分后的平均分?
  2. Python函数细节:多数量参数、强制参数传递、返回多值、匿名/内联函数
  3. 计算机应用属不属于科技股,哪些股票属于科技股
  4. codeforces contest 1119
  5. 12 月份 10 个新鲜的 jQuery 插件和教程
  6. xp正版验证补丁_实操web漏洞验证——IIS HTTP.sys 整数溢出漏洞
  7. php ioc容器,PHP 在Swoole中使用双IoC容器实现无污染的依赖注入
  8. .NET 重生之旅——序言
  9. MateBook14一个多月的使用体验(开发向)
  10. vs下C# WinForm 解决方案里面生成的文件都是什么作用?干什么的?
  11. 网站播放视频较慢,利用mp4转m3u8解决
  12. 什么时候跳槽,为什么离职,你想好了么?
  13. PHP与其他语言的比较
  14. linux系统下mysql编码格式,Windows、Linux系统下mysql编码设置
  15. javascript 模拟退格键_javascript禁止Backspace退格键的多种方法
  16. 腾讯2017秋招笔试编程题--游戏任务标记
  17. 做抖音有哪些变现方式
  18. 软件测试与治学的三重境界
  19. 三级栏目html,易优CMS 栏目页分离调用二级栏目导航和三级栏目导航
  20. CSS解决文字与图片不能水平居中对齐的问题

热门文章

  1. YouTube高效传输策略:节省14%带宽 用户体验提升
  2. 海量服务 | 论服务器极致化海量运营交付的未来
  3. RASP技术攻防之基础篇
  4. kubernetes1.8.4安装指南 -- 5. 证书生成
  5. ubuntu 允许root用户登陆ssh
  6. lua的元表metatable及元方法
  7. 每日两SQL(9),欢迎交流~
  8. mysql面试精选【推荐】
  9. leetcode 238. Product of Array Except Self | 238. 除自身以外数组的乘积(Java)
  10. 微服务、容器、DevOps三者之间的演进关系,你清楚吗?