yolov3权重_目标检测之 YOLOv3 (Pytorch实现)

1.文章目的

Github上已经有YOLOv3 Pytorch版本的实现，但我总觉得不是自己的东西直接拿过来用着不舒服。想着自己动手丰衣足食，因此，本文主要是出于学习的目的，将YOLO3网络自己搭建一遍，然后使用官方提供的预训练权重进行预测，这样有助于对YOLOv3模型的理解。

2.目标检测的任务

目标检测是计算机视觉中的一项任务，它包括识别给定照片中一个或多个目标的存在、位置和类型。这是一个具有挑战性的问题，涉及建立在对象识别方法（例如，它们在哪里）、对象定位（例如，它们的范围是什么）和对象分类（例如，它们是什么）的基础上。例如下面这张照片，目标检测的任务是识别出照片里有什么，它们在哪里，并用方框将它们标注出来。

三只斑马（Taken by Boegh）

3.YOLOv3模型

关于YOLOv3模型（原论文作者将其称之为“DarkNet”,这个名字听起来怪怪的）的介绍，网上有一大堆，这里不再哆嗦。网络结构如下图：

另外有一点：对于搭建好的模型，我们将使用预先训练好的权重文件来进行预测，因此，有必要先下载好预训练权重文件（在国内如果你有足够的时间等待下载或者网络不会抽风那你可以不用迅雷。ps，迅雷是个好东西）：

DarkNet在MSCOCO数据集上的预训练权重

4.模型搭建

导入需要用到的库：

import numpy as np
import torch
import torch.nn as nn
import torchvision
from PIL import Image
import matplotlib.pyplot as plt

定义 YOLOv3（DarkNet）网络中的层：

每个 DarkNet 层包括卷积层、batch norm(BN)层、激活函数。如果 DarkNet 层中有 BN 层，则其中得到卷积层只有权重而没有 bias。

DarkNet 网络分别在第 82，94，106 层会输出预测，即共计三个在不同 stride 下的输出，在这三个输出层中没有 BN 层，也没有激活函数。

#Darknet层
class DarknetLayer(nn.Module):def __init__(self, in_channels, out_channels, kernel_size, stride, padding, bnorm = True, leaky = True):super().__init__()self.conv = nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding, bias = False if bnorm else True)self.bnorm = nn.BatchNorm2d(out_channels, eps = 1e-3) if bnorm else Noneself.leaky = nn.LeakyReLU(0.1) if leaky else Nonedef forward(self, x):x = self.conv(x)if self.bnorm is not None:x = self.bnorm(x)if self.leaky is not None:x = self.leaky(x)return x

定义 YOLOv3 网络中的块：

这里借鉴了 ResNet 中残差块的思想，一个块中会有一个跳跃，即输入在经过块中每一层之后得到一个临时输出，再将输入和临时输出在相同的位置处相加得到块的输出。

#DarkNet块
class DarknetBlock(nn.Module):def __init__(self, layers, skip = True):super().__init__()self.skip = skipself.layers = nn.ModuleDict()for i in range(len(layers)):self.layers[layers[i]['id']] = DarknetLayer(layers[i]['in_channels'], layers[i]['out_channels'], layers[i]['kernel_size'],layers[i]['stride'], layers[i]['padding'], layers[i]['bnorm'],layers[i]['leaky'])def forward(self, x):count = 0for _, layer in self.layers.items():if count == (len(self.layers) - 2) and self.skip:skip_connection = xcount += 1x = layer(x)return x + skip_connection if self.skip else x

上述代码将几个 DarkNet 层堆叠成一个块。layers 是包含了几个字典的一个列表，每个字典声明了 DarkNet 层的的输入通道数，卷积核数等参数。skip 用于指明这个块是否作为残差块使用。

forword 函数中有一个 if 语句，这个语句的作用是，如果这个块是残差块，则将块中 stride 为 2 的层的输出和块的临时输出相加，如果没有 stride 为 2 的层，才将块的输入和临时的输出相加得到块的输出。

将块堆叠成 YOLOv3 网络：

总共有 106 层，第 82，94，106 层是输出，结构稍微有点复杂。

#DarkNet网络
class Yolov3(nn.Module):def __init__(self):super().__init__()self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners = False)#layer0 -> layer4, input = (3, 416, 416), flow_out = (64, 208, 208)self.blocks = nn.ModuleDict()self.blocks['block0_4'] = DarknetBlock([{'id': 'layer_0', 'in_channels': 3, 'out_channels': 32, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_1', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_2', 'in_channels': 64, 'out_channels': 32, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_3', 'in_channels': 32, 'out_channels': 64, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer5 -> layer8, input = (64, 208, 208), flow_out = (128, 104, 104)self.blocks['block5_8'] = DarknetBlock([{'id': 'layer_5', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_6', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_7', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer9 -> layer11, input = (128, 104, 104), flow_out = (128, 104, 104)self.blocks['block9_11'] = DarknetBlock([{'id': 'layer_9', 'in_channels': 128, 'out_channels': 64, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_10', 'in_channels': 64, 'out_channels': 128, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer12 -> layer15, input = (128, 104, 104), flow_out = (256, 52, 52)self.blocks['block12_15'] = DarknetBlock([{'id': 'layer_12', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_13', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_14', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer16 -> layer36, input = (256, 52, 52), flow_out = (256, 52, 52)self.blocks['block16_18'] = DarknetBlock([{'id': 'layer_16', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_17', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block19_21'] = DarknetBlock([{'id': 'layer_19', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_20', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block22_24'] = DarknetBlock([{'id': 'layer_22', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_23', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block25_27'] = DarknetBlock([{'id': 'layer_25', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_26', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block28_30'] = DarknetBlock([{'id': 'layer_28', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_29', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block31_33'] = DarknetBlock([{'id': 'layer_31', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_32', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block34_36'] = DarknetBlock([{'id': 'layer_34', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_35', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer37 -> layer40, input = (256, 52, 52), flow_out = (512, 26, 26)self.blocks['block37_40'] = DarknetBlock([{'id': 'layer_37', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_38', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_39', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer41 -> layer61, input = (512, 26, 26), flow_out = (512, 26, 26)self.blocks['block41_43'] = DarknetBlock([{'id': 'layer_41', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_42', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block44_46'] = DarknetBlock([{'id': 'layer_44', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_45', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block47_49'] = DarknetBlock([{'id': 'layer_47', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_48', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block50_52'] = DarknetBlock([{'id': 'layer_50', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_51', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block53_55'] = DarknetBlock([{'id': 'layer_53', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_54', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block56_58'] = DarknetBlock([{'id': 'layer_56', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_57', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block59_61'] = DarknetBlock([{'id': 'layer_59', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_60', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer62 -> layer65, input = (512, 26, 26), flow_out = (1024, 13, 13)self.blocks['block62_65'] = DarknetBlock([{'id': 'layer_62', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 2, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_63', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_64', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer66 -> layer74, input = (1024, 13, 13), flow_out = (1024, 13, 13)self.blocks['block66_68'] = DarknetBlock([{'id': 'layer_66', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_67', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block69_71'] = DarknetBlock([{'id': 'layer_69', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_70', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])self.blocks['block72_74'] = DarknetBlock([{'id': 'layer_72', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_73', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True}])#layer75 -> layer79, input = (1024, 13, 13), flow_out = (512, 13, 13)self.blocks['block75_79'] = DarknetBlock([{'id': 'layer_75', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_76', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_77', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_78', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_79', 'in_channels': 1024, 'out_channels': 512, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer80 -> layer82, input = (512, 13, 13), yolo_out = (255, 13, 13)self.blocks['yolo_82'] = DarknetBlock([{'id': 'layer_80', 'in_channels': 512, 'out_channels': 1024, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_81', 'in_channels': 1024, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)#layer83 -> layer86, input = (512, 13, 13), -> (256, 13, 13) -> upsample and concate layer61(512, 26, 26), flow_out = (768, 26, 26)self.blocks['block83_86'] = DarknetBlock([{'id': 'layer_84', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer87 -> layer91, input = (768, 26, 26), flow_out = (256, 26, 26)self.blocks['block87_91'] = DarknetBlock([{'id': 'layer_87', 'in_channels': 768, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_88', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_89', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_90', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_91', 'in_channels': 512, 'out_channels': 256, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer92 -> layer94, input = (256, 26, 26), yolo_out = (255, 26, 26)self.blocks['yolo_94'] = DarknetBlock([{'id': 'layer_92', 'in_channels': 256, 'out_channels': 512, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_93', 'in_channels': 512, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)#layer95 -> layer98, input = (256, 26, 26), -> (128, 26, 26) -> upsample and concate layer36(256, 52, 52), flow_out = (384, 52, 52)self.blocks['block95_98'] = DarknetBlock([{'id': 'layer_96', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True}], skip = False)#layer99 -> layer106, input = (384, 52, 52), yolo_out = (255, 52, 52)self.blocks['yolo_106'] = DarknetBlock([{'id': 'layer_99', 'in_channels': 384, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_100', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_101', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_102', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_103', 'in_channels': 256, 'out_channels': 128, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': True, 'leaky': True},{'id': 'layer_104', 'in_channels': 128, 'out_channels': 256, 'kernel_size': 3, 'stride': 1, 'padding' : 1, 'bnorm': True, 'leaky': True},{'id': 'layer_105', 'in_channels': 256, 'out_channels': 255, 'kernel_size': 1, 'stride': 1, 'padding' : 0, 'bnorm': False, 'leaky': False}], skip = False)def forward(self, x):x = self.blocks['block0_4'](x)x = self.blocks['block5_8'](x)x = self.blocks['block9_11'](x)x = self.blocks['block12_15'](x)x = self.blocks['block16_18'](x)x = self.blocks['block19_21'](x)x = self.blocks['block22_24'](x)x = self.blocks['block25_27'](x)x = self.blocks['block28_30'](x)x = self.blocks['block31_33'](x)x = self.blocks['block34_36'](x)skip36 = xx = self.blocks['block37_40'](x)x = self.blocks['block41_43'](x)x = self.blocks['block44_46'](x)x = self.blocks['block47_49'](x)x = self.blocks['block50_52'](x)x = self.blocks['block53_55'](x)x = self.blocks['block56_58'](x)x = self.blocks['block59_61'](x)skip61 = xx = self.blocks['block62_65'](x)x = self.blocks['block66_68'](x)x = self.blocks['block69_71'](x)x = self.blocks['block72_74'](x)x = self.blocks['block75_79'](x)yolo_82 = self.blocks['yolo_82'](x)x = self.blocks['block83_86'](x)x = self.upsample(x)x = torch.cat((x, skip61), dim = 1)x = self.blocks['block87_91'](x)yolo_94 = self.blocks['yolo_94'](x)x = self.blocks['block95_98'](x)x = self.upsample(x)x = torch.cat((x, skip36), dim = 1)yolo_106 = self.blocks['yolo_106'](x)return yolo_82, yolo_94, yolo_106

定义模型

model = Yolov3()

到这一步可以用 print 将模型结构打印出来。

加载预训练权重

这时候，权重文件应该已经下载好了，我们可以通过一个权重读取类来将权重参数加载到我们的模型里：

#权重读取类
class WeightReader():def __init__(self, weight_file):with open(weight_file, 'r') as fp:header = np.fromfile(fp, dtype = np.int32, count = 5)self.header = torch.from_numpy(header)self.seen = self.header[3]#The rest of the values are the weights#load them upself.weights = np.fromfile(fp, dtype = np.float32)#加载权重参数def load_weights(self, model):ptr = 0for _, block in model.blocks.items():for _, layer in block.layers.items():bn = layer.bnormconv = layer.convif bn is not None:#Get the number of weights of Batch Norm Layernum_bn_biases = bn.bias.numel()#Load the data#偏差bn_biases = torch.from_numpy(self.weights[ptr:ptr + num_bn_biases])ptr += num_bn_biases#权重bn_weights = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr  += num_bn_biases#均值bn_running_mean = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr  += num_bn_biases#方差bn_running_var = torch.from_numpy(self.weights[ptr: ptr + num_bn_biases])ptr  += num_bn_biases#Cast the loaded weights into dims of model weights. bn_biases = bn_biases.view_as(bn.bias.data)bn_weights = bn_weights.view_as(bn.weight.data)bn_running_mean = bn_running_mean.view_as(bn.running_mean)bn_running_var = bn_running_var.view_as(bn.running_var)#Copy the data to modelbn.bias.data.copy_(bn_biases)bn.weight.data.copy_(bn_weights)bn.running_mean.copy_(bn_running_mean)bn.running_var.copy_(bn_running_var)  else:#Number of biasesnum_biases = conv.bias.numel()#Load the biasesconv_biases = torch.from_numpy(self.weights[ptr: ptr + num_biases])ptr = ptr + num_biases#reshape the loaded weights according to the dims of the model weightsconv_biases = conv_biases.view_as(conv.bias.data)#Finally copy the dataconv.bias.data.copy_(conv_biases)#load the weights for the Convolutional layersnum_weights = conv.weight.numel()#Do the same as above for weightsconv_weights = torch.from_numpy(self.weights[ptr:ptr+num_weights])ptr = ptr + num_weightsconv_weights = conv_weights.view_as(conv.weight.data)conv.weight.data.copy_(conv_weights)#查看网络参数def weight_summary(self, model):train_able, train_disable = 0, 0for _, block in model.blocks.items():for _, layer in block.layers.items():bn = layer.bnormconv = layer.convif bn is not None:train_able += (bn.bias.numel() + bn.weight.numel())train_disable += (bn.running_mean.numel() + bn.running_var.numel())else:train_able += conv.bias.numel()train_able += conv.weight.numel()print("total = %d"%(train_able + train_disable))print("count of train_able = %d"%train_able)print("count of train_disable = %d"%train_disable)

官方给出的预训练权重文件中去掉前 5 个数值，剩下的才是可以加载到模型里面的。需要注意权重文件中参数的保存格式，这里给出官方提供的一张图：

它是按照层的前向传播顺序来存储参数数值的。如果 DarkNet 层中有 BN 层，则依次存储 BN 的偏置，权重，均值，方差以及卷积层的权重。如果 DarkNet 层中没有 BN 层，则依次存储卷积层的偏置，卷积层的权重。

对于 BN 层，它的偏置和权重是可训练参数，而均值和方差是不可训练参数，但都需要加载到网络里。

通过以下代码加载参数并查看参数数量。

#加载模型参数，并查看模型参数数量
#####网络总参数为 62,001,757
#####其中，可训练参数(BN层以及卷积层的weight, bias) = 61,949,149， 不可训练参数(BN层的均值和方差) = 52,608
weight_reader = WeightReader('yolov3.weights')
weight_reader.load_weights(model)
weight_reader.weight_summary(model)

输入处理

定义一个图片加载的函数，将输入的图片裁剪成网络输入的大小(416)，并将图片每个像素都除以 255，转成四维张量。最后返回图片和图片原始的宽高。

#加载图片
def img_loader(photo_file, input_w, input_h):img = Image.open(photo_file)img_w, img_h = img.sizeimg = img.resize((input_w, input_h))img = torchvision.transforms.ToTensor()(img)img = torch.unsqueeze(img, 0)#返回指定大小的图片张量和图片原始的宽高return img, img_w, img_h

接下来，模型就可以根据输入的图片得到准确的输出了。

photo_file = 'zebra.jpg'
input_w, input_h = 416, 416
img, img_w, img_h = img_loader(photo_file, input_w, input_h)
y_hat = model(img)

这时候得到的 y_hat 是一个含有三个元素的元组，每个元素都是一个四维张量，剩下要做的事就是对这些张量进行解码，做 IoU 过滤，使用非极大值抑制，画出边框等一些列操作，这里一并将涉及到的函数直接贴出。

#锚箱类
class BoundBox:def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):self.xmin = xminself.ymin = yminself.xmax = xmaxself.ymax = ymaxself.objness = objnessself.classes = classesself.label = -1self.score = -1def get_label(self):if self.label == -1:self.label = np.argmax(self.classes)return self.labeldef get_score(self):if self.score == -1:self.score = self.classes[self.get_label()]return self.scoredef _sigmoid(x):return 1. / (1. + np.exp(-x))#解码网络输出
def decode_netout(netout, anchors, obj_thresh, net_w, net_h):grid_h, grid_w = netout.shape[1: ]nb_box = 3netout = netout.permute(1, 2, 0).detach().numpy().reshape((grid_h, grid_w, nb_box, -1))nb_class = netout.shape[-1] - 5boxes = []netout[..., :2]  = _sigmoid(netout[..., :2])netout[..., 4:]  = _sigmoid(netout[..., 4:])netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]netout[..., 5:] *= netout[..., 5:] > obj_threshfor i in range(grid_h*grid_w):row = i / grid_wcol = i % grid_wfor b in range(nb_box):# 4th element is objectness scoreobjectness = netout[int(row)][int(col)][b][4]if(objectness.all() <= obj_thresh): continue# first 4 elements are x, y, w, and hx, y, w, h = netout[int(row)][int(col)][b][:4]x = (col + x) / grid_w # center position, unit: image widthy = (row + y) / grid_h # center position, unit: image heightw = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image widthh = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height# last elements are class probabilitiesclasses = netout[int(row)][col][b][5:]box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)boxes.append(box)return boxes#执行边界框坐标的转换，将边界框列表、加载照片的原始形状和网络输入的形状作为参数。
#边界框的坐标将直接更新。
def correct_yolo_boxes(boxes, image_w, image_h, net_w, net_h):new_w, new_h = net_w, net_hfor i in range(len(boxes)):x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_wy_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_hboxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)# 为计算 IoU 服务
def _interval_overlap(interval_a, interval_b):x1, x2 = interval_ax3, x4 = interval_bif x3 < x1:if x4 < x1:return 0else:return min(x2,x4) - x1else:if x2 < x3:return 0else:return min(x2,x4) - x3#计算两个箱体的 IoU
def bbox_iou(box1, box2):intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])intersect = intersect_w * intersect_hw1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.yminw2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.yminunion = w1*h1 + w2*h2 - intersectreturn float(intersect) / union#非极大值抑制
def do_nms(boxes, nms_thresh):if len(boxes) > 0:nb_class = len(boxes[0].classes)else:returnfor c in range(nb_class):sorted_indices = np.argsort([-box.classes[c] for box in boxes])for i in range(len(sorted_indices)):index_i = sorted_indices[i]if boxes[index_i].classes[c] == 0: continuefor j in range(i+1, len(sorted_indices)):index_j = sorted_indices[j]if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:boxes[index_j].classes[c] = 0#检索那些能强烈预测物体存在的箱子：它们的可信度超过 thresh
def get_boxes(boxes, labels, thresh):v_boxes, v_labels, v_scores = list(), list(), list()# enumerate all boxesfor box in boxes:# enumerate all possible labelsfor i in range(len(labels)):# check if the threshold for this label is high enoughif box.classes[i] > thresh:v_boxes.append(box)v_labels.append(labels[i])v_scores.append(box.classes[i]*100)# don't break, many labels may trigger for one boxreturn v_boxes, v_labels, v_scores#画出边界框
def draw_boxes(photo_file, v_boxes, v_labels, v_scores):# load the imagedata = plt.imread(photo_file)# plot the imageplt.imshow(data)# get the context for drawing boxesax = plt.gca()# plot each boxfor i in range(len(v_boxes)):box = v_boxes[i]# get coordinatesy1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax# calculate width and height of the boxwidth, height = x2 - x1, y2 - y1# create the shaperect = plt.Rectangle((x1, y1), width, height, fill=False, color='white')# draw the boxax.add_patch(rect)# draw text and score in top left cornerlabel = "%s (%.1f)" % (v_labels[i], v_scores[i])plt.text(x1, y1, label, color='white', bbox=dict(facecolor='red'))# show the plotplt.show()

写一个函数对上述步骤做一个封装。

def make_predict(photo_file):img, img_w, img_h = img_loader(photo_file, input_w, input_h)y_hat = model(img)boxes = []for i in range(len(y_hat)):# decode the output of the networkboxes += decode_netout(y_hat[i][0], anchors[i], class_threshold, input_w, input_h)# correct the sizes of the bounding boxes for the shape of the imagecorrect_yolo_boxes(boxes, img_w, img_h, input_w, input_h)# suppress non-maximal boxesdo_nms(boxes, 0.5)# get the details of the detected objectsv_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)# summarize what we foundfor i in range(len(v_boxes)):print(v_labels[i], v_scores[i])# draw what we founddraw_boxes(photo_file, v_boxes, v_labels, v_scores)

另外，需要将网络输出的类别序号映射成我们能够理解的自然语言，权重文件能够预测的标签如下：

#权重文件能够预测的标签
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck","boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench","bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe","backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard","sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard","tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana","apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake","chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse","remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator","book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]

最后，执行我们封装好的函数。

#预先设定的锚点
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
#输入的网络的宽高
input_w, input_h = 416, 416
#置信度阈值
class_threshold = 0.75
#读取图片开始预测
photo_file = 'zebra.jpg'
make_predict(photo_file)

结果如下：

参考文献：

YOLOv3 论文

How to Perform Object Detection With YOLOv3 in Keras

How to implement a YOLO (v3) object detector from scratch in PyTorch

YOLOv3网络结构和解析