yolov5代码及原理解析

文章目录

yolov5代码及原理解析
一、代码及原理解析
- 1、输入端
- - (1) letterbox
  - (2) Mosaic增强
  - (3) anchor
  - - 1）关闭时
    - 2）开启时
- 2、Backbone
- - （1）Focus结构
  - （2）CSP
  - （3）SPP
- 3、Neck
- 4、输出端
- - （1）输出通道数
  - （2）损失函数计算
  - - 1）锚框选取
    - 2）IoU_loss计算
    - 3）BCELoss
  - （3）NMS最大值抑制
- 5、评价指标
二、不同复杂度的yolov5模型
- 1.不同模型参数
- - （1）yolov5s
  - （2）yolov5m
  - （3）yolov5l
  - （4）yolov5x
- 2.参数影响
- - （1）depth_multiple
  - （2）width_multiple
主要参考文章及视频

一、代码及原理解析

1、输入端

(1) letterbox

此模块作用为将原始图片统一缩放到一个标准尺寸，再送入检测网络中。

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):# Resize and pad image while meeting stride-multiple constraintsshape = im.shape[:2]  # current shape [height, width]if isinstance(new_shape, int):new_shape = (new_shape, new_shape)# Scale ratio (new / old)r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])if not scaleup:  # only scale down, do not scale up (for better val mAP)r = min(r, 1.0)# Compute paddingratio = r, r  # width, height ratiosnew_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh paddingif auto:  # minimum rectangledw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh paddingelif scaleFill:  # stretchdw, dh = 0.0, 0.0new_unpad = (new_shape[1], new_shape[0])ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratiosdw /= 2  # divide padding into 2 sidesdh /= 2if shape[::-1] != new_unpad:  # resizeim = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))left, right = int(round(dw - 0.1)), int(round(dw + 0.1))im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add borderreturn im, ratio, (dw, dh)

步骤为：
1）计算new_shape[0] / shape[0]和new_shape[1] / shape[1]，选取其最小值r。
2）将原图像resize为int(round(shape[1] * r)), int(round(shape[0] * r))。
3）分别计算new_shape[1] - new_unpad[0]对stride（默认是32）取余数和new_shape[0] - new_unpad[1]对stride（默认是32）取余数的结果，记为dw和dh。
4）在resize后的图像的基础上填充dw和dh大小像素的空白边界像素。

(2) Mosaic增强

首先介绍一下load_image函数

def load_image(self, i):# Loads 1 image from dataset index 'i', returns (im, original hw, resized hw)im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i],if im is None:  # not cached in RAMif fn.exists():  # load npyim = np.load(fn)else:  # read imageim = cv2.imread(f)  # BGRassert im is not None, f'Image Not Found {f}'h0, w0 = im.shape[:2]  # orig hwr = self.img_size / max(h0, w0)  # ratioif r != 1:  # if sizes are not equalim = cv2.resize(im,(int(w0 * r), int(h0 * r)),interpolation=cv2.INTER_LINEAR if (self.augment or r > 1) else cv2.INTER_AREA)return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resizedelse:return self.ims[i], self.im_hw0[i], self.im_hw[i]  # im, hw_original, hw_resized

其类似于letterbox，只是缺少了填充边缘的步骤。此函数将图像进行resize并输出resize前后的图像大小。

接下来看一下四张图进行Mosaic增强的函数load_mosaic：

def load_mosaic(self, index):# YOLOv5 4-mosaic loader. Loads 1 image + 3 random images into a 4-image mosaiclabels4, segments4 = [], []s = self.img_sizeyc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, yindices = [index] + random.choices(self.indices, k=3)  # 3 additional image indicesrandom.shuffle(indices)for i, index in enumerate(indices):# Load imageimg, _, (h, w) = self.load_image(index)# place img in img4if i == 0:  # top leftimg4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tilesx1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)elif i == 1:  # top rightx1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), ycx1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), helif i == 2:  # bottom leftx1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)elif i == 3:  # bottom rightx1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]padw = x1a - x1bpadh = y1a - y1b# Labelslabels, segments = self.labels[index].copy(), self.segments[index].copy()if labels.size:labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy formatsegments = [xyn2xy(x, w, h, padw, padh) for x in segments]labels4.append(labels)segments4.extend(segments)# Concat/clip labelslabels4 = np.concatenate(labels4, 0)for x in (labels4[:, 1:], *segments4):np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()# img4, labels4 = replicate(img4, labels4)  # replicate# Augmentimg4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])img4, labels4 = random_perspective(img4, labels4, segments4,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)  # border to removereturn img4, labels4

步骤如下：
1）确定拼接的四张图片的相接的点yc, xc，其为（img_size//2，3img_size//2）中的随机点。
2）随机抽取序列号为index的图片，通过load_image函数将其resize并读取resize后的h和w。
3）生成（2img_size，2*img_size）的空白图片，并将resize后的小图片复制到大空白图片中。
注意：由于yc, xc是随机的，最终填充的图片中可能存在大量空白。

由上述程序可以看到，load_mosaic函数中还存在random_perspective函数，此函数中包含了一些其它的图像增强操作，包括degrees:旋转；translate:水平和垂直转换；scale:缩放；shear:图片裁剪；perspective:透视变换。
相应程序如下所示：

def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,border=(0, 0)):# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))# targets = [cls, xyxy]height = im.shape[0] + border[0] * 2  # shape(h,w,c)width = im.shape[1] + border[1] * 2# CenterC = np.eye(3)C[0, 2] = -im.shape[1] / 2  # x translation (pixels)C[1, 2] = -im.shape[0] / 2  # y translation (pixels)# PerspectiveP = np.eye(3)P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)# Rotation and ScaleR = np.eye(3)a = random.uniform(-degrees, degrees)# a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotationss = random.uniform(1 - scale, 1 + scale)# s = 2 ** random.unifoonMatrix2D(angle=a, center=(0, 0), scale=s)rm(-scale, scale)R[:2] = cv2.getRotati# ShearS = np.eye(3)S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)# TranslationT = np.eye(3)T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)# Combined rotation matrixM = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANTif (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changedif perspective:im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))else:  # affineim = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))# Visualize# import matplotlib.pyplot as plt# ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()# ax[0].imshow(im[:, :, ::-1])  # base# ax[1].imshow(im2[:, :, ::-1])  # warped# Transform label coordinatesn = len(targets)if n:use_segments = any(x.any() for x in segments)new = np.zeros((n, 4))if use_segments:  # warp segmentssegments = resample_segments(segments)  # upsamplefor i, segment in enumerate(segments):xy = np.ones((len(segment), 3))xy[:, :2] = segmentxy = xy @ M.T  # transformxy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine# clipnew[i] = segment2box(xy, width, height)else:  # warp boxesxy = np.ones((n * 4, 3))xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1xy = xy @ M.T  # transformxy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine# create new boxesx = xy[:, [0, 2, 4, 6]]y = xy[:, [1, 3, 5, 7]]new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T# clipnew[:, [0, 2]] = new[:, [0, 2]].clip(0, width)new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)# filter candidatesi = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)targets = targets[i]targets[:, 1:5] = new[i]return im, targets

除了上述的四张图像的Mosaic增强外，程序中还有九张图像的Mosaic增强函数load_mosaic9：

def load_mosaic9(self, index):# YOLOv5 9-mosaic loader. Loads 1 image + 8 random images into a 9-image mosaiclabels9, segments9 = [], []s = self.img_sizeindices = [index] + random.choices(self.indices, k=8)  # 8 additional image indicesrandom.shuffle(indices)hp, wp = -1, -1  # height, width previousfor i, index in enumerate(indices):# Load imageimg, _, (h, w) = self.load_image(index)# place img in img9if i == 0:  # centerimg9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tilesh0, w0 = h, wc = s, s, s + w, s + h  # xmin, ymin, xmax, ymax (base) coordinateselif i == 1:  # topc = s, s - h, s + w, selif i == 2:  # top rightc = s + wp, s - h, s + wp + w, selif i == 3:  # rightc = s + w0, s, s + w0 + w, s + helif i == 4:  # bottom rightc = s + w0, s + hp, s + w0 + w, s + hp + helif i == 5:  # bottomc = s + w0 - w, s + h0, s + w0, s + h0 + helif i == 6:  # bottom leftc = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + helif i == 7:  # leftc = s - w, s + h0 - h, s, s + h0elif i == 8:  # top leftc = s - w, s + h0 - hp - h, s, s + h0 - hppadx, pady = c[:2]x1, y1, x2, y2 = (max(x, 0) for x in c)  # allocate coords# Labelslabels, segments = self.labels[index].copy(), self.segments[index].copy()if labels.size:labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padx, pady)  # normalized xywh to pixel xyxy formatsegments = [xyn2xy(x, w, h, padx, pady) for x in segments]labels9.append(labels)segments9.extend(segments)# Imageimg9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:]  # img9[ymin:ymax, xmin:xmax]hp, wp = h, w  # height, width previous# Offsetyc, xc = (int(random.uniform(0, s)) for _ in self.mosaic_border)  # mosaic center x, yimg9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]# Concat/clip labelslabels9 = np.concatenate(labels9, 0)labels9[:, [1, 3]] -= xclabels9[:, [2, 4]] -= ycc = np.array([xc, yc])  # centerssegments9 = [x - c for x in segments9]for x in (labels9[:, 1:], *segments9):np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()# img9, labels9 = replicate(img9, labels9)  # replicate# Augmentimg9, labels9 = random_perspective(img9, labels9, segments9,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)  # border to removereturn img9, labels9

(3) anchor

yolov5中具有autoanchor功能，在设置时默认关闭：

parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')

若想启用，可设置为True。
下面介绍一下此功能开启和关闭时yolov5都是怎样训练锚框的。

1）关闭时

在\yolov5-master\models\内有yolov5系列模型的yaml参数文件，以yolov5s为例，其关于anchor的描述为：

anchors:- [10,13, 16,30, 33,23]  # P3/8- [30,61, 62,45, 59,119]  # P4/16- [116,90, 156,198, 373,326]  # P5/32

其中有三行，每行有三个anchors，每行代表不同尺度下的anchors，最终代表在三个不同尺度的特征图下的各三个anchors，其参数代表w和h。
大的特征图对应于小的anchors，小的特征图对应于大的anchors。
例：图像大小默认为640，其对应的特征图分别为：
三次下采样：640/8=80；四次下采样：640/16=40；五次下采样：640/32=20；
其anchors的描述包括：
位置：特征图中的每个网格；
大小：anchors的大小，由yaml中的预设anchors中的参数决定。
对于8080大小的特征图，其初始anchors大小即为第一行参数整体/8；
对于4040大小的特征图，其初始anchors大小即为第一行参数整体/16；
对于20*20大小的特征图，其初始anchors大小即为第一行参数整体/32；
由此，通过计算最终输出anchors和gt的差别，将其汇总在loss中，便可实现对anchors的调整，并将信息汇集在网络参数中。

2）开启时

autoanchors代码描述在\yolov5-master\utils\autoanchor.py中，其包括三个函数：

def check_anchor_order(m):
def check_anchors(dataset, model, thr=4.0, imgsz=640):
def kmean_anchors(dataset='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):

其中，第二、三个函数较为重要，check_anchors函数为计算生成的anchors和gt的符合程度是否高于0.98，若高于，则按生成的anchors继续进行训练，若低于，则启用kmean_anchors函数进行anchors的生成。

def check_anchors(dataset, model, thr=4.0, imgsz=640):# Check anchor fit to data, recompute if necessarym = model.module.model[-1] if hasattr(model, 'module') else model.model[-1]  # Detect()shapes = imgsz * dataset.shapes / dataset.shapes.max(1, keepdims=True)scale = np.random.uniform(0.9, 1.1, size=(shapes.shape[0], 1))  # augment scalewh = torch.tensor(np.concatenate([l[:, 3:5] * s for s, l in zip(shapes * scale, dataset.labels)])).float()  # whdef metric(k):  # compute metricr = wh[:, None] / k[None]x = torch.min(r, 1 / r).min(2)[0]  # ratio metricbest = x.max(1)[0]  # best_xaat = (x > 1 / thr).float().sum(1).mean()  # anchors above thresholdbpr = (best > 1 / thr).float().mean()  # best possible recallreturn bpr, aatanchors = m.anchors.clone() * m.stride.to(m.anchors.device).view(-1, 1, 1)  # current anchorsbpr, aat = metric(anchors.cpu().view(-1, 2))s = f'\n{PREFIX}{aat:.2f} anchors/target, {bpr:.3f} Best Possible Recall (BPR). 'if bpr > 0.98:  # threshold to recomputeLOGGER.info(emojis(f'{s}Current anchors are a good fit to dataset ✅'))else:LOGGER.info(emojis(f'{s}Anchors are a poor fit to dataset ⚠️, attempting to improve...'))na = m.anchors.numel() // 2  # number of anchorstry:anchors = kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False)except Exception as e:LOGGER.info(f'{PREFIX}ERROR: {e}')new_bpr = metric(anchors)[0]if new_bpr > bpr:  # replace anchorsanchors = torch.tensor(anchors, device=m.anchors.device).type_as(m.anchors)m.anchors[:] = anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1)  # losscheck_anchor_order(m)s = f'{PREFIX}Done ✅ (optional: update model *.yaml to use these anchors in the future)'else:s = f'{PREFIX}Done ⚠️ (original anchors better than new anchors, proceeding with original anchors)'LOGGER.info(emojis(s))

def kmean_anchors(dataset='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):""" Creates kmeans-evolved anchors from training datasetArguments:dataset: path to data.yaml, or a loaded datasetn: number of anchorsimg_size: image size used for trainingthr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0gen: generations to evolve anchors using genetic algorithmverbose: print all resultsReturn:k: kmeans evolved anchorsUsage:from utils.autoanchor import *; _ = kmean_anchors()"""from scipy.cluster.vq import kmeansnpr = np.randomthr = 1 / thrdef metric(k, wh):  # compute metricsr = wh[:, None] / k[None]x = torch.min(r, 1 / r).min(2)[0]  # ratio metric# x = wh_iou(wh, torch.tensor(k))  # iou metricreturn x, x.max(1)[0]  # x, best_xdef anchor_fitness(k):  # mutation fitness_, best = metric(torch.tensor(k, dtype=torch.float32), wh)return (best * (best > thr).float()).mean()  # fitnessdef print_results(k, verbose=True):k = k[np.argsort(k.prod(1))]  # sort small to largex, best = metric(k, wh0)bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n  # best possible recall, anch > thrs = f'{PREFIX}thr={thr:.2f}: {bpr:.4f} best possible recall, {aat:.2f} anchors past thr\n' \f'{PREFIX}n={n}, img_size={img_size}, metric_all={x.mean():.3f}/{best.mean():.3f}-mean/best, ' \f'past_thr={x[x > thr].mean():.3f}-mean: 'for i, x in enumerate(k):s += '%i,%i, ' % (round(x[0]), round(x[1]))if verbose:LOGGER.info(s[:-2])return kif isinstance(dataset, str):  # *.yaml filewith open(dataset, errors='ignore') as f:data_dict = yaml.safe_load(f)  # model dictfrom utils.datasets import LoadImagesAndLabelsdataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True)# Get label whshapes = img_size * dataset.shapes / dataset.shapes.max(1, keepdims=True)wh0 = np.concatenate([l[:, 3:5] * s for s, l in zip(shapes, dataset.labels)])  # wh# Filteri = (wh0 < 3.0).any(1).sum()if i:LOGGER.info(f'{PREFIX}WARNING: Extremely small objects found: {i} of {len(wh0)} labels are < 3 pixels in size')wh = wh0[(wh0 >= 2.0).any(1)]  # filter > 2 pixels# wh = wh * (npr.rand(wh.shape[0], 1) * 0.9 + 0.1)  # multiply by random scale 0-1# Kmeans inittry:LOGGER.info(f'{PREFIX}Running kmeans for {n} anchors on {len(wh)} points...')assert n <= len(wh)  # apply overdetermined constraints = wh.std(0)  # sigmas for whiteningk = kmeans(wh / s, n, iter=30)[0] * s  # pointsassert n == len(k)  # kmeans may return fewer points than requested if wh is insufficient or too similarexcept Exception:LOGGER.warning(f'{PREFIX}WARNING: switching strategies from kmeans to random init')k = np.sort(npr.rand(n * 2)).reshape(n, 2) * img_size  # random initwh, wh0 = (torch.tensor(x, dtype=torch.float32) for x in (wh, wh0))k = print_results(k, verbose=False)# Plot# k, d = [None] * 20, [None] * 20# for i in tqdm(range(1, 21)):#     k[i-1], d[i-1] = kmeans(wh / s, i)  # points, mean distance# fig, ax = plt.subplots(1, 2, figsize=(14, 7), tight_layout=True)# ax = ax.ravel()# ax[0].plot(np.arange(1, 21), np.array(d) ** 2, marker='.')# fig, ax = plt.subplots(1, 2, figsize=(14, 7))  # plot wh# ax[0].hist(wh[wh[:, 0]<100, 0],400)# ax[1].hist(wh[wh[:, 1]<100, 1],400)# fig.savefig('wh.png', dpi=200)# Evolvef, sh, mp, s = anchor_fitness(k), k.shape, 0.9, 0.1  # fitness, generations, mutation prob, sigmapbar = tqdm(range(gen), desc=f'{PREFIX}Evolving anchors with Genetic Algorithm:')  # progress barfor _ in pbar:v = np.ones(sh)while (v == 1).all():  # mutate until a change occurs (prevent duplicates)v = ((npr.random(sh) < mp) * random.random() * npr.randn(*sh) * s + 1).clip(0.3, 3.0)kg = (k.copy() * v).clip(min=2.0)fg = anchor_fitness(kg)if fg > f:f, k = fg, kg.copy()pbar.desc = f'{PREFIX}Evolving anchors with Genetic Algorithm: fitness = {f:.4f}'if verbose:print_results(k, verbose)return print_results(k)

总体步骤：
1）计算指标，判断其是否超过0.98，若超过，则不执行自适应anchors，若不超过，则执行2；
2）采用k_mean算法进行聚类，得到新的初始anchors的w和h；
3）采用遗传算法对w和h进行突变，执行1000次，观察其是否有优化。
此方法在yolo之前版本也存在，但是yolov5将其直接嵌入在主代码中，可自动调用。

接下来介绍一下yolov5的整体框架：

整体框架中包含了Backbone和Neck，接下来分别介绍这两部分。模型组件相关代码在\yolov5-master\models\common.py中。

2、Backbone

（1）Focus结构

Focus模块在v5中是图片进入backbone前，对图片进行切片操作，具体操作是在一张图片中每隔一个像素拿到一个值，类似于邻近下采样，这样就拿到了四张图片，四张图片互补，长的差不多，但是没有信息丢失，这样一来，将W、H信息就集中到了通道空间，输入通道扩充了4倍，即拼接起来的图片相对于原先的RGB三通道模式变成了12个通道，最后将得到的新图片再经过卷积操作，最终得到了没有信息丢失情况下的二倍下采样特征图。

以yolov5s为例，原始的640 × 640 × 3的图像输入Focus结构，采用切片操作，先变成320 × 320 × 12的特征图，再经过一次卷积操作，最终变成320 × 320 × 32的特征图。

class Focus(nn.Module):# Focus wh information into c-spacedef __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groupssuper().__init__()self.conv = Conv(c1 * 4, c2, k, s, p, g, act)# self.contract = Contract(gain=2)def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))# return self.conv(self.contract(x))

（2）CSP

Backbone采用CSP1_X，Neck采用CSP2_X，其结构如上面的结构图所示。
其中，Res unit如下图所示。

class BottleneckCSP(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansionsuper().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)self.cv4 = Conv(2 * c_, c2, 1, 1)self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)self.act = nn.SiLU()self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))def forward(self, x):y1 = self.cv3(self.m(self.cv1(x)))y2 = self.cv2(x)return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1))))

（3）SPP

SPP的作用是得到更多不同的信息。

class SPP(nn.Module):# Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729def __init__(self, c1, c2, k=(5, 9, 13)):super().__init__()c_ = c1 // 2  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])def forward(self, x):x = self.cv1(x)with warnings.catch_warnings():warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warningreturn self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

其使用填充方法，使得经过max pooling后，尺寸不变，且步长为1，且每层max pooing的大小不同，分别为5、9、13。

3、Neck

Yolov5采用FPN+PAN，整体框架图中第1、2、3、4、7、8个CBL具有下采样功能（conv步长为2），使得特征图可以按照上图所示进行concat，此种结构可以融合不同尺寸特征图的信息。

4、输出端

（1）输出通道数

每层特征图最终都会经过1乘1卷积，变成（5+分类数）乘3个通道（乘3是因为有3个anchors），分别是xywh（4个通道)，置信度（存在目标的概率），分类数（每类存在的置信度）

（2）损失函数计算

yolov5中损失函数包含三项，分别为anchors和bbox的loss，classification的loss以及confidence的loss，为其加权之和（默认为0.05bbox_loss+0.5cls_loss+1*obj_loss）。下面介绍一些计算时的细节，其代码在yolov5-master\utils\loss.py中。

1）锚框选取

2）IoU_loss计算

anchors和bbox的loss计算由IoU_loss得到。
各类IoU_loss的计算可以查看深入浅出Yolo系列之Yolov3&Yolov4&Yolov5&Yolox核心基础知识完整讲解4.3.4 Prediction创新
代码在yolov5-master\utils\metrics.py中。
bbox_iou用来计算IoU。

def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):# Returns the IoU of box1 to box2. box1 is 4, box2 is nx4box2 = box2.T# Get the coordinates of bounding boxesif x1y1x2y2:  # x1, y1, x2, y2 = box1b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]else:  # transform from xywh to xyxyb1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2# Intersection areainter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)# Union Areaw1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + epsw2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + epsunion = w1 * h1 + w2 * h2 - inter + epsiou = inter / unionif CIoU or DIoU or GIoU:cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1)  # convex (smallest enclosing box) widthch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)  # convex heightif CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squaredrho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +(b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center distance squaredif CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)with torch.no_grad():alpha = v / (v - iou + (1 + eps))return iou - (rho2 / c2 + v * alpha)  # CIoUreturn iou - rho2 / c2  # DIoUc_area = cw * ch + eps  # convex areareturn iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdfreturn iou  # IoU

3）BCELoss

计算loss时，三类loss均采用此函数：

class BCEBlurWithLogitsLoss(nn.Module):# BCEwithLogitLoss() with reduced missing label effects.def __init__(self, alpha=0.05):super().__init__()self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none')  # must be nn.BCEWithLogitsLoss()self.alpha = alphadef forward(self, pred, true):loss = self.loss_fcn(pred, true)pred = torch.sigmoid(pred)  # prob from logitsdx = pred - true  # reduce only missing label effects# dx = (pred - true).abs()  # reduce missing label and false label effectsalpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4))loss *= alpha_factorreturn loss.mean()

（3）NMS最大值抑制

NMS的原理可以查看NMS原理大总结，其中yolov5将IoU进行了变换。
代码在yolov5-master\utils\general.py中：

def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,labels=(), max_det=300):"""Runs Non-Maximum Suppression (NMS) on inference resultsReturns:list of detections, on (n,6) tensor per image [xyxy, conf, cls]"""nc = prediction.shape[2] - 5  # number of classesxc = prediction[..., 4] > conf_thres  # candidates# Checksassert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'# Settingsmin_wh, max_wh = 2, 7680  # (pixels) minimum and maximum box width and heightmax_nms = 30000  # maximum number of boxes into torchvision.ops.nms()time_limit = 10.0  # seconds to quit afterredundant = True  # require redundant detectionsmulti_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)merge = False  # use merge-NMSt = time.time()output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]for xi, x in enumerate(prediction):  # image index, image inference# Apply constraintsx[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-heightx = x[xc[xi]]  # confidence# Cat apriori labels if autolabellingif labels and len(labels[xi]):lb = labels[xi]v = torch.zeros((len(lb), nc + 5), device=x.device)v[:, :4] = lb[:, 1:5]  # boxv[:, 4] = 1.0  # confv[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # clsx = torch.cat((x, v), 0)# If none remain process next imageif not x.shape[0]:continue# Compute confx[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf# Box (center x, center y, width, height) to (x1, y1, x2, y2)box = xywh2xyxy(x[:, :4])# Detections matrix nx6 (xyxy, conf, cls)if multi_label:i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).Tx = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)else:  # best class onlyconf, j = x[:, 5:].max(1, keepdim=True)x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]# Filter by classif classes is not None:x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]# Apply finite constraint# if not torch.isfinite(x).all():#     x = x[torch.isfinite(x).all(1)]# Check shapen = x.shape[0]  # number of boxesif not n:  # no boxescontinueelif n > max_nms:  # excess boxesx = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence# Batched NMSc = x[:, 5:6] * (0 if agnostic else max_wh)  # classesboxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scoresi = torchvision.ops.nms(boxes, scores, iou_thres)  # NMSif i.shape[0] > max_det:  # limit detectionsi = i[:max_det]if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrixweights = iou * scores[None]  # box weightsx[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxesif redundant:i = i[iou.sum(1) > 1]  # require redundancyoutput[xi] = x[i]if (time.time() - t) > time_limit:LOGGER.warning(f'WARNING: NMS time limit {time_limit}s exceeded')break  # time limit exceededreturn output

5、评价指标

以下三个链接既包含了原理介绍又包含了训练结果的解析：
YoloV5相关性能指标解析
YOLOv5基础知识点——性能指标
深度学习评估指标之目标检测——（yolov5 可视化训练结果以及result.txt解析）

二、不同复杂度的yolov5模型

不同模型区分的点为depth_multiple和width_multiple这两个参数。

1.不同模型参数

（1）yolov5s

depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

（2）yolov5m

depth_multiple: 0.67  # model depth multiple
width_multiple: 0.75  # layer channel multiple

（3）yolov5l

depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple

（4）yolov5x

depth_multiple: 1.33  # model depth multiple
width_multiple: 1.25  # layer channel multiple

2.参数影响

（1）depth_multiple

将上面的CSP1和CSP2结构单独放在下面

depth_multiple这个参数影响的便是CSP1和CSP2的深度，即CSP1中的残差组件的数量和CSP2中的CBL数量。
而这个数量是如何计算得到的呢？

上图为yolov5s的backbone参数，图中红圈圈起来的变为CSP1的参数，其每个CSP1的残差单元数量便为第二个数字乘以depth_multiple。
例：其第一个CSP1中的残差组件数量为3depth_multiple=1，第二个CSP1中的残差组件数量为9depth_multiple=3，第三个CSP1中的残差组件数量为9*depth_multiple=3；
CSP2同样如此：

# YOLOv5 v6.0 head
head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]],  # cat backbone P4[-1, 3, C3, [512, False]],  # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]],  # cat backbone P3[-1, 3, C3, [256, False]],  # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]],  # cat head P4[-1, 3, C3, [512, False]],  # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]],  # cat head P5[-1, 3, C3, [1024, False]],  # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)]

其CSP2中的CBL数量均为2*3depth_multiple=2个。

（2）width_multiple

此参数影响的是backbone中Focus结构和CBL结构中的卷积核个数。

卷积核数量为上图中框中所示的数字乘以width_multiple。

主要参考文章及视频

深入浅出Yolo系列之Yolov5核心基础知识完整讲解

【最适合新手入门的【YOLOV5目标实战】教程！基于Pytorch搭建YOLOV5目标检测平台！环境部署+项目实战（深度学习/计算机视觉）-哔哩哔哩】

yolov5代码及原理详解相关推荐

基于CNN的MINIST手写数字识别项目代码以及原理详解
文章目录项目简介项目下载地址项目开发软件环境项目开发硬件环境前言一.数据加载的作用二.Pytorch进行数据加载所需工具 2.1 Dataset 2.2 Dataloader 2.3 T ...
卷积神经网络（CNN）详细介绍及其原理详解
文章目录前言一.什么是卷积神经网络二.输入层三.卷积层四.池化层五.全连接层六.输出层七.回顾整个过程总结前言本文总结了关于卷积神经网络(CNN)的一些基础的概念,并且对于其中的 ...
图像质量损失函数SSIM Loss的原理详解和代码具体实现
本文转自微信公众号SIGAI 文章PDF见: http://www.tensorinfinity.com/paper_164.html http://www.360doc.com/content/19 ...
TOPSIS(逼近理想解)算法原理详解与代码实现
写在前面: 个人理解:针对存在多项指标,多个方案的方案评价分析方法,也就是根据已存在的一份数据,判断数据中各个方案的优劣.中心思想是首先确定各项指标的最优理想值(正理想值)和最劣理想值(负理想解),所 ...
蓝牙：CRC原理详解(附crc16校验代码)
CRC原理详解(附crc16校验代码) 参考链接: https://www.cnblogs.com/esestt/archive/2007/08/09/848856.html Cyclic Redun ...
冒泡排序原理详解及代码实现
1.冒泡排序数组排序常用的一种方式,为什么要叫冒泡排序呢?这还要从它的原理说起. 2.代码实现(低效版) 3.原理详解:冒泡排序最基本的思想就是从左到右依次判断相邻的两个数的大小关系,如果前面的数大于 ...
DS18B20温度传感器原理详解及例程代码、漏极开路
[常用传感器]DS18B20温度传感器原理详解及例程代码_Z小旋的博客-CSDN博客_ds18b20温度传感器传感器引脚及原理图 DS18B20传感器的引脚及封装图如下: DS18B20一共有三个引 ...
计算机组织与结构poc,CPU漏洞原理详解以及POC代码分享
原标题:CPU漏洞原理详解以及POC代码分享首先,这个漏洞已经公布近一周时间了,看到各大媒体.公众号到处在宣传,本打算不再发布类似信息,但是发现很多媒体的报道达到了一个目的--几乎所有的CPU都有漏 ...
视频教程-深度学习原理详解及Python代码实现-深度学习
深度学习原理详解及Python代码实现大学教授,美国归国博士.博士生导师:人工智能公司专家顾问:长期从事人工智能.物联网.大数据研究:已发表学术论文100多篇,授权发明专利10多项白勇 ¥88.0 ...

yolov5代码及原理详解