

  • yolov5代码及原理解析
  • 一、代码及原理解析
    • 1、输入端
      • (1) letterbox
      • (2) Mosaic增强
      • (3) anchor
        • 1)关闭时
        • 2)开启时
    • 2、Backbone
      • (1)Focus结构
      • (2)CSP
      • (3)SPP
    • 3、Neck
    • 4、输出端
      • (1)输出通道数
      • (2)损失函数计算
        • 1)锚框选取
        • 2)IoU_loss计算
        • 3)BCELoss
      • (3)NMS最大值抑制
    • 5、评价指标
  • 二、不同复杂度的yolov5模型
    • 1.不同模型参数
      • (1)yolov5s
      • (2)yolov5m
      • (3)yolov5l
      • (4)yolov5x
    • 2.参数影响
      • (1)depth_multiple
      • (2)width_multiple
  • 主要参考文章及视频



(1) letterbox


def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):# Resize and pad image while meeting stride-multiple constraintsshape = im.shape[:2]  # current shape [height, width]if isinstance(new_shape, int):new_shape = (new_shape, new_shape)# Scale ratio (new / old)r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])if not scaleup:  # only scale down, do not scale up (for better val mAP)r = min(r, 1.0)# Compute paddingratio = r, r  # width, height ratiosnew_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh paddingif auto:  # minimum rectangledw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh paddingelif scaleFill:  # stretchdw, dh = 0.0, 0.0new_unpad = (new_shape[1], new_shape[0])ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratiosdw /= 2  # divide padding into 2 sidesdh /= 2if shape[::-1] != new_unpad:  # resizeim = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))left, right = int(round(dw - 0.1)), int(round(dw + 0.1))im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add borderreturn im, ratio, (dw, dh)

1)计算new_shape[0] / shape[0]和new_shape[1] / shape[1],选取其最小值r。
2)将原图像resize为int(round(shape[1] * r)), int(round(shape[0] * r))。
3)分别计算new_shape[1] - new_unpad[0]对stride(默认是32)取余数和new_shape[0] - new_unpad[1]对stride(默认是32)取余数的结果,记为dw和dh。

(2) Mosaic增强


def load_image(self, i):# Loads 1 image from dataset index 'i', returns (im, original hw, resized hw)im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i],if im is None:  # not cached in RAMif fn.exists():  # load npyim = np.load(fn)else:  # read imageim = cv2.imread(f)  # BGRassert im is not None, f'Image Not Found {f}'h0, w0 = im.shape[:2]  # orig hwr = self.img_size / max(h0, w0)  # ratioif r != 1:  # if sizes are not equalim = cv2.resize(im,(int(w0 * r), int(h0 * r)),interpolation=cv2.INTER_LINEAR if (self.augment or r > 1) else cv2.INTER_AREA)return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resizedelse:return self.ims[i], self.im_hw0[i], self.im_hw[i]  # im, hw_original, hw_resized



def load_mosaic(self, index):# YOLOv5 4-mosaic loader. Loads 1 image + 3 random images into a 4-image mosaiclabels4, segments4 = [], []s = self.img_sizeyc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, yindices = [index] + random.choices(self.indices, k=3)  # 3 additional image indicesrandom.shuffle(indices)for i, index in enumerate(indices):# Load imageimg, _, (h, w) = self.load_image(index)# place img in img4if i == 0:  # top leftimg4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tilesx1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)elif i == 1:  # top rightx1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), ycx1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), helif i == 2:  # bottom leftx1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)elif i == 3:  # bottom rightx1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]padw = x1a - x1bpadh = y1a - y1b# Labelslabels, segments = self.labels[index].copy(), self.segments[index].copy()if labels.size:labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy formatsegments = [xyn2xy(x, w, h, padw, padh) for x in segments]labels4.append(labels)segments4.extend(segments)# Concat/clip labelslabels4 = np.concatenate(labels4, 0)for x in (labels4[:, 1:], *segments4):np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()# img4, labels4 = replicate(img4, labels4)  # replicate# Augmentimg4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])img4, labels4 = random_perspective(img4, labels4, segments4,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)  # border to removereturn img4, labels4

1)确定拼接的四张图片的相接的点yc, xc,其为(img_size//2,3img_size//2)中的随机点。
注意:由于yc, xc是随机的,最终填充的图片中可能存在大量空白。


def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,border=(0, 0)):# torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))# targets = [cls, xyxy]height = im.shape[0] + border[0] * 2  # shape(h,w,c)width = im.shape[1] + border[1] * 2# CenterC = np.eye(3)C[0, 2] = -im.shape[1] / 2  # x translation (pixels)C[1, 2] = -im.shape[0] / 2  # y translation (pixels)# PerspectiveP = np.eye(3)P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)# Rotation and ScaleR = np.eye(3)a = random.uniform(-degrees, degrees)# a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotationss = random.uniform(1 - scale, 1 + scale)# s = 2 ** random.unifoonMatrix2D(angle=a, center=(0, 0), scale=s)rm(-scale, scale)R[:2] = cv2.getRotati# ShearS = np.eye(3)S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)# TranslationT = np.eye(3)T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)# Combined rotation matrixM = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANTif (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changedif perspective:im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))else:  # affineim = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))# Visualize# import matplotlib.pyplot as plt# ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()# ax[0].imshow(im[:, :, ::-1])  # base# ax[1].imshow(im2[:, :, ::-1])  # warped# Transform label coordinatesn = len(targets)if n:use_segments = any(x.any() for x in segments)new = np.zeros((n, 4))if use_segments:  # warp segmentssegments = resample_segments(segments)  # upsamplefor i, segment in enumerate(segments):xy = np.ones((len(segment), 3))xy[:, :2] = segmentxy = xy @ M.T  # transformxy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine# clipnew[i] = segment2box(xy, width, height)else:  # warp boxesxy = np.ones((n * 4, 3))xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1xy = xy @ M.T  # transformxy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine# create new boxesx = xy[:, [0, 2, 4, 6]]y = xy[:, [1, 3, 5, 7]]new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T# clipnew[:, [0, 2]] = new[:, [0, 2]].clip(0, width)new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)# filter candidatesi = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)targets = targets[i]targets[:, 1:5] = new[i]return im, targets


def load_mosaic9(self, index):# YOLOv5 9-mosaic loader. Loads 1 image + 8 random images into a 9-image mosaiclabels9, segments9 = [], []s = self.img_sizeindices = [index] + random.choices(self.indices, k=8)  # 8 additional image indicesrandom.shuffle(indices)hp, wp = -1, -1  # height, width previousfor i, index in enumerate(indices):# Load imageimg, _, (h, w) = self.load_image(index)# place img in img9if i == 0:  # centerimg9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tilesh0, w0 = h, wc = s, s, s + w, s + h  # xmin, ymin, xmax, ymax (base) coordinateselif i == 1:  # topc = s, s - h, s + w, selif i == 2:  # top rightc = s + wp, s - h, s + wp + w, selif i == 3:  # rightc = s + w0, s, s + w0 + w, s + helif i == 4:  # bottom rightc = s + w0, s + hp, s + w0 + w, s + hp + helif i == 5:  # bottomc = s + w0 - w, s + h0, s + w0, s + h0 + helif i == 6:  # bottom leftc = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + helif i == 7:  # leftc = s - w, s + h0 - h, s, s + h0elif i == 8:  # top leftc = s - w, s + h0 - hp - h, s, s + h0 - hppadx, pady = c[:2]x1, y1, x2, y2 = (max(x, 0) for x in c)  # allocate coords# Labelslabels, segments = self.labels[index].copy(), self.segments[index].copy()if labels.size:labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padx, pady)  # normalized xywh to pixel xyxy formatsegments = [xyn2xy(x, w, h, padx, pady) for x in segments]labels9.append(labels)segments9.extend(segments)# Imageimg9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:]  # img9[ymin:ymax, xmin:xmax]hp, wp = h, w  # height, width previous# Offsetyc, xc = (int(random.uniform(0, s)) for _ in self.mosaic_border)  # mosaic center x, yimg9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]# Concat/clip labelslabels9 = np.concatenate(labels9, 0)labels9[:, [1, 3]] -= xclabels9[:, [2, 4]] -= ycc = np.array([xc, yc])  # centerssegments9 = [x - c for x in segments9]for x in (labels9[:, 1:], *segments9):np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()# img9, labels9 = replicate(img9, labels9)  # replicate# Augmentimg9, labels9 = random_perspective(img9, labels9, segments9,degrees=self.hyp['degrees'],translate=self.hyp['translate'],scale=self.hyp['scale'],shear=self.hyp['shear'],perspective=self.hyp['perspective'],border=self.mosaic_border)  # border to removereturn img9, labels9

(3) anchor


parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')




anchors:- [10,13, 16,30, 33,23]  # P3/8- [30,61, 62,45, 59,119]  # P4/16- [116,90, 156,198, 373,326]  # P5/32




def check_anchor_order(m):
def check_anchors(dataset, model, thr=4.0, imgsz=640):
def kmean_anchors(dataset='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):


def check_anchors(dataset, model, thr=4.0, imgsz=640):# Check anchor fit to data, recompute if necessarym = model.module.model[-1] if hasattr(model, 'module') else model.model[-1]  # Detect()shapes = imgsz * dataset.shapes / dataset.shapes.max(1, keepdims=True)scale = np.random.uniform(0.9, 1.1, size=(shapes.shape[0], 1))  # augment scalewh = torch.tensor(np.concatenate([l[:, 3:5] * s for s, l in zip(shapes * scale, dataset.labels)])).float()  # whdef metric(k):  # compute metricr = wh[:, None] / k[None]x = torch.min(r, 1 / r).min(2)[0]  # ratio metricbest = x.max(1)[0]  # best_xaat = (x > 1 / thr).float().sum(1).mean()  # anchors above thresholdbpr = (best > 1 / thr).float().mean()  # best possible recallreturn bpr, aatanchors = m.anchors.clone() * m.stride.to(m.anchors.device).view(-1, 1, 1)  # current anchorsbpr, aat = metric(anchors.cpu().view(-1, 2))s = f'\n{PREFIX}{aat:.2f} anchors/target, {bpr:.3f} Best Possible Recall (BPR). 'if bpr > 0.98:  # threshold to recomputeLOGGER.info(emojis(f'{s}Current anchors are a good fit to dataset ✅'))else:LOGGER.info(emojis(f'{s}Anchors are a poor fit to dataset ⚠️, attempting to improve...'))na = m.anchors.numel() // 2  # number of anchorstry:anchors = kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False)except Exception as e:LOGGER.info(f'{PREFIX}ERROR: {e}')new_bpr = metric(anchors)[0]if new_bpr > bpr:  # replace anchorsanchors = torch.tensor(anchors, device=m.anchors.device).type_as(m.anchors)m.anchors[:] = anchors.clone().view_as(m.anchors) / m.stride.to(m.anchors.device).view(-1, 1, 1)  # losscheck_anchor_order(m)s = f'{PREFIX}Done ✅ (optional: update model *.yaml to use these anchors in the future)'else:s = f'{PREFIX}Done ⚠️ (original anchors better than new anchors, proceeding with original anchors)'LOGGER.info(emojis(s))
def kmean_anchors(dataset='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):""" Creates kmeans-evolved anchors from training datasetArguments:dataset: path to data.yaml, or a loaded datasetn: number of anchorsimg_size: image size used for trainingthr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0gen: generations to evolve anchors using genetic algorithmverbose: print all resultsReturn:k: kmeans evolved anchorsUsage:from utils.autoanchor import *; _ = kmean_anchors()"""from scipy.cluster.vq import kmeansnpr = np.randomthr = 1 / thrdef metric(k, wh):  # compute metricsr = wh[:, None] / k[None]x = torch.min(r, 1 / r).min(2)[0]  # ratio metric# x = wh_iou(wh, torch.tensor(k))  # iou metricreturn x, x.max(1)[0]  # x, best_xdef anchor_fitness(k):  # mutation fitness_, best = metric(torch.tensor(k, dtype=torch.float32), wh)return (best * (best > thr).float()).mean()  # fitnessdef print_results(k, verbose=True):k = k[np.argsort(k.prod(1))]  # sort small to largex, best = metric(k, wh0)bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n  # best possible recall, anch > thrs = f'{PREFIX}thr={thr:.2f}: {bpr:.4f} best possible recall, {aat:.2f} anchors past thr\n' \f'{PREFIX}n={n}, img_size={img_size}, metric_all={x.mean():.3f}/{best.mean():.3f}-mean/best, ' \f'past_thr={x[x > thr].mean():.3f}-mean: 'for i, x in enumerate(k):s += '%i,%i, ' % (round(x[0]), round(x[1]))if verbose:LOGGER.info(s[:-2])return kif isinstance(dataset, str):  # *.yaml filewith open(dataset, errors='ignore') as f:data_dict = yaml.safe_load(f)  # model dictfrom utils.datasets import LoadImagesAndLabelsdataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True)# Get label whshapes = img_size * dataset.shapes / dataset.shapes.max(1, keepdims=True)wh0 = np.concatenate([l[:, 3:5] * s for s, l in zip(shapes, dataset.labels)])  # wh# Filteri = (wh0 < 3.0).any(1).sum()if i:LOGGER.info(f'{PREFIX}WARNING: Extremely small objects found: {i} of {len(wh0)} labels are < 3 pixels in size')wh = wh0[(wh0 >= 2.0).any(1)]  # filter > 2 pixels# wh = wh * (npr.rand(wh.shape[0], 1) * 0.9 + 0.1)  # multiply by random scale 0-1# Kmeans inittry:LOGGER.info(f'{PREFIX}Running kmeans for {n} anchors on {len(wh)} points...')assert n <= len(wh)  # apply overdetermined constraints = wh.std(0)  # sigmas for whiteningk = kmeans(wh / s, n, iter=30)[0] * s  # pointsassert n == len(k)  # kmeans may return fewer points than requested if wh is insufficient or too similarexcept Exception:LOGGER.warning(f'{PREFIX}WARNING: switching strategies from kmeans to random init')k = np.sort(npr.rand(n * 2)).reshape(n, 2) * img_size  # random initwh, wh0 = (torch.tensor(x, dtype=torch.float32) for x in (wh, wh0))k = print_results(k, verbose=False)# Plot# k, d = [None] * 20, [None] * 20# for i in tqdm(range(1, 21)):#     k[i-1], d[i-1] = kmeans(wh / s, i)  # points, mean distance# fig, ax = plt.subplots(1, 2, figsize=(14, 7), tight_layout=True)# ax = ax.ravel()# ax[0].plot(np.arange(1, 21), np.array(d) ** 2, marker='.')# fig, ax = plt.subplots(1, 2, figsize=(14, 7))  # plot wh# ax[0].hist(wh[wh[:, 0]<100, 0],400)# ax[1].hist(wh[wh[:, 1]<100, 1],400)# fig.savefig('wh.png', dpi=200)# Evolvef, sh, mp, s = anchor_fitness(k), k.shape, 0.9, 0.1  # fitness, generations, mutation prob, sigmapbar = tqdm(range(gen), desc=f'{PREFIX}Evolving anchors with Genetic Algorithm:')  # progress barfor _ in pbar:v = np.ones(sh)while (v == 1).all():  # mutate until a change occurs (prevent duplicates)v = ((npr.random(sh) < mp) * random.random() * npr.randn(*sh) * s + 1).clip(0.3, 3.0)kg = (k.copy() * v).clip(min=2.0)fg = anchor_fitness(kg)if fg > f:f, k = fg, kg.copy()pbar.desc = f'{PREFIX}Evolving anchors with Genetic Algorithm: fitness = {f:.4f}'if verbose:print_results(k, verbose)return print_results(k)







以yolov5s为例,原始的640 × 640 × 3的图像输入Focus结构,采用切片操作,先变成320 × 320 × 12的特征图,再经过一次卷积操作,最终变成320 × 320 × 32的特征图。

class Focus(nn.Module):# Focus wh information into c-spacedef __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groupssuper().__init__()self.conv = Conv(c1 * 4, c2, k, s, p, g, act)# self.contract = Contract(gain=2)def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)return self.conv(torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1))# return self.conv(self.contract(x))


其中,Res unit如下图所示。

class BottleneckCSP(nn.Module):# CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworksdef __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansionsuper().__init__()c_ = int(c2 * e)  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)self.cv4 = Conv(2 * c_, c2, 1, 1)self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)self.act = nn.SiLU()self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))def forward(self, x):y1 = self.cv3(self.m(self.cv1(x)))y2 = self.cv2(x)return self.cv4(self.act(self.bn(torch.cat((y1, y2), dim=1))))



class SPP(nn.Module):# Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729def __init__(self, c1, c2, k=(5, 9, 13)):super().__init__()c_ = c1 // 2  # hidden channelsself.cv1 = Conv(c1, c_, 1, 1)self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])def forward(self, x):x = self.cv1(x)with warnings.catch_warnings():warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warningreturn self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))

其使用填充方法,使得经过max pooling后,尺寸不变,且步长为1,且每层max pooing的大小不同,分别为5、9、13。










def build_targets(self, p, targets):# Build targets for compute_loss(), input targets(image,class,x,y,w,h)na, nt = self.na, targets.shape[0]  # number of anchors, targetstcls, tbox, indices, anch = [], [], [], []gain = torch.ones(7, device=targets.device)  # normalized to gridspace gainai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indicesg = 0.5  # biasoff = torch.tensor([[0, 0],[1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m# [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm], device=targets.device).float() * g  # offsetsfor i in range(self.nl):anchors = self.anchors[i]gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain# Match targets to anchorst = targets * gainif nt:# Matchesr = t[:, :, 4:6] / anchors[:, None]  # wh ratioj = torch.max(r, 1 / r).max(2)[0] < self.hyp['anchor_t']  # compare# j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))t = t[j]  # filter# Offsetsgxy = t[:, 2:4]  # grid xygxi = gain[[2, 3]] - gxy  # inversej, k = ((gxy % 1 < g) & (gxy > 1)).Tl, m = ((gxi % 1 < g) & (gxi > 1)).Tj = torch.stack((torch.ones_like(j), j, k, l, m))t = t.repeat((5, 1, 1))[j]offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]else:t = targets[0]offsets = 0# Defineb, c = t[:, :2].long().T  # image, classgxy = t[:, 2:4]  # grid xygwh = t[:, 4:6]  # grid whgij = (gxy - offsets).long()gi, gj = gij.T  # grid xy indices# Appenda = t[:, 6].long()  # anchor indicesindices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indicestbox.append(torch.cat((gxy - gij, gwh), 1))  # boxanch.append(anchors[a])  # anchorstcls.append(c)  # classreturn tcls, tbox, indices, anch

build_ targets函数用于获得在训练时计算Loss函数所需要的目标框,即被认为是正样本

与yolov3/v4的不同: yolov5支持跨网格预测


该函数输出的正样本框比传入的targets (GT框) 数目多


(1)对于任何一层计算当前bbox和当前层anchor的匹配程度,不采用iou, 而是shape比例;如果anchor和bbox的宽高比差距大于4,则认为不匹配,此时忽略相应的bbox, 即当做背景;

(2)然后对bbox计算落在的网格所有anchors都计算Loss (并不是直接和GT框比较计算Loss)


另外,yoLoy5也没有conf分支忽略阈值(ignore thresh)的操作, 而yoLov3/v4有 。



各类IoU_loss的计算可以查看深入浅出Yolo系列之Yolov3&Yolov4&Yolov5&Yolox核心基础知识完整讲解4.3.4 Prediction创新

def bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):# Returns the IoU of box1 to box2. box1 is 4, box2 is nx4box2 = box2.T# Get the coordinates of bounding boxesif x1y1x2y2:  # x1, y1, x2, y2 = box1b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]else:  # transform from xywh to xyxyb1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2# Intersection areainter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \(torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)# Union Areaw1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + epsw2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + epsunion = w1 * h1 + w2 * h2 - inter + epsiou = inter / unionif CIoU or DIoU or GIoU:cw = torch.max(b1_x2, b2_x2) - torch.min(b1_x1, b2_x1)  # convex (smallest enclosing box) widthch = torch.max(b1_y2, b2_y2) - torch.min(b1_y1, b2_y1)  # convex heightif CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squaredrho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 +(b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center distance squaredif CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47v = (4 / math.pi ** 2) * torch.pow(torch.atan(w2 / h2) - torch.atan(w1 / h1), 2)with torch.no_grad():alpha = v / (v - iou + (1 + eps))return iou - (rho2 / c2 + v * alpha)  # CIoUreturn iou - rho2 / c2  # DIoUc_area = cw * ch + eps  # convex areareturn iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdfreturn iou  # IoU



class BCEBlurWithLogitsLoss(nn.Module):# BCEwithLogitLoss() with reduced missing label effects.def __init__(self, alpha=0.05):super().__init__()self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none')  # must be nn.BCEWithLogitsLoss()self.alpha = alphadef forward(self, pred, true):loss = self.loss_fcn(pred, true)pred = torch.sigmoid(pred)  # prob from logitsdx = pred - true  # reduce only missing label effects# dx = (pred - true).abs()  # reduce missing label and false label effectsalpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4))loss *= alpha_factorreturn loss.mean()



def non_max_suppression(prediction, conf_thres=0.25, iou_thres=0.45, classes=None, agnostic=False, multi_label=False,labels=(), max_det=300):"""Runs Non-Maximum Suppression (NMS) on inference resultsReturns:list of detections, on (n,6) tensor per image [xyxy, conf, cls]"""nc = prediction.shape[2] - 5  # number of classesxc = prediction[..., 4] > conf_thres  # candidates# Checksassert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'# Settingsmin_wh, max_wh = 2, 7680  # (pixels) minimum and maximum box width and heightmax_nms = 30000  # maximum number of boxes into torchvision.ops.nms()time_limit = 10.0  # seconds to quit afterredundant = True  # require redundant detectionsmulti_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)merge = False  # use merge-NMSt = time.time()output = [torch.zeros((0, 6), device=prediction.device)] * prediction.shape[0]for xi, x in enumerate(prediction):  # image index, image inference# Apply constraintsx[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-heightx = x[xc[xi]]  # confidence# Cat apriori labels if autolabellingif labels and len(labels[xi]):lb = labels[xi]v = torch.zeros((len(lb), nc + 5), device=x.device)v[:, :4] = lb[:, 1:5]  # boxv[:, 4] = 1.0  # confv[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # clsx = torch.cat((x, v), 0)# If none remain process next imageif not x.shape[0]:continue# Compute confx[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf# Box (center x, center y, width, height) to (x1, y1, x2, y2)box = xywh2xyxy(x[:, :4])# Detections matrix nx6 (xyxy, conf, cls)if multi_label:i, j = (x[:, 5:] > conf_thres).nonzero(as_tuple=False).Tx = torch.cat((box[i], x[i, j + 5, None], j[:, None].float()), 1)else:  # best class onlyconf, j = x[:, 5:].max(1, keepdim=True)x = torch.cat((box, conf, j.float()), 1)[conf.view(-1) > conf_thres]# Filter by classif classes is not None:x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]# Apply finite constraint# if not torch.isfinite(x).all():#     x = x[torch.isfinite(x).all(1)]# Check shapen = x.shape[0]  # number of boxesif not n:  # no boxescontinueelif n > max_nms:  # excess boxesx = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence# Batched NMSc = x[:, 5:6] * (0 if agnostic else max_wh)  # classesboxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scoresi = torchvision.ops.nms(boxes, scores, iou_thres)  # NMSif i.shape[0] > max_det:  # limit detectionsi = i[:max_det]if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)# update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrixweights = iou * scores[None]  # box weightsx[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxesif redundant:i = i[iou.sum(1) > 1]  # require redundancyoutput[xi] = x[i]if (time.time() - t) > time_limit:LOGGER.warning(f'WARNING: NMS time limit {time_limit}s exceeded')break  # time limit exceededreturn output


深度学习评估指标之目标检测——(yolov5 可视化训练结果以及result.txt解析)





depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple


depth_multiple: 0.67  # model depth multiple
width_multiple: 0.75  # layer channel multiple


depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple


depth_multiple: 1.33  # model depth multiple
width_multiple: 1.25  # layer channel multiple






# YOLOv5 v6.0 head
head:[[-1, 1, Conv, [512, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 6], 1, Concat, [1]],  # cat backbone P4[-1, 3, C3, [512, False]],  # 13[-1, 1, Conv, [256, 1, 1]],[-1, 1, nn.Upsample, [None, 2, 'nearest']],[[-1, 4], 1, Concat, [1]],  # cat backbone P3[-1, 3, C3, [256, False]],  # 17 (P3/8-small)[-1, 1, Conv, [256, 3, 2]],[[-1, 14], 1, Concat, [1]],  # cat head P4[-1, 3, C3, [512, False]],  # 20 (P4/16-medium)[-1, 1, Conv, [512, 3, 2]],[[-1, 10], 1, Concat, [1]],  # cat head P5[-1, 3, C3, [1024, False]],  # 23 (P5/32-large)[[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)]









  1. 基于CNN的MINIST手写数字识别项目代码以及原理详解

    文章目录 项目简介 项目下载地址 项目开发软件环境 项目开发硬件环境 前言 一.数据加载的作用 二.Pytorch进行数据加载所需工具 2.1 Dataset 2.2 Dataloader 2.3 T ...

  2. 卷积神经网络(CNN)详细介绍及其原理详解

    文章目录 前言 一.什么是卷积神经网络 二.输入层 三.卷积层 四.池化层 五.全连接层 六.输出层 七.回顾整个过程 总结 前言 本文总结了关于卷积神经网络(CNN)的一些基础的概念,并且对于其中的 ...

  3. 图像质量损失函数SSIM Loss的原理详解和代码具体实现

    本文转自微信公众号SIGAI 文章PDF见: http://www.tensorinfinity.com/paper_164.html http://www.360doc.com/content/19 ...

  4. TOPSIS(逼近理想解)算法原理详解与代码实现

    写在前面: 个人理解:针对存在多项指标,多个方案的方案评价分析方法,也就是根据已存在的一份数据,判断数据中各个方案的优劣.中心思想是首先确定各项指标的最优理想值(正理想值)和最劣理想值(负理想解),所 ...

  5. 蓝牙:CRC原理详解(附crc16校验代码)

    CRC原理详解(附crc16校验代码) 参考链接: https://www.cnblogs.com/esestt/archive/2007/08/09/848856.html Cyclic Redun ...

  6. 冒泡排序原理详解及代码实现

    1.冒泡排序数组排序常用的一种方式,为什么要叫冒泡排序呢?这还要从它的原理说起. 2.代码实现(低效版) 3.原理详解:冒泡排序最基本的思想就是从左到右依次判断相邻的两个数的大小关系,如果前面的数大于 ...

  7. DS18B20温度传感器原理详解及例程代码、漏极开路

    [常用传感器]DS18B20温度传感器原理详解及例程代码_Z小旋的博客-CSDN博客_ds18b20温度传感器 传感器引脚及原理图 DS18B20传感器的引脚及封装图如下: DS18B20一共有三个引 ...

  8. 计算机组织与结构poc,CPU漏洞原理详解以及POC代码分享

    原标题:CPU漏洞原理详解以及POC代码分享 首先,这个漏洞已经公布近一周时间了,看到各大媒体.公众号到处在宣传,本打算不再发布类似信息,但是发现很多媒体的报道达到了一个目的--几乎所有的CPU都有漏 ...

  9. 视频教程-深度学习原理详解及Python代码实现-深度学习

    深度学习原理详解及Python代码实现 大学教授,美国归国博士.博士生导师:人工智能公司专家顾问:长期从事人工智能.物联网.大数据研究:已发表学术论文100多篇,授权发明专利10多项 白勇 ¥88.0 ...


  1. 访问有用户名和密码的网络共享,实现数据导入
  2. 用少于10行代码训练前沿深度学习新药研发模型
  3. 在OSI参考模型中,当两台计算机进行文件传输时,为防止中间出现网络故障而重传整个文件的情况,可通过在文件中插入同步点来解决,这个动作发生在( )
  4. 让Eclipse变得快点,取消validation
  5. matlab设计凸轮轮廓代码_机械设计基础之什么是凸轮机构,分类和常用运动规律是怎么样的?...
  6. python php mysql_Python 操作 MySQL 的正确姿势
  7. 主题模型(Topic Model)与LDA算法
  8. nodewebkitV0.21.6版本的学习
  9. 不会编程,就不会写测试案例,就不会测试
  10. MATLAB编程:绘制折线图 以及 画图的一些小技巧
  11. 你所学的专业是怎么回事——摄影测量与遥感
  12. Windows Xp Sp3官方简体中文版(原版) 纯净安装版 百度网盘下载
  13. matlab 图片黑白图片,MATLAB读取黑白图像显示却是黑色,24位深转8位深黑白图像解决方法(示例代码)...
  14. php怎么实现网页切图,CSS_网站页面切图与CSS注意事项,一、Web页面切图 1) Web页面的 - phpStudy...
  15. 同步消息和异步消息的区别
  16. 应用程序迁移,电脑c盘满了怎么转移到d盘?
  17. 4.2 基础数据模型
  18. 真相的监控依赖 让监控规制更加凸显
  19. AAAI 2022 | 量化交易相关论文(附论文链接)
  20. 爱普生Epson Expression Photo XP-850 一体机驱动


  1. Matlab使用笔记(三):matlab设置代码自动补全功能
  2. 2022-2027年中国丙纶行业市场全景评估及发展战略规划报告
  3. SIM应用那些事儿,你做对了吗?
  4. windows 7 64位无法连接到HP5200LX打印机 (错误 0x0000007e)的解决方法
  5. 原生JS 表单提交验证器
  6. 参会记录|2021 CCF “计算之美”学术大会参会总结
  7. oracle合同专员待遇,人事专员 【6个月合同】HR 助理- 外企 新金桥路 薪资 8-10K
  8. 模块 14 - 15:网络应用通信考试
  9. Objective-c——UI进阶开发第一天(UIPickerView和UIDatePicker)
  10. 什么是异常?异常的详细介绍