前言

在接下来的几天，我将解读yolov4，yolo系列一直是很火的目标检测算法。我特别喜欢yolov4。而今天我们来谈下数据增强。

数据增强

计算机视觉中的图像增强，是人为的为视觉不变性（语义不变）引入了先验知识。数据增强也基本上成了提高模型性能的最简单、直接的方法了。首先增强的样本和原来的样本是由强相关性的（裁剪、翻转、旋转、缩放、扭曲等几何变换，还有像素扰动、添加噪声、光照调节、对比度调节、样本加和或插值、分割补丁等），通过某些简单的操作，提高了最终性能。

数据增强步骤

1.对图片进行水平翻转

水平翻转目标框坐标

# 图片的大小iw, ih = image.sizeimage = image.transpose(Image.FLIP_LEFT_RIGHT)# print(box[:, [0, 2]] ,box[:, [2, 0]])box[:, [0, 2]] = iw - box[:, [2, 0]]image.show()

2.对图片进行缩放

代码：

 # 对输入进来的图片进行缩放new_ar = w / hscale = rand(scale_low, scale_high)if new_ar < 1:nh = int(scale * h)nw = int(nh * new_ar)# image.show()else:nw = int(scale * w)nh = int(nw / new_ar)image = image.resize((nw, nh), Image.BICUBIC)image.show()

3.对图片HSV色域变换

HSV模型，是针对用户观感的一种颜色模型，侧重于色彩表示，什么颜色、深浅如何、明暗如何。

H是色彩，S是深浅， S = 0时，只有灰度，V是明暗，表示色彩的明亮程度
代码：

 # 进行色域变换hue = rand(-hue, hue)sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)val = rand(1, val) if rand() < .5 else 1 / rand(1, val)x = rgb_to_hsv(np.array(image) / 255.)x[..., 0] += huex[..., 0][x[..., 0] > 1] -= 1x[..., 0][x[..., 0] < 0] += 1x[..., 1] *= satx[..., 2] *= valx[x > 1] = 1x[x < 0] = 0image = hsv_to_rgb(x)image = Image.fromarray((image * 255).astype(np.uint8))image.show()

4. Mosaic数据增强

Yolov4的mosaic数据增强参考了CutMix数据增强方式，理论上具有一定的相似性！CutMix数据增强方式利用两张图片进行拼接。如下第4张图。

但是mosaic利用了四张图片，根据论文所说其拥有一个巨大的优点是丰富检测物体的背景！且在BN计算的时候一下子会计算四张图片的数据！

annotations需要对框的坐标在合成图中进行调整，超出边界的需要裁剪，效果图如下

     # 将图片进行放置，分别对应四张分割图片的位置dx = place_x[index]# print(dx)dy = place_y[index]# print(dy)new_image = Image.new('RGB', (w, h), (128, 128, 128))new_image.paste(image, (dx, dy))image_data = np.array(new_image) / 255# new_image.show()# Image.fromarray((image_data*255).astype(np.uint8)).save(str(index)+"distort.jpg")index = index + 1box_data = []# 对box进行重新处理if len(box) > 0:np.random.shuffle(box)box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dxbox[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dybox[:, 0:2][box[:, 0:2] < 0] = 0box[:, 2][box[:, 2] > w] = wbox[:, 3][box[:, 3] > h] = hbox_w = box[:, 2] - box[:, 0]box_h = box[:, 3] - box[:, 1]#>>> np.logical_and([True, False], [False, False])#array([False, False], dtype=bool)box = box[np.logical_and(box_w > 1, box_h > 1)]box_data = np.zeros((len(box), 5))box_data[:len(box)] = boximage_datas.append(image_data)box_datas.append(box_data)img = Image.fromarray((image_data * 255).astype(np.uint8))for j in range(len(box_data)):thickness = 3left, top, right, bottom = box_data[j][0:4]draw = ImageDraw.Draw(img)for i in range(thickness):draw.rectangle([left + i, top + i, right - i, bottom - i], outline=(255, 255, 255))# img.show()# # 将图片分割，放在一起# print(int(w * min_offset_x))# print( int(w * (1 - min_offset_x)))cutx = np.random.randint(int(w * min_offset_x), int(w * (1 - min_offset_x)))cuty = np.random.randint(int(h * min_offset_y), int(h * (1 - min_offset_y)))new_image = np.zeros([h, w, 3])new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :]new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :]new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :]new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :]img = Image.fromarray((new_image * 255).astype(np.uint8))img.show()# 对框进行进一步的处理new_boxes = merge_bboxes(box_datas, cutx, cuty)
def merge_bboxes(bboxes, cutx, cuty):merge_bbox = []for i in range(len(bboxes)):for box in bboxes[i]:tmp_box = []x1, y1, x2, y2 = box[0], box[1], box[2], box[3]if i == 0:if y1 > cuty or x1 > cutx:continueif y2 >= cuty and y1 <= cuty:y2 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x2 = cutxif x2 - x1 < 5:continueif i == 1:if y2 < cuty or x1 > cutx:continueif y2 >= cuty and y1 <= cuty:y1 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x2 = cutxif x2 - x1 < 5:continueif i == 2:if y2 < cuty or x2 < cutx:continueif y2 >= cuty and y1 <= cuty:y1 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x1 = cutxif x2 - x1 < 5:continueif i == 3:if y1 > cuty or x2 < cutx:continueif y2 >= cuty and y1 <= cuty:y2 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x1 = cutxif x2 - x1 < 5:continuetmp_box.append(x1)tmp_box.append(y1)tmp_box.append(x2)tmp_box.append(y2)tmp_box.append(box[-1])merge_bbox.append(tmp_box)return merge_bbox

5. 总代码

from PIL import Image, ImageDraw
import numpy as np
from matplotlib.colors import rgb_to_hsv, hsv_to_rgb
import mathdef rand(a=0, b=1):return np.random.rand() * (b - a) + adef merge_bboxes(bboxes, cutx, cuty):merge_bbox = []for i in range(len(bboxes)):for box in bboxes[i]:tmp_box = []x1, y1, x2, y2 = box[0], box[1], box[2], box[3]if i == 0:if y1 > cuty or x1 > cutx:continueif y2 >= cuty and y1 <= cuty:y2 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x2 = cutxif x2 - x1 < 5:continueif i == 1:if y2 < cuty or x1 > cutx:continueif y2 >= cuty and y1 <= cuty:y1 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x2 = cutxif x2 - x1 < 5:continueif i == 2:if y2 < cuty or x2 < cutx:continueif y2 >= cuty and y1 <= cuty:y1 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x1 = cutxif x2 - x1 < 5:continueif i == 3:if y1 > cuty or x2 < cutx:continueif y2 >= cuty and y1 <= cuty:y2 = cutyif y2 - y1 < 5:continueif x2 >= cutx and x1 <= cutx:x1 = cutxif x2 - x1 < 5:continuetmp_box.append(x1)tmp_box.append(y1)tmp_box.append(x2)tmp_box.append(y2)tmp_box.append(box[-1])merge_bbox.append(tmp_box)return merge_bboxdef get_random_data(annotation_line, input_shape, random=True, hue=.1, sat=1.5, val=1.5, proc_img=True):'''random preprocessing for real-time data augmentation'''h, w = input_shapemin_offset_x = 0.4min_offset_y = 0.4scale_low = 1 - min(min_offset_x, min_offset_y)scale_high = scale_low + 0.2image_datas = []box_datas = []index = 0place_x = [0, 0, int(w * min_offset_x), int(w * min_offset_x)]place_y = [0, int(h * min_offset_y), int(w * min_offset_y), 0]for line in annotation_line:# 每一行进行分割line_content = line.split()# 打开图片image = Image.open(line_content[0])image = image.convert("RGB")image.show()# 图片的大小iw, ih = image.size# 保存框的位置box = np.array([np.array(list(map(int, box.split(',')))) for box in line_content[1:]])# image.save(str(index)+".jpg")# 是否翻转图片flip = rand() < .5# image.show()if flip and len(box) > 0:# image.show()image = image.transpose(Image.FLIP_LEFT_RIGHT)# print(box[:, [0, 2]] ,box[:, [2, 0]])box[:, [0, 2]] = iw - box[:, [2, 0]]# image.show()# 对输入进来的图片进行缩放new_ar = w / hscale = rand(scale_low, scale_high)if new_ar < 1:nh = int(scale * h)nw = int(nh * new_ar)# image.show()else:nw = int(scale * w)nh = int(nw / new_ar)image = image.resize((nw, nh), Image.BICUBIC)# image.show()# 进行色域变换hue = rand(-hue, hue)sat = rand(1, sat) if rand() < .5 else 1 / rand(1, sat)val = rand(1, val) if rand() < .5 else 1 / rand(1, val)x = rgb_to_hsv(np.array(image) / 255.)x[..., 0] += huex[..., 0][x[..., 0] > 1] -= 1x[..., 0][x[..., 0] < 0] += 1x[..., 1] *= satx[..., 2] *= valx[x > 1] = 1x[x < 0] = 0image = hsv_to_rgb(x)image = Image.fromarray((image * 255).astype(np.uint8))image.show()# 将图片进行放置，分别对应四张分割图片的位置dx = place_x[index]# print(dx)dy = place_y[index]# print(dy)new_image = Image.new('RGB', (w, h), (128, 128, 128))new_image.paste(image, (dx, dy))image_data = np.array(new_image) / 255# new_image.show()# Image.fromarray((image_data*255).astype(np.uint8)).save(str(index)+"distort.jpg")index = index + 1box_data = []# 对box进行重新处理if len(box) > 0:np.random.shuffle(box)box[:, [0, 2]] = box[:, [0, 2]] * nw / iw + dxbox[:, [1, 3]] = box[:, [1, 3]] * nh / ih + dybox[:, 0:2][box[:, 0:2] < 0] = 0box[:, 2][box[:, 2] > w] = wbox[:, 3][box[:, 3] > h] = hbox_w = box[:, 2] - box[:, 0]box_h = box[:, 3] - box[:, 1]#>>> np.logical_and([True, False], [False, False])#array([False, False], dtype=bool)box = box[np.logical_and(box_w > 1, box_h > 1)]box_data = np.zeros((len(box), 5))box_data[:len(box)] = boximage_datas.append(image_data)box_datas.append(box_data)img = Image.fromarray((image_data * 255).astype(np.uint8))for j in range(len(box_data)):thickness = 3left, top, right, bottom = box_data[j][0:4]draw = ImageDraw.Draw(img)for i in range(thickness):draw.rectangle([left + i, top + i, right - i, bottom - i], outline=(255, 255, 255))# img.show()# # 将图片分割，放在一起# print(int(w * min_offset_x))# print( int(w * (1 - min_offset_x)))cutx = np.random.randint(int(w * min_offset_x), int(w * (1 - min_offset_x)))cuty = np.random.randint(int(h * min_offset_y), int(h * (1 - min_offset_y)))new_image = np.zeros([h, w, 3])new_image[:cuty, :cutx, :] = image_datas[0][:cuty, :cutx, :]new_image[cuty:, :cutx, :] = image_datas[1][cuty:, :cutx, :]new_image[cuty:, cutx:, :] = image_datas[2][cuty:, cutx:, :]new_image[:cuty, cutx:, :] = image_datas[3][:cuty, cutx:, :]img = Image.fromarray((new_image * 255).astype(np.uint8))img.show()# 对框进行进一步的处理new_boxes = merge_bboxes(box_datas, cutx, cuty)return new_image, new_boxesdef normal_(annotation_line, input_shape):'''random preprocessing for real-time data augmentation'''line = annotation_line.split()image = Image.open(line[0])box = np.array([np.array(list(map(int, box.split(',')))) for box in line[1:]])iw, ih = image.sizeimage = image.transpose(Image.FLIP_LEFT_RIGHT)box[:, [0, 2]] = iw - box[:, [2, 0]]return image, boxif __name__ == "__main__":with open("2007_train.txt") as f:lines = f.readlines()a = np.random.randint(0, len(lines))line = lines[a:a + 4]image_data, box_data = get_random_data(line, [416, 416])img = Image.fromarray((image_data * 255).astype(np.uint8))for j in range(len(box_data)):thickness = 3left, top, right, bottom = box_data[j][0:4]draw = ImageDraw.Draw(img)for i in range(thickness):draw.rectangle([left + i, top + i, right - i, bottom - i], outline=(255, 255, 255))img.show()# img.save("box_all.jpg")

浅谈yolov4中的一部分数据增强相关推荐

python读取图像数据流_浅谈TensorFlow中读取图像数据的三种方式
本文面对三种常常遇到的情况,总结三种读取数据的方式,分别用于处理单张图片.大量图片,和TFRecorder读取方式.并且还补充了功能相近的tf函数. 1.处理单张图片我们训练完模型之后,常常要用图片 ...
[转] 浅谈脱壳中的附加数据问题（overlay）
浅谈脱壳中的附加数据问题(overlay) Author:Lenus From: www.popbase.net E-mail:Lenus_M@163.com -------------------- ...
浅谈脱壳中的附加数据问题（overlay）
浅谈脱壳中的附加数据问题(overlay) Author:Lenus From: www.popbase.net E-mail:Lenus_M@163.com -------------------- ...
yolov4中的mosaic数据增强
文章详细讲解yolov4中的mosaic数据增强方法以及代码细节,如有错误,希望指正. 参考代码链接:https://github.com/bubbliiiing/yolov4-keras 1.下述代 ...
python3字节转化字符_浅谈 Python3 中对二进制数据 XOR 编码的正确姿势
Python3 中的默认编码是 UTF-8,这给大家写 Python 代码带来了很大的便利,不用再像 Python2.x 那样为数据编码操碎了心.但是,由于全面转向 UTF-8 编码,Python3 ...
YoloV4当中的Mosaic数据增强方法（附代码讲解）
上一期中讲解了图像分类和目标检测中的数据增强的区别和联系,这期讲解数据增强的进阶版- yolov4中的Mosaic数据增强方法以及CutMix. 前言 Yolov4的mosaic数据增强参考了CutM ...
python读取json数据格式问题_浅谈Python中的异常和JSON读写数据的实现
异常可以防止出现一些不友好的信息返回给用户,有助于提升程序的可用性,在java中通过try ... catch ... finally来处理异常,在Python中通过try ... except .. ...
浅谈caffe中train_val.prototxt和deploy.prototxt文件的区别
浅谈caffe中train_val.prototxt和deploy.prototxt文件的区别标签: caffe深度学习CaffeNet 2016-11-02 16:10 1203人阅读评论(1) ...
视频基础知识：浅谈视频会议中H.264编码标准的技术发展
浅谈视频会议中H.264编码标准的技术发展浅谈视频会议中H.264编码标准的技术发展数字视频技术广泛应用于通信.计算机.广播电视等领域,带来了会议电视.可视电话及数字电视.媒体存储等一系列应用,促 ...

浅谈yolov4中的一部分数据增强

浅谈yolov4中的数据增强

前言