1.概述

训练关键点检测模型，如 Keypoint RCNN，需要一个数据集，其中包含具有感兴趣对象和标注的图像（具有对象关键点和边界框坐标的文本文件）。

例如，在下图中，您可以看到可视化的关键点和边界框。每个对象（胶管）有两个关键点（头部和尾部）。

数据集包含的图像越多，模型训练的效果就越好，因为它会在训练过程中看到更多的示例。包含 200 多张图像的数据集是可以的。包含 1000 多张图像的数据集要好得多。出色的数据集包含 5000 多张图像。

请注意，数据集不应仅包含大量图像，而是所有图像应尽可能多变。这些图像上感兴趣的对象应该与其他对象混合，呈现在不同的环境、不同的背景、不同的位置等。

创建数据集的一种方法是手动创建它。这意味着我们拍了很多照片，就像上面的照片，然后手动标注它们。这种方法是最好的，因为所有照片都是真实的，但是创建这样的数据集需要很多时间。

另一种方法是自动创建合成数据集。使用这种方法，裁剪的感兴趣对象会随机缩放、旋转并使用 python 脚本添加到背景中。标注是使用相同的脚本创建的。在这种方法下，我们创建的图像不是真实的照片，但这些图像上的对象看起来 100% 真实。

来自合成数据集的图像示例如下：

与手动过程相比，自动化过程使我们能够花费更少的时间来创建数据集。例如，生成 1000 个合成图像和标注可能需要不到一个小时。这比拍摄 1000 张不同的照片并手动标注要快得多。

下面，我将描述为关键点检测创建合成数据集的所有步骤。

我将展示如何使用胶管创建合成数据集以训练 Keypoint RCNN。为此，我们需要以下数据：

每张照片上不同位置的感兴趣对象（胶管）的裁剪照片和Mask+关键点坐标（第一个关键点是头部，第二个关键点是尾巴）；
背景图片（只是来自互联网的不同照片）；
不同物体（汽车、椅子、吉他等）的裁剪照片和Mask，它们将用作背景噪声，使背景更加复杂。

我拍了 14 张胶管照片并为它们制作了Mask：

我还为每张照片创建了 14 个带有关键点（头部和尾部）坐标的 json 文件。除了坐标，json 文件还包含关键点的可见性。即每个胶管有2个关键点，头部和尾部，以[x, y, visibility]格式描述。此数据集中的所有关键点都是可见的（即visibility = 1）。

我收集了 60 张将用作背景的图像。看看其中一些图像：

我还收集了 107 张不同物体的图像，这些图像将用作背景噪声。这些实际上可以是任何不是胶管的物体：

从此处下载上述数据。

以下是如何使用下载的数据创建合成场景：

首先，我们将从文件夹 bg/ 中随机选择一张背景图像，并将其调整为例如 1920x1080。
其次，我们将从文件夹 bg_noise/ 中随机选择一个背景噪声对象。然后我们将随机调整大小、旋转并将其添加到背景图像中。
我们将多次重复第二步。
第三，我们将从文件夹 images/ 中随机选择一个感兴趣的对象。然后，我们将随机调整大小，旋转，并将其添加到背景图像上的背景噪声对象的顶部。
我们将多次重复第三步。

获得的对象的随机组合是一个合成场景。
合成数据集由许多合成场景组成。

2.代码实现

让我们创建一个创建合成数据集的脚本。

2.1 导入相关库

在 Jupyter Notebook 中创建一个新笔记本。首先，我们需要导入必要的模块：

import os
import cv2
import json
import numpy as np
import matplotlib.pyplot as plt
import albumentations as A
import time
from tqdm import tqdm

2.2 文件路径

将下载的数据解压缩到文件夹 data/ 并创建包含图像、Mask和关键点路径的列表：

PATH_MAIN = "data"files_imgs = sorted(os.listdir(os.path.join(PATH_MAIN, 'images')))
files_imgs = [os.path.join(PATH_MAIN, 'images', f) for f in files_imgs]
files_masks = sorted(os.listdir(os.path.join(PATH_MAIN, 'masks')))
files_masks = [os.path.join(PATH_MAIN, 'masks', f) for f in files_masks]
files_keypoints = sorted(os.listdir(os.path.join(PATH_MAIN, 'keypoints')))
files_keypoints = [os.path.join(PATH_MAIN, 'keypoints', f) for f in files_keypoints]print("The first five files from the sorted list of object images:", files_imgs[:5])
print("\nThe first five files from the sorted list of object masks:", files_masks[:5])
print("\nThe first five files from the sorted list of object keypoints:", files_keypoints[:5])files_bg_imgs = os.listdir(os.path.join(PATH_MAIN, 'bg'))
files_bg_imgs = [os.path.join(PATH_MAIN, 'bg', f) for f in files_bg_imgs]
files_bg_noise_imgs = os.listdir(os.path.join(PATH_MAIN, "bg_noise", "images"))
files_bg_noise_imgs = [os.path.join(PATH_MAIN, "bg_noise", "images", f) for f in files_bg_noise_imgs]
files_bg_noise_masks = os.listdir(os.path.join(PATH_MAIN, "bg_noise", "masks"))
files_bg_noise_masks = [os.path.join(PATH_MAIN, "bg_noise", "masks", f) for f in files_bg_noise_masks]print("\nThe first five files from the sorted list of background images:", files_bg_imgs[:5])
print("\nThe first five files from the sorted list of background noise images:", files_bg_noise_imgs[:5])
print("\nThe first five files from the sorted list of background noise masks:", files_bg_noise_masks[:5])

查看输出以更好地理解创建列表的结构：

The first five files from the sorted list of object images: ['data\images\1.jpg', 'data\images\10.jpg', 'data\images\11.jpg', 'data\images\12.jpg', 'data\images\13.jpg']The first five files from the sorted list of object masks: ['data\masks\1.png', 'data\masks\10.png', 'data\masks\11.png', 'data\masks\12.png', 'data\masks\13.png']The first five files from the sorted list of object keypoints: ['data\keypoints\1.json', 'data\keypoints\10.json', 'data\keypoints\11.json', 'data\keypoints\12.json', 'data\keypoints\13.json']The first five files from the sorted list of background images: ['data\bg\bg_1.jpg', 'data\bg\bg_10.jpg', 'data\bg\bg_11.jpg', 'data\bg\bg_12.jpg', 'data\bg\bg_13.jpg']The first five files from the sorted list of background noise images: ['data\bg_noise\images\1.png', 'data\bg_noise\images\10.jpg', 'data\bg_noise\images\100.jpg', 'data\bg_noise\images\101.png', 'data\bg_noise\images\102.png']The first five files from the sorted list of background noise masks: ['data\bg_noise\masks\1.png', 'data\bg_noise\masks\10.png', 'data\bg_noise\masks\100.png', 'data\bg_noise\masks\101.png', 'data\bg_noise\masks\102.png']

稍后，我们的脚本将有一段代码，它将从这些列表中随机选择一个对象图像，调整它的大小，为其添加增强，并将其添加到背景中。

2.3 图像、Mask和关键点

有几种类型的Mask：

Original mask是物体区域用黑色(0,0,0)填充，背景区域用白色(255,255,255)填充的Mask。
Boolean mask是对象区域填充为True，背景区域填充为False的Mask。
Binary mask是对象区域用1填充，背景区域用0填充的Mask。

于本脚本的目的，我们将把original masks转换为Binary mask。
在这里，我们定义了一个函数 get_img_and_mask()，它以 OpenCV 格式返回对象的图像，并以Binary mask格式返回对象的Mask：

def get_img_and_mask(img_path, mask_path):img = cv2.imread(img_path)img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)mask = cv2.imread(mask_path)mask = cv2.cvtColor(mask, cv2.COLOR_BGR2RGB)mask_b = mask[:,:,0] == 0 # This is boolean maskmask = mask_b.astype(np.uint8) # This is binary maskreturn img, mask

我们还将定义一个函数visualize_single_img_with_keypoints()，它将带有感兴趣对象的图像可视化并在该图像上绘制对象的关键点：

def visualize_single_img_with_keypoints(img,mask,keypoints,keypoints_names, title,draw_bboxes=False):xmin = np.min(np.where(mask)[1])xmax = np.max(np.where(mask)[1])ymin = np.min(np.where(mask)[0])ymax = np.max(np.where(mask)[0])bbox = np.array([xmin, ymin, xmax, ymax])start_point = (bbox[0], bbox[1])end_point = (bbox[2], bbox[3])if draw_bboxes:img = cv2.rectangle(img.copy(), start_point, end_point, (255,0,0), 2)for idx, kp in enumerate(keypoints):img = cv2.circle(img.copy(), tuple(kp[:2]), 3, (255,0,0), 6)img = cv2.putText(img.copy(), " " + keypoints_names[idx], tuple(kp[:2]), cv2.FONT_HERSHEY_SIMPLEX, 1, (255,0,0), 2, cv2.LINE_AA)plt.figure(figsize=(16,16))plt.title(title, fontsize=18)plt.imshow(img)keypoints_names = ['Head', 'Tail']

2.3.1 感兴趣的对象(带关键点)

让我们看看 get_img_and_mask() 函数是如何工作的：

# Let's look at a random object and its binary maskimg_path = files_imgs[9]
mask_path = files_masks[9]img, mask = get_img_and_mask(img_path, mask_path)print("Image file:", img_path)
print("Mask file:", mask_path)
print("\nShape of the image of the object:", img.shape)
print("Shape of the binary mask:", mask.shape)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img)
ax[0].set_title('Object', fontsize=18)
ax[1].imshow(mask)
ax[1].set_title('Binary mask', fontsize=18);# Image file: data\images\5.jpg
# Mask file: data\masks\5.png# Shape of the image of the object: (735, 1111, 3)
# Shape of the binary mask: (735, 1111)

请注意，图像的宽度为 1111，高度为 735。此外，图像有 3 个通道。这就是为什么图像的形状是 (735, 1111, 3)。Binary mask具有相同的宽度和高度，但只有一个通道。这就是Binary mask的形状为 (735, 1111) 的原因。

让我们可视化这张图片上的关键点：

with open(files_keypoints[9]) as f:data = json.load(f)keypoints = data['keypoints']print("Keypoints:", keypoints)visualize_single_img_with_keypoints(img, mask, keypoints, keypoints_names, title="Keypoints of the object")
# Keypoints: [[979, 103, 1], [132, 594, 1]]

第一个关键点head 的 x 坐标为 979，y 坐标为 103，visibility = 1。第二个关键点tail，x坐标为132，y坐标为594，visibility = 1。

2.3.2 背景噪声对象（无关键点）

让我们使用 get_img_and_mask() 函数获取随机噪声对象的图像和Mask：

bg_img_path = files_bg_noise_imgs[17]
bg_mask_path = files_bg_noise_masks[17]bg_img, bg_mask = get_img_and_mask(bg_img_path, bg_mask_path)print("Image file:", bg_img_path)
print("Mask file:", bg_mask_path)
print("\nShape of the image of the object:", bg_img.shape)
print("Shape of the binary mask:", bg_mask.shape)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(bg_img)
ax[0].set_title('Object', fontsize=18)
ax[1].imshow(bg_mask)
ax[1].set_title('Binary mask', fontsize=18);# Image file: data\bg_noise\images\18.jpg
# Mask file: data\bg_noise\masks\18.png# Shape of the image of the object: (1280, 1073, 3)
# Shape of the binary mask: (1280, 1073)

2.4 调整背景图片大小

将用作背景的图像有不同的大小。例如:2114x1398、3456x5184、1920x1440、3264x4080等。其中一些是水平的(宽度>高度)，其他是垂直的(高度>宽度)。

但我们可能希望合成数据集中的所有图像都具有固定尺寸：水平图像为 1920x1080，垂直图像为 1080x1920。为此，我们将借助 resize_img() 函数调整背景图像的大小：

def resize_img(img, desired_max, desired_min=None):h, w = img.shape[0], img.shape[1]longest, shortest = max(h, w), min(h, w)longest_new = desired_maxif desired_min:shortest_new = desired_minelse:shortest_new = int(shortest * (longest_new / longest))if h > w:h_new, w_new = longest_new, shortest_newelse:h_new, w_new = shortest_new, longest_newtransform_resize = A.Compose([A.Sequential([A.Resize(h_new, w_new, interpolation=1, always_apply=False, p=1)], p=1)])transformed = transform_resize(image=img)img_r = transformed["image"]return img_r

让我们看看这个函数是如何工作的：

# Let's look how a random background image can be resized with resize_img() functionimg_bg_path = files_bg_imgs[5]
img_bg = cv2.imread(img_bg_path)
img_bg = cv2.cvtColor(img_bg, cv2.COLOR_BGR2RGB)img_bg_resized_1 = resize_img(img_bg, desired_max=1920, desired_min=None)
img_bg_resized_2 = resize_img(img_bg, desired_max=1920, desired_min=1080)print("Shape of the original background image:", img_bg.shape)print("Shape of the resized background image (desired_max=1920, desired_min=None):", img_bg_resized_1.shape)
print("Shape of the resized background image (desired_max=1920, desired_min=1080):", img_bg_resized_2.shape)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_bg_resized_1)
ax[0].set_title('Resized (desired_max=1920, desired_min=None)', fontsize=18)
ax[1].imshow(img_bg_resized_2)
ax[1].set_title('Resized (desired_max=1920, desired_min=1080)', fontsize=18);# Shape of the original background image: (3068, 2454, 3)
# Shape of the resized background image (desired_max=1920, desired_min=None): (1920, 1535, 3)
# Shape of the resized background image (desired_max=1920, desired_min=1080): (1920, 1080, 3)

您可以看到该函数找出图像的哪一侧（宽度或高度）最长，并沿最长的一侧将图像调整为 desired_max 大小。如果未设置desired_min，则图像的最短边按比例调整大小，否则图像沿最短边调整为desired_min 大小。

2.5 调整和转换感兴趣的对象（使用关键点）

用于调整对象大小和变换对象的函数 resize_transform_obj() 与调整背景图像大小的函数类似，但有一些补充。

函数 resize_transform_obj() 调整对象的图像大小和对象的binary mask。此外，来自albumentations 库的transforms可以作为参数传递给函数。在调整大小和变换期间，关键点的坐标也会受到影响。

def resize_transform_obj(img,mask,longest_min,longest_max,keypoints,transforms=False):h, w = mask.shape[0], mask.shape[1]longest, shortest = max(h, w), min(h, w)longest_new = np.random.randint(longest_min, longest_max)shortest_new = int(shortest * (longest_new / longest))if h > w:h_new, w_new = longest_new, shortest_newelse:h_new, w_new = shortest_new, longest_newkeypoints_2 = [kp[0:2] for kp in keypoints]transform_resize = A.Compose([A.Resize(h_new,w_new,interpolation=1,always_apply=False,p=1)],keypoint_params=A.KeypointParams(format='xy'))transformed_resized = transform_resize(image=img,mask=mask,keypoints=keypoints_2)img_t = transformed_resized["image"]mask_t = transformed_resized["mask"]keypoints_2_t = transformed_resized["keypoints"]        if transforms:    transformed = transforms(image=img_t,mask=mask_t,keypoints=keypoints_2_t)img_t = transformed["image"]mask_t = transformed["mask"]keypoints_2_t = transformed["keypoints"]keypoints_t = []for idx, kp in enumerate(keypoints_2_t):keypoints_t.append(list(map(int, kp)) + [keypoints[idx][2]])       return img_t, mask_t, keypoints_ttransforms_obj = A.Compose([A.RandomRotate90(p=1),A.RandomBrightnessContrast(brightness_limit=(-0.1, 0.2),contrast_limit=0.1,brightness_by_max=True,always_apply=False,p=1)
],
keypoint_params=A.KeypointParams(format='xy'))

在上面的代码中，定义了一个复杂的变换 transforms_obj。这种变换旋转图像并在狭窄范围内改变对比度和亮度。它将用于转换感兴趣的对象。

让我们看看函数 resize_transform_obj() 是如何工作的：

img_path = files_imgs[9]
mask_path = files_masks[9]
img, mask = get_img_and_mask(img_path, mask_path)with open(files_keypoints[9]) as f:data = json.load(f)keypoints = data['keypoints']img_t, mask_t, keypoints_t = resize_transform_obj(img,mask,longest_min=900,longest_max=1000,keypoints=keypoints,transforms=transforms_obj)print("\nShape of the image of the transformed object:", img_t.shape)
print("Shape of the transformed binary mask:", img_t.shape)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_t)
ax[0].set_title('Transformed object', fontsize=18)
ax[1].imshow(mask_t)
ax[1].set_title('Transformed binary mask', fontsize=18);# Shape of the image of the transformed object: (983, 650, 3)
# Shape of the transformed binary mask: (983, 650, 3)

您之前已经看过这些图像和Mask，但现在图像的形状是 (983, 650, 3) 而不是 (735, 1111, 3)。此外，图像被旋转并且这里的亮度比以前更高。这就是转换的工作方式。

让我们在转换后的图像上可视化关键点：

visualize_single_img_with_keypoints(img_t,mask_t,keypoints_t,keypoints_names,title="Keypoints of the transformed object")

2.6 调整和变换背景噪声对象（无关键点）

在这里，我们将定义函数 resize_transform_bg_obj() 来转换噪声对象。函数 resize_transform_obj() 和新函数之间的区别在于，新函数不变换关键点，因为背景噪声对象没有关键点。

def resize_transform_bg_obj(img,mask,longest_min,longest_max,transforms=False):h, w = mask.shape[0], mask.shape[1]longest, shortest = max(h, w), min(h, w)longest_new = np.random.randint(longest_min, longest_max)shortest_new = int(shortest * (longest_new / longest))if h > w:h_new, w_new = longest_new, shortest_newelse:h_new, w_new = shortest_new, longest_newtransform_resize = A.Resize(h_new,w_new,interpolation=1,always_apply=False,p=1)transformed_resized = transform_resize(image=img, mask=mask)img_t = transformed_resized["image"]mask_t = transformed_resized["mask"]if transforms:    transformed = transforms(image=img_t, mask=mask_t)img_t = transformed["image"]mask_t = transformed["mask"]  return img_t, mask_ttransforms_bg_obj = A.Compose([A.RandomRotate90(p=1),A.HorizontalFlip(p=0.5),A.VerticalFlip(p=0.5),A.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.07, always_apply=False, p=1),A.Blur(blur_limit=(3,15), always_apply=False, p=0.5)
])

在上面的代码中，定义了一个复杂的变换 transforms_bg_obj。这种变换可以旋转图像、翻转图像、添加模糊、改变颜色、对比度和亮度。它将用于转换背景噪声对象。

让我们看看函数 resize_transform_bg_obj() 是如何工作的：

bg_img_t, bg_mask_t = resize_transform_bg_obj(bg_img,bg_mask,longest_min=900,longest_max=1000,transforms=transforms_bg_obj)print("\nShape of the image of the transformed object:", bg_img_t.shape)
print("Shape of the transformed binary mask:", bg_mask_t.shape)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(bg_img_t)
ax[0].set_title('Transformed object', fontsize=18)
ax[1].imshow(bg_mask_t)
ax[1].set_title('Transformed binary mask', fontsize=18);

您之前已经看过这些图像和蒙版，但是现在图像被旋转并且这里的亮度比以前更高。

2.7 将感兴趣的对象（带有关键点）添加到背景中

2.7.1 添加一个对象

在这里，函数 add_obj() 将感兴趣的对象添加到背景：

def add_obj(img_comp, mask_comp, keypoints_comp, img, mask, keypoints, x, y, idx):'''img_comp - composition of objectsmask_comp - composition of objects` maskskeypoints_comp - composition of keypointsimg - image of objectmask - mask of objectkeypoints - keypoints of objectx, y - coordinates where left top corner of img is placedFunction returns img_comp in CV2 RGB format + mask_comp + keypoints_comp as a list'''h_comp, w_comp = img_comp.shape[0], img_comp.shape[1]    h, w = img.shape[0], img.shape[1]   mask_b = mask == 1mask_rgb_b = np.stack([mask_b, mask_b, mask_b], axis=2)img_comp[y:y+h, x:x+w, :] = img_comp[y:y+h, x:x+w, :] * ~mask_rgb_b + (img * mask_rgb_b)mask_comp[y:y+h, x:x+w] = mask_comp[y:y+h, x:x+w] * ~mask_b + (idx * mask_b)keypoints_comp.append([[kp[0] + x, kp[1] + y, kp[2]] for kp in keypoints])return img_comp, mask_comp, keypoints_comp

函数 add_obj() 返回图像合成（背景 + 添加的对象）、Mask合成（添加对象的Mask合成）和关键点合成（添加对象的关键点列表）。我们还将定义一个函数visualize_composition_with_keypoints()，它可视化感兴趣对象的组合并绘制对象的关键点：

def visualize_composition_with_keypoints(img_comp, keypoints_comp, keypoints_names, bboxes_comp=None):if bboxes_comp:for bbox in bboxes_comp:start_point, end_point = tuple([bbox[0], bbox[1]]), tuple([bbox[2], bbox[3]])img_comp = cv2.rectangle(img_comp.copy(), start_point, end_point, (255,0,0), 2)for keypoints in keypoints_comp:for idx, kp in enumerate(keypoints):img_comp = cv2.circle(img_comp.copy(), tuple(kp[:2]), 3, (255,0,0), 6)img_comp = cv2.putText(img_comp.copy(), " " + keypoints_names[idx], tuple(kp[:2]), cv2.FONT_HERSHEY_SIMPLEX, 2, (255,0,0), 4, cv2.LINE_AA)plt.figure(figsize=(40,40))plt.imshow(img_comp)

让我们在背景中添加一个胶管：

img_bg_path = files_bg_imgs[44]
img_bg = cv2.imread(img_bg_path)
img_bg = cv2.cvtColor(img_bg, cv2.COLOR_BGR2RGB)h, w = img_bg.shape[0], img_bg.shape[1]
mask_comp = np.zeros((h,w), dtype=np.uint8)
keypoints_comp = []img_comp, mask_comp, keypoints_comp = add_obj(img_bg,mask_comp,keypoints_comp,img,mask,keypoints,x=100,y=100,idx=1)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_comp)
ax[0].set_title('Composition', fontsize=18)
ax[1].imshow(mask_comp)
ax[1].set_title('Composition mask', fontsize=18);

这里的初始构图是背景图像 img_bg。

数组 mask_comp = np.zeros((h,w), dtype=np.uint8) 是初始合成的Mask。由于初始构图只是一个背景图像，上面没有任何对象，因此它的Mask仅包含零。

将胶管添加到 img_bg 后，通过将这些像素中的初始值与 1 重叠，将其掩码添加到 mask_comp，这对应于图像合成上添加的胶管。我们通过将参数 idx=1 传递给函数 add_obj() 为添加的胶管的Mask定义了数字 1。上面的右图是关于合成Mask的:数字0用深紫色标记，数字1用黄色标记。

让我们看看关键点：

print("Keypoints:", keypoints_comp)
visualize_composition_with_keypoints(img_comp, keypoints_comp, keypoints_names)
# Keypoints: [[[1079, 203, 1], [232, 694, 1]]]

让我们添加变换后的胶管：

img_comp, mask_comp, keypoints_comp = add_obj(img_comp,mask_comp,keypoints_comp,img_t,mask_t,keypoints_t,x=400,y=250,idx=2)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_comp)
ax[0].set_title('Composition', fontsize=18)
ax[1].imshow(mask_comp)
ax[1].set_title('Composition mask', fontsize=18);

这次初始合成 img_comp 已经包含一个胶管，所以初始合成 mask_comp 的Mask包含数字 0 和 1。

通过向合成中添加一个胶管，该胶管的Mask通过将这些像素中的初始值与 2 重叠来添加到 mask_comp，这对应于图像合成上添加的胶管。这次我们通过将参数 idx=2 传递给函数 add_obj() 来为添加胶管的掩码定义数字 2。

上面的右图是关于合成Mask的：数字 0 用深紫色标记，数字 1 用蓝色和绿色混合标记，数字 2 用黄色标记。

让我们看看关键点：

print("Keypoints:", keypoints_comp)
visualize_composition_with_keypoints(img_comp, keypoints_comp, keypoints_names)

2.8 将噪声对象（无关键点）添加到背景

2.8.1 添加一个对象

在这里，我们将定义函数add_bg_obj()，它将噪声对象添加到背景。要详细了解这个函数是如何工作的，我建议您阅读Python添加对象到图像这篇文章。

def add_bg_obj(img_comp, mask_comp, img, mask, x, y, idx):'''img_comp - composition of objectsmask_comp - composition of objects` masksimg - image of objectmask - binary mask of objectx, y - coordinates where center of img is placedFunction returns img_comp in CV2 RGB format + mask_comp'''h_comp, w_comp = img_comp.shape[0], img_comp.shape[1]h, w = img.shape[0], img.shape[1]x = x - int(w/2)y = y - int(h/2)mask_b = mask == 1mask_rgb_b = np.stack([mask_b, mask_b, mask_b], axis=2)if x >= 0 and y >= 0:h_part = h - max(0, y+h-h_comp) # h_part - part of the image which gets into the frame of img_comp along y-axisw_part = w - max(0, x+w-w_comp) # w_part - part of the image which gets into the frame of img_comp along x-axisimg_comp[y:y+h_part, x:x+w_part, :] = img_comp[y:y+h_part, x:x+w_part, :] * ~mask_rgb_b[0:h_part, 0:w_part, :] + (img * mask_rgb_b)[0:h_part, 0:w_part, :]mask_comp[y:y+h_part, x:x+w_part] = mask_comp[y:y+h_part, x:x+w_part] * ~mask_b[0:h_part, 0:w_part] + (idx * mask_b)[0:h_part, 0:w_part]elif x < 0 and y < 0:h_part = h + yw_part = w + ximg_comp[0:0+h_part, 0:0+w_part, :] = img_comp[0:0+h_part, 0:0+w_part, :] * ~mask_rgb_b[h-h_part:h, w-w_part:w, :] + (img * mask_rgb_b)[h-h_part:h, w-w_part:w, :]mask_comp[0:0+h_part, 0:0+w_part] = mask_comp[0:0+h_part, 0:0+w_part] * ~mask_b[h-h_part:h, w-w_part:w] + (idx * mask_b)[h-h_part:h, w-w_part:w]elif x < 0 and y >= 0:h_part = h - max(0, y+h-h_comp)w_part = w + ximg_comp[y:y+h_part, 0:0+w_part, :] = img_comp[y:y+h_part, 0:0+w_part, :] * ~mask_rgb_b[0:h_part, w-w_part:w, :] + (img * mask_rgb_b)[0:h_part, w-w_part:w, :]mask_comp[y:y+h_part, 0:0+w_part] = mask_comp[y:y+h_part, 0:0+w_part] * ~mask_b[0:h_part, w-w_part:w] + (idx * mask_b)[0:h_part, w-w_part:w]elif x >= 0 and y < 0:h_part = h + yw_part = w - max(0, x+w-w_comp)img_comp[0:0+h_part, x:x+w_part, :] = img_comp[0:0+h_part, x:x+w_part, :] * ~mask_rgb_b[h-h_part:h, 0:w_part, :] + (img * mask_rgb_b)[h-h_part:h, 0:w_part, :]mask_comp[0:0+h_part, x:x+w_part] = mask_comp[0:0+h_part, x:x+w_part] * ~mask_b[h-h_part:h, 0:w_part] + (idx * mask_b)[h-h_part:h, 0:w_part]return img_comp, mask_comp

函数 add_bg_obj() 返回图像合成（背景 + 添加的对象）和Mask合成（添加对象的Mask合成）。

让我们通过将椅子添加到背景来看看它是如何工作的：

img_bg_path = files_bg_imgs[44]
img_bg = cv2.imread(img_bg_path)
img_bg = cv2.cvtColor(img_bg, cv2.COLOR_BGR2RGB)h, w = img_bg.shape[0], img_bg.shape[1]
mask_comp = np.zeros((h,w), dtype=np.uint8)img_comp, mask_comp = add_bg_obj(img_bg, mask_comp, bg_img, bg_mask, x=1700, y=600, idx=1)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_comp)
ax[0].set_title('Composition', fontsize=18)
ax[1].imshow(mask_comp)
ax[1].set_title('Composition mask', fontsize=18);

让我们添加transform后的椅子：

img_comp, mask_comp = add_bg_obj(img_comp, mask_comp, bg_img_t, bg_mask_t, x=1500, y=100, idx=2)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_comp)
ax[0].set_title('Composition', fontsize=18)
ax[1].imshow(mask_comp)
ax[1].set_title('Composition mask', fontsize=18);

2.8.2 添加多个对象

我们希望数据集的背景尽可能多样。各种背景有利于关键点检测神经网络的训练过程。但是我们只有 60 个背景图像，如果我们要创建 1000 个或更多图像的数据集，这并不多。

为了使背景更加多样化，我们将随机添加噪声对象。噪声对象将使用函数 create_bg_with_noise() 添加：

def create_bg_with_noise(files_bg_imgs,files_bg_noise_imgs,files_bg_noise_masks,bg_max=1920,bg_min=1080,max_objs_to_add=60,longest_bg_noise_max=1000,longest_bg_noise_min=200,blank_bg=False):if blank_bg:img_comp_bg = np.ones((bg_min, bg_max, 3), dtype=np.uint8) * 255mask_comp_bg = np.zeros((bg_min, bg_max), dtype=np.uint8)else:    idx = np.random.randint(len(files_bg_imgs))img_bg = cv2.imread(files_bg_imgs[idx])img_bg = cv2.cvtColor(img_bg, cv2.COLOR_BGR2RGB)img_comp_bg = resize_img(img_bg, bg_max, bg_min)mask_comp_bg = np.zeros((img_comp_bg.shape[0], img_comp_bg.shape[1]), dtype=np.uint8)for i in range(1, np.random.randint(max_objs_to_add) + 2):idx = np.random.randint(len(files_bg_noise_imgs))img, mask = get_img_and_mask(files_bg_noise_imgs[idx], files_bg_noise_masks[idx])x, y = np.random.randint(img_comp_bg.shape[1]), np.random.randint(img_comp_bg.shape[0])img_t, mask_t = resize_transform_bg_obj(img, mask, longest_bg_noise_min, longest_bg_noise_max, transforms=transforms_bg_obj)img_comp_bg, _ = add_bg_obj(img_comp_bg, mask_comp_bg, img_t, mask_t, x, y, i)return img_comp_bg

以下是参数说明：

files_bg_imgs 是一个包含背景图像路径的列表；
files_bg_noise_imgs 是一个包含噪声对象图像路径的列表；
files_bg_noise_masks 是一个包含噪声对象掩码路径的列表；
bg_max 和 bg_min 是背景图像最长和最短边的目标尺寸；
max_objs_to_add 是要添加到背景中的最大噪声对象数；
long_bg_noise_min 和longest_bg_noise_max 是噪声对象最长边的最小和最大尺。long_bg_noise_max 应小于 bg_min，longest_bg_noise_min 应至少为 30。
如果我们希望背景为白色而不是随机图像，则 blank_bg 应该为 True。

如果我们设置白色背景，让我们看看这个函数是如何工作的：

img_comp_bg = create_bg_with_noise(files_bg_imgs,files_bg_noise_imgs,files_bg_noise_masks,max_objs_to_add=20,blank_bg=True)
plt.figure(figsize=(15,15))
plt.imshow(img_comp_bg)

这次我们将随机选择一张图片作为背景：

img_comp_bg = create_bg_with_noise(files_bg_imgs,files_bg_noise_imgs,files_bg_noise_masks,max_objs_to_add=20)
plt.figure(figsize=(15,15))
plt.imshow(img_comp_bg)

请注意，在每次调用函数 create_bg_with_noise() 之后，我们都会得到一个新的噪声对象组合，因为它们是随机选择并放置在背景之上的。

2.9 控制重叠度

新添加的感兴趣对象可以与先前添加的感兴趣对象部分重叠。有时它可以与另一个对象的重要部分重叠，例如其面积的 60% 或 70%，甚至完全重叠。但我们不希望这种情况发生。

我们可能想要控制重叠的程度，使其小于20%或30%。或者我们可能希望我们感兴趣的物体完全不重叠。

让我们定义函数 check_overlapping() 检查是否有与任何先前添加的对象重叠超过overlap_degree阈值：

def check_overlapping(mask_comp, obj_areas, overlap_degree=0):obj_ids = np.unique(mask_comp).astype(np.uint8)[1:-1]masks = mask_comp == obj_ids[:, None, None]ok = Trueif len(np.unique(mask_comp)) != np.max(mask_comp) + 1:ok = Falsereturn okfor idx, mask in enumerate(masks):if np.count_nonzero(mask) / obj_areas[idx] < 1 - overlap_degree:ok = Falsebreakreturn ok

将新对象添加到合成后，此函数会将先前添加的对象的未重叠部分的区域与先前添加的对象的原始区域进行比较。如果与之前添加的任何对象的重叠度超过了overlap_degree，则该函数返回 False。如果所有先前添加的对象重叠不超过overlap_degree 或根本不重叠，则该函数返回True。

参数 mask_comp 是添加新对象后的Mask组合。

参数 obj_areas 是对象的原始区域列表，按添加顺序排列，就好像它们没有重叠一样。此列表在将其传递给 check_areas() 函数时不应包含新添加的对象。

2.10 创建合成组合

在这里，我们将定义创建对象合成组合的函数 create_composition()：

def create_composition(img_comp_bg,max_objs=15,longest_min=300,longest_max=700,overlap_degree=0,max_attempts_per_obj=10):img_comp = img_comp_bg.copy()h, w = img_comp.shape[0], img_comp.shape[1]mask_comp = np.zeros((h,w), dtype=np.uint8)keypoints_comp = []obj_areas = []num_objs = np.random.randint(max_objs) + 2i = 1for _ in range(1, num_objs):for _ in range(max_attempts_per_obj):imgs_number = len(files_imgs)idx = np.random.randint(imgs_number)img_path = files_imgs[idx]mask_path = files_masks[idx]keypoints_path = files_keypoints[idx]img, mask = get_img_and_mask(img_path, mask_path)with open(keypoints_path) as f:data = json.load(f)keypoints = data['keypoints']img_t, mask_t, keypoints_t = resize_transform_obj(img,mask,longest_min,longest_max,keypoints=keypoints,transforms=transforms_obj)x_max, y_max = img_comp.shape[1] - img_t.shape[1], img_comp.shape[0] - img_t.shape[0]x, y = np.random.randint(x_max), np.random.randint(y_max)if i == 1:img_comp, mask_comp, keypoints_comp = add_obj(img_comp,mask_comp,keypoints_comp,img_t,mask_t,keypoints_t,x,y,i)obj_areas.append(np.count_nonzero(mask_t))i += 1breakelse:        img_comp_prev, mask_comp_prev, keypoints_comp_prev = img_comp.copy(), mask_comp.copy(), keypoints_comp.copy()img_comp, mask_comp, keypoints_comp = add_obj(img_comp,mask_comp,keypoints_comp,img_t,mask_t,keypoints_t,x,y,i)ok = check_overlapping(mask_comp, obj_areas, overlap_degree)if ok:obj_areas.append(np.count_nonzero(mask_t))i += 1breakelse:img_comp, mask_comp, keypoints_comp = img_comp_prev.copy(), mask_comp_prev.copy(), keypoints_comp_prev.copy()return img_comp, mask_comp, keypoints_comp

以下是参数说明：

img_comp_bg 是将添加感兴趣对象的背景。
max_objs 是要添加的最大对象数。
long_min 和longest_max 是感兴趣对象最长边的最小和最大尺寸。
overlap_degree是阈值，它定义了随机添加的感兴趣对象是否与任何先前添加的感兴趣对象重叠超过由overlap_degree定义的阈值。如果至少有一个对象重叠过多，则该函数将返回先前的合成并再次添加该对象。
max_attempts_per_obj 将尝试添加对象的次数，而不会与其他对象重叠超过由overlap_degree定义的阈值。

该函数返回：

img_comp：添加了感兴趣对象的图像。在我们的例子中，感兴趣的对象是胶管。
mask_comp：添加对象的Mask组合。背景像素的值为 0，第一个添加对象的像素值为 1，第二个添加对象的像素值为 2，以此类推。
keypoints_comp：添加对象的关键点列表。

可以从Mask中获取每个对象的边界框。我们将定义函数 create_bboxes_from_mask_comp() 以列表的形式返回感兴趣对象的边界框坐标：

def create_bboxes_from_mask_comp(mask_comp):height, width = mask_comp.shape[0], mask_comp.shape[1]obj_ids = np.unique(mask_comp)[1:]masks = mask_comp == obj_ids[:, None, None]bboxes_comp = []for i in range(len(obj_ids)):pos = np.where(masks[i])xmin = np.min(pos[1])xmax = np.max(pos[1])ymin = np.min(pos[0])ymax = np.max(pos[0])bboxes_comp.append(list(map(int, [xmin, ymin, xmax, ymax])))return bboxes_comp

现在我们已准备好生成合成数据并将其可视化（此处我们设置overlap_degree=0，因此胶管根本不重叠）：

img_comp, mask_comp, keypoints_comp = create_composition(img_comp_bg,max_objs=4,overlap_degree=0,max_attempts_per_obj=10)fig, ax = plt.subplots(1, 2, figsize=(16, 7))
ax[0].imshow(img_comp)
ax[0].set_title('Composition', fontsize=18)
ax[1].imshow(mask_comp)
ax[1].set_title('Composition mask', fontsize=18);

让我们可视化关键点和边界框：

print("Keypoints:", keypoints_comp)bboxes_comp = create_bboxes_from_mask_comp(mask_comp)visualize_composition_with_keypoints(img_comp,keypoints_comp,keypoints_names,bboxes_comp)
# Keypoints: [[[473, 652, 1], [266, 72, 1]], [[1564, 716, 1], [1571, 283, 1]], [[862, 423, 1], [1164, 745, 1]]]

2.11 创建和保存合成数据

我们编写了一个 python 脚本来创建合成图像和Mask。现在我们将编写为图像创建标注的脚本。

首先，创建文件夹 dataset/train/images/、dataset/train/annotations/、dataset/valid/images/、dataset/valid/annotations/ 其中函数 generate_dataset() 将保存图像和标注。

这是创建数据集的函数：

def generate_dataset(imgs_number, folder, split='train'):time_start = time.time()for j in tqdm(range(imgs_number)):img_comp_bg = create_bg_with_noise(files_bg_imgs,files_bg_noise_imgs,files_bg_noise_masks,max_objs_to_add=60)img_comp, mask_comp, keypoints_comp = create_composition(img_comp_bg, max_objs=3,overlap_degree=0,max_attempts_per_obj=10)bboxes_comp = create_bboxes_from_mask_comp(mask_comp)img_comp = cv2.cvtColor(img_comp, cv2.COLOR_RGB2BGR)cv2.imwrite(os.path.join(folder, split, 'images/{}.jpg').format(j), img_comp)annotations = {}annotations['bboxes'], annotations['keypoints'] = bboxes_comp, keypoints_compwith open(os.path.join(folder, split, 'annotations/{}.json').format(j), "w") as f:json.dump(annotations, f)    time_end = time.time()time_total = round(time_end - time_start)time_per_img = round((time_end - time_start) / imgs_number, 1)print("Generation of {} synthetic images is completed. It took {} seconds, or {} seconds per image".format(imgs_number, time_total, time_per_img))print("Images are stored in '{}'".format(os.path.join(folder, split, 'images')))print("Annotations are stored in '{}'".format(os.path.join(folder, split, 'annotations')))

让我们创建 1000 个训练图像和 200 个验证图像的数据集：

generate_dataset(1000, folder='dataset', split='train')
generate_dataset(200, folder='dataset', split='valid')

100%|██████████████████████████████████████████████████████████████████████████████| 1000/1000 [17:13<00:00,  1.03s/it]
Generation of 1000 synthetic images is completed. It took 1033 seconds, or 1.0 seconds per image
Images are stored in 'dataset\train\images'
Annotations are stored in 'dataset\train\annotations'
100%|████████████████████████████████████████████████████████████████████████████████| 200/200 [03:19<00:00,  1.00it/s]
Generation of 200 synthetic images is completed. It took 199 seconds, or 1.0 seconds per image
Images are stored in 'dataset\valid\images'
Annotations are stored in 'dataset\valid\annotations'

现在我们有了一个合成数据集并准备好训练一个对象检测模型！

就我而言，在配备 Intel Core i7–10700K 处理器和 32GB RAM 的 PC 上生成 1200 张图像的数据集大约需要 20 分钟。一张合成图像在大约 1 秒内生成。

我还用胶管拍了 23 张照片，并手工标注它们。我们可以使用这些真实照片来测试训练后的目标检测模型的质量。

在这里，您可以下载 1000 个合成训练图像、200 个合成验证图像和 23 个真实测试图像的整个数据集。

这是一个包含上述所有步骤的 GitHub 存储库和笔记本。

2.12 来自合成数据集的示例

让我们看一下生成的合成数据集中的随机图像：

以下是带有标注的相关 json 文件的样子：

{“bboxes”: [[1257, 475, 1901, 603], [199, 154, 637, 463]], “keypoints”: [[[1318, 530, 1], [1874, 547, 1]], [[249, 413, 1], [597, 198, 1]]]}

我们看到在图像上有两个胶管。带有此图像注释的文件包含两个边界框的坐标。此外，每个胶管有两个关键点的坐标(因此，在这张图片上总共有四个关键点)。

让我们可视化标注文件:

关键点和标注位于正确的位置。这意味着我们可以直观地确认我们的脚本可以正常工作。

2.13 训练和测试关键点检测模型

我使用生成的合成数据集来训练 Keypoint RCNN 模型。

接下来，我使用经过训练的模型实时检测来自摄像机的视频流上胶管的关键点。结果如下：

BONUS

压缩GIF文件的Python代码实现

# ! /usr/local/bin/python3
# -*- coding: utf-8 -*-from PIL import Image
import os
import imageio
from skimage.transform import resizedef AnalysisGif(gifPath):image = Image.open(gifPath)pngDir = gifPath[:-4]if os.path.exists(pngDir):files = os.listdir(pngDir)for file in files:file = pngDir + "/" + fileos.remove(file)os.rmdir(pngDir)os.mkdir(pngDir)try:while True:current = image.tell()pngPath = pngDir + '/' + str(current) + '.png'image.save(pngPath, quality=85)image.seek(current + 1)except EOFError:print(EOFError)passdef Combine2Gif(folderPath, gifFilePath):files = os.listdir(folderPath)pngFiles = []for i in range(0, len(files), 15):pngFiles.append(folderPath + "/" + ('%d.png' % i))GenerateGif(0.1, gifFilePath, pngFiles)def GenerateGif(step, gifPath, filterPngs):images = []for filePath in filterPngs:# Read imageimg_io = imageio.imread(filePath)H, W = img_io.shape[:2]# Resize imageimg_io = resize(img_io, (int(H//2), int(W//2)))images.append(img_io)imageio.mimsave(gifPath, images, duration=step)if __name__ == "__main__":gifPath = r"1_tHL_PIHhirfXylP6KGRB9A.gif"AnalysisGif(gifPath)Combine2Gif(gifPath[:-4], gifPath[:-4] + "_result.gif")print("== finished ==")

参考目录

https://medium.com/@alexppppp/how-to-create-synthetic-dataset-for-computer-vision-keypoint-detection-78ba481cdafd

基于Python，OpenCV，Numpy和Albumentations实现关键点检测的合成数据集相关推荐

基于Python，OpenCV，Numpy和Albumentations实现目标检测的合成数据集
1.总述训练一个对象检测模型,如YOLOv5,需要一个包含感兴趣对象的图像和注释(带有对象边界框坐标的文本文件)的数据集. 例如,在下面的图片中,你可以看到可视化的边界框.每个边界框表示与特定类别相 ...
基于python opencv人脸识别的签到系统
基于python opencv人脸识别的签到系统前言先看下效果实现的功能开始准备页面的构建功能实现代码部分总结前言一个基于opencv人脸识别和TensorFlow进行模型训练的人 ...
基于python+OpenCV的车牌号码识别
基于python+OpenCV的车牌号码识别车牌识别行业已具备一定的市场规模,在电子警察.公路卡口.停车场.商业管理.汽修服务等领域已取得了部分应用.一个典型的车辆牌照识别系统一般包括以下4个部分: ...
python中numpy数组的合并_基于Python中numpy数组的合并实例讲解
基于Python中numpy数组的合并实例讲解 Python中numpy数组的合并有很多方法,如 - np.append() - np.concatenate() - np.stack() - np. ...
基于Python+OpenCV车道线检测(直道和弯道)
基于Python+OpenCV车道线检测(直道和弯道) 基于Python+OpenCV车道线检测(直道和弯道)
【开源分享】基于Python+OpenCV+PyQt5车牌识别(GUI界面)
亲测无错:基于Python+OpenCV+PyQt5车牌识别(GUI界面)绝对可以用的!!!!! 基于Python+OpenCV+PyQt5车牌识别(GUI界面) 参考文档
python opencv 条形码及二维码检测识别
目录条形码检测识别二维码检测识别基于python opencv pyzbar 实现. 条形码检测识别原图: 最后截取图: 直接上代码: import cv2 import numpy as n ...
Python+OpenCV+dlib汽车驾驶员疲劳驾驶检测
点击查看:Python+OpenCV+dlib汽车驾驶员疲劳驾驶检测文件大小:80M 操作系统:Windows10旗舰版开发工具:Python3.8.OpenCV4.5.dlib 开发语言:.py ...
Python+OpenCV：图像快速角点检测算法(FAST Algorithm for Corner Detection)
Python+OpenCV:图像快速角点检测算法(FAST Algorithm for Corner Detection) 理论 Feature Detection using FAST Select ...

基于Python，OpenCV，Numpy和Albumentations实现关键点检测的合成数据集