Selective Search算法—候选框生成

相比于滑动搜索策略，Selective Search算法采用启发式的方法，过滤掉图像中很多断裂的子区域，候选生成所需的目标区域（Region Proposal），计算效率大幅提升。

文章目录

Selective Search算法---候选框生成
引论：学习算法前的问题思考
一、Selective Search算法实现步骤
二、Selective Search算法流程图
三、代码
四、深入思考
五、项目链接

引论：学习算法前的问题思考

Ques：如何粗略地度量两张图片的相似度？

假设现在有5张图片，数学上可以采取一种什么样的度量方式，来计算图片之间的相似度呢？如何粗略地计算出后4张图片与第1张图片的相似度？

（1）从颜色上来度量。直觉上，第3张图片整体颜色偏黑，感觉与第1张图片是最相似的，而第4张图片整体都是淡蓝色，与第1张图片差异程度最大。

可以这么做：分别统计图像像素值在0-255的概率分布，拉成直方图的形式。假设每个通道分别拉成6维的直方图，最后就能拼接得到一个18维的颜色特征直方图。我们将两张图片的18维颜色特征直方图进行比对，将对应较小的值累加起来求和，最后就得到了一个颜色相似度的数学度量。

def metric_color_similarity(image1, image2):color_bin = 6color_hist1 = np.array([])color_hist2 = np.array([])for colour_channel in (0, 1, 2):c1 = image1[:, :, colour_channel]color_hist1 = np.concatenate([color_hist1] + [np.histogram(c1, color_bin, (0.0, 255.0))[0]])c2 = image2[:, :, colour_channel]color_hist2 = np.concatenate([color_hist2] + [np.histogram(c2, color_bin, (0.0, 255.0))[0]])color_hist1 = color_hist1 / sum(color_hist1)color_hist2 = color_hist2 / sum(color_hist2)color_sim = 0for i in range(len(color_hist1)):color_sim = color_sim + min(color_hist1[i], color_hist2[i])print(color_sim)

计算得到第2、3、4、5张图分别与第1张图的颜色相似度大小：

（2）从纹理上来度量。将rgb图片灰度化，提取LBP纹理特征图。统计纹理特征图像素值在0-255的概率分布，拉成直方图的形式。将两张图片的纹理特征直方图进行比对，将对应较小的值累加起来求和，最后就得到了一个纹理相似度的数学度量。

def metric_texture_similarity(image1, image2):gray_image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2GRAY)gray_image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2GRAY)tex_img1 = skimage.feature.local_binary_pattern(gray_image1, 8, 1.0)tex_img2 = skimage.feature.local_binary_pattern(gray_image2, 8, 1.0)texture_bin = 20texture_hist1 = np.histogram(tex_img1.flatten(), texture_bin, (0.0, 255.0))[0]texture_hist2 = np.histogram(tex_img2.flatten(), texture_bin, (0.0, 255.0))[0]p_hist1 = texture_hist1/sum(texture_hist1)p_hist2 = texture_hist2/sum(texture_hist2)similarity = 0for i in range(texture_bin):similarity = similarity + min(p_hist1[i], p_hist2[i])print(similarity)

上面5张图提取得到的LBP纹理特征图如下：

计算得到第2、3、4、5张图分别与第1张图的纹理相似度大小：

这些度量方式虽然粗略，但的确可以作为图片相似度的一种简单计算方式。

一、Selective Search算法实现步骤

第1步：利用felzenszwalb算法对rgb图像进行过度预分割。

假设我们原始输入一张（250，250，3）的rgb图像，如下图所示：

调用skimage.segmentation.felzenszwalb函数进行预分割，分割结果如下图所示：

原始250*250=62500个像素点被分割成了915个类别。

第2步：创建字典集合region，含有915个元素。其中每个元素的键记为label，对应8个值（该label下所有像素点的min_x、min_y、max_x、max_y、类别标号label、像素点个数size 、颜色统计直方图、纹理统计直方图）。

计算min_x、min_y、max_x、max_y时，设置min的初始值为inf，max的初始值为0，遍历每个样本点，依次更新。

计算颜色统计直方图、纹理统计直方图时，将rgb像素点、LBP纹理图像素点拉成直方图的形式，再做归一化处理。

最后得到字典集合样本点的形式为：

第3步：创建相邻对集合neighbour_couple ，含有2429个相邻对。

对于字典集合region中的915个元素，两两进行比较，根据每个类别区域的min_x、min_y、max_x、max_y，判断这两个区域是不是相邻。如果相邻，就把region中的元素r1、r2，以（r1，r2）的形式append到neighbour_couple中。

最后得到neighbour_couple的结果为：

第4步：创建相似度字典集合sim_dictionary。对neighbour_couple中的2429个相邻对，分别计算它们的相似度，并以（i，j）：sim的形式，添加进入sim_dictionary。

计算区域i、区域j的相似度时，利用了下面4种相似度度量公式：

（1）颜色相似度

（2）纹理相似度

（3）大小相似度

大小是指区域中包含像素点的个数，计算方式是总体减去两个像素和占全图像像素比例，这样可以尽量让小的区域先合并，避免某个大区域对周围小区域进行吞并。

（4）形状相似度

形状相似度主要是为了衡量两个区域是否更加“吻合”，其指标是合并后能够框住区域的最小矩形和原始两图像大小和的差越小，其吻合度越高。

最后将四种相似度累加起来，作为区域（i，j）之间的相似度度量。

这是某些邻近区域的相似度计算结果，可以看出还是比较合理的。

第5步：找出集合sim_dictionary中相似度最大的区域对（i，j），进行融合，标记为新的区域t，添加进入region集合中。删除neighbour_couple与i、j邻近的区域对，更新为与t邻近的区域对。

区域t更新后，键标记为原先最大的label值+1，而8个值更新公式如下：

当sim_dictionary集合中所有的邻近区域都融合完毕后，region中不再有新的区域加入，此时整个区域融合过程结束。由计算可以得到，原先felzenszwalb算法分割后只得到915个区域，通过区域不断融合添加入新区域后，最后总共得到2429个区域。

第6步：对融合后的region集合，取出每个区域的min_x、min_y、max_x、max_y，二次筛选后，得到的就是我们的候选区域。

我们将这些位置对应的图片区域裁减出来，得到的是如下候选区域：

由结果可以看出，对于目标检测中的人脸检测问题，我们利用Selective Search算法的确可以筛选得到目标人脸区域。

二、Selective Search算法流程图

三、代码

import cv2
import numpy as np
import skimage.segmentation
import random
import skimage.feature# Selective Search algorithm# step 1: calculate the first fel_segment region
# step 2: calculate the neighbour couple
# step 3: calculate the similarity dictionary
# step 4: merge regions and calculate the second merged region
# step 5: obtain e target candidate regions by secondary screeningdef intersect(a, b):if (a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]) or \(a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or \(a["min_x"] < b["min_x"] < a["max_x"] and a["min_y"] < b["max_y"] < a["max_y"]) or \(a["min_x"] < b["max_x"] < a["max_x"] and a["min_y"] < b["min_y"] < a["max_y"]):return Truereturn Falsedef calc_similarity(r1, r2, size):sim1 = 0sim2 = 0for a, b in zip(r1["hist_c"], r2["hist_c"]):sim1 = sim1 + min(a, b)for a, b in zip(r1["hist_t"], r2["hist_t"]):sim2 = sim2 + min(a, b)sim3 = 1.0 - (r1["size"] + r2["size"]) / sizerect_size = (max(r1["max_x"], r2["max_x"]) - min(r1["min_x"], r2["min_x"])) * \(max(r1["max_y"], r2["max_y"]) - min(r1["min_y"], r2["min_y"]))sim4 = 1.0 - (rect_size - r1["size"] - r2["size"]) / sizesimilarity = sim1 + sim2 + sim3 + sim4return similaritydef merge_region(r1, r2, t):new_size = r1["size"] + r2["size"]r_new = {"min_x": min(r1["min_x"], r2["min_x"]),"min_y": min(r1["min_y"], r2["min_y"]),"max_x": max(r1["max_x"], r2["max_x"]),"max_y": max(r1["max_y"], r2["max_y"]),"size": new_size,"hist_c": (r1["hist_c"] * r1["size"] + r2["hist_c"] * r2["size"]) / new_size,"hist_t": (r1["hist_t"] * r1["size"] + r2["hist_t"] * r2["size"]) / new_size,"labels": t}return r_new# Step 1: Calculate the different categories segmented by felzenszwalb algorithmdef first_calc_fel_category(image, scale, sigma, min_size):fel_mask = skimage.segmentation.felzenszwalb(image, scale=scale, sigma=sigma, min_size=min_size)print('The picture has been segmented in these categories : ', np.max(fel_mask))   # 0-694 categoriesgray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)    # (250, 250)texture_img = skimage.feature.local_binary_pattern(gray_image, 8, 1.0)    # (250, 250)# fel_img = np.zeros((fel_mask.shape[0], fel_mask.shape[0], 3))# for i in range(np.max(fel_mask)):#     a = random.randint(0, 255)#     b = random.randint(0, 255)#     c = random.randint(0, 255)#     for j in range(fel_mask.shape[0]):#         for k in range(fel_mask.shape[1]):#             if fel_mask[j, k] == i:#                 fel_img[j, k, 0] = a#                 fel_img[j, k, 1] = b#                 fel_img[j, k, 2] = c## cv2.namedWindow("image")# cv2.imshow('image', fel_img/255)# cv2.waitKey(0)# cv2.imwrite('felzenszwalb_img.jpg', fel_img)img_append = np.zeros((fel_mask.shape[0], fel_mask.shape[1], 4))  # (250, 250, 4)img_append[:, :, 0:3] = imageimg_append[:, :, 3] = fel_maskregion = {}# calc the min_x、in_y、max_x、max_y、label in every categoryfor y, i in enumerate(img_append):for x, (r, g, b, l) in enumerate(i):if l not in region:region[l] = {"min_x": 0xffff, "min_y": 0xffff, "max_x": 0, "max_y": 0, "labels": l}if region[l]["min_x"] > x:region[l]["min_x"] = xif region[l]["min_y"] > y:region[l]["min_y"] = yif region[l]["max_x"] < x:region[l]["max_x"] = xif region[l]["max_y"] < y:region[l]["max_y"] = yfor k, v in list(region.items()):# calc the size feature in every categorymasked_color = image[:, :, :][img_append[:, :, 3] == k]region[k]["size"] = len(masked_color)# calc the color feature in every categorycolor_bin = 6color_hist = np.array([])for colour_channel in (0, 1, 2):c = masked_color[:, colour_channel]color_hist = np.concatenate([color_hist] + [np.histogram(c, color_bin, (0.0, 255.0))[0]])color_hist = color_hist / sum(color_hist)region[k]["hist_c"] = color_hist# calc the texture feature in every categorytexture_bin = 10masked_texture = texture_img[:, :][img_append[:, :, 3] == k]texture_hist = np.histogram(masked_texture, texture_bin, (0.0, 255.0))[0]texture_hist = texture_hist / sum(texture_hist)region[k]["hist_t"] = texture_histreturn region# Step 2: Calculate the neighbour couple in the first fel_segment regiondef calc_neighbour_couple(region):r = list(region.items())couples = []for cur, a in enumerate(r[:-1]):for b in r[cur + 1:]:if intersect(a[1], b[1]):couples.append((a, b))return couples# Step 3: Calculate the sim_dictionary in the neighbour coupledef calc_sim_dictionary(couple, total_size):sim_dictionary = {}for (ai, ar), (bi, br) in couple:sim_dictionary[(ai, bi)] = calc_similarity(ar, br, total_size)return sim_dictionary# step 4: merge the small regions and calculate the second merged regiondef second_calc_merge_category(sim_dictionary, region,  total_size):while sim_dictionary != {}:i, j = sorted(sim_dictionary.items(), key=lambda i: i[1])[-1][0]t = max(region.keys()) + 1.0region[t] = merge_region(region[i], region[j], t)key_to_delete = []for k, v in list(sim_dictionary.items()):if (i in k) or (j in k):key_to_delete.append(k)for k in key_to_delete:del sim_dictionary[k]for k in [a for a in key_to_delete if a != (i, j)]:n = k[1] if k[0] in (i, j) else k[0]sim_dictionary[(t, n)] = calc_similarity(region[t], region[n], total_size)return region# step 5: obtain the target candidate regions by secondary screeningdef calc_candidate_box(second_region, total_size):category = []for k, r in list(second_region.items()):category.append({'rect': (r['min_x'], r['min_y'], r['max_x'], r['max_y']), 'size': r['size']})candidate_box = set()for r in category:if r['rect'] in candidate_box:continueif r['size'] > total_size / 4:continueif r['size'] < total_size / 36:continuex1, y1, x2, y2 = r['rect']if (x2-x1) == 0 or (y2-y1) == 0:continueif (y2-y1) / (x2-x1) > 1.5 or (x2-x1) / (y2-y1) > 1.5:continuecandidate_box.add(r['rect'])return candidate_boximg = cv2.imread('/home/archer/CODE/PF/162.jpg')
total_size = img.shape[0] * img.shape[1]
print('The shape of the image is : ', img.shape)    # (250, 250, 3)first_region = first_calc_fel_category(img, scale=20, sigma=0.9, min_size=10)
print('first segment categories: ', len(first_region))neighbour_couple = calc_neighbour_couple(first_region)
print('first neighbour_couple : ', len(neighbour_couple))sim_dictionary = calc_sim_dictionary(neighbour_couple, total_size)second_region = second_calc_merge_category(sim_dictionary, first_region, total_size)
print('second merge categories: ', len(second_region))candidate_box = calc_candidate_box(second_region, total_size)
print('the candidate box we got by the selective search algorithm ： ')flag = 1
for (x1, y1, x2, y2) in candidate_box:select_img = img[y1:y2, x1:x2]print(x1, y1, x2, y2)# cv2.namedWindow("select_image")# cv2.imshow("select_image", select_img)# cv2.waitKey(0)img_path ='/home/archer/CODE/PF/selective/' + str(flag) + '.jpg'cv2.imwrite(img_path, select_img)flag = flag + 1

四、深入思考

思考1：Selective Search算法真的能生成想要的目标候选区域吗？

在利用RCNN做目标检测时，最关键的是要牢牢把握住两个步骤的效果。一是要确保Selective Search算法一定要能生成我们想要检测的目标区域；二是要确保CNN分类器一定要把我们想要检测的区域准确判别为真。

现实场景情形复杂，有时目标区域过小，图像分割融合得不到我们想要的目标区域。这也是该算法无法处理的缺陷，甚至是目标检测算法大多的通病。

但最起码针对LFW数据集，Selective Search算法对人脸区域的生成效果不错，每张图片基本都能生成我们想要的目标区域。有时我们能调整felzenszwalb算法，让图片更加过度预分割，以期待最后融合能生成我们想要的目标区域。

思考2：我们能从这段代码中学到什么？

Selective Search算法的具体编程实现比较复杂，相比以往接触到的算法实现，对于非科班没经过系统算法训练的我，还是有点吃力的。我们能从中学到很多编程技巧。

（1）类似c++中自定义的类，为了方便整体存储我们想要数据内容，我们可以借助python中的字典来实现，一个key对应多个value，每个value记录不同格式的数据。

（2）Selective Search算法中最后的区域融合，很像统计算法中的层次聚类法，不断合并更新新的对应关系。以后如果涉及到这方面的编程实现，可以借鉴。

（3）masked_color = image[:, :, :][img_append[:, :, 3] == k]，这句命令把所有img_append第4个分量等于k的 image中的元素全部取了出来，实际用途很广。

五、项目链接

如果代码跑不通，或者想直接使用训练好的模型，可以去下载项目链接：
https://blog.csdn.net/Twilight737