Python+OpenCV：K-Means聚类

目标

Learn to use cv.kmeans() function in OpenCV for data clustering.

理解参数

输入参数：

samples : It should be of np.float32 data type, and each feature should be put in a single column.
nclusters(K) : Number of clusters required at end.
criteria : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are `( type, max_iter, epsilon )`:
1. type of termination criteria. It has 3 flags as below:
  - cv.TERM_CRITERIA_EPS - stop the algorithm iteration if specified accuracy, epsilon, is reached.
  - cv.TERM_CRITERIA_MAX_ITER - stop the algorithm after the specified number of iterations, max_iter.
  - cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER - stop the iteration when any of the above condition is met.
2. max_iter - An integer specifying maximum number of iterations.
3. epsilon - Required accuracy.
attempts : Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness. This compactness is returned as output.
flags : This flag is used to specify how initial centers are taken. Normally two flags are used for this :
cv.KMEANS_PP_CENTERS (Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].) and cv.KMEANS_RANDOM_CENTERS (Select random initial centers in each attempt.).

输出参数：

compactness : It is the sum of squared distance from each point to their corresponding centers.
labels : This is the label array (same as 'code' in previous article) where each element marked '0', '1'.....
centers : This is array of centers of clusters.

Now we will see how to apply K-Means algorithm with three examples.

Data with Only One Feature

Consider, you have a set of data with only one feature, ie one-dimensional.

For eg, we can take our t-shirt problem where you use only height of people to decide the size of t-shirt.

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):"""函数功能: method:0: Data with Only One Feature with K-Means Clustering in OpenCV."""# 0: Data with Only One Feature with K-Means Clustering in OpenCV.if 0 == method:x = np.random.randint(25, 100, 25)y = np.random.randint(175, 255, 25)z = np.hstack((x, y))z = z.reshape((50, 1))z = np.float32(z)pyplot.figure('Data Histogram', figsize=(16, 9))pyplot.hist(z, 256, [0, 256])pyplot.show()# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)# Set flags (Just to avoid line break in the code)flags = lmc_cv.KMEANS_RANDOM_CENTERS# Apply KMeanscompactness, labels, centers = lmc_cv.kmeans(z, 2, None, criteria, 10, flags)# split the data to different clusters depending on their labels.cluster_a = z[labels == 0]cluster_b = z[labels == 1]# plot 'A' in red, 'B' in blue, 'centers' in yellowpyplot.figure('Result', figsize=(16, 9))pyplot.hist(cluster_a, 256, [0, 256], color='r')pyplot.hist(cluster_b, 256, [0, 256], color='b')pyplot.hist(centers, 32, [0, 256], color='y')pyplot.show()

Data with Multiple Features

In previous example, we took only height for t-shirt problem. Here, we will take both height and weight, ie two features.

Remember, in previous case, we made our data to a single column vector. Each feature is arranged in a column, while each row corresponds to an input test sample.

For example, in this case, we set a test data of size 50x2, which are heights and weights of 50 people.

First column corresponds to height of all the 50 people and second column corresponds to their weights.

First row contains two elements where first one is the height of first person and second one his weight.

Similarly remaining rows corresponds to heights and weights of other people.

Check image below:

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):"""函数功能: method:1: Data with Multiple Features with K-Means Clustering in OpenCV."""# 1: Data with Multiple Features with K-Means Clustering in OpenCV.if 1 == method:x = np.random.randint(25, 50, (25, 2))y = np.random.randint(60, 85, (25, 2))z = np.vstack((x, y))# convert to np.float32z = np.float32(z)# define criteria and apply kmeans()criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)ret, label, center = lmc_cv.kmeans(z, 2, None, criteria, 10, lmc_cv.KMEANS_RANDOM_CENTERS)# Now separate the data, Note the flatten()cluster_a = z[label.ravel() == 0]cluster_b = z[label.ravel() == 1]# Plot the datapyplot.figure('Result', figsize=(16, 9))pyplot.scatter(cluster_a[:, 0], cluster_a[:, 1])pyplot.scatter(cluster_b[:, 0], cluster_b[:, 1], c='r')pyplot.scatter(center[:, 0], center[:, 1], s=80, c='y', marker='s')pyplot.xlabel('Height')pyplot.ylabel('Weight')pyplot.show()

Color Quantization

Color Quantization is the process of reducing number of colors in an image.

One reason to do so is to reduce the memory. Sometimes, some devices may have limitation such that it can produce only limited number of colors.

In those cases also, color quantization is performed. Here we use k-means clustering for color quantization.

There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape the image to an array of Mx3 size (M is number of pixels in image).

And after the clustering, we apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have specified number of colors.

And again we need to reshape it back to the shape of original image.

Below is the code:

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):"""函数功能: method:2: Color Quantization with K-Means Clustering in OpenCV."""# 2: Color Quantization with K-Means Clustering in OpenCV.if 2 == method:stacking_images = []image_file_name = ['D:/99-Research/TestData/image/Castle01.jpg','D:/99-Research/TestData/image/Castle02.jpg','D:/99-Research/TestData/image/Castle03.jpg','D:/99-Research/TestData/image/Castle04.jpg']for i in range(len(image_file_name)):image = lmc_cv.imread(image_file_name[i])image = lmc_cv.cvtColor(image, lmc_cv.COLOR_BGR2RGB)stacking_image = image.copy()result_image = image.copy()z = image.reshape((-1, 3))# convert to np.float32z = np.float32(z)# define criteria, number of clusters and apply kmeans()criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)for clusters_number in range(1, 4):ret, label, center = lmc_cv.kmeans(z, 2 ** clusters_number, None, criteria, 10,lmc_cv.KMEANS_RANDOM_CENTERS)# Now convert back into uint8, and make original imagecenter = np.uint8(center)res = center[label.flatten()]result_image = res.reshape(image.shape)# stacking images side-by-sidestacking_image = np.hstack((stacking_image, result_image))# stacking images side-by-sidestacking_images.append(stacking_image)# 显示图像for i in range(len(stacking_images)):pyplot.figure('Color Quantization with K-Means Clustering %d' % (i + 1))pyplot.subplot(1, 1, 1)pyplot.imshow(stacking_images[i], 'gray')pyplot.title('Color Quantization with K-Means Clustering: k=2 k=4 k=8')pyplot.xticks([])pyplot.yticks([])pyplot.savefig('%02d.png' % (i + 1))pyplot.show()# 根据用户输入保存图像if ord("q") == (lmc_cv.waitKey(0) & 0xFF):# 销毁窗口pyplot.close('all')return