Python+OpenCV:K-Means聚类

目标

  • Learn to use cv.kmeans() function in OpenCV for data clustering.

理解参数

输入参数:

  1. samples : It should be of np.float32 data type, and each feature should be put in a single column.
  2. nclusters(K) : Number of clusters required at end.
  3. criteria : It is the iteration termination criteria. When this criteria is satisfied, algorithm iteration stops. Actually, it should be a tuple of 3 parameters. They are `( type, max_iter, epsilon )`:
    1. type of termination criteria. It has 3 flags as below:

      • cv.TERM_CRITERIA_EPS - stop the algorithm iteration if specified accuracy, epsilon, is reached.
      • cv.TERM_CRITERIA_MAX_ITER - stop the algorithm after the specified number of iterations, max_iter.
      • cv.TERM_CRITERIA_EPS + cv.TERM_CRITERIA_MAX_ITER - stop the iteration when any of the above condition is met.
    2. max_iter - An integer specifying maximum number of iterations.
    3. epsilon - Required accuracy.
  4. attempts : Flag to specify the number of times the algorithm is executed using different initial labellings. The algorithm returns the labels that yield the best compactness. This compactness is returned as output.
  5. flags : This flag is used to specify how initial centers are taken. Normally two flags are used for this : 
    cv.KMEANS_PP_CENTERS (Use kmeans++ center initialization by Arthur and Vassilvitskii [Arthur2007].) and cv.KMEANS_RANDOM_CENTERS (Select random initial centers in each attempt.).

输出参数:

  1. compactness : It is the sum of squared distance from each point to their corresponding centers.
  2. labels : This is the label array (same as 'code' in previous article) where each element marked '0', '1'.....
  3. centers : This is array of centers of clusters.

Now we will see how to apply K-Means algorithm with three examples.

Data with Only One Feature

Consider, you have a set of data with only one feature, ie one-dimensional.

For eg, we can take our t-shirt problem where you use only height of people to decide the size of t-shirt.

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):"""函数功能: method:0: Data with Only One Feature with K-Means Clustering in OpenCV."""# 0: Data with Only One Feature with K-Means Clustering in OpenCV.if 0 == method:x = np.random.randint(25, 100, 25)y = np.random.randint(175, 255, 25)z = np.hstack((x, y))z = z.reshape((50, 1))z = np.float32(z)pyplot.figure('Data Histogram', figsize=(16, 9))pyplot.hist(z, 256, [0, 256])pyplot.show()# Define criteria = ( type, max_iter = 10 , epsilon = 1.0 )criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)# Set flags (Just to avoid line break in the code)flags = lmc_cv.KMEANS_RANDOM_CENTERS# Apply KMeanscompactness, labels, centers = lmc_cv.kmeans(z, 2, None, criteria, 10, flags)# split the data to different clusters depending on their labels.cluster_a = z[labels == 0]cluster_b = z[labels == 1]# plot 'A' in red, 'B' in blue, 'centers' in yellowpyplot.figure('Result', figsize=(16, 9))pyplot.hist(cluster_a, 256, [0, 256], color='r')pyplot.hist(cluster_b, 256, [0, 256], color='b')pyplot.hist(centers, 32, [0, 256], color='y')pyplot.show()

Data with Multiple Features

In previous example, we took only height for t-shirt problem. Here, we will take both height and weight, ie two features.

Remember, in previous case, we made our data to a single column vector. Each feature is arranged in a column, while each row corresponds to an input test sample.

For example, in this case, we set a test data of size 50x2, which are heights and weights of 50 people.

First column corresponds to height of all the 50 people and second column corresponds to their weights.

First row contains two elements where first one is the height of first person and second one his weight.

Similarly remaining rows corresponds to heights and weights of other people.

Check image below:

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):"""函数功能: method:1: Data with Multiple Features with K-Means Clustering in OpenCV."""# 1: Data with Multiple Features with K-Means Clustering in OpenCV.if 1 == method:x = np.random.randint(25, 50, (25, 2))y = np.random.randint(60, 85, (25, 2))z = np.vstack((x, y))# convert to np.float32z = np.float32(z)# define criteria and apply kmeans()criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)ret, label, center = lmc_cv.kmeans(z, 2, None, criteria, 10, lmc_cv.KMEANS_RANDOM_CENTERS)# Now separate the data, Note the flatten()cluster_a = z[label.ravel() == 0]cluster_b = z[label.ravel() == 1]# Plot the datapyplot.figure('Result', figsize=(16, 9))pyplot.scatter(cluster_a[:, 0], cluster_a[:, 1])pyplot.scatter(cluster_b[:, 0], cluster_b[:, 1], c='r')pyplot.scatter(center[:, 0], center[:, 1], s=80, c='y', marker='s')pyplot.xlabel('Height')pyplot.ylabel('Weight')pyplot.show()

Color Quantization

Color Quantization is the process of reducing number of colors in an image.

One reason to do so is to reduce the memory. Sometimes, some devices may have limitation such that it can produce only limited number of colors.

In those cases also, color quantization is performed. Here we use k-means clustering for color quantization.

There is nothing new to be explained here. There are 3 features, say, R,G,B. So we need to reshape the image to an array of Mx3 size (M is number of pixels in image).

And after the clustering, we apply centroid values (it is also R,G,B) to all pixels, such that resulting image will have specified number of colors.

And again we need to reshape it back to the shape of original image.

Below is the code:

####################################################################################################
# K-Means聚类(K-Means Clustering)
def lmc_cv_k_means_demo(method):"""函数功能: method:2: Color Quantization with K-Means Clustering in OpenCV."""# 2: Color Quantization with K-Means Clustering in OpenCV.if 2 == method:stacking_images = []image_file_name = ['D:/99-Research/TestData/image/Castle01.jpg','D:/99-Research/TestData/image/Castle02.jpg','D:/99-Research/TestData/image/Castle03.jpg','D:/99-Research/TestData/image/Castle04.jpg']for i in range(len(image_file_name)):image = lmc_cv.imread(image_file_name[i])image = lmc_cv.cvtColor(image, lmc_cv.COLOR_BGR2RGB)stacking_image = image.copy()result_image = image.copy()z = image.reshape((-1, 3))# convert to np.float32z = np.float32(z)# define criteria, number of clusters and apply kmeans()criteria = (lmc_cv.TERM_CRITERIA_EPS + lmc_cv.TERM_CRITERIA_MAX_ITER, 10, 1.0)for clusters_number in range(1, 4):ret, label, center = lmc_cv.kmeans(z, 2 ** clusters_number, None, criteria, 10,lmc_cv.KMEANS_RANDOM_CENTERS)# Now convert back into uint8, and make original imagecenter = np.uint8(center)res = center[label.flatten()]result_image = res.reshape(image.shape)# stacking images side-by-sidestacking_image = np.hstack((stacking_image, result_image))# stacking images side-by-sidestacking_images.append(stacking_image)# 显示图像for i in range(len(stacking_images)):pyplot.figure('Color Quantization with K-Means Clustering %d' % (i + 1))pyplot.subplot(1, 1, 1)pyplot.imshow(stacking_images[i], 'gray')pyplot.title('Color Quantization with K-Means Clustering: k=2 k=4 k=8')pyplot.xticks([])pyplot.yticks([])pyplot.savefig('%02d.png' % (i + 1))pyplot.show()# 根据用户输入保存图像if ord("q") == (lmc_cv.waitKey(0) & 0xFF):# 销毁窗口pyplot.close('all')return

​​​​​​​

Python+OpenCV:K-Means聚类相关推荐

  1. OpenCV的k - means聚类 -对图片进行颜色量化

    OpenCV的k - means聚类 目标 学习使用cv2.kmeans()数据聚类函数OpenCV 理解参数 输入参数 样品:它应该的np.float32数据类型,每个特性应该被放在一个单独的列. ...

  2. OpenCV官方文档 理解k - means聚类

    理解k - means聚类 目标 在这一章中,我们将了解k - means聚类的概念,它是如何工作等. 理论 我们将这个处理是常用的一个例子. t恤尺寸问题 考虑一个公司要发布一个新模型的t恤. 显然 ...

  3. OpenCV k均值聚类kmeans clustering的实例(附完整代码)

    OpenCV k均值聚类kmeans clustering的实例 OpenCV k均值聚类kmeans clustering的实例 OpenCV k均值聚类kmeans clustering的实例 # ...

  4. 基于Python的k均值聚类不同规格的商品名

    基于Python的k均值聚类不同规格的商品名 前言 聚类的目标是使得同一簇内的点之间的距离较短,而不同簇中点之间的距离较大.以此来区分不同的群体. 本篇讲述使用k均值算法对超市购物记录集中的商品名称进 ...

  5. k means聚类算法_K-Means 聚类算法 20210108

    说到聚类,应先理解聚类和分类的区别 聚类和分类最大的不同在于:分类的目标是事先已知的,而聚类则不一样,聚类事先不知道目标变量是什么,类别没有像分类那样被预先定义出来. K-Means 聚类算法有很多种 ...

  6. python实现k均值聚类(kMeans)基于numpy

    1.k均值聚类简介 k均值聚类是一种无监督学习方法,当数据量小,数据维度低时,具有简单.快速.方便的优点,但是当数据量较大时,其速度较慢,也容易陷入局部最优. 2. 步骤 和以前一样,kMeans聚类 ...

  7. Python——KMeans(k均值聚类)实战(附详细代码与注解)

    开始之前 各位朋友周末好,今天博主小码将开车≥Ö‿Ö≤为大家用代码实战讲解KMeans聚类,请大家坐稳了≡(▔﹏▔)≡.作为机器学习的十大经典算法之一,聚类的相关现实应用非常之广,如图像分割,文本分类 ...

  8. python 聚类分析 k means

    Kmeans 是一种动态聚类方法,其基本思想是:首先随机选取 K 个点作为初始凝聚点,按照距离最近原则划分为 K 类:然后重新计算 K 个类的重心作为新的凝聚点,再按照距离最近原则重新分类:重复这一过 ...

  9. k means聚类算法_一文读懂K-means聚类算法

    1.引言 什么是聚类?我们通常说,机器学习任务可以分为两类,一类是监督学习,一类是无监督学习.监督学习:训练集有明确标签,监督学习就是寻找问题(又称输入.特征.自变量)与标签(又称输出.目标.因变量) ...

最新文章

  1. C#基础解析之Ⅱ【运算符和条件结构】
  2. DropdownList绑定的两种方法
  3. a commit git 参数是什么意思_深入理解Git - 一切皆commit
  4. mongodb 启动时的警告问题
  5. Chrome扩展应用Infinity New Tab的备份
  6. mysql到oracle数据迁移,mysql数据迁移到oracle
  7. 社区团购“九不得”:低价倾销、大数据“杀熟”被禁止
  8. MongoDB初试备份及恢复
  9. C++实现的Miller-Rabin素性测试程序
  10. 配置generatorConfig.xml自动生成的代码的sql书写问题
  11. GB28181协议--GB28181协议简介
  12. jmeter连接mysql并定义变量提供给后续接口使用
  13. ipad mini2 进入DFU模式
  14. Laravel中使用GuzzleHttp调用第三方服务的API接口
  15. 树莓派的Python成功解决TypeError: Image data cannot be converted to float
  16. 「镁客早报」微软总市值超过苹果亚马逊;Linux内核出现漏洞
  17. uniapp 跳转到商品详情页
  18. 国信长天蓝桥杯嵌入式类——stm32——使用keil4建立工程文件过程
  19. KUBUNTU 10.04 的安装与配置详解
  20. 【Java从零到架构师第二季】【07】JDBC FOR MySQL

热门文章

  1. Codeforces Round #879 (Div. 2) C. Short Program
  2. Xamarin Android 应用程序内图标上数字提示
  3. C#生成不重复随机数列表
  4. 设计模式之strategy模式(C++实现)
  5. Confluence 6 数据收集隐私策略
  6. 百度 UEditor--自定义上传文件路径及读取文件
  7. 暗能量什么样?爱因斯坦只“蒙”对了一半
  8. spring加载classpath与classpath*的区别别
  9. 基于XAMPP的Testlink安装方法
  10. ARCHLINX设置静态ip使用