利用 Python 实现 K-means 算法

使用 Python 实现K-means算法，采用随机函数随机在二维平面上生成100个点，然后使用所写程序对这100个点进行聚类【可以采用SSE（Sum of the Squared Errors，误差平方和）来确定最佳聚类数，即确定K值】。

问题的聚类算法分析：

①　程序先随机在二维平面生成100个点，再随机从中选取k个点作为初始化质心；

②　计算每个点到每个质心的距离，加入到距离最近的点中；

③　计算每个聚类中所有点的距离平均值作为该聚类的新的质心；

④　对比原质心与新质心是否改变，如果不改变则聚类已稳定，否则就重复2-4步，直到重复指定的n次，防止不收敛。

K值选取过程分析：

利用for循环指定K值范围，然后遍历计算每个K值的SSE，可以绘制下图，选择该图的拐点作为最终K值即可。本题中选题K=4较为合适。

实验结果：

****************************************
k： 1
center_point_new:{0: [43.46, 47.88]}
cluster:{0: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]}
estimator: 171203.39999999997
****************************************
k： 2
center_point_new:{0: [42.86538461538461, 23.692307692307693], 1: [44.104166666666664, 74.08333333333333]}
cluster:{0: [0, 2, 4, 5, 6, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 21, 23, 26, 27, 31, 36, 40, 41, 42, 44, 46, 47, 49, 51, 52, 56, 58, 60, 61, 62, 64, 65, 66, 69, 70, 71, 73, 80, 81, 82, 83, 87, 88, 89, 90, 91, 93], 1: [1, 3, 7, 10, 11, 20, 22, 24, 25, 28, 29, 30, 32, 33, 34, 35, 37, 38, 39, 43, 45, 48, 50, 53, 54, 55, 57, 59, 63, 67, 68, 72, 74, 75, 76, 77, 78, 79, 84, 85, 86, 92, 94, 95, 96, 97, 98, 99]}
estimator: 107785.28044871795
****************************************
k： 3
center_point_new:{0: [77.28571428571429, 18.523809523809526], 1: [57.84848484848485, 73.24242424242425], 2: [17.695652173913043, 43.08695652173913]}
cluster:{0: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 58, 61, 62, 64, 65, 71, 73, 83, 89, 90, 91], 1: [3, 7, 10, 11, 20, 28, 29, 30, 34, 35, 37, 38, 39, 48, 50, 54, 55, 63, 68, 72, 74, 76, 78, 79, 84, 85, 92, 94, 95, 96, 97, 98, 99], 2: [0, 1, 4, 5, 9, 13, 15, 18, 19, 21, 22, 24, 25, 27, 31, 32, 33, 40, 41, 42, 43, 44, 45, 46, 47, 49, 51, 52, 53, 56, 57, 59, 60, 66, 67, 69, 70, 75, 77, 80, 81, 82, 86, 87, 88, 93]}
estimator: 69427.21814417467
****************************************
k： 4
center_point_new:{0: [78.75, 18.3], 1: [64.6923076923077, 69.26923076923077], 2: [20.333333333333332, 81.04761904761905], 3: [20.060606060606062, 27.848484848484848]}
cluster:{0: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 58, 61, 62, 64, 65, 71, 73, 89, 90, 91], 1: [3, 7, 10, 11, 20, 25, 28, 29, 30, 34, 35, 39, 48, 50, 54, 55, 72, 74, 78, 79, 84, 85, 92, 94, 95, 98], 2: [1, 24, 32, 33, 37, 38, 43, 45, 53, 57, 59, 63, 67, 68, 75, 76, 77, 86, 96, 97, 99], 3: [0, 4, 5, 9, 13, 15, 18, 19, 21, 22, 27, 31, 40, 41, 42, 44, 46, 47, 49, 51, 52, 56, 60, 66, 69, 70, 80, 81, 82, 83, 87, 88, 93]}
estimator: 39536.34410589411
****************************************
k： 5
center_point_new:{0: [73.0, 74.33333333333333], 1: [77.28571428571429, 18.523809523809526], 2: [15.75, 14.6875], 3: [20.77777777777778, 85.33333333333333], 4: [29.0, 47.77777777777778]}
cluster:{0: [3, 7, 11, 20, 28, 29, 30, 34, 48, 50, 55, 78, 79, 85, 92, 94, 95, 98], 1: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 58, 61, 62, 64, 65, 71, 73, 83, 89, 90, 91], 2: [4, 15, 18, 19, 21, 27, 31, 42, 44, 49, 56, 66, 69, 80, 87, 93], 3: [1, 24, 32, 37, 38, 43, 45, 53, 59, 63, 67, 68, 75, 76, 84, 96, 97, 99], 4: [0, 5, 9, 10, 13, 22, 25, 33, 35, 39, 40, 41, 46, 47, 51, 52, 54, 57, 60, 70, 72, 74, 77, 81, 82, 86, 88]}
estimator: 30705.73908730159
****************************************
k： 6
center_point_new:{0: [44.42857142857143, 47.0], 1: [88.0, 34.125], 2: [77.36363636363636, 8.0], 3: [20.952380952380953, 82.04761904761905], 4: [14.16, 24.92], 5: [76.0, 78.14285714285714]}
cluster:{0: [0, 3, 9, 10, 11, 25, 35, 39, 40, 46, 47, 54, 58, 72, 73, 74, 77, 81, 82, 83, 92], 1: [6, 8, 23, 65, 71, 78, 89, 90], 2: [2, 12, 14, 16, 17, 26, 36, 61, 62, 64, 91], 3: [1, 24, 32, 33, 37, 38, 43, 45, 53, 57, 59, 63, 67, 68, 75, 76, 84, 86, 96, 97, 99], 4: [4, 5, 13, 15, 18, 19, 21, 22, 27, 31, 41, 42, 44, 49, 51, 52, 56, 60, 66, 69, 70, 80, 87, 88, 93], 5: [7, 20, 28, 29, 30, 34, 48, 50, 55, 79, 85, 94, 95, 98]}
estimator: 26203.382359307354
****************************************
k： 7
center_point_new:{0: [46.857142857142854, 41.5], 1: [81.22222222222223, 17.0], 2: [47.84615384615385, 72.92307692307692], 3: [83.9090909090909, 78.36363636363636], 4: [13.615384615384615, 86.76923076923077], 5: [15.666666666666666, 13.533333333333333], 6: [16.9375, 47.5]}
cluster:{0: [9, 25, 39, 40, 46, 47, 54, 58, 72, 73, 74, 82, 83, 92], 1: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 61, 62, 64, 65, 71, 89, 90, 91], 2: [3, 10, 11, 20, 30, 35, 37, 48, 68, 76, 84, 85, 99], 3: [7, 28, 29, 34, 50, 55, 78, 79, 94, 95, 98], 4: [1, 24, 32, 38, 43, 45, 53, 59, 63, 67, 75, 96, 97], 5: [4, 15, 18, 21, 27, 31, 42, 44, 49, 56, 66, 69, 80, 87, 93], 6: [0, 5, 13, 19, 22, 33, 41, 51, 52, 57, 60, 70, 77, 81, 86, 88]}
estimator: 19237.784108946606
****************************************
k： 8
center_point_new:{0: [83.9090909090909, 78.36363636363636], 1: [18.125, 47.9375], 2: [38.44444444444444, 83.22222222222223], 3: [20.105263157894736, 16.63157894736842], 4: [79.94736842105263, 17.42105263157895], 5: [10.0, 74.66666666666667], 6: [8.0, 92.85714285714286], 7: [50.0, 55.5625]}
cluster:{0: [7, 28, 29, 34, 50, 55, 78, 79, 94, 95, 98], 1: [0, 5, 13, 22, 33, 40, 41, 51, 52, 57, 60, 70, 77, 81, 86, 88], 2: [30, 37, 38, 68, 76, 84, 96, 97, 99], 3: [4, 15, 18, 19, 21, 27, 31, 42, 44, 46, 47, 49, 56, 66, 69, 80, 83, 87, 93], 4: [2, 6, 8, 12, 14, 16, 17, 23, 26, 36, 61, 62, 64, 65, 71, 73, 89, 90, 91], 5: [1, 43, 67], 6: [24, 32, 45, 53, 59, 63, 75], 7: [3, 9, 10, 11, 20, 25, 35, 39, 48, 54, 58, 72, 74, 82, 85, 92]}
estimator: 19305.17060644034
****************************************
k： 9
center_point_new:{0: [53.5, 20.5], 1: [15.785714285714286, 12.428571428571429], 2: [26.857142857142858, 54.142857142857146], 3: [12.090909090909092, 40.81818181818182], 4: [35.45454545454545, 82.63636363636364], 5: [85.13333333333334, 18.866666666666667], 6: [49.6, 56.93333333333333], 7: [5.25, 89.25], 8: [83.9090909090909, 78.36363636363636]}
cluster:{0: [16, 36, 46, 47, 58, 73, 83, 91], 1: [15, 18, 21, 27, 31, 42, 44, 49, 56, 66, 69, 80, 87, 93], 2: [0, 33, 40, 57, 77, 81, 86], 3: [4, 5, 13, 19, 22, 41, 51, 52, 60, 70, 88], 4: [1, 30, 37, 38, 63, 68, 76, 84, 96, 97, 99], 5: [2, 6, 8, 12, 14, 17, 23, 26, 61, 62, 64, 65, 71, 89, 90], 6: [3, 9, 10, 11, 20, 25, 35, 39, 48, 54, 72, 74, 82, 85, 92], 7: [24, 32, 43, 45, 53, 59, 67, 75], 8: [7, 28, 29, 34, 50, 55, 78, 79, 94, 95, 98]}
estimator: 14449.77272727272

实验源码：

# coding=utf-8
import numpy as np
from numpy import randomimport matplotlib.pyplot as plt# K-means
def k_means(matrix, center_point, k, n):# print('k=', k, ' n=', n)# 2.对每个样本点，计算得到距其最近的质心，将其类别标为该质心所对应的clustercluster = {}for i in range(0, k):cluster[i] = []for i in range(0, 100):# 记录所有距离d = []# 记录下标信息index = {}for j in range(0, k):s = ((matrix[0][i] - center_point[j][0]) ** 2 + (matrix[1][i] - center_point[j][1]) ** 2)**0.5index[s] = jd.append(s)# 排序d.sort()# 将 i 点加入到距离最近的质心中cluster[index[d[0]]].append(i)# 3.重新计算k个 cluser 对应的质心center_point_new = {}# for i in range(0, k):#     center_point_new[i] = center_point[i]# print(cluster)for i in cluster:if len(cluster[i]) != 0:x = 0y = 0for point in cluster[i]:x += matrix[0][point]y += matrix[1][point]center_point_new[i] = [x/len(cluster[i]), y/len(cluster[i])]else:center_point_new[i] = center_point[i]# print(center_point_new)# print(center_point)is_same = True# 遍历判断是否质心不再改变for i in center_point:# print(center_point[i])# print(center_point_new[i])if center_point[i] != center_point_new[i]:is_same = False# 如果质心相同，或者已经迭代 n 次即不再继续if is_same or n >= 10:# print(center_point_new, '\n', n)# print(cluster)estimator = computeSSE(center_point_new, cluster)return center_point_new, cluster, estimatorelse:return k_means(matrix, center_point_new, k=k, n=n+1)# 计算SSE
def computeSSE(center_point, cluster):estimator = 0for i in cluster:if len(cluster[i]) != 0:for point in cluster[i]:estimator += (matrix[0][point] - center_point[i][0]) ** 2 + (matrix[1][point] - center_point[i][1]) ** 2return estimatorif __name__ == '__main__':matrix = np.array(random.randint((100), size=(100, 100)))# 利用SSE选择最优kSSE = []  # 存放每次结果的误差平方和for k in range(1, 10):  # K的范围 ： 1-10# k = 10print('*'*40)print('k：', k)# 1.随机选定 聚类中心center_point_index = np.array(random.randint((100), size=(k)))center_point = {}for i in range(0, k):center_point[i] = [matrix[0][center_point_index[i]], matrix[1][center_point_index[i]]]center_point_new, cluster, estimator = k_means(matrix, center_point, k=k, n=1)print('center_point_new:\n', center_point_new, '\ncluster:\n', cluster, '\nestimator:', estimator)SSE.append(estimator)X = range(1, 10)plt.xlabel('k')plt.ylabel('SSE')plt.plot(X, SSE, 'o-')plt.show()

利用 Python 实现 K-means 算法相关推荐

（含源码）利用Python实现KLT跟踪算法
利用Python实现KLT跟踪算法 NVIDIA 视觉编程接口 (VPI: Vision Programming Interface) 是 NVIDIA 的计算机视觉和图像处理软件库,使您能够实现在 ...
kmeans改进 matlab,基于距离函数的改进k―means 算法
摘要:聚类算法在自然科学和和社会科学中都有很普遍的应用,而K-means算法是聚类算法中经典的划分方法之一.但如果数据集内相邻的簇之间离散度相差较大,或者是属性分布区间相差较大,则算法的聚类效果十分有 ...
python机器学习 | K近邻算法学习（1）
K近邻算法学习 1 K近邻算法介绍 1.1算法定义 1.2算法原理 1.3算法讨论 1.3.1 K值选择 1.3.2距离计算 1.3.3 KD树 2 K近邻算法实现 2.1scikit-learn工具 ...
从零开始用Python实现k近邻算法（附代码、数据集）
作者:Tavish Srivastava 翻译:王雨桐校对:丁楠雅本文约2000字,建议阅读8分钟. 本文将带领读者理解KNN算法在分类问题中的使用,并结合案例运用Python进行实战操作. 注意 ...
python实现k均值算法_python实现kMeans算法
聚类是一种无监督的学习,将相似的对象放到同一簇中,有点像是全自动分类,簇内的对象越相似,簇间的对象差别越大,则聚类效果越好. 1.k均值聚类算法 k均值聚类将数据分为k个簇,每个簇通过其质心,即簇中所 ...
基于Python的K近邻算法实现
模式识别 K近邻法目录模式识别 K近邻法 1 一.最近邻.k近邻算法介绍 2 1.1 介绍 2 1.2 近邻法的形式化表示 2 (1)最近邻 2 (2)k近邻 3 二.实验数据集介绍 3 2.1 ...
k means算法C语言伪代码,K均值算法（K-Means）
1. K-Means算法步骤算法步骤收敛性定义,畸变函数(distortion function): 伪代码: 1) 创建k个点作为K个簇的起始质心(经常随机选择) 2) 当任意一个点的蔟分配结果 ...
Python学习——K近邻算法
K-近邻算法介绍 K-近邻算法步骤为了说明算法步骤,这儿引用一个实例电影名称打斗镜头暧昧镜头电影类型泰坦尼克号 3 104 爱情片那些年 2 100 爱情片七月与安生 1 81 爱情片 ...
利用python语言实现分类算法_使用python实现kNN分类算法
k-近邻算法是基本的机器学习算法,算法的原理非常简单: 输入样本数据后,计算输入样本和参考样本之间的距离,找出离输入样本距离最近的k个样本,找出这k个样本中出现频率最高的类标签作为输入样本的类标签,很 ...
python实现k core算法_python实现密度聚类(模板代码+sklearn代码)
本人在此就不搬运书上关于密度聚类的理论知识了,仅仅实现密度聚类的模板代码和调用skelarn的密度聚类算法. 有人好奇,为什么有sklearn库了还要自己去实现呢?其实,库的代码是比自己写的高效且容易 ...

利用 Python 实现 K-means 算法

利用 Python 实现 K-means 算法

利用 Python 实现 K-means 算法相关推荐

最新文章

热门文章