Kmeans 均值类聚算法 (numpy库版)

对上篇Kmeans算法改写并没有重写稍显繁琐。。

import random
import matplotlib.pyplot as plt
import numpy as np
import timeclass KMeans():def __init__(self, k=1):''':param k: k代表分类数'''self.__k = kself.__data = None         # 存放原始数据self.__pointCenter = None  # 存放中心点，第一次获得的中心点通过随机方式在__data里随机出来self.__result = []       # 存放分类结果for i in range(k):self.__result.append([]) passpassdef fit(self, data, threshold, times=50000):'''进行模型训练:param data: 训练数据:param threshold: 阈值，退出条件:return:'''self.__data = dataself.randomCenter()print(self.__pointCenter)centerDistance = self.calPointCenterDistance(self.__pointCenter, self.__data)# 对原始数据进行分类，将每个点分到离它最近的中心点i = 0for temp in centerDistance:index = np.argmin(temp)self.__result[index].append(self.__data[i])i += 1pass# 打印分类结果# print(self.__result)oldCenterPoint = self.__pointCenternewCenterPoint = self.calNewPointCenter(self.__result)while np.sum(np.sum((oldCenterPoint -  newCenterPoint)**2, axis=1)**0.5)/self.__k > threshold:times -= 1result = []for i in range(self.__k):result.append([])pass# 保存上次的中心点oldCenterPoint = newCenterPointcenterDistance = self.calPointCenterDistance(newCenterPoint, self.__data)# 对原始数据进行分类，将每个点分到离它最近的中心点i = 0for temp in centerDistance:index = np.argmin(temp)result[index].append(self.__data[i]) # result = [[[10,20]]]i += 1passnewCenterPoint = self.calNewPointCenter(result)self.__result = resultpassself.__pointCenter = newCenterPointreturn newCenterPoint, self.__resultpassdef calPointCenterDistance(self, center, data):'''计算每个点和每个中心点之间的距离:return:'''centerDistance = []flag = Falsefor temp in data:centerDistance.append([np.sum((center - temp) ** 2, axis=1) ** 0.5])pass# print(centerDistance)return np.array(centerDistance)passdef calNewPointCenter(self, result):'''计算新的中心点:param result::return:'''newCenterPoint = Noneflag = Falsefor temp in result:# 转置temps = np.array(temp)point = np.mean(temps, axis=0)if not flag:newCenterPoint = np.array([point])flag = Truepasselse:newCenterPoint = np.vstack((newCenterPoint, point))pass# print(newCenterPoint)return newCenterPointpassdef randomCenter(self):'''从原始的__data里随机出最开始进行计算的k个中心点:return:'''if not self.__pointCenter:index = random.randint(0, len(self.__data) - 1)self.__pointCenter = np.array([self.__data[index]])passwhile len(self.__pointCenter) < self.__k:# 随机一个索引index = random.randint(0, len(self.__data) - 1)# 判断中心点是否重复，如果不重复，加入中心点列表if self.__data[index] not in self.__pointCenter:self.__pointCenter = np.vstack((self.__pointCenter, self.__data[index]))passpasspasspassif __name__ == "__main__":# 原始数据改为nunmpy结构data = np.random.randint(0, 100, 20000).reshape(10000, 2)# print(data)startTime = time.time()kmeans = KMeans(k=5)centerPoint, result = kmeans.fit(data, 0.0001)print(time.time() - startTime)print(centerPoint)plt.plot()plt.title("KMeans Classification")i = 0tempx = []tempy = []color = []for temp in result:temps = [[temp[x][i] for x in range(len(temp))] for i in range(len(temp[0]))]color += [i] * len(temps[0])tempx += temps[0]tempy += temps[1]i += 2passplt.scatter(tempx, tempy, c=color, s=30)plt.show()pass

效果图：

Kmeans 均值类聚算法 (numpy库版)相关推荐

Kmeans 均值类聚算法（无numpy版）
Kmeans: 指在平面坐标系中随机生成M个点然后随机N个类然后在这M个点中随机出N个点作为分类的中心点然后计算其他点和这N个点之间的距离,将每个点分给距离最近的中心点,最后在这N个点确定的N类中 ...
基于sklearn的k均值类聚模型
理论无监督学习无监督学习是相对于有监督学习的概念,无监督学习的样本只有数据没有标签(label),由模型自主发现样本之间的关系.可用于数据的类聚(类聚算法)和降维(主成分分析)等. 无监督学习的结 ...
类聚算法matlab,机器学习实战ByMatlab（一）KNN算法
KNN 算法其实简单的说就是"物以类聚",也就是将新的没有被分类的点分类为周围的点中大多数属于的类.它采用测量不同特征值之间的距离方法进行分类,思想很简单:如果一个样本的特征空间中 ...
mapreduce实现简单的K-M类聚
2019独角兽企业重金招聘Python工程师标准>>> 上代码首先是map public static class KMmap extends Mapper<LongWrit ...
Kmeans均值聚类算法
Kmeans均值聚类算法 Kmeans白话理解 Kmeans原理详解聚类与分类原理介绍工作流程评价指标 Kmeans代码实现聊一聊Kmeans的缺点及优化 Kmeans白话理解 Kmeans ...
聚类 python 代码_不足 20 行 Python 代码，高效实现 k-means 均值聚类算法
下载好向圈APP可以快速联系圈友您需要登录才可以下载或查看,没有帐号?立即注册 x 不足 20 行 Python 代码,高效实现 k-means 均值聚类算法-1.jpg (143.81 KB, ...
不足 20 行 Python 代码，高效实现 k-means 均值聚类算法！
作者 | 许文武责编 | 郭芮出品 | CSDN 博客 scikti-learn 将机器学习分为4个领域,分别是分类(classification).聚类(clustering).回归(regre ...
KMEANS均值聚类和层次聚类：亚洲国家地区生活幸福质量异同可视化分析和选择最佳聚类数...
阅读全文:http://tecdat.cn/?p=24198 <世界幸福报告>是可持续发展解决方案网络的年度报告,该报告使用盖洛普世界民意调查的调查结果研究了150多个国家/地区的生活质量 ...
不足20行 python 代码，高效实现 k-means 均值聚类算法
scikti-learn 将机器学习分为4个领域,分别是分类(classification).聚类(clustering).回归(regression)和降维(dimensionality reduc ...

Kmeans 均值类聚算法 (numpy库版)

Kmeans 均值类聚算法 (numpy库版)相关推荐

最新文章

热门文章