Kmeans 均值类聚算法（无numpy版）

Kmeans：指在平面坐标系中随机生成M个点然后随机N个类然后在这M个点中随机出N个点作为分类的中心点然后计算其他点和这N个点之间的距离，将每个点分给距离最近的中心点，最后在这N个点确定的N类中分别计算新的距离这类中所有点距离相近的新的中心点称为质心。

import random
import matplotlib.pyplot as plt
import numpyclass KMeans():def __init__(self, k=1):''':param k: k代表分类数'''self.__k = kself.__data = []  # 存放原始数据self.__pointCenter = []  # 存放中心点，第一次获得的中心点通过随机方式在__data里随机出来self.__result = []for i in range(k):self.__result.append([])  # [[],[],[],[],[]]passpassdef fit(self, data, threshold, times=50000):'''进行模型训练:param data: 训练数据:param threshold: 阈值，退出条件:return:'''self.__data = dataself.randomCenter()print(self.__pointCenter)centerDistance = self.calPointCenterDistance(self.__pointCenter, self.__data)# 对原始数据进行分类，将每个点分到离它最近的中心点i = 0for temp in centerDistance:index = temp.index(min(temp))self.__result[index].append(self.__data[i])i += 1pass# 打印分类结果# print(self.__result)oldCenterPoint = self.__pointCenternewCenterPoint = self.calNewPointCenter(self.__result)while self.calCenterToCenterDistance(oldCenterPoint, newCenterPoint) > threshold:times -= 1result = []for i in range(self.__k):result.append([])pass# 保存上次的中心点oldCenterPoint = newCenterPointcenterDistance = self.calPointCenterDistance(newCenterPoint, self.__data)# 对原始数据进行分类，将每个点分到离它最近的中心点i = 0for temp in centerDistance:index = temp.index(min(temp))result[index].append(self.__data[i])  # result = [[[10,20]]]i += 1passnewCenterPoint = self.calNewPointCenter(result)print(self.calCenterToCenterDistance(oldCenterPoint, newCenterPoint))self.__result = resultpassself.__pointCenter = newCenterPointreturn newCenterPoint, self.__resultpassdef calCenterToCenterDistance(self, old, new):'''计算两次中心点之间的距离，求和求均值:param old: 上次的中心点:param new: 新计算的中心点:return:'''total = 0for point1, point2 in zip(old, new):total += self.distance(point1, point2)passreturn total / len(old)passdef calPointCenterDistance(self, center, data):'''计算每个点和每个中心点之间的距离:return:'''centerDistance = []for temp in data:centerDistance.append([self.distance(temp, point) for point in center])passprint(centerDistance)return centerDistancepassdef calNewPointCenter(self, result):'''计算新的中心点:param result::return:'''newCenterPoint = []for temp in result:# 转置temps = [[temp[x][i] for x in range(len(temp))] for i in range(len(temp[0]))]point = []for t in temps:# 对每个维度求和，去平均point.append(sum(t) / len(t))  # meanpassnewCenterPoint.append(point)passprint(newCenterPoint)return newCenterPointpassdef distance(self, pointer1, pointer2):'''计算两个点之间的距离，支持任意维度，欧式距离:param pointer1::param pointer2::return:'''distance = (sum([(x1 - x2) ** 2 for x1, x2 in zip(pointer1, pointer2)])) ** 0.5return distancepassdef randomCenter(self):'''从原始的__data里随机出最开始进行计算的k个中心点:return:'''while len(self.__pointCenter) < self.__k:# 随机一个索引index = random.randint(0, len(self.__data) - 1)# 判断中心点是否重复，如果不重复，加入中心点列表if self.__data[index] not in self.__pointCenter:self.__pointCenter.append(self.__data[index])passpasspasspassif __name__ == "__main__":data = [[random.randint(1, 100), random.randint(1, 100)] for i in range(1000)]for i in range(10):kmeans = KMeans(k=5)centerPoint, result = kmeans.fit(data, 0.0001)print(centerPoint)plt.plot()plt.title("KMeans Classification")i = 0tempx = []tempy = []color = []for temp in result:temps = [[temp[x][i] for x in range(len(temp))] for i in range(len(temp[0]))]color += [i] * len(temps[0])tempx += temps[0]tempy += temps[1]i += 2passplt.scatter(tempx, tempy, c=color, s=30)plt.show()passpass

效果图：

Kmeans 均值类聚算法（无numpy版）相关推荐

Kmeans 均值类聚算法 (numpy库版)
对上篇Kmeans算法改写并没有重写稍显繁琐.. import random import matplotlib.pyplot as plt import numpy as np import tim ...
基于sklearn的k均值类聚模型
理论无监督学习无监督学习是相对于有监督学习的概念,无监督学习的样本只有数据没有标签(label),由模型自主发现样本之间的关系.可用于数据的类聚(类聚算法)和降维(主成分分析)等. 无监督学习的结 ...
类聚算法matlab,机器学习实战ByMatlab（一）KNN算法
KNN 算法其实简单的说就是"物以类聚",也就是将新的没有被分类的点分类为周围的点中大多数属于的类.它采用测量不同特征值之间的距离方法进行分类,思想很简单:如果一个样本的特征空间中 ...
mapreduce实现简单的K-M类聚
2019独角兽企业重金招聘Python工程师标准>>> 上代码首先是map public static class KMmap extends Mapper<LongWrit ...
Kmeans均值聚类算法
Kmeans均值聚类算法 Kmeans白话理解 Kmeans原理详解聚类与分类原理介绍工作流程评价指标 Kmeans代码实现聊一聊Kmeans的缺点及优化 Kmeans白话理解 Kmeans ...
无监督机器学习中，最常见4类聚类算法总结 | 技术头条
点击上方↑↑↑蓝字关注我们~ 「2019 Python开发者日」,购票请扫码咨询 ↑↑↑ 编译 | 安然.狄思云来源 | 读芯术(ID:AI_Discovery) 在机器学习过程中,很多数据都具有特 ...
KMEANS均值聚类和层次聚类：亚洲国家地区生活幸福质量异同可视化分析和选择最佳聚类数...
阅读全文:http://tecdat.cn/?p=24198 <世界幸福报告>是可持续发展解决方案网络的年度报告,该报告使用盖洛普世界民意调查的调查结果研究了150多个国家/地区的生活质量 ...
聚类 python 代码_不足 20 行 Python 代码，高效实现 k-means 均值聚类算法
下载好向圈APP可以快速联系圈友您需要登录才可以下载或查看,没有帐号?立即注册 x 不足 20 行 Python 代码,高效实现 k-means 均值聚类算法-1.jpg (143.81 KB, ...
不足 20 行 Python 代码，高效实现 k-means 均值聚类算法！
作者 | 许文武责编 | 郭芮出品 | CSDN 博客 scikti-learn 将机器学习分为4个领域,分别是分类(classification).聚类(clustering).回归(regre ...

Kmeans 均值类聚算法（无numpy版）

Kmeans 均值类聚算法（无numpy版）相关推荐

最新文章

热门文章

Kmeans 均值类聚算法 （无numpy版）

Kmeans 均值类聚算法 （无numpy版）相关推荐

最新文章

热门文章

Kmeans 均值类聚算法（无numpy版）

Kmeans 均值类聚算法（无numpy版）相关推荐