kmeans算法练习

在 http://taoblog421.cn/posts/27782ca8/#more的基础上完成一个练习

现在有部分餐饮客户的消费数据存于数据文件consumption. csv，其中R表示最近一次消费时间间隔，F表示消费频率，M表示消费总金额。编程实现K-Means聚类算法，将客户分类成3类客户群，并评价这些客户群的价值。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RRIQbEQ5-1609490837677)(https://i.loli.net/2021/01/01/sgInCyRraW6itLm.png)]
（图片有问题可直接复制到浏览器打开）

话不多说，直接上代码：

import utils.FileRead;import java.util.*;/*** @Author liaotao* @Date 2021/1/1 15:27*/
public class Kmeans {// 节点列表private static List<Node> nodeList = new ArrayList<>();// 质心private static Map<Integer,Node> centroid = new HashMap<>();public static void main(String[] args) throws Exception {initNodes("f:/consumption.csv");centroid = randomCentroidByKey(3);doIteration();printResult();}/*** 输出聚类结果*/public static void printResult() {for (Node node : nodeList) {System.out.println(Arrays.toString(node.getAttributes()) + "belongs to cluster " + node.getLabel());}for (Integer integer : centroid.keySet()) {List<String> ids = new ArrayList<>();for (Node node : nodeList) {if (node.getLabel() == integer) {ids.add(node.getId());}}System.out.println("属于cluster" + integer + "的id有"+ ids.toString());}}/*** 关键代码，迭代进行聚类*/public static void doIteration() {while (true) {// 1.计算各个节点到质心的距离，最近的就属于哪个簇for (Node node : nodeList) {// 暂时存储一个节点与所有质心的距离，方便取最小值Map<Double,Integer> distance = new HashMap<>();for (Map.Entry<Integer, Node> entry : centroid.entrySet()) {distance.put(getDistance(node,entry.getValue()), entry.getKey());}//找最小值double min = 0;for (Double value : distance.keySet()) {if (min < value) {min = value;}}node.setLabel(distance.get(min));}// 2.计算同一 cluster 中，也就是相同 label 的点向量的平均值，作为新的质心；// 保留旧的质心用于比较Map<Integer,Node> oldCentroid = centroid;// 清空旧的质心centroid = new HashMap<>();int count = 1;List<Integer> labelList = new ArrayList<>();List<CentroidSupport> centroidSupportList = new ArrayList<>();for (Node node : nodeList) {if (! labelList.contains(node.getLabel())) {CentroidSupport centroidSupport = new CentroidSupport(node.getLabel());labelList.add(node.getLabel());centroidSupport.getNodeList().add(node);centroidSupportList.add(centroidSupport);} else {for (CentroidSupport centroidSupport : centroidSupportList) {if (centroidSupport.getLabel() == node.getLabel()) {centroidSupport.getNodeList().add(node);}}}}for (CentroidSupport centroidSupport : centroidSupportList) {Node avg = CentroidSupport.getAvg(centroidSupport.getNodeList());centroid.put(count,avg);count ++;}// 判断质心是否变化，若无变化则退出循环boolean falg = false;for (Map.Entry<Integer, Node> entry : centroid.entrySet()) {Node node1 = centroid.get(entry.getKey());Node node2 = oldCentroid.get(entry.getKey());if (node1.getLabel() == node2.getLabel() && node1.getAttributes() != node2.getAttributes()) {falg = true;}}if (falg) {break;}}}/*** 初始化数据* @param filepath 文件目录*/public static void initNodes(String filepath) throws Exception {List<String> list = FileRead.read(filepath);for (int i = 1; i < list.size(); i++) {List<String> asList = Arrays.asList(list.get(i).split(","));Node node = new Node();node.setId(asList.get(0));double[] attArray = new double[6];attArray[0] = Double.parseDouble(asList.get(1));attArray[1] = Double.parseDouble(asList.get(2));attArray[2] = Double.parseDouble(asList.get(3));node.setAttributes(attArray);nodeList.add(node);}}/*** 估计k值和初始质心* @param nodeList* @return*/public static Map<Integer,Node> computeK(List<Node> nodeList) {// 计算所有点之间距离的平均值// 辅助计算节点列表List<NodeSupport> nodeSupportList = new ArrayList<>();double distanceSum = 0;double distanceAvg;int count = 0;//最终返回结果mapMap<Integer,Node> resultMap = new HashMap<>();for (int i = 0; i < nodeList.size(); i++) {for (int j = i+1; j < nodeList.size(); j++) {distanceSum += getDistance(nodeList.get(i),nodeList.get(j));// 填充nodeSupportListnodeSupportList.add(new NodeSupport(nodeList.get(i),nodeList.get(j),getDistance(nodeList.get(i),nodeList.get(j))));count ++;}}distanceAvg = distanceSum/count;//还要用到计数器count = 3;// 选择初始质心的时候，先选择最远的两个点// 这时想到应该封装一个类，类属性有节点1，节点2，他们之间的距离 方便后续操作//得到距离最远的两个点NodeSupport max = Collections.max(nodeSupportList, (n1, n2) -> (int) (n1.getDistance() - n2.getDistance()));resultMap.put(1,max.getNode1());resultMap.put(2,max.getNode2());// 接下来从这最两个点开始，与这最两个点距离都大于平均距离的点可视为新发现的质心，否则不视之为质心for (Node node : nodeList) {if (getDistance(node,max.getNode1()) > distanceAvg && getDistance(node,max.getNode2()) > distanceAvg) {//新的质心resultMap.put(count,node);count ++;}}return resultMap;}/*** 根据key生成质心* @param k* @return*/public static Map<Integer,Node> randomCentroidByKey(int k) {Map<Integer,Node> resultMap = new HashMap<>();Random random = new Random();for (int i = 1; i <= k; i++) {int index = random.nextInt(nodeList.size());resultMap.put(i,nodeList.get(index));}return resultMap;}/*** 计算两个结点之间得到欧式距离的平方（为了避免比较时出现奇怪的问题，直接用平方来比较）* @param n1 节点1* @param n2 节点2* @return 欧氏距离的平方*/public static double getDistance(Node n1,Node n2) {double distance = 0;for (int i = 0; i < n1.getAttributes().length; i++) {distance += (n1.getAttributes()[i] - n2.getAttributes()[i]) * (n1.getAttributes()[i] - n2.getAttributes()[i]);}return distance;}
}/*** 定义分类的节点*/
class Node {private String id; //标识每条数据private int label; //label(标签)用来记录这个点属于哪个cluster(簇)private double[] attributes = new double[6]; //存放属性，使用数组是可以存放多维的属性public String getId() {return id;}public void setId(String id) {this.id = id;}public int getLabel() {return label;}public void setLabel(int label) {this.label = label;}public double[] getAttributes() {return attributes;}public void setAttributes(double[] attributes) {this.attributes = attributes;}@Overridepublic String toString() {return "Node{" +"id='" + id + '\'' +", label=" + label +", attributes=" + Arrays.toString(attributes) +'}';}
}/*** 辅助完成功能的类（完成质点的选择）*/
class NodeSupport{private Node node1;private Node node2;private double distance;public NodeSupport(Node node1, Node node2, double distance) {this.node1 = node1;this.node2 = node2;this.distance = distance;}public Node getNode1() {return node1;}public void setNode1(Node node1) {this.node1 = node1;}public Node getNode2() {return node2;}public void setNode2(Node node2) {this.node2 = node2;}public double getDistance() {return distance;}public void setDistance(double distance) {this.distance = distance;}@Overridepublic String toString() {return "NodeSupport{" +"node1=" + node1 +", node2=" + node2 +", distance=" + distance +'}';}
}/*** 辅助完成质点划分（迭代时算出平均值）*/
class CentroidSupport{private Integer label;private List<Node> nodeList = new ArrayList<>();/*** 根据节点列表算出平均值* @param list* @return*/public static Node getAvg(List<Node> list) {Node sum = new Node();for (Node node : list) {for (int i = 0; i < node.getAttributes().length; i++) {sum.getAttributes()[i] += node.getAttributes()[i];}}for (int i = 0; i < sum.getAttributes().length; i++) {sum.getAttributes()[i] /= list.size();}return sum;}public Integer getLabel() {return label;}public void setLabel(Integer label) {this.label = label;}public List<Node> getNodeList() {return nodeList;}public void setNodeList(List<Node> nodeList) {this.nodeList = nodeList;}public CentroidSupport(Integer label) {this.label = label;}@Overridepublic String toString() {return "CentroidSupport{" +"label=" + label +", nodeList=" + nodeList +'}';}
}

运行结果：

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-etKoSc4I-1609490837680)(https://i.loli.net/2021/01/01/TJyXfomLld25Qsk.png)]

离谱的怎么分都分不成三类

kmeans算法练习相关推荐

机器学习中的聚类算法（1）：k-means算法
一文详解激光点云的物体聚类:https://mp.weixin.qq.com/s/FmMJn2qjtylUMRGrD5telw 引言: Q:什么是聚类算法? 现在我们在做的深度学习当中,比如图像的识别 ...
python实现K-means算法
K-means算法流程: 随机选k个样本作为初始聚类中心计算数据集中每个样本到k个聚类中心距离,并将其分配到距离最小的聚类中心对于每个聚类,重新计算中心回到2,至得到局部最优解 python代码 ...
Python之机器学习K-means算法实现
一.前言: 今天在宿舍弄了一个下午的代码,总算还好,把这个东西算是熟悉了,还不算是力竭,只算是知道了怎么回事.今天就给大家分享一下我的代码.代码可以运行,运行的Python环境是Python3.6以上 ...
matlab 职坐标,机器学习入门之机器学习实战ByMatlab（四）二分K-means算法
本文主要向大家介绍了机器学习入门之机器学习实战ByMatlab(四)二分K-means算法,通过具体的内容向大家展现,希望对大家学习机器学习入门有所帮助.前面我们在是实现K-means算法的时候,提到 ...
一文详尽系列之K-means算法
点击上方"Datawhale",选择"星标"公众号第一时间获取价值内容 K-means 是我们最常用的基于距离的聚类算法,其认为两个目标的距离越近,相似度越大 ...
标准K-means算法的缺陷、K-mean++初始化算法、初始化算法步骤、Kmeans++算法实现
标准K-means算法的缺陷.K-mean++初始化算法.初始化算法步骤.Kmeans++算法实现目录标准K-means算法的缺陷.K-mean&
Kmeans算法的过程是什么？Kmeans算法的缺陷主要有哪些？
Kmeans算法的过程是什么?Kmeans算法的缺陷主要有哪些? 目录 Kmeans算法的过程是什么?Kmeans算法的缺陷主要有哪些?
AI K-means算法对数据进行聚类分析-实验报告
1. 问题描述及实验要求 K-means算法对data中数据进行聚类分析 (1)算法原理描述 (2)算法结构 (3)写出K-means具体功能函数(不能直接调用sklearn.cluster(Mean ...
「AI科技」机器学习算法之K-means算法原理及缺点改进思路
https://www.toutiao.com/a6641916717624721933/ 2019-01-03 08:00:00 K-means算法是使用得最为广泛的一个算法,本文将介绍K-mean ...
机器学习里如何确定K-Means算法的K值？
[问题] Kmeans算法中,K值所决定的是在该聚类算法中,所要分配聚类的簇的多少.Kmeans算法对初始值是比较敏感的,对于同样的k值,选取的点不同,会影响算法的聚类效果和迭代的次数. [解决方案] ...

kmeans算法练习

kmeans算法练习相关推荐

最新文章

热门文章