为更好理解聚类算法,从网上找现成代码来理解,发现了一个Java自身的ML库,链接:http://java-ml.sourceforge.net/

有兴趣可以下载来看看源码,理解基础ML算法。对于DBSCAN算法,从网上找到一个Java实现的,主要是用来理解其算法过程。参考代码如下:

1、Point类,数据对象

package sk.cluster;public class Point {private double x;//坐标x轴private double y;//坐标y轴private boolean isVisit;//是佛访问标记private int cluster;//所属簇类private boolean isNoised;//是否是噪音数据public Point(double x,double y) {this.x = x;this.y = y;this.isVisit = false;this.cluster = 0;this.isNoised = false;}public double getDistance(Point point) {//计算两点间距离return Math.sqrt((x-point.x)*(x-point.x)+(y-point.y)*(y-point.y));}public void setX(double x) {this.x = x;}public double getX() {return x;}public void setY(double y) {this.y = y;}public double getY() {return y;}public void setVisit(boolean isVisit) {this.isVisit = isVisit;}public boolean getVisit() {return isVisit;}public int getCluster() {return cluster;}public void setNoised(boolean isNoised) {this.isNoised = isNoised;}public void setCluster(int cluster) {this.cluster = cluster;}public boolean getNoised() {return this.isNoised;}@Overridepublic String toString() {return x+" "+y+" "+cluster+" "+(isNoised?1:0);}}

2、Data类,数据集

package sk.cluster;import java.io.*;
import java.text.DecimalFormat;
import java.text.NumberFormat;
import java.util.ArrayList;
import java.util.Random;public class Data {private static DecimalFormat df=(DecimalFormat) NumberFormat.getInstance();//随机生成数据public static ArrayList<Point> generateSinData(int size) {ArrayList<Point> points = new ArrayList<Point>(size);Random rd = new Random(size);for (int i=0;i<size/2;i++) {double x = format(Math.PI / (size / 2) * (i + 1));double y = format(Math.sin(x)) ;points.add(new Point(x,y));}for (int i=0;i<size/2;i++) {double x = format(1.5 + Math.PI / (size/2) * (i+1));double y = format(Math.cos(x));points.add(new Point(x,y));}return points;}//输入指定数据public static ArrayList<Point> generateSpecialData() {ArrayList<Point> points = new ArrayList<Point>();points.add(new Point(2,2));points.add(new Point(3,1));points.add(new Point(3,4));points.add(new Point(3,14));points.add(new Point(5,3));points.add(new Point(8,3));points.add(new Point(8,6));points.add(new Point(9,8));points.add(new Point(10,4));points.add(new Point(10,7));points.add(new Point(10,10));points.add(new Point(10,14));points.add(new Point(11,13));points.add(new Point(12,7));points.add(new Point(12,15));points.add(new Point(14,7));points.add(new Point(14,9));points.add(new Point(14,15));points.add(new Point(15,8));return points;}//获取文件数据public static ArrayList<Point> getData(String sourcePath) {ArrayList<Point> points = new ArrayList<Point>();File fileIn = new File(sourcePath);try {BufferedReader br = new BufferedReader(new FileReader(fileIn));String line = null;line = br.readLine();while (line != null) {Double x = Double.parseDouble(line.split(",")[3]);Double y = Double.parseDouble(line.split(",")[4]);points.add(new Point(x, y));line = br.readLine();}br.close();} catch (IOException e) {e.printStackTrace();}return points;}//输出到文件public static void writeData(ArrayList<Point> points,String path) {try {BufferedWriter bw = new BufferedWriter(new FileWriter(path));for (Point point:points) {bw.write(point.toString()+"\n");}bw.close();} catch (IOException e) {e.printStackTrace();}}private static double format(double x) {return Double.valueOf(df.format(x));}}

3、DBSCAN类,实现DBSCAN算法

package sk.cluster;import java.util.ArrayList;public class DBScan {private double radius;private int minPts;public DBScan(double radius,int minPts) {this.radius = radius;//领域半径参数this.minPts = minPts;//领域密度值,该领域内有多少个样本}public void process(ArrayList<Point> points) {int size = points.size();int idx = 0;int cluster = 1;while (idx<size) {//样本总数Point p = points.get(idx++);//choose an unvisited pointif (!p.getVisit()) {p.setVisit(true);//set visitedArrayList<Point> adjacentPoints = getAdjacentPoints(p, points);//计算两点距离,看是否在领域内//set the point which adjacent points less than minPts noisedif (adjacentPoints != null && adjacentPoints.size() < minPts) {p.setNoised(true);//噪音数据} else {//建立该点作为领域核心对象p.setCluster(cluster);for (int i = 0; i < adjacentPoints.size(); i++) {Point adjacentPoint = adjacentPoints.get(i);//领域内的样本//only check unvisited point, cause only unvisited have the chance to add new adjacent pointsif (!adjacentPoint.getVisit()) {adjacentPoint.setVisit(true);ArrayList<Point> adjacentAdjacentPoints = getAdjacentPoints(adjacentPoint, points);//add point which adjacent points not less than minPts noisedif (adjacentAdjacentPoints != null && adjacentAdjacentPoints.size() >= minPts) {//adjacentPoints.addAll(adjacentAdjacentPoints);for (Point pp : adjacentAdjacentPoints){if (!adjacentPoints.contains(pp)){adjacentPoints.add(pp);}}}}//add point which doest not belong to any clusterif (adjacentPoint.getCluster() == 0) {adjacentPoint.setCluster(cluster);//set point which marked noised before non-noisedif (adjacentPoint.getNoised()) {adjacentPoint.setNoised(false);}}}cluster++;}}if (idx%1000==0) {System.out.println(idx);}}}private ArrayList<Point> getAdjacentPoints(Point centerPoint,ArrayList<Point> points) {ArrayList<Point> adjacentPoints = new ArrayList<Point>();for (Point p:points) {//include centerPoint itselfdouble distance = centerPoint.getDistance(p);if (distance<=radius) {adjacentPoints.add(p);}}return adjacentPoints;}}
/*
##DBScan算法流程图算法:DBScan,基于密度的聚类算法
输入:D:一个包含n个数据的数据集r:半径参数minPts:领域密度阈值
输出:基于密度的聚类集合
标记D中所有的点为unvisted
for each p in Dif p.visit = unvisted找出与点p距离不大于r的所有点集合NIf N.size() < minPts标记点p为噪声点Elsefor each p' in NIf p'.visit == unvisted找出与点p距离不大于r的所有点集合N'If N'.size()>=minPts将集合N'加入集合N中去End ifElseIf p'未被聚到某个簇将p'聚到当前簇If p'被标记为噪声点将p'取消标记为噪声点End IfEnd IfEnd IfEnd forEnd ifEnd if
End for
*/

4、client测试类

package sk.cluster;import java.util.ArrayList;public class Client {public static void main(String[] args) {ArrayList<Point> points = Data.generateSinData(200);//随机生成200个pointDBScan dbScan = new DBScan(0.6,4);//r:领域半径参数 ,minPts领域密度阈值,密度值//ArrayList<Point> points = Data.generateSpecialData();//ArrayList<Point> points = Data.getData("D:\\tmp\\testData.txt");//DBScan dbScan = new DBScan(0.1,1000);dbScan.process(points);for (Point p:points) {System.out.println(p);}Data.writeData(points,"D:\\tmp\\data.txt");}}

机器学习知识点(十八)密度聚类DBSCAN算法Java实现相关推荐

  1. 机器学习知识点(十六)集成学习AdaBoost算法Java实现

    理解http://blog.csdn.net/fjssharpsword/article/details/61913092中AdaBoost算法,从网上找了一套简单的代码加以理解. 1.基分类器,实现 ...

  2. 机器学习强基计划7-5:图文详解密度聚类DBSCAN算法(附Python实现)

    目录 0 写在前面 1 密度聚类 2 DBSCAN算法 3 Python实现 3.1 算法复现 3.2 可视化实验 0 写在前面 机器学习强基计划聚焦深度和广度,加深对机器学习模型的理解与应用.&qu ...

  3. (十八)密度聚类DBSCAN

    密度聚类DBSCAN DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法) 是一种很典型 ...

  4. Python基于聚类算法实现密度聚类(DBSCAN)计算

    本文实例讲述了Python基于聚类算法实现密度聚类(DBSCAN)计算.分享给大家供大家参考,具体如下: 算法思想 基于密度的聚类算法从样本密度的角度考察样本之间的可连接性,并基于可连接样本不断扩展聚 ...

  5. dbscan聚类算法matlab_密度聚类DBSCAN、HDBSCAN(转)

    # 密度聚类DBSCAN.HDBSCAN DBSCAN DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声 ...

  6. 密度聚类OPTICS算法

    密度聚类OPTICS算法 DBSCAN有一些缺点,如:参数的设定,比如说阈值和半径  这些参数对结果很敏感,还有就是该算法是全局密度的,假若数据集的密度变化很大时,可能识别不出某些簇. 核心距离:假定 ...

  7. 【Python-ML】SKlearn库密度聚类DBSCAN模型

    # -*- coding: utf-8 -*- ''' Created on 2018年1月25日 @author: Jason.F @summary: 无监督聚类学习-基于密度 空间的聚类算法(De ...

  8. java dbscan_聚类(DBSCAN)算法原理

    DBSCAN(Density-Based Spatial Clustering of Applications with Noise,具有噪声的基于密度的聚类方法)是一种很典型的密度聚类算法,和 K- ...

  9. 【机器学习】使用scikitLearn对数据进行聚类:Kmeans聚类算法的应用及密度聚类DBSCAN

    无监督学习: [机器学习]使用scikitLearn对数据进行聚类:Kmeans聚类算法及聚类效果评估 [机器学习]使用scikitLearn对数据进行聚类:高斯聚类GaussianMixture [ ...

最新文章

  1. mysql8.0.12密码_mysql8.0.12如何重置root密码
  2. 五款常用协议分析处理工具推荐
  3. 小白都能看得懂的java虚拟机内存模型
  4. canvas用2d渲染出3d的感觉
  5. 程维谈智慧交通:我们赶上好时代 走出了自己路
  6. leetcode力扣617. 合并二叉树
  7. 腾讯正式入局中视频领域
  8. Android 系统应用Setting开发总结
  9. 【付费毕设】php mysql社团报名管理系统
  10. Android控件之HorizontalScrollView 去掉滚动条
  11. YOLOv5永不缺席 | YOLO-Pose带来实时性高且易部署的姿态估计模型!!!
  12. 擎标带你了解CMMI3与CMMI5的区别
  13. python num函数,python函数
  14. C++ : 陶陶摘苹果
  15. 消息中间件MQ的学习境界和路线
  16. C语言比较两个字符串相等为什么不是用“==”
  17. TWaver使用中间点画折线方法
  18. Mysql_sql存储过程
  19. 矩阵寻找目标值的技巧
  20. Python每日一练——列表,元组和字典第五关:单星运算符和双星运算符

热门文章

  1. [RDMA] RDMA 初步使用操作
  2. 联系人排序java代码_Android仿微信联系人按字母排序_脚本之家
  3. java 版本兼容问题_3.5版本存在jdk兼容的问题
  4. java crossdomin.xml_crossdomain.xml的配置详解
  5. Linux下ssh登录速度慢的解决办法
  6. Spring Boot 应用系列 5 -- Spring Boot 2 整合logback
  7. 【oracle】oracle经典sql,exception,database link纠错
  8. 炸金花的JS实现从0开始之 -------现在什么都不会(1)
  9. Wisdom RESTClient支持自动化测试并可以生成API文档
  10. 初识Hibernate之关联映射(一)