使用weka进行Cross-validation实验
Generating cross-validation folds (Java approach)
文献:
http://weka.wikispaces.com/Generating+cross-validation+folds+%28Java+approach%29
This article describes how to generate train/test splits for cross-validation using the Weka API directly.
The following variables are given:
Instances data = ...; // contains the full dataset we wann create train/test sets from
int seed = ...; // the seed for randomizing the data
int folds = ...; // the number of folds to generate, >=2
Randomize the data
First, randomize your data:
Random rand = new Random(seed); // create seeded number generator
randData = new Instances(data); // create copy of original data
randData.randomize(rand); // randomize data with number generator
In case your data has a nominal class and you wanna perform stratified cross-validation:
randData.stratify(folds);
Generate the folds
Single run
Next thing that we have to do is creating the train and the test set:
for (int n = 0; n < folds; n++) {
Instances train = randData.trainCV(folds, n);
Instances test = randData.testCV(folds, n);
// further processing, classification, etc.
...
}
Note:
- the above code is used by the weka.filters.supervised.instance.StratifiedRemoveFolds filter
- the weka.classifiers.Evaluation class and the Explorer/Experimenter would use this method for obtaining the train set:
Instances train = randData.trainCV(folds, n, rand);
Multiple runs
The example above only performs one run of a cross-validation. In case you want to run 10 runs of 10-fold cross-validation, use the following loop:
Instances data = ...; // our dataset again, obtained from somewhere
int runs = 10;
for (int i = 0; i < runs; i++) {
seed = i+1; // every run gets a new, but defined seed value
// see: randomize the data
...
// see: generate the folds
...
}
一个简单的小实验:
继续对上一节中的红酒和白酒进行分类。分类器没有变化,只是增加了重复试验过程
package assignment2;import weka.core.Instances;import weka.core.converters.ConverterUtils.DataSource;import weka.core.Utils;import weka.classifiers.Classifier;import weka.classifiers.Evaluation;import weka.classifiers.trees.J48;import weka.filters.Filter;import weka.filters.unsupervised.attribute.Remove;import java.io.FileReader;import java.util.Random;public class cv_rw {public static Instances getFileInstances(String filename) throws Exception{FileReader frData =new FileReader(filename);Instances data = new Instances(frData);int length= data.numAttributes();String[] options = new String[2];options[0]="-R";options[1]=Integer.toString(length);Remove remove =new Remove();remove.setOptions(options);remove.setInputFormat(data);Instances newData= Filter.useFilter(data, remove);return newData;}public static void main(String[] args) throws Exception {// loads data and set class index Instances data = getFileInstances("D://Weka_tutorial//WineQuality//RedWhiteWine.arff");// System.out.println(instances); data.setClassIndex(data.numAttributes()-1);// classifier// String[] tmpOptions;// String classname;// tmpOptions = Utils.splitOptions(Utils.getOption("W", args));// classname = tmpOptions[0];// tmpOptions[0] = "";// Classifier cls = (Classifier) Utils.forName(Classifier.class, classname, tmpOptions);//// // other options// int runs = Integer.parseInt(Utils.getOption("r", args));//重复试验// int folds = Integer.parseInt(Utils.getOption("x", args));int runs=1;int folds=10;J48 j48= new J48();// j48.buildClassifier(instances);// perform cross-validationfor (int i = 0; i < runs; i++) {// randomize dataint seed = i + 1;Random rand = new Random(seed);Instances randData = new Instances(data);randData.randomize(rand);// if (randData.classAttribute().isNominal()) //没看懂这里什么意思,往高手回复,万分感谢// randData.stratify(folds); Evaluation eval = new Evaluation(randData);for (int n = 0; n < folds; n++) {Instances train = randData.trainCV(folds, n);Instances test = randData.testCV(folds, n);// the above code is used by the StratifiedRemoveFolds filter, the// code below by the Explorer/Experimenter:// Instances train = randData.trainCV(folds, n, rand);// build and evaluate classifier Classifier j48Copy = Classifier.makeCopy(j48);j48Copy.buildClassifier(train);eval.evaluateModel(j48Copy, test);}// output evaluation System.out.println();System.out.println("=== Setup run " + (i+1) + " ===");System.out.println("Classifier: " + j48.getClass().getName());System.out.println("Dataset: " + data.relationName());System.out.println("Folds: " + folds);System.out.println("Seed: " + seed);System.out.println();System.out.println(eval.toSummaryString("=== " + folds + "-fold Cross-validation run " + (i+1) + "===", false));}}}
运行程序得到实验结果:
=== Setup run 1 ===
Classifier: weka.classifiers.trees.J48
Dataset: RedWhiteWine-weka.filters.unsupervised.instance.Randomize-S42-weka.filters.unsupervised.instance.Randomize-S42-weka.filters.unsupervised.attribute.Remove-R13
Folds: 10
Seed: 1
=== 10-fold Cross-validation run 1===
Correctly Classified Instances 6415 98.7379 %
Incorrectly Classified Instances 82 1.2621 %
Kappa statistic 0.9658
Mean absolute error 0.0159
Root mean squared error 0.1109
Relative absolute error 4.2898 %
Root relative squared error 25.7448 %
Total Number of Instances 6497
转载于:https://www.cnblogs.com/7899-89/p/3667330.html
使用weka进行Cross-validation实验相关推荐
- Java机器学习库ML之九交叉验证法(Cross Validation)
交叉验证(Cross Validation,CV)是用来验证分类器的性能一种统计分析方法,基本思想是把在某种意义下将原始数据(dataset)进行分组,一部分做为训练集(train set),另一部分 ...
- 交叉验证(Cross Validation)方法思想简介
交叉验证(CrossValidation)方法思想 以下简称交叉验证(Cross Validation)为CV.CV是用来验证分类器的性能一种统计分析方法,基本思想是把在某种意义下将原始数据(data ...
- Sklearn——交叉验证(Cross Validation)
文章目录 1.前言 2.非交叉验证实验 3.交叉验证实验 4.准确率与平方误差 4.1.准确率实验 4.2.均方误差实验 5.Learning curve 检查过拟合 5.1.加载必要模块 5.2.加 ...
- 【超参数寻优】交叉验证(Cross Validation)超参数寻优的python实现:单一参数寻优
[超参数寻优]交叉验证(Cross Validation)超参数寻优的python实现:单一参数寻优 一.交叉验证的意义 二.常用的交叉验证方法 1.Hold one method 2.K-flod ...
- 深度学习:交叉验证(Cross Validation)
首先,交叉验证的目的是为了让被评估的模型达到最优的泛化性能,找到使得模型泛化性能最优的超参值.在全部训练集上重新训练模型,并使用独立测试集对模型性能做出最终评价. 目前在一些论文里倒是没有特别强调这样 ...
- 十折交叉验证10-fold cross validation, 数据集划分 训练集 验证集 测试集
机器学习 数据挖掘 数据集划分 训练集 验证集 测试集 Q:如何将数据集划分为测试数据集和训练数据集? A:three ways: 1.像sklearn一样,提供一个将数据集切分成训练集和测试集的函数 ...
- R语言构建xgboost模型:交叉验证(cross validation)训练xgboost模型,配置自定义的损失函数评估函数并使用交叉验证训练xgboost模型
R语言构建xgboost模型:交叉验证(cross validation)训练xgboost模型,配置自定义的损失函数(loss function).评估函数(evaluation function) ...
- R语言构建xgboost模型:交叉验证(cross validation)训练xgboost模型
R语言构建xgboost模型:交叉验证(cross validation)训练xgboost模型 目录
- python可视化多个机器学习模型在训练集(train set)上交叉验证(cross validation)的AUC值、可视化模型效能
python可视化多个机器学习模型在训练集(train set)上交叉验证(cross validation)的AUC值.可视化模型效能 # 所有的模型中填写的参数都是通过randomsearchcv ...
- R使用交叉验证(cross validation)进行机器学习模型性能评估
R使用交叉验证(cross validation)进行机器学习模型性能评估 目录 R使用交叉验证(cross validation)进行机器学习模型性能评估
最新文章
- ACE之Reactor模式使用实例
- vegas9.0合成计时器
- 关于质量标准化的思考和实践
- linux内核完全剖析0.11,linux0.11内核完全剖析 - ramdisk.c
- myeclipse报错:The superclass javax.servlet.http.HttpServlet was not found on the Java Build Path
- 三维点云学习(2)五种算法比较
- html圆如何找到垂直中心线,一种用于找中心线及圆心的装置的制作方法
- python小人画爱心_用Python画一颗心、小人发射爱心(附源码)
- Linux入门之常用命令(10)软连接 硬链接
- Markdown中设置图片尺寸及添加图注
- QQ通讯组件(网页中的在线客服、唤起QQ临时会话)
- 1 月 9 日:iPhone 问世
- 买礼物(线段树+set维护)
- 2015年中国数据库技术大会(DTCC)PPT合集
- delta和gamma中性_Delta中性
- 转:最好的300款免费软件
- Vector BLF格式转ASC格式软件 QT+C++编写
- 全国农村固定观察点调查数据
- 计算机基础长江出版社课件,《计算机应用基础多媒体课件的设计.doc
- 专升本英语6套学习笔记和三套模拟试卷
热门文章
- oracle网络公开课《存储技术》课件和视频共享下载
- Trigger4Orchard
- 软件工程师的发明家—从发明家的视角分析软件
- Nginx 0.8.x + PHP 5.2.13(FastCGI)搭建胜过Apache十倍的Web服务器(第6版)
- c转义字符以及常见问题和解决方法||c中的注释
- CTFshow 反序列化 web267
- CTFshow php特性 web90
- opencv中的imread不支持中文路径的解决办法
- 5.1matlab数据统计分析(最大值、最小值、平均值、中值、和、积、累加和、累加积、标准差、相关系数、排序)
- laplacian算子的运用