MATLAB中调用Weka设置方法(转)及示例
本文转自:
http://blog.sina.com.cn/s/blog_890c6aa30101av9x.html
MATLAB命令行下验证Java版本命令
version -java
配置MATLAB调用Java库
- Finish Java codes.
- Create Java library file, i.e., .jar file.
- Put created .jar file to one of directories Matlab uses for storing libraries, and add corresponding path to
Matlab configuration file, $MATLABINSTALLDIR\$MatlabVersion\toolbox\local\classpath.txt.
配置MATLAB调用Weka
- 下载weka
- 安装weka
- 在环境变量的系统变量中的Path中加入jre6(或者其他的)中bin文件夹的绝对路径,如:
C:\Program Files\Java\jre1.8.0_77\bin; - 查找MATLAB配置文件classpath.txt
which classpath.txt %使用这个命令可以查找classpath.txt的位置 - 修改配置文件classpath.txt
edit classpath.txt
在classpath.txt配置文件中将weka安装目录下的weka.jar的绝对安装路径填入,如:
C:\Program Files\Weka-3-8\weka.jar - 重启MATLAB
运行如下命令:
attributes = javaObject(‘weka.core.FastVector’);
%如果MATLAB没有报错,就说明配置成功了Matlab在调用weka中的类时,经常遇见heap space溢出的情况,我们需要设置较大的堆栈,设置方法是:
Matlab->File->Preference->General->Java Heap Memory, 然后设置适当的值。
Matlab调用Weka示例
代码来自:
http://cn.mathworks.com/matlabcentral/fileexchange/37311-smoteboost
http://www.mathworks.com/matlabcentral/fileexchange/37315-rusboost
clc;
clear all;
close all;file = 'data.csv'; % Dataset% Reading training file
data = dlmread(file);
label = data(:,end);% Extracting positive data points
idx = (label==1);
pos_data = data(idx,:);
row_pos = size(pos_data,1);% Extracting negative data points
neg_data = data(~idx,:);
row_neg = size(neg_data,1);% Random permuation of positive and negative data points
p = randperm(row_pos);
n = randperm(row_neg);% 80-20 split for training and test
tstpf = p(1:round(row_pos/5));
tstnf = n(1:round(row_neg/5));
trpf = setdiff(p, tstpf);
trnf = setdiff(n, tstnf);train_data = [pos_data(trpf,:);neg_data(trnf,:)];
test_data = [pos_data(tstpf,:);neg_data(tstnf,:)];% Decision Tree
prediction = SMOTEBoost(train_data,test_data,'tree',false);
disp (' Label Probability');
disp ('-----------------------------');
disp (prediction);
function prediction = SMOTEBoost (TRAIN,TEST,WeakLearn,ClassDist)
% This function implements the SMOTEBoost Algorithm. For more details on the
% theoretical description of the algorithm please refer to the following
% paper:
% N.V. Chawla, A.Lazarevic, L.O. Hall, K. Bowyer, "SMOTEBoost: Improving
% Prediction of Minority Class in Boosting, Journal of Knowledge Discovery
% in Databases: PKDD, 2003.
% Input: TRAIN = Training data as matrix
% TEST = Test data as matrix
% WeakLearn = String to choose algortihm. Choices are
% 'svm','tree','knn' and 'logistic'.
% ClassDist = true or false. true indicates that the class
% distribution is maintained while doing weighted
% resampling and before SMOTE is called at each
% iteration. false indicates that the class distribution
% is not maintained while resampling.
% Output: prediction = size(TEST,1)x 2 matrix. Col 1 is class labels for
% all instances. Col 2 is probability of the instances
% being classified as positive class.javaaddpath('weka.jar');%% Training SMOTEBoost
% Total number of instances in the training set
m = size(TRAIN,1);
POS_DATA = TRAIN(TRAIN(:,end)==1,:);
NEG_DATA = TRAIN(TRAIN(:,end)==0,:);
pos_size = size(POS_DATA,1);
neg_size = size(NEG_DATA,1);% Reorganize TRAIN by putting all the positive and negative exampels
% together, respectively.
TRAIN = [POS_DATA;NEG_DATA];% Converting training set into Weka compatible format
CSVtoARFF (TRAIN, 'train', 'train');
train_reader = javaObject('java.io.FileReader', 'train.arff');
train = javaObject('weka.core.Instances', train_reader);
train.setClassIndex(train.numAttributes() - 1);% Total number of iterations of the boosting method
T = 10;% W stores the weights of the instances in each row for every iteration of
% boosting. Weights for all the instances are initialized by 1/m for the
% first iteration.
W = zeros(1,m);
for i = 1:mW(1,i) = 1/m;
end% L stores pseudo loss values, H stores hypothesis, B stores (1/beta)
% values that is used as the weight of the % hypothesis while forming the
% final hypothesis. % All of the following are of length <=T and stores
% values for every iteration of the boosting process.
L = [];
H = {};
B = [];% Loop counter
t = 1;% Keeps counts of the number of times the same boosting iteration have been
% repeated
count = 0;% Boosting T iterations
while t <= T% LOG MESSAGEdisp (['Boosting iteration #' int2str(t)]);if ClassDist == true% Resampling POS_DATA with weights of positive examplePOS_WT = zeros(1,pos_size);sum_POS_WT = sum(W(t,1:pos_size));for i = 1:pos_sizePOS_WT(i) = W(t,i)/sum_POS_WT ;endRESAM_POS = POS_DATA(randsample(1:pos_size,pos_size,true,POS_WT),:);% Resampling NEG_DATA with weights of positive exampleNEG_WT = zeros(1,neg_size);sum_NEG_WT = sum(W(t,pos_size+1:m));for i = 1:neg_sizeNEG_WT(i) = W(t,pos_size+i)/sum_NEG_WT ;endRESAM_NEG = NEG_DATA(randsample(1:neg_size,neg_size,true,NEG_WT),:);% Resampled TRAIN is stored in RESAMPLEDRESAMPLED = [RESAM_POS;RESAM_NEG];% Calulating the percentage of boosting the positive class. 'pert'% is used as a parameter of SMOTEpert = ((neg_size-pos_size)/pos_size)*100;else % Indices of resampled trainRND_IDX = randsample(1:m,m,true,W(t,:));% Resampled TRAIN is stored in RESAMPLEDRESAMPLED = TRAIN(RND_IDX,:);% Calulating the percentage of boosting the positive class. 'pert'% is used as a parameter of SMOTEpos_size = sum(RESAMPLED(:,end)==1);neg_size = sum(RESAMPLED(:,end)==0);pert = ((neg_size-pos_size)/pos_size)*100;end% Converting resample training set into Weka compatible formatCSVtoARFF (RESAMPLED,'resampled','resampled');reader = javaObject('java.io.FileReader','resampled.arff');resampled = javaObject('weka.core.Instances',reader);resampled.setClassIndex(resampled.numAttributes()-1);% New SMOTE boosted data gets stored in Ssmote = javaObject('weka.filters.supervised.instance.SMOTE');pert = ((neg_size-pos_size)/pos_size)*100;smote.setPercentage(pert);smote.setInputFormat(resampled);S = weka.filters.Filter.useFilter(resampled, smote);% Training a weak learner. 'pred' is the weak hypothesis. However, the % hypothesis function is encoded in 'model'.switch WeakLearncase 'svm'model = javaObject('weka.classifiers.functions.SMO');case 'tree'model = javaObject('weka.classifiers.trees.J48');case 'knn'model = javaObject('weka.classifiers.lazy.IBk');model.setKNN(5);case 'logistic'model = javaObject('weka.classifiers.functions.Logistic');endmodel.buildClassifier(S);pred = zeros(m,1);for i = 0 : m - 1pred(i+1) = model.classifyInstance(train.instance(i));end% Computing the pseudo loss of hypothesis 'model'loss = 0;for i = 1:mif TRAIN(i,end)==pred(i)continue;elseloss = loss + W(t,i);endend% If count exceeds a pre-defined threshold (5 in the current% implementation), the loop is broken and rolled back to the state% where loss > 0.5 was not encountered.if count > 5L = L(1:t-1);H = H(1:t-1);B = B(1:t-1);disp (' Too many iterations have loss > 0.5');disp (' Aborting boosting...');break;end% If the loss is greater than 1/2, it means that an inverted% hypothesis would perform better. In such cases, do not take that% hypothesis into consideration and repeat the same iteration. 'count'% keeps counts of the number of times the same boosting iteration have% been repeatedif loss > 0.5count = count + 1;continue;elsecount = 1;end L(t) = loss; % Pseudo-loss at each iterationH{t} = model; % Hypothesis function beta = loss/(1-loss); % Setting weight update parameter 'beta'.B(t) = log(1/beta); % Weight of the hypothesis% At the final iteration there is no need to update the weights any% furtherif t==Tbreak;end% Updating weight for i = 1:mif TRAIN(i,end)==pred(i)W(t+1,i) = W(t,i)*beta;elseW(t+1,i) = W(t,i);endend% Normalizing the weight for the next iterationsum_W = sum(W(t+1,:));for i = 1:mW(t+1,i) = W(t+1,i)/sum_W;end% Incrementing loop countert = t + 1;
end% The final hypothesis is calculated and tested on the test set
% simulteneously.%% Testing SMOTEBoost
n = size(TEST,1); % Total number of instances in the test setCSVtoARFF(TEST,'test','test');
test = 'test.arff';
test_reader = javaObject('java.io.FileReader', test);
test = javaObject('weka.core.Instances', test_reader);
test.setClassIndex(test.numAttributes() - 1);% Normalizing B
sum_B = sum(B);
for i = 1:size(B,2)B(i) = B(i)/sum_B;
endprediction = zeros(n,2);for i = 1:n% Calculating the total weight of the class labels from all the models% produced during boostingwt_zero = 0;wt_one = 0;for j = 1:size(H,2)p = H{j}.classifyInstance(test.instance(i-1)); if p==1wt_one = wt_one + B(j);else wt_zero = wt_zero + B(j); endendif (wt_one > wt_zero)prediction(i,:) = [1 wt_one];elseprediction(i,:) = [0 wt_one];end
end
function r = CSVtoARFF (data, relation, type)
% csv to arff file converter% load the csv data
[rows cols] = size(data);% open the arff file for writing
farff = fopen(strcat(type,'.arff'), 'w');% print the relation part of the header
fprintf(farff, '@relation %s', relation);% Reading from the ARFF header
fid = fopen('ARFFheader.txt','r');
tline = fgets(fid);
while ischar(tline)tline = fgets(fid);fprintf(farff,'%s',tline);
end
fclose(fid);% Converting the data
for i = 1 : rows% print the attribute values for the data pointfor j = 1 : cols - 1if data(i,j) ~= -1 % check if it is a missing valuefprintf(farff, '%d,', data(i,j));elsefprintf(farff, '?,');endend% print the label for the data pointfprintf(farff, '%d\n', data(i,end));
end% close the file
fclose(farff);r = 0;
function model = ClassifierTrain(data,type)
% Training the classifier that would do the sample selectionjavaaddpath('weka.jar');CSVtoARFF(data,'train','train');
train_file = 'train.arff';
reader = javaObject('java.io.FileReader', train_file);
train = javaObject('weka.core.Instances', reader);
train.setClassIndex(train.numAttributes() - 1);
% options = javaObject('java.lang.String');switch typecase 'svm'model = javaObject('weka.classifiers.functions.SMO');kernel = javaObject('weka.classifiers.functions.supportVector.RBFKernel');model.setKernel(kernel);case 'tree'model = javaObject('weka.classifiers.trees.J48');% options = weka.core.Utils.splitOptions('-C 0.2');% model.setOptions(options);case 'knn'model = javaObject('weka.classifiers.lazy.IBk');model.setKNN(5);case 'logistic'model = javaObject('weka.classifiers.functions.Logistic');
endmodel.buildClassifier(train);
function prediction = ClassifierPredict(data,model)
% Predicting the labels of the test instances
% Input: data = test data
% model = the trained model
% type = type of classifier
% Output: prediction = prediction labelsjavaaddpath('weka.jar');CSVtoARFF(data,'test','test');
test_file = 'test.arff';
reader = javaObject('java.io.FileReader', test_file);
test = javaObject('weka.core.Instances', reader);
test.setClassIndex(test.numAttributes() - 1);prediction = [];
for i = 0 : size(data,1) - 1p = model.classifyInstance(test.instance(i));prediction = [prediction; p];
end
MATLAB中调用Weka设置方法(转)及示例相关推荐
- ansys matlab 调用,在matlab中调用ansys的方法 [转,原创:Elvin]
很多人都关心在matlab里如何调用ansys计算,我也曾经困惑过一段时间,到各个论坛去找资料,问大家,但是没有一个非常明确的答案.有很多在c语言和forthan中调用ansys的资料,但是对matl ...
- Matlab中调用文件夹中子文件夹内.m文件的方法
关于matlab中调用一个文件夹内所有的.m文件,包括其内的其他子文件夹内所包括的.m文件方法 问题:我们在经常打开一个.m文件,系统 默认为添加该文件夹,或者我们经常添加path中set path的 ...
- 如何在MATLAB中调用(运行)“用Python写成的函数或脚本”
如何在MATLAB中调用"用Python写成的函数或脚本",首先要确保MATLAB知道咱们的Python解释器的位置在哪里. 如果安装了Python的时候把Python加入了系统环 ...
- matlab gmt,科学网—在Matlab中调用GMT画图 - 徐逸鹤的博文
GMT(Generic Mapping Tools)是由Paul Wessel和Walter Smith开发的一款开源的绘图软件.它使用命令行生成ps或者eps文件.GMT的常用命令包括psbasem ...
- java weka包_在Eclipse中调用weka包实现分类
1.如题. 最近写了一个FCM的聚类算法,希望能够可视化结果,因此一个想法是调用weka中的包,使自己的程序可以可视化.这里参考了网络上的方法,首先实现在Eclipse中调用weka包实现分类的功能. ...
- matlab中调用java代码_Matlab中调用第三方Java代码
在Java中采用Matlab JA Builder可以实现调用m文件,采用这样的方式,可在Matlab的M文件中,直接调用Java类.这种方式可以表示为Java--> Matlab( m, Ja ...
- 继承实现的原理、子类中调用父类的方法、封装
一.继承实现的原来 1.继承顺序 Python的类可以继承多个类.继承多个类的时候,其属性的寻找的方法有两种,分别是深度优先和广度优先. 如下的结构,新式类和经典类的属性查找顺序都一致.顺序为D--- ...
- C/C++中调用api设置mysql连接的编码方式
MySQL在C/C++中调用api设置连接mysql的编码方式有以下几种方法: 1. mysqli_set_charset 调用示例: [cpp] view plain copy ret = mysq ...
- 在Biztalk应用中调用程序集的方法
本文通过一个简单实例,介绍Biztalk应用中调用程序集的方法. 虽然Biztalk Server提供了众多的内置功能,然而在一些情况下可能还需要调用程序集中的函数.因为在.Net程序集中,可以非常方 ...
最新文章
- 迈向智慧化 物联网规模应用不断拓展
- java微服务,微在哪_Java:ChronicleMap第3部分,快速微服务
- 第二十二节: 以SQLServer为例介绍数据库自有的锁机制(共享锁、更新锁、排它锁等)和事务隔离级别 :
- data 的数据代理
- [转载]对称加密DES和TripleDES
- hive中如何把13位转化为时间_【hive常用函数一】日期函数
- 萨纳斯耿文强:光伏管控智能化亟待统一标准
- 两轮差速机器人坐标系及运动轨迹描述
- 【开发工具IDE】eclipse的web项目的tomcat安装部署问题
- iOS开发的几种加密方式
- Excel 各版本每个sheet 最大行数限制
- deactive(Deactive breakpoint)
- 第三阶段应用层——1.13 数码相册—梳理与总结
- python-多态_new魔法方法_单态模式_连贯操作
- WLAN按钮不见了或者网络适配器不见了导致上不了网
- Improvement of AUTO sampling statistics gathering feature in Oracle 11g
- IDL的高效编程(六)
- Xss-reflected/stored跨站脚本分析(Cross site scripting)
- Python开多次方根
- 登录注册 图片验证码生成
热门文章
- oracle connect权限6,Oracle 19c 升级19.6 RU 导致权限异常 gipcInternalConnectSync: failed sync request 解决方法...
- java 开源so库_NDK使用之引用.so开源库
- oracle sql序列,SQL server 和Oracle 序列
- c语言表达式判断,在C语言的if语句中,用做判断的表达式为()。
- android显示3d模型_Creator3D:太厉害了!3D模型原来可以这样显示在2DUI上
- 『数学』你确定你学会了勾股弦定理!真的吗?看完这个篇文章再回答我!
- 『ACM--算法--KMP』信息竞赛进阶指南--KMP算法(模板)
- 图论--拓扑排序--模板
- mybatis-generator 逆向生成工具(实体、dao、sql)
- 一步一步带你训练自己的SSD检测算法