Mahout 安装配置及一个简单测试
Mahout 简介
Mahout 是一个很强大的数据挖掘工具,是一个分布式机器学习算法的集合,包括:被称为Taste的分布式协同过滤的实现、分类、聚类等。Mahout最大的优点就是基于hadoop实现,把很多以前运行于单机上的算法,转化为了MapReduce模式,这样大大提升了算法可处理的数据量和处理性能。
Hadoop
http://blog.csdn.net/fenglailea/article/details/53318459
风.fox
环境
Centos7 服务器
当前最新版 0.12.2
Mahout下载地址
http://archive.apache.org/dist/mahout/
http://archive.apache.org/dist/mahout/0.12.2/
wget http://archive.apache.org/dist/mahout/0.12.2/apache-mahout-distribution-0.12.2.tar.gz
tar -zxvf apache-mahout-distribution-0.12.2.tar.gz
这里放到 Hadoop 目录里
mv apache-mahout-distribution-0.12.2 /home/hadoop/hadoop/mahout
Mahout环境变量设置
设置全局/etc/bashrc
,当前用户~/.bashrc
这里使用当前用户
vim ~/.bashrc
mahout环境变量
export MAHOUT_HOME=/home/hadoop/hadoop/mahout
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH
hadoop环境变量
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_HOME_WARN_SUPPRESS=not_null
应用环境变量
source ~/.bashrc #推荐
或
. ~/.bashrc
source在当前shell 里运行,点号是在子shell里运行
查询是否安装成功,
mahout
若出现一下,表示安装成功
arff.vector: : Generate Vectors from an ARFF file or directorybaumwelch: : Baum-Welch algorithm for unsupervised HMM trainingcanopy: : Canopy clusteringcat: : Print a file or resource as the logistic regression models would see itcleansvd: : Cleanup and verification of SVD outputclusterdump: : Dump cluster output to textclusterpp: : Groups Clustering Output In Clusterscmdump: : Dump confusion matrix in HTML or text formatscvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.describe: : Describe the fields and target variable in a data setevaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probesfkmeans: : Fuzzy K-means clusteringhmmpredict: : Generate random sequence of observations by given HMMitemsimilarity: : Compute the item-item-similarities for item-based collaborative filteringkmeans: : K-means clusteringlucene.vector: : Generate Vectors from a Lucene indexmatrixdump: : Dump matrix in CSV formatmatrixmult: : Take the product of two matricesparallelALS: : ALS-WR factorization of a rating matrixqualcluster: : Runs clustering experiments and summarizes results in a CSVrecommendfactorized: : Compute recommendations using the factorization of a rating matrixrecommenditembased: : Compute recommendations using item-based collaborative filteringregexconverter: : Convert text files on a per line basis based on regular expressionsresplit: : Splits a set of SequenceFiles into a number of equal splitsrowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}rowsimilarity: : Compute the pairwise similarities of the rows of a matrixrunAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression modelrunlogistic: : Run a logistic regression model against CSV dataseq2encoded: : Encoded Sparse Vector generation from Text sequence filesseq2sparse: : Sparse Vector generation from Text sequence filesseqdirectory: : Generate sequence files (of Text) from a directoryseqdumper: : Generic Sequence File dumperseqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archivesseqwiki: : Wikipedia xml dump to sequence filespectralkmeans: : Spectral k-means clusteringsplit: : Split Input data into test and train setssplitDataset: : split a rating dataset into training and probe partsssvd: : Stochastic SVDstreamingkmeans: : Streaming k-means clusteringsvd: : Lanczos Singular Value Decompositiontestnb: : Test the Vector-based Bayes classifiertrainAdaptiveLogistic: : Train an AdaptivelogisticRegression modeltrainlogistic: : Train a logistic regression using stochastic gradient descenttrainnb: : Train the Vector-based Bayes classifiertranspose: : Take the transpose of a matrixvalidateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data setvecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectorsvectordump: : Dump vectors from a sequence file to textviterbi: : Viterbi decoding of hidden states from given output states sequence
Mahout 和Hadoop 集成测试
首先,hadoop 要安装完成及启动
http://blog.csdn.net/fenglailea/article/details/53318459
下载测试数据
http://archive.ics.uci.edu/ml/databases/synthetic_control/
wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
hadoop 上传测试数据
hadoop fs -mkdir -p ./testdata
hadoop fs -put synthetic_control.data ./testdata
查看目录及文件
hadoop fs -ls
hadoop fs -ls ./testdata
使用Mahout中的kmeans聚类算法进行测试
mahout -core org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
XX执行完成,最后几行如下
1.0 : [distance=55.039831561905785]: [33.67,38.675,39.742,41.989,37.291,43.975,31.909,25.878,31.08,15.858,13.95,23.097,19.983,21.692,31.579,38.57,33.376,38.843,41.936,33.534,39.195,32.897,25.343,18.523,15.089,17.771,22.614,25.313,23.687,29.01,41.995,35.712,40.872,41.669,32.156,25.162,24.98,23.705,18.413,20.975,14.906,26.171,30.165,27.818,35.083,39.514,37.851,33.967,32.338,34.977,26.589,28.079,19.597,24.669,23.098,25.685,28.215,34.94,36.91,39.749]
16/11/24 16:47:52 INFO ClusterDumper: Wrote 6 clusters
16/11/24 16:47:52 INFO MahoutDriver: Program took 22175 ms (Minutes: 0.3695833333333333)
查看输出
hadoop fs -ls ./output
Found 15 items
-rw-r--r-- 1 hadoop supergroup 194 2016-11-24 16:47 output/_policy
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusteredPoints
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-0
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-1
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-10-final
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-2
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-3
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-4
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-5
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-6
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-7
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-8
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/clusters-9
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/data
drwxr-xr-x - hadoop supergroup 0 2016-11-24 16:47 output/random-seeds
查看数据
mahout vectordump -i ./output/data/part-m-00000
查看
http://itindex.net/detail/51681-mahout
http://blog.csdn.net/wind520/article/details/38851367
Mahout 安装配置及一个简单测试相关推荐
- 1-3.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之三
在1-1.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之一 基础上进行如下操作: 所有路由不能全部都在myDjango下的urls.py路由文件中, ...
- Oracle data integrator 11g安装配置和一个实例应用指南pdf
<Oracle data integrator 11g安装配置和一个实例应用指南pdf> 下载地址: 网盘下载 转载于:https://www.cnblogs.com/long12365/ ...
- mahout安装配置
1.下载mahout 下载地址:http://mahout.apache.org 我下载的最新版:mahout-distribution-0.9 2.把mahout解压到你想存放的文档,我是放在/Us ...
- odbc配置以及一个简单的java连接的代码编写
1.odbc配置的问题记录 问题描述: 刚开始写好程序之后,直接进行简单数据库调用,但是程序一直报空指针错误,后来查找资料才知道,jdk8里面是没有odbc所用的驱动类,于是换成了jdk7就可以了. ...
- java基础 第一章上(安装 配置java、简单dos命令)
一.安装 配置java 下载安装 1.java官网下载jdk(32位或者64位根据自己电脑而定). 2.双击jdk.exe文件安装. 环境变量配置 右击我的电脑--属性--高 ...
- Spring MVC:使用基于Java的配置创建一个简单的Controller
这是我博客上与Spring MVC相关的第一篇文章. 开端总是令人兴奋的,因此我将尽量简洁明了. Spring MVC允许以最方便,直接和快速的方式创建Web应用程序. 开始使用这项技术意味着需要Sp ...
- ASP.NET Aries 入门开发教程2:配置出一个简单的列表页面
前言: 朋友们都期待我稳定地工作,但创业公司若要躺下,也非意念可控. 若人生注定了风雨飘摇,那就雨中前行了. 最机开始看聊新的工作机会,欢迎推荐,创业公司也可! 同时,趁着自由时间,抓紧把这系列教程给 ...
- Kettle系列文章二(安装配置Kettle+SqlServer+简单的输入输出作业)
一.下载 Kettle下载地址:https://community.hitachivantara.com/docs/DOC-1009855 下拉到DownLoad,点击红框中的链接进行下载.. 二.解 ...
- 1-2.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之二
七.在项目下新建 templates 路径 在工程上,右键,添加templates目录 注意*: 此目录下即用来存放我们的html文件: 此目录一般是与app的主目录是平级的.当然也可以建立在app的 ...
- 1-1.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之一
首先,安装python3.8和pycharm参考其他教程. 一.安装django 使用下面命令默认安装最新版的django pip install django 使用下面命令可以安装指定版本 pip ...
最新文章
- barplot参数 python_Python零基础入门Python数据分析最好的实战项目
- 比特币钱包(4) BIP39 助记词
- Kettle使用_6 配置资源库
- 行业发展的大势所趋 嵌入式机器视觉系统前景一片大好
- Python学习(三)基础
- 疫情之下,哪些行业正在逆势爆发?
- Spring MVC视图解析器
- 需求分析和系统分析的区别
- 深度学习服务器?深度了解一下!
- 【必会】SQL 命令大全
- 2022年1~10月语音合成(TTS)和语音识别(ASR)论文月报
- 三阶齐次线性方程求通解_非齐次线性方程通解求法------常数变易法.ppt
- Setup Factory打包注册dll
- 如何下载某些IT培训机构上课视频——可以发送/保存
- 彻底关闭苹果系统更新_彻底关闭iPhone自动更新系统 亲测有效
- 【5G NR】NG接口
- 你总问,全面发展还是术业专攻?这就是答案
- pandas 第十二期组队-pandas基础
- 基于智能融合配变终端的数字化台区技术应用(转载)
- VS2017-多种编程语言的开发工具
热门文章
- 基于mAppWidget实现手绘地图--索引DEMO
- Linux产生僵尸进程和孤儿进程及区别
- 英语12个月份名称的由来
- 拓端tecdat|R语言Bass模型进行销售预测
- Django一些常用操作记录
- JAVA发送邮件案例
- 一段python算法实战的代码
- 【图像处理】hough变换_检测直线
- bytes的json解析
- python中的[1:]、[::-1]、X[:,m:n]和X[1,:]