Mahout 简介

Mahout 是一个很强大的数据挖掘工具,是一个分布式机器学习算法的集合,包括:被称为Taste的分布式协同过滤的实现、分类、聚类等。Mahout最大的优点就是基于hadoop实现,把很多以前运行于单机上的算法,转化为了MapReduce模式,这样大大提升了算法可处理的数据量和处理性能。

Hadoop

http://blog.csdn.net/fenglailea/article/details/53318459
风.fox

环境

Centos7 服务器
当前最新版 0.12.2

Mahout下载地址

http://archive.apache.org/dist/mahout/
http://archive.apache.org/dist/mahout/0.12.2/

wget http://archive.apache.org/dist/mahout/0.12.2/apache-mahout-distribution-0.12.2.tar.gz
tar -zxvf apache-mahout-distribution-0.12.2.tar.gz

这里放到 Hadoop 目录里

mv apache-mahout-distribution-0.12.2 /home/hadoop/hadoop/mahout

Mahout环境变量设置

设置全局/etc/bashrc,当前用户~/.bashrc
这里使用当前用户

vim ~/.bashrc

mahout环境变量

export MAHOUT_HOME=/home/hadoop/hadoop/mahout
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
export PATH=$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH

hadoop环境变量

export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_HOME_WARN_SUPPRESS=not_null

应用环境变量

source  ~/.bashrc  #推荐
或
. ~/.bashrc

source在当前shell 里运行,点号是在子shell里运行

查询是否安装成功,

mahout

若出现一下,表示安装成功

arff.vector: : Generate Vectors from an ARFF file or directorybaumwelch: : Baum-Welch algorithm for unsupervised HMM trainingcanopy: : Canopy clusteringcat: : Print a file or resource as the logistic regression models would see itcleansvd: : Cleanup and verification of SVD outputclusterdump: : Dump cluster output to textclusterpp: : Groups Clustering Output In Clusterscmdump: : Dump confusion matrix in HTML or text formatscvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.describe: : Describe the fields and target variable in a data setevaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probesfkmeans: : Fuzzy K-means clusteringhmmpredict: : Generate random sequence of observations by given HMMitemsimilarity: : Compute the item-item-similarities for item-based collaborative filteringkmeans: : K-means clusteringlucene.vector: : Generate Vectors from a Lucene indexmatrixdump: : Dump matrix in CSV formatmatrixmult: : Take the product of two matricesparallelALS: : ALS-WR factorization of a rating matrixqualcluster: : Runs clustering experiments and summarizes results in a CSVrecommendfactorized: : Compute recommendations using the factorization of a rating matrixrecommenditembased: : Compute recommendations using item-based collaborative filteringregexconverter: : Convert text files on a per line basis based on regular expressionsresplit: : Splits a set of SequenceFiles into a number of equal splitsrowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}rowsimilarity: : Compute the pairwise similarities of the rows of a matrixrunAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression modelrunlogistic: : Run a logistic regression model against CSV dataseq2encoded: : Encoded Sparse Vector generation from Text sequence filesseq2sparse: : Sparse Vector generation from Text sequence filesseqdirectory: : Generate sequence files (of Text) from a directoryseqdumper: : Generic Sequence File dumperseqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archivesseqwiki: : Wikipedia xml dump to sequence filespectralkmeans: : Spectral k-means clusteringsplit: : Split Input data into test and train setssplitDataset: : split a rating dataset into training and probe partsssvd: : Stochastic SVDstreamingkmeans: : Streaming k-means clusteringsvd: : Lanczos Singular Value Decompositiontestnb: : Test the Vector-based Bayes classifiertrainAdaptiveLogistic: : Train an AdaptivelogisticRegression modeltrainlogistic: : Train a logistic regression using stochastic gradient descenttrainnb: : Train the Vector-based Bayes classifiertranspose: : Take the transpose of a matrixvalidateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data setvecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectorsvectordump: : Dump vectors from a sequence file to textviterbi: : Viterbi decoding of hidden states from given output states sequence

Mahout 和Hadoop 集成测试

首先,hadoop 要安装完成及启动

http://blog.csdn.net/fenglailea/article/details/53318459

下载测试数据

http://archive.ics.uci.edu/ml/databases/synthetic_control/

wget http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data

hadoop 上传测试数据

hadoop fs -mkdir -p ./testdata
hadoop fs -put synthetic_control.data ./testdata

查看目录及文件

hadoop fs -ls
hadoop fs -ls ./testdata

使用Mahout中的kmeans聚类算法进行测试

mahout -core  org.apache.mahout.clustering.syntheticcontrol.kmeans.Job

XX执行完成,最后几行如下

        1.0 : [distance=55.039831561905785]: [33.67,38.675,39.742,41.989,37.291,43.975,31.909,25.878,31.08,15.858,13.95,23.097,19.983,21.692,31.579,38.57,33.376,38.843,41.936,33.534,39.195,32.897,25.343,18.523,15.089,17.771,22.614,25.313,23.687,29.01,41.995,35.712,40.872,41.669,32.156,25.162,24.98,23.705,18.413,20.975,14.906,26.171,30.165,27.818,35.083,39.514,37.851,33.967,32.338,34.977,26.589,28.079,19.597,24.669,23.098,25.685,28.215,34.94,36.91,39.749]
16/11/24 16:47:52 INFO ClusterDumper: Wrote 6 clusters
16/11/24 16:47:52 INFO MahoutDriver: Program took 22175 ms (Minutes: 0.3695833333333333) 

查看输出

hadoop fs -ls ./output
Found 15 items
-rw-r--r--   1 hadoop supergroup        194 2016-11-24 16:47 output/_policy
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusteredPoints
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-0
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-1
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-10-final
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-2
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-3
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-4
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-5
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-6
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-7
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-8
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/clusters-9
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/data
drwxr-xr-x   - hadoop supergroup          0 2016-11-24 16:47 output/random-seeds

查看数据

mahout vectordump -i ./output/data/part-m-00000

查看
http://itindex.net/detail/51681-mahout
http://blog.csdn.net/wind520/article/details/38851367

Mahout 安装配置及一个简单测试相关推荐

  1. 1-3.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之三

    在1-1.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之一 基础上进行如下操作: 所有路由不能全部都在myDjango下的urls.py路由文件中, ...

  2. Oracle data integrator 11g安装配置和一个实例应用指南pdf

    <Oracle data integrator 11g安装配置和一个实例应用指南pdf> 下载地址: 网盘下载 转载于:https://www.cnblogs.com/long12365/ ...

  3. mahout安装配置

    1.下载mahout 下载地址:http://mahout.apache.org 我下载的最新版:mahout-distribution-0.9 2.把mahout解压到你想存放的文档,我是放在/Us ...

  4. odbc配置以及一个简单的java连接的代码编写

    1.odbc配置的问题记录 问题描述: 刚开始写好程序之后,直接进行简单数据库调用,但是程序一直报空指针错误,后来查找资料才知道,jdk8里面是没有odbc所用的驱动类,于是换成了jdk7就可以了. ...

  5. java基础 第一章上(安装 配置java、简单dos命令)

    一.安装 配置java     下载安装          1.java官网下载jdk(32位或者64位根据自己电脑而定). 2.双击jdk.exe文件安装. 环境变量配置 右击我的电脑--属性--高 ...

  6. Spring MVC:使用基于Java的配置创建一个简单的Controller

    这是我博客上与Spring MVC相关的第一篇文章. 开端总是令人兴奋的,因此我将尽量简洁明了. Spring MVC允许以最方便,直接和快速的方式创建Web应用程序. 开始使用这项技术意味着需要Sp ...

  7. ASP.NET Aries 入门开发教程2:配置出一个简单的列表页面

    前言: 朋友们都期待我稳定地工作,但创业公司若要躺下,也非意念可控. 若人生注定了风雨飘摇,那就雨中前行了. 最机开始看聊新的工作机会,欢迎推荐,创业公司也可! 同时,趁着自由时间,抓紧把这系列教程给 ...

  8. Kettle系列文章二(安装配置Kettle+SqlServer+简单的输入输出作业)

    一.下载 Kettle下载地址:https://community.hitachivantara.com/docs/DOC-1009855 下拉到DownLoad,点击红框中的链接进行下载.. 二.解 ...

  9. 1-2.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之二

    七.在项目下新建 templates 路径 在工程上,右键,添加templates目录 注意*: 此目录下即用来存放我们的html文件: 此目录一般是与app的主目录是平级的.当然也可以建立在app的 ...

  10. 1-1.Win10系统利用Pycharm社区版安装Django搭建一个简单Python Web项目的步骤之一

    首先,安装python3.8和pycharm参考其他教程. 一.安装django 使用下面命令默认安装最新版的django pip install django 使用下面命令可以安装指定版本 pip ...

最新文章

  1. barplot参数 python_Python零基础入门Python数据分析最好的实战项目
  2. 比特币钱包(4) BIP39 助记词
  3. Kettle使用_6 配置资源库
  4. 行业发展的大势所趋 嵌入式机器视觉系统前景一片大好
  5. Python学习(三)基础
  6. 疫情之下,哪些行业正在逆势爆发?
  7. Spring MVC视图解析器
  8. 需求分析和系统分析的区别
  9. 深度学习服务器?深度了解一下!
  10. 【必会】SQL 命令大全
  11. 2022年1~10月语音合成(TTS)和语音识别(ASR)论文月报
  12. 三阶齐次线性方程求通解_非齐次线性方程通解求法------常数变易法.ppt
  13. Setup Factory打包注册dll
  14. 如何下载某些IT培训机构上课视频——可以发送/保存
  15. 彻底关闭苹果系统更新_彻底关闭iPhone自动更新系统 亲测有效
  16. 【5G NR】NG接口
  17. 你总问,全面发展还是术业专攻?这就是答案
  18. pandas 第十二期组队-pandas基础
  19. 基于智能融合配变终端的数字化台区技术应用(转载)
  20. VS2017-多种编程语言的开发工具

热门文章

  1. 基于mAppWidget实现手绘地图--索引DEMO
  2. Linux产生僵尸进程和孤儿进程及区别
  3. 英语12个月份名称的由来
  4. 拓端tecdat|R语言Bass模型进行销售预测
  5. Django一些常用操作记录
  6. JAVA发送邮件案例
  7. 一段python算法实战的代码
  8. 【图像处理】hough变换_检测直线
  9. bytes的json解析
  10. python中的[1:]、[::-1]、X[:,m:n]和X[1,:]