MAHOUT 0.9版本的贝叶斯分类器测试样例测试非常简单,执行下面这条语句,然后选择第2项就OK了。

$MAHOUT_HOME/examples/bin/classify-20newsgroups.sh

[jifeng@jifeng01 hadoop]$ $MAHOUT_HOME/examples/bin/classify-20newsgroups.sh
Please select a number to choose the corresponding task to run
1. cnaivebayes
2. naivebayes
3. sgd
4. clean -- cleans up the work area in /tmp/mahout-work-jifeng
Enter your choice :

选择第二项,等待运行完成:

Enter your choice : 2
ok. You chose 2 and we'll use naivebayes
creating work directory at /tmp/mahout-work-jifeng
Downloading 20news-bydate% Total    % Received % Xferd  Average Speed   Time    Time     Time  CurrentDload  Upload   Total   Spent    Left  Speed
100 13.7M  100 13.7M    0     0  96135      0  0:02:30  0:02:30 --:--:--  103k
Extracting...
+ echo 'Preparing 20newsgroups data'
Preparing 20newsgroups data
+ rm -rf /tmp/mahout-work-jifeng/20news-all
+ mkdir /tmp/mahout-work-jifeng/20news-all
+ cp -R /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/alt.atheism /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/comp.graphics /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/comp.os.ms-windows.misc /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/comp.sys.ibm.pc.hardware /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/comp.sys.mac.hardware /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/comp.windows.x /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/misc.forsale /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/rec.autos /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/rec.motorcycles /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/rec.sport.baseball /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/rec.sport.hockey /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/sci.crypt /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/sci.electronics /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/sci.med /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/sci.space /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/soc.religion.christian /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/talk.politics.guns /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/talk.politics.mideast /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/talk.politics.misc /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-test/talk.religion.misc /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/alt.atheism /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/comp.graphics /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/comp.os.ms-windows.misc /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/comp.sys.ibm.pc.hardware /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/comp.sys.mac.hardware /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/comp.windows.x /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/misc.forsale /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/rec.autos /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/rec.motorcycles /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/rec.sport.baseball /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/rec.sport.hockey /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/sci.crypt /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/sci.electronics /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/sci.med /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/sci.space /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/soc.religion.christian /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/talk.politics.guns /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/talk.politics.mideast /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/talk.politics.misc /tmp/mahout-work-jifeng/20news-bydate/20news-bydate-train/talk.religion.misc /tmp/mahout-work-jifeng/20news-all
+ '[' /home/jifeng/hadoop/hadoop-1.2.1 '!=' '' ']'
+ '[' '' == '' ']'
+ echo 'Copying 20newsgroups data to HDFS'
Copying 20newsgroups data to HDFS
+ set +e
+ /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop dfs -rmr /tmp/mahout-work-jifeng/20news-all
Warning: $HADOOP_HOME is deprecated.rmr: cannot remove /tmp/mahout-work-jifeng/20news-all: No such file or directory.
+ set -e
+ /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop dfs -put /tmp/mahout-work-jifeng/20news-all /tmp/mahout-work-jifeng/20news-all
Warning: $HADOOP_HOME is deprecated.+ echo 'Creating sequence files from 20newsgroups data'
Creating sequence files from 20newsgroups data
+ ./bin/mahout seqdirectory -i /tmp/mahout-work-jifeng/20news-all -o /tmp/mahout-work-jifeng/20news-seq -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.Running on hadoop, using /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/jifeng/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB: /home/jifeng/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.14/09/20 15:35:58 WARN driver.MahoutDriver: No seqdirectory.props found on classpath, will use command-line arguments only
14/09/20 15:35:59 INFO common.AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[/tmp/mahout-work-jifeng/20news-all], --keyPrefix=[], --method=[mapreduce], --output=[/tmp/mahout-work-jifeng/20news-seq], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
14/09/20 15:36:05 INFO input.FileInputFormat: Total input paths to process : 18846
14/09/20 15:36:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/09/20 15:36:06 WARN snappy.LoadSnappy: Snappy native library not loaded
14/09/20 15:36:20 INFO mapred.JobClient: Running job: job_201409201505_0001
14/09/20 15:36:21 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:36:34 INFO mapred.JobClient:  map 10% reduce 0%
14/09/20 15:36:37 INFO mapred.JobClient:  map 17% reduce 0%
14/09/20 15:36:41 INFO mapred.JobClient:  map 20% reduce 0%
14/09/20 15:36:44 INFO mapred.JobClient:  map 24% reduce 0%
14/09/20 15:36:47 INFO mapred.JobClient:  map 29% reduce 0%
14/09/20 15:36:50 INFO mapred.JobClient:  map 33% reduce 0%
14/09/20 15:36:53 INFO mapred.JobClient:  map 38% reduce 0%
14/09/20 15:36:56 INFO mapred.JobClient:  map 43% reduce 0%
14/09/20 15:36:59 INFO mapred.JobClient:  map 49% reduce 0%
14/09/20 15:37:01 INFO mapred.JobClient:  map 55% reduce 0%
14/09/20 15:37:04 INFO mapred.JobClient:  map 59% reduce 0%
14/09/20 15:37:07 INFO mapred.JobClient:  map 66% reduce 0%
14/09/20 15:37:10 INFO mapred.JobClient:  map 73% reduce 0%
14/09/20 15:37:13 INFO mapred.JobClient:  map 81% reduce 0%
14/09/20 15:37:16 INFO mapred.JobClient:  map 92% reduce 0%
14/09/20 15:37:20 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:37:20 INFO mapred.JobClient: Job complete: job_201409201505_0001
14/09/20 15:37:20 INFO mapred.JobClient: Counters: 18
14/09/20 15:37:20 INFO mapred.JobClient:   Job Counters
14/09/20 15:37:20 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=56433
14/09/20 15:37:20 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:37:20 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:37:20 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:37:20 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/09/20 15:37:20 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:37:20 INFO mapred.JobClient:     Bytes Written=19202391
14/09/20 15:37:20 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:37:20 INFO mapred.JobClient:     HDFS_BYTES_READ=37622181
14/09/20 15:37:20 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=59535
14/09/20 15:37:20 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=19202391
14/09/20 15:37:20 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:37:20 INFO mapred.JobClient:     Bytes Read=0
14/09/20 15:37:20 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:37:20 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:37:20 INFO mapred.JobClient:     Physical memory (bytes) snapshot=84213760
14/09/20 15:37:20 INFO mapred.JobClient:     Spilled Records=0
14/09/20 15:37:20 INFO mapred.JobClient:     CPU time spent (ms)=21400
14/09/20 15:37:20 INFO mapred.JobClient:     Total committed heap usage (bytes)=57679872
14/09/20 15:37:20 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=349954048
14/09/20 15:37:20 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:37:20 INFO mapred.JobClient:     SPLIT_RAW_BYTES=1767178
14/09/20 15:37:20 INFO driver.MahoutDriver: Program took 81580 ms (Minutes: 1.3596666666666666)
+ echo 'Converting sequence files to vectors'
Converting sequence files to vectors
+ ./bin/mahout seq2sparse -i /tmp/mahout-work-jifeng/20news-seq -o /tmp/mahout-work-jifeng/20news-vectors -lnorm -nv -wt tfidf
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.Running on hadoop, using /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/jifeng/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB: /home/jifeng/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.14/09/20 15:37:23 WARN driver.MahoutDriver: No seq2sparse.props found on classpath, will use command-line arguments only
14/09/20 15:37:23 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1
14/09/20 15:37:23 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 1.0
14/09/20 15:37:23 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1
14/09/20 15:37:23 INFO vectorizer.SparseVectorsFromSequenceFiles: Tokenizing documents in /tmp/mahout-work-jifeng/20news-seq
14/09/20 15:37:27 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:37:28 INFO mapred.JobClient: Running job: job_201409201505_0002
14/09/20 15:37:29 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:37:48 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:37:49 INFO mapred.JobClient: Job complete: job_201409201505_0002
14/09/20 15:37:49 INFO mapred.JobClient: Counters: 19
14/09/20 15:37:49 INFO mapred.JobClient:   Job Counters
14/09/20 15:37:49 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=16049
14/09/20 15:37:49 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:37:49 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:37:49 INFO mapred.JobClient:     Rack-local map tasks=1
14/09/20 15:37:49 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:37:49 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/09/20 15:37:49 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:37:49 INFO mapred.JobClient:     Bytes Written=27503580
14/09/20 15:37:49 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:37:49 INFO mapred.JobClient:     HDFS_BYTES_READ=19202523
14/09/20 15:37:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=57471
14/09/20 15:37:49 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=27503580
14/09/20 15:37:49 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:37:49 INFO mapred.JobClient:     Bytes Read=19202391
14/09/20 15:37:49 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:37:49 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:37:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=90681344
14/09/20 15:37:49 INFO mapred.JobClient:     Spilled Records=0
14/09/20 15:37:49 INFO mapred.JobClient:     CPU time spent (ms)=5370
14/09/20 15:37:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=31916032
14/09/20 15:37:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=519794688
14/09/20 15:37:49 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:37:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=132
14/09/20 15:37:49 INFO vectorizer.SparseVectorsFromSequenceFiles: Creating Term Frequency Vectors
14/09/20 15:37:49 INFO vectorizer.DictionaryVectorizer: Creating dictionary from /tmp/mahout-work-jifeng/20news-vectors/tokenized-documents and saving at /tmp/mahout-work-jifeng/20news-vectors/wordcount
14/09/20 15:37:52 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:37:52 INFO mapred.JobClient: Running job: job_201409201505_0003
14/09/20 15:37:53 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:38:08 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:38:15 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:38:16 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:38:17 INFO mapred.JobClient: Job complete: job_201409201505_0003
14/09/20 15:38:17 INFO mapred.JobClient: Counters: 29
14/09/20 15:38:17 INFO mapred.JobClient:   Job Counters
14/09/20 15:38:17 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:38:17 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=8755
14/09/20 15:38:17 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:38:17 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:38:17 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:38:17 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:38:17 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8294
14/09/20 15:38:17 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:38:17 INFO mapred.JobClient:     Bytes Written=2315037
14/09/20 15:38:17 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:38:17 INFO mapred.JobClient:     FILE_BYTES_READ=11857906
14/09/20 15:38:17 INFO mapred.JobClient:     HDFS_BYTES_READ=27503736
14/09/20 15:38:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=15512128
14/09/20 15:38:17 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2315037
14/09/20 15:38:17 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:38:17 INFO mapred.JobClient:     Bytes Read=27503580
14/09/20 15:38:17 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:38:17 INFO mapred.JobClient:     Map output materialized bytes=3538084
14/09/20 15:38:17 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:38:17 INFO mapred.JobClient:     Reduce shuffle bytes=3538084
14/09/20 15:38:17 INFO mapred.JobClient:     Spilled Records=849345
14/09/20 15:38:17 INFO mapred.JobClient:     Map output bytes=39462740
14/09/20 15:38:17 INFO mapred.JobClient:     Total committed heap usage (bytes)=219021312
14/09/20 15:38:17 INFO mapred.JobClient:     CPU time spent (ms)=6420
14/09/20 15:38:17 INFO mapred.JobClient:     Combine input records=3026242
14/09/20 15:38:17 INFO mapred.JobClient:     SPLIT_RAW_BYTES=156
14/09/20 15:38:17 INFO mapred.JobClient:     Reduce input records=192904
14/09/20 15:38:17 INFO mapred.JobClient:     Reduce input groups=192904
14/09/20 15:38:17 INFO mapred.JobClient:     Combine output records=554873
14/09/20 15:38:17 INFO mapred.JobClient:     Physical memory (bytes) snapshot=243875840
14/09/20 15:38:17 INFO mapred.JobClient:     Reduce output records=93563
14/09/20 15:38:17 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=699461632
14/09/20 15:38:17 INFO mapred.JobClient:     Map output records=2664273
14/09/20 15:38:21 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:38:21 INFO mapred.JobClient: Running job: job_201409201505_0004
14/09/20 15:38:22 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:38:32 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:38:39 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:38:42 INFO mapred.JobClient:  map 100% reduce 91%
14/09/20 15:38:44 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:38:44 INFO mapred.JobClient: Job complete: job_201409201505_0004
14/09/20 15:38:44 INFO mapred.JobClient: Counters: 29
14/09/20 15:38:44 INFO mapred.JobClient:   Job Counters
14/09/20 15:38:44 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:38:44 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5169
14/09/20 15:38:44 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:38:44 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:38:44 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:38:44 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:38:44 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=11769
14/09/20 15:38:44 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:38:44 INFO mapred.JobClient:     Bytes Written=29314118
14/09/20 15:38:44 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:38:44 INFO mapred.JobClient:     FILE_BYTES_READ=29226519
14/09/20 15:38:44 INFO mapred.JobClient:     HDFS_BYTES_READ=27503736
14/09/20 15:38:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=54668830
14/09/20 15:38:44 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
14/09/20 15:38:44 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:38:44 INFO mapred.JobClient:     Bytes Read=27503580
14/09/20 15:38:44 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:38:44 INFO mapred.JobClient:     Map output materialized bytes=27274291
14/09/20 15:38:44 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:38:44 INFO mapred.JobClient:     Reduce shuffle bytes=27274291
14/09/20 15:38:44 INFO mapred.JobClient:     Spilled Records=37692
14/09/20 15:38:44 INFO mapred.JobClient:     Map output bytes=27199343
14/09/20 15:38:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=258617344
14/09/20 15:38:44 INFO mapred.JobClient:     CPU time spent (ms)=5930
14/09/20 15:38:44 INFO mapred.JobClient:     Combine input records=0
14/09/20 15:38:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=156
14/09/20 15:38:44 INFO mapred.JobClient:     Reduce input records=18846
14/09/20 15:38:44 INFO mapred.JobClient:     Reduce input groups=18846
14/09/20 15:38:44 INFO mapred.JobClient:     Combine output records=0
14/09/20 15:38:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=285229056
14/09/20 15:38:44 INFO mapred.JobClient:     Reduce output records=18846
14/09/20 15:38:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=699805696
14/09/20 15:38:44 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:38:46 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:38:46 INFO mapred.JobClient: Running job: job_201409201505_0005
14/09/20 15:38:47 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:38:58 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:39:05 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:39:07 INFO mapred.JobClient: Job complete: job_201409201505_0005
14/09/20 15:39:07 INFO mapred.JobClient: Counters: 29
14/09/20 15:39:07 INFO mapred.JobClient:   Job Counters
14/09/20 15:39:07 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:39:07 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5851
14/09/20 15:39:07 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:39:07 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:39:07 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:39:07 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:39:07 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9887
14/09/20 15:39:07 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:39:07 INFO mapred.JobClient:     Bytes Written=29314118
14/09/20 15:39:07 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:39:07 INFO mapred.JobClient:     FILE_BYTES_READ=29059398
14/09/20 15:39:07 INFO mapred.JobClient:     HDFS_BYTES_READ=29314272
14/09/20 15:39:07 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58235856
14/09/20 15:39:07 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=29314118
14/09/20 15:39:07 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:39:07 INFO mapred.JobClient:     Bytes Read=29314118
14/09/20 15:39:07 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:39:07 INFO mapred.JobClient:     Map output materialized bytes=29059398
14/09/20 15:39:07 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:39:07 INFO mapred.JobClient:     Reduce shuffle bytes=29059398
14/09/20 15:39:07 INFO mapred.JobClient:     Spilled Records=37692
14/09/20 15:39:07 INFO mapred.JobClient:     Map output bytes=28984080
14/09/20 15:39:07 INFO mapred.JobClient:     Total committed heap usage (bytes)=161316864
14/09/20 15:39:07 INFO mapred.JobClient:     CPU time spent (ms)=3580
14/09/20 15:39:07 INFO mapred.JobClient:     Combine input records=0
14/09/20 15:39:07 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
14/09/20 15:39:07 INFO mapred.JobClient:     Reduce input records=18846
14/09/20 15:39:07 INFO mapred.JobClient:     Reduce input groups=18846
14/09/20 15:39:07 INFO mapred.JobClient:     Combine output records=0
14/09/20 15:39:07 INFO mapred.JobClient:     Physical memory (bytes) snapshot=213598208
14/09/20 15:39:07 INFO mapred.JobClient:     Reduce output records=18846
14/09/20 15:39:07 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=698466304
14/09/20 15:39:07 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:39:07 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jifeng/20news-vectors/partial-vectors-0
14/09/20 15:39:07 INFO vectorizer.SparseVectorsFromSequenceFiles: Calculating IDF
14/09/20 15:39:09 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:39:09 INFO mapred.JobClient: Running job: job_201409201505_0006
14/09/20 15:39:10 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:39:23 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:39:30 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:39:31 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:39:31 INFO mapred.JobClient: Job complete: job_201409201505_0006
14/09/20 15:39:31 INFO mapred.JobClient: Counters: 29
14/09/20 15:39:31 INFO mapred.JobClient:   Job Counters
14/09/20 15:39:31 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:39:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10231
14/09/20 15:39:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:39:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:39:31 INFO mapred.JobClient:     Rack-local map tasks=1
14/09/20 15:39:31 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:39:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8205
14/09/20 15:39:31 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:39:31 INFO mapred.JobClient:     Bytes Written=1890073
14/09/20 15:39:31 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:39:31 INFO mapred.JobClient:     FILE_BYTES_READ=4880830
14/09/20 15:39:31 INFO mapred.JobClient:     HDFS_BYTES_READ=29314273
14/09/20 15:39:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6306594
14/09/20 15:39:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1890073
14/09/20 15:39:31 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:39:31 INFO mapred.JobClient:     Bytes Read=29314118
14/09/20 15:39:31 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:39:31 INFO mapred.JobClient:     Map output materialized bytes=1309902
14/09/20 15:39:31 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:39:31 INFO mapred.JobClient:     Reduce shuffle bytes=1309902
14/09/20 15:39:31 INFO mapred.JobClient:     Spilled Records=442190
14/09/20 15:39:31 INFO mapred.JobClient:     Map output bytes=31005336
14/09/20 15:39:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=132190208
14/09/20 15:39:31 INFO mapred.JobClient:     CPU time spent (ms)=5200
14/09/20 15:39:31 INFO mapred.JobClient:     Combine input records=2838840
14/09/20 15:39:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=155
14/09/20 15:39:31 INFO mapred.JobClient:     Reduce input records=93564
14/09/20 15:39:31 INFO mapred.JobClient:     Reduce input groups=93564
14/09/20 15:39:31 INFO mapred.JobClient:     Combine output records=348626
14/09/20 15:39:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=186679296
14/09/20 15:39:31 INFO mapred.JobClient:     Reduce output records=93564
14/09/20 15:39:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=699461632
14/09/20 15:39:31 INFO mapred.JobClient:     Map output records=2583778
14/09/20 15:39:31 INFO vectorizer.SparseVectorsFromSequenceFiles: Pruning
14/09/20 15:39:35 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:39:35 INFO mapred.JobClient: Running job: job_201409201505_0007
14/09/20 15:39:36 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:39:49 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:39:57 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:40:00 INFO mapred.JobClient:  map 100% reduce 67%
14/09/20 15:40:02 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:40:04 INFO mapred.JobClient: Job complete: job_201409201505_0007
14/09/20 15:40:04 INFO mapred.JobClient: Counters: 29
14/09/20 15:40:04 INFO mapred.JobClient:   Job Counters
14/09/20 15:40:04 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:40:04 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11081
14/09/20 15:40:04 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:40:04 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:40:04 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:40:04 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:40:04 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=13644
14/09/20 15:40:04 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:40:04 INFO mapred.JobClient:     Bytes Written=28689283
14/09/20 15:40:04 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:40:04 INFO mapred.JobClient:     FILE_BYTES_READ=9646422
14/09/20 15:40:04 INFO mapred.JobClient:     HDFS_BYTES_READ=29314273
14/09/20 15:40:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=15601678
14/09/20 15:40:04 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
14/09/20 15:40:04 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:40:04 INFO mapred.JobClient:     Bytes Read=29314118
14/09/20 15:40:04 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:40:04 INFO mapred.JobClient:     Map output materialized bytes=7741585
14/09/20 15:40:04 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:40:04 INFO mapred.JobClient:     Reduce shuffle bytes=7741585
14/09/20 15:40:04 INFO mapred.JobClient:     Spilled Records=37692
14/09/20 15:40:04 INFO mapred.JobClient:     Map output bytes=28984080
14/09/20 15:40:04 INFO mapred.JobClient:     Total committed heap usage (bytes)=192774144
14/09/20 15:40:04 INFO mapred.JobClient:     CPU time spent (ms)=8050
14/09/20 15:40:04 INFO mapred.JobClient:     Combine input records=0
14/09/20 15:40:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=155
14/09/20 15:40:04 INFO mapred.JobClient:     Reduce input records=18846
14/09/20 15:40:04 INFO mapred.JobClient:     Reduce input groups=18846
14/09/20 15:40:04 INFO mapred.JobClient:     Combine output records=0
14/09/20 15:40:04 INFO mapred.JobClient:     Physical memory (bytes) snapshot=306360320
14/09/20 15:40:04 INFO mapred.JobClient:     Reduce output records=18846
14/09/20 15:40:04 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1039331328
14/09/20 15:40:04 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:40:05 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:40:05 INFO mapred.JobClient: Running job: job_201409201505_0008
14/09/20 15:40:06 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:40:15 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:40:22 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:40:25 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:40:25 INFO mapred.JobClient: Job complete: job_201409201505_0008
14/09/20 15:40:25 INFO mapred.JobClient: Counters: 29
14/09/20 15:40:25 INFO mapred.JobClient:   Job Counters
14/09/20 15:40:25 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:40:25 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7062
14/09/20 15:40:25 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:40:25 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:40:25 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:40:25 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:40:25 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9478
14/09/20 15:40:25 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:40:25 INFO mapred.JobClient:     Bytes Written=28689283
14/09/20 15:40:25 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:40:25 INFO mapred.JobClient:     FILE_BYTES_READ=28437750
14/09/20 15:40:25 INFO mapred.JobClient:     HDFS_BYTES_READ=28689448
14/09/20 15:40:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56991474
14/09/20 15:40:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
14/09/20 15:40:25 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:40:25 INFO mapred.JobClient:     Bytes Read=28689283
14/09/20 15:40:25 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:40:25 INFO mapred.JobClient:     Map output materialized bytes=28437750
14/09/20 15:40:25 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:40:25 INFO mapred.JobClient:     Reduce shuffle bytes=28437750
14/09/20 15:40:25 INFO mapred.JobClient:     Spilled Records=37692
14/09/20 15:40:25 INFO mapred.JobClient:     Map output bytes=28362505
14/09/20 15:40:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=160694272
14/09/20 15:40:25 INFO mapred.JobClient:     CPU time spent (ms)=3090
14/09/20 15:40:25 INFO mapred.JobClient:     Combine input records=0
14/09/20 15:40:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=165
14/09/20 15:40:25 INFO mapred.JobClient:     Reduce input records=18846
14/09/20 15:40:25 INFO mapred.JobClient:     Reduce input groups=18846
14/09/20 15:40:25 INFO mapred.JobClient:     Combine output records=0
14/09/20 15:40:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=212541440
14/09/20 15:40:25 INFO mapred.JobClient:     Reduce output records=18846
14/09/20 15:40:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=698466304
14/09/20 15:40:25 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:40:25 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jifeng/20news-vectors/tf-vectors-partial
14/09/20 15:40:25 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jifeng/20news-vectors/tf-vectors-toprune
14/09/20 15:40:29 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:40:29 INFO mapred.JobClient: Running job: job_201409201505_0009
14/09/20 15:40:30 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:40:40 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:40:48 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:40:51 INFO mapred.JobClient:  map 100% reduce 89%
14/09/20 15:40:53 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:40:54 INFO mapred.JobClient: Job complete: job_201409201505_0009
14/09/20 15:40:54 INFO mapred.JobClient: Counters: 29
14/09/20 15:40:54 INFO mapred.JobClient:   Job Counters
14/09/20 15:40:54 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:40:54 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9479
14/09/20 15:40:54 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:40:54 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:40:54 INFO mapred.JobClient:     Rack-local map tasks=1
14/09/20 15:40:54 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:40:54 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=12699
14/09/20 15:40:54 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:40:54 INFO mapred.JobClient:     Bytes Written=28689283
14/09/20 15:40:54 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:40:54 INFO mapred.JobClient:     FILE_BYTES_READ=30342579
14/09/20 15:40:54 INFO mapred.JobClient:     HDFS_BYTES_READ=28689430
14/09/20 15:40:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56995482
14/09/20 15:40:54 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
14/09/20 15:40:54 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:40:54 INFO mapred.JobClient:     Bytes Read=28689283
14/09/20 15:40:54 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:40:54 INFO mapred.JobClient:     Map output materialized bytes=28437750
14/09/20 15:40:54 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:40:54 INFO mapred.JobClient:     Reduce shuffle bytes=28437750
14/09/20 15:40:54 INFO mapred.JobClient:     Spilled Records=37692
14/09/20 15:40:54 INFO mapred.JobClient:     Map output bytes=28362505
14/09/20 15:40:54 INFO mapred.JobClient:     Total committed heap usage (bytes)=192151552
14/09/20 15:40:54 INFO mapred.JobClient:     CPU time spent (ms)=6690
14/09/20 15:40:54 INFO mapred.JobClient:     Combine input records=0
14/09/20 15:40:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=147
14/09/20 15:40:54 INFO mapred.JobClient:     Reduce input records=18846
14/09/20 15:40:54 INFO mapred.JobClient:     Reduce input groups=18846
14/09/20 15:40:54 INFO mapred.JobClient:     Combine output records=0
14/09/20 15:40:54 INFO mapred.JobClient:     Physical memory (bytes) snapshot=294711296
14/09/20 15:40:54 INFO mapred.JobClient:     Reduce output records=18846
14/09/20 15:40:54 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1024688128
14/09/20 15:40:54 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:40:55 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:40:55 INFO mapred.JobClient: Running job: job_201409201505_0010
14/09/20 15:40:56 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:41:06 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:41:13 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:41:15 INFO mapred.JobClient: Job complete: job_201409201505_0010
14/09/20 15:41:15 INFO mapred.JobClient: Counters: 29
14/09/20 15:41:15 INFO mapred.JobClient:   Job Counters
14/09/20 15:41:15 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:41:15 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=7371
14/09/20 15:41:15 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:41:15 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:41:15 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:41:15 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:41:15 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9851
14/09/20 15:41:15 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:41:15 INFO mapred.JobClient:     Bytes Written=28689283
14/09/20 15:41:15 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:41:15 INFO mapred.JobClient:     FILE_BYTES_READ=28437750
14/09/20 15:41:15 INFO mapred.JobClient:     HDFS_BYTES_READ=28689437
14/09/20 15:41:15 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=56992548
14/09/20 15:41:15 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=28689283
14/09/20 15:41:15 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:41:15 INFO mapred.JobClient:     Bytes Read=28689283
14/09/20 15:41:15 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:41:15 INFO mapred.JobClient:     Map output materialized bytes=28437750
14/09/20 15:41:15 INFO mapred.JobClient:     Map input records=18846
14/09/20 15:41:15 INFO mapred.JobClient:     Reduce shuffle bytes=28437750
14/09/20 15:41:15 INFO mapred.JobClient:     Spilled Records=37692
14/09/20 15:41:15 INFO mapred.JobClient:     Map output bytes=28362505
14/09/20 15:41:15 INFO mapred.JobClient:     Total committed heap usage (bytes)=160694272
14/09/20 15:41:15 INFO mapred.JobClient:     CPU time spent (ms)=3450
14/09/20 15:41:15 INFO mapred.JobClient:     Combine input records=0
14/09/20 15:41:15 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
14/09/20 15:41:15 INFO mapred.JobClient:     Reduce input records=18846
14/09/20 15:41:15 INFO mapred.JobClient:     Reduce input groups=18846
14/09/20 15:41:15 INFO mapred.JobClient:     Combine output records=0
14/09/20 15:41:15 INFO mapred.JobClient:     Physical memory (bytes) snapshot=213041152
14/09/20 15:41:15 INFO mapred.JobClient:     Reduce output records=18846
14/09/20 15:41:15 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=700563456
14/09/20 15:41:15 INFO mapred.JobClient:     Map output records=18846
14/09/20 15:41:15 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jifeng/20news-vectors/partial-vectors-0
14/09/20 15:41:15 INFO driver.MahoutDriver: Program took 232258 ms (Minutes: 3.8709666666666664)
+ echo 'Creating training and holdout set with a random 80-20 split of the generated vector dataset'
Creating training and holdout set with a random 80-20 split of the generated vector dataset
+ ./bin/mahout split -i /tmp/mahout-work-jifeng/20news-vectors/tfidf-vectors --trainingOutput /tmp/mahout-work-jifeng/20news-train-vectors --testOutput /tmp/mahout-work-jifeng/20news-test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.Running on hadoop, using /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/jifeng/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB: /home/jifeng/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.14/09/20 15:41:17 WARN driver.MahoutDriver: No split.props found on classpath, will use command-line arguments only
14/09/20 15:41:18 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-jifeng/20news-vectors/tfidf-vectors], --method=[sequential], --overwrite=null, --randomSelectionPct=[40], --sequenceFiles=null, --startPhase=[0], --tempDir=[temp], --testOutput=[/tmp/mahout-work-jifeng/20news-test-vectors], --trainingOutput=[/tmp/mahout-work-jifeng/20news-train-vectors]}
14/09/20 15:41:19 INFO utils.SplitInput: part-r-00000 has 162419 lines
14/09/20 15:41:19 INFO utils.SplitInput: part-r-00000 test split size is 64968 based on random selection percentage 40
14/09/20 15:41:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/09/20 15:41:20 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/09/20 15:41:20 INFO compress.CodecPool: Got brand-new compressor
14/09/20 15:41:21 INFO compress.CodecPool: Got brand-new compressor
14/09/20 15:41:25 INFO utils.SplitInput: file: part-r-00000, input: 162419 train: 11198, test: 7648 starting at 0
14/09/20 15:41:25 INFO driver.MahoutDriver: Program took 7698 ms (Minutes: 0.1283)
+ echo 'Training Naive Bayes model'
Training Naive Bayes model
+ ./bin/mahout trainnb -i /tmp/mahout-work-jifeng/20news-train-vectors -el -o /tmp/mahout-work-jifeng/model -li /tmp/mahout-work-jifeng/labelindex -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.Running on hadoop, using /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/jifeng/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB: /home/jifeng/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.14/09/20 15:41:28 WARN driver.MahoutDriver: No trainnb.props found on classpath, will use command-line arguments only
14/09/20 15:41:28 INFO common.AbstractJob: Command line arguments: {--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, --input=[/tmp/mahout-work-jifeng/20news-train-vectors], --labelIndex=[/tmp/mahout-work-jifeng/labelindex], --output=[/tmp/mahout-work-jifeng/model], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
14/09/20 15:41:28 INFO util.NativeCodeLoader: Loaded the native-hadoop library
14/09/20 15:41:28 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/09/20 15:41:28 INFO compress.CodecPool: Got brand-new decompressor
14/09/20 15:41:34 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:41:37 INFO mapred.JobClient: Running job: job_201409201505_0011
14/09/20 15:41:38 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:41:48 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:41:56 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:41:57 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:41:58 INFO mapred.JobClient: Job complete: job_201409201505_0011
14/09/20 15:41:58 INFO mapred.JobClient: Counters: 29
14/09/20 15:41:58 INFO mapred.JobClient:   Job Counters
14/09/20 15:41:58 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:41:58 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=5959
14/09/20 15:41:58 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:41:58 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:41:58 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:41:58 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:41:58 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8506
14/09/20 15:41:58 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:41:58 INFO mapred.JobClient:     Bytes Written=2736814
14/09/20 15:41:58 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:41:58 INFO mapred.JobClient:     FILE_BYTES_READ=1515825
14/09/20 15:41:58 INFO mapred.JobClient:     HDFS_BYTES_READ=12691767
14/09/20 15:41:58 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=3149124
14/09/20 15:41:58 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2736814
14/09/20 15:41:58 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:41:58 INFO mapred.JobClient:     Bytes Read=12691625
14/09/20 15:41:58 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:41:58 INFO mapred.JobClient:     Map output materialized bytes=1515143
14/09/20 15:41:58 INFO mapred.JobClient:     Map input records=11198
14/09/20 15:41:58 INFO mapred.JobClient:     Reduce shuffle bytes=1515143
14/09/20 15:41:58 INFO mapred.JobClient:     Spilled Records=40
14/09/20 15:41:58 INFO mapred.JobClient:     Map output bytes=16617381
14/09/20 15:41:58 INFO mapred.JobClient:     Total committed heap usage (bytes)=219086848
14/09/20 15:41:58 INFO mapred.JobClient:     CPU time spent (ms)=2870
14/09/20 15:41:58 INFO mapred.JobClient:     Combine input records=11198
14/09/20 15:41:58 INFO mapred.JobClient:     SPLIT_RAW_BYTES=142
14/09/20 15:41:58 INFO mapred.JobClient:     Reduce input records=20
14/09/20 15:41:58 INFO mapred.JobClient:     Reduce input groups=20
14/09/20 15:41:58 INFO mapred.JobClient:     Combine output records=20
14/09/20 15:41:58 INFO mapred.JobClient:     Physical memory (bytes) snapshot=204357632
14/09/20 15:41:58 INFO mapred.JobClient:     Reduce output records=20
14/09/20 15:41:58 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=701775872
14/09/20 15:41:58 INFO mapred.JobClient:     Map output records=11198
14/09/20 15:41:58 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:41:58 INFO mapred.JobClient: Running job: job_201409201505_0012
14/09/20 15:41:59 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:42:06 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:42:13 INFO mapred.JobClient:  map 100% reduce 33%
14/09/20 15:42:14 INFO mapred.JobClient:  map 100% reduce 100%
14/09/20 15:42:14 INFO mapred.JobClient: Job complete: job_201409201505_0012
14/09/20 15:42:14 INFO mapred.JobClient: Counters: 29
14/09/20 15:42:14 INFO mapred.JobClient:   Job Counters
14/09/20 15:42:14 INFO mapred.JobClient:     Launched reduce tasks=1
14/09/20 15:42:14 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=3188
14/09/20 15:42:14 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:42:14 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:42:14 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:42:14 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:42:14 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=8219
14/09/20 15:42:14 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:42:14 INFO mapred.JobClient:     Bytes Written=899207
14/09/20 15:42:14 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:42:14 INFO mapred.JobClient:     FILE_BYTES_READ=444260
14/09/20 15:42:14 INFO mapred.JobClient:     HDFS_BYTES_READ=2736948
14/09/20 15:42:14 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1007696
14/09/20 15:42:14 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=899207
14/09/20 15:42:14 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:42:14 INFO mapred.JobClient:     Bytes Read=2736814
14/09/20 15:42:14 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:42:14 INFO mapred.JobClient:     Map output materialized bytes=444252
14/09/20 15:42:14 INFO mapred.JobClient:     Map input records=20
14/09/20 15:42:14 INFO mapred.JobClient:     Reduce shuffle bytes=444252
14/09/20 15:42:14 INFO mapred.JobClient:     Spilled Records=4
14/09/20 15:42:14 INFO mapred.JobClient:     Map output bytes=899081
14/09/20 15:42:14 INFO mapred.JobClient:     Total committed heap usage (bytes)=225873920
14/09/20 15:42:14 INFO mapred.JobClient:     CPU time spent (ms)=960
14/09/20 15:42:14 INFO mapred.JobClient:     Combine input records=2
14/09/20 15:42:14 INFO mapred.JobClient:     SPLIT_RAW_BYTES=134
14/09/20 15:42:14 INFO mapred.JobClient:     Reduce input records=2
14/09/20 15:42:14 INFO mapred.JobClient:     Reduce input groups=2
14/09/20 15:42:14 INFO mapred.JobClient:     Combine output records=2
14/09/20 15:42:14 INFO mapred.JobClient:     Physical memory (bytes) snapshot=224452608
14/09/20 15:42:14 INFO mapred.JobClient:     Reduce output records=2
14/09/20 15:42:14 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=701775872
14/09/20 15:42:14 INFO mapred.JobClient:     Map output records=2
14/09/20 15:42:15 INFO driver.MahoutDriver: Program took 47200 ms (Minutes: 0.7866666666666666)
+ echo 'Self testing on training set'
Self testing on training set
+ ./bin/mahout testnb -i /tmp/mahout-work-jifeng/20news-train-vectors -m /tmp/mahout-work-jifeng/model -l /tmp/mahout-work-jifeng/labelindex -ow -o /tmp/mahout-work-jifeng/20news-testing
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.Running on hadoop, using /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/jifeng/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB: /home/jifeng/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.14/09/20 15:42:17 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
14/09/20 15:42:17 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-jifeng/20news-train-vectors], --labelIndex=[/tmp/mahout-work-jifeng/labelindex], --model=[/tmp/mahout-work-jifeng/model], --output=[/tmp/mahout-work-jifeng/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
14/09/20 15:42:20 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:42:21 INFO mapred.JobClient: Running job: job_201409201505_0013
14/09/20 15:42:22 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:42:35 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:42:35 INFO mapred.JobClient: Job complete: job_201409201505_0013
14/09/20 15:42:35 INFO mapred.JobClient: Counters: 20
14/09/20 15:42:35 INFO mapred.JobClient:   Job Counters
14/09/20 15:42:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9283
14/09/20 15:42:35 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:42:35 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:42:35 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:42:35 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:42:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/09/20 15:42:35 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:42:35 INFO mapred.JobClient:     Bytes Written=2109460
14/09/20 15:42:35 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:42:35 INFO mapred.JobClient:     FILE_BYTES_READ=3663744
14/09/20 15:42:35 INFO mapred.JobClient:     HDFS_BYTES_READ=12691767
14/09/20 15:42:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58876
14/09/20 15:42:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=2109460
14/09/20 15:42:35 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:42:35 INFO mapred.JobClient:     Bytes Read=12691625
14/09/20 15:42:35 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:42:35 INFO mapred.JobClient:     Map input records=11198
14/09/20 15:42:35 INFO mapred.JobClient:     Physical memory (bytes) snapshot=57454592
14/09/20 15:42:35 INFO mapred.JobClient:     Spilled Records=0
14/09/20 15:42:35 INFO mapred.JobClient:     CPU time spent (ms)=5310
14/09/20 15:42:35 INFO mapred.JobClient:     Total committed heap usage (bytes)=29954048
14/09/20 15:42:35 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=349515776
14/09/20 15:42:35 INFO mapred.JobClient:     Map output records=11198
14/09/20 15:42:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=142
14/09/20 15:42:36 INFO test.TestNaiveBayesDriver: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :      11119       99.2945%
Incorrectly Classified Instances        :         79        0.7055%
Total Classified Instances              :      11198=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j       k       l       m       n        o       p       q       r       s       t       <--Classified as
434     0       0       0       0       0       0       0       0       0       0       0       0       0        0       0       0       0       1       0        |  435         a     = alt.atheism
0       555     0       2       1       3       0       0       0       0       0       1       0       1        0       0       0       0       0       0        |  563         b     = comp.graphics
0       6       553     18      1       1       0       0       0       0       0       0       0       0        0       0       0       0       0       0        |  579         c     = comp.os.ms-windows.misc
0       0       0       564     1       0       3       0       0       0       0       0       0       0        0       0       0       0       0       0        |  568         d     = comp.sys.ibm.pc.hardware
0       0       1       0       573     0       0       0       0       0       0       0       1       0        0       0       0       0       0       0        |  575         e     = comp.sys.mac.hardware
0       1       1       0       0       585     0       0       0       0       0       0       0       0        0       0       0       0       0       0        |  587         f     = comp.windows.x
0       0       0       1       0       0       582     1       0       0       0       0       2       0        0       0       0       0       0       0        |  586         g     = misc.forsale
0       0       0       0       1       0       1       613     0       0       0       0       0       0        0       0       0       0       0       1        |  616         h     = rec.autos
0       0       0       0       0       0       1       0       603     0       0       0       0       0        0       0       0       0       0       0        |  604         i     = rec.motorcycles
0       0       0       0       0       0       0       0       0       595     0       1       0       0        0       0       0       0       0       0        |  596         j     = rec.sport.baseball
0       0       0       0       0       0       0       0       0       0       584     0       0       0        0       0       0       0       0       1        |  585         k     = rec.sport.hockey
0       0       0       0       0       0       0       0       0       0       0       583     1       0        0       0       0       0       0       1        |  585         l     = sci.crypt
0       0       0       2       0       0       0       0       0       0       0       0       584     0        0       0       0       0       0       0        |  586         m     = sci.electronics
0       1       0       0       0       0       0       0       0       0       0       0       1       570      0       0       0       0       0       0        |  572         n     = sci.med
0       0       0       0       0       0       0       0       0       0       0       0       0       1        617     0       0       0       0       0        |  618         o     = sci.space
0       0       0       0       0       0       0       0       0       0       0       0       0       0        0       592     1       0       0       0        |  593         p     = soc.religion.christian
0       0       0       0       0       0       0       0       0       0       0       0       0       0        0       0       565     0       0       0        |  565         q     = talk.politics.mideast
0       0       0       0       0       0       0       0       0       0       0       1       0       0        0       0       0       544     0       0        |  545         r     = talk.politics.guns
7       0       0       0       0       0       0       0       0       0       0       0       0       0        1       1       0       1       359     1        |  370         s     = talk.religion.misc
0       0       0       0       0       0       0       0       0       0       0       1       0       0        0       0       0       5       0       464      |  470         t     = talk.politics.misc=======================================================
Statistics
-------------------------------------------------------
Kappa                                        0.987
Accuracy                                   99.2945%
Reliability                                94.5236%
Reliability (standard deviation)            0.216914/09/20 15:42:36 INFO driver.MahoutDriver: Program took 18730 ms (Minutes: 0.31216666666666665)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /tmp/mahout-work-jifeng/20news-test-vectors -m /tmp/mahout-work-jifeng/model -l /tmp/mahout-work-jifeng/labelindex -ow -o /tmp/mahout-work-jifeng/20news-testing
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.Running on hadoop, using /home/jifeng/hadoop/hadoop-1.2.1/bin/hadoop and HADOOP_CONF_DIR=/home/jifeng/hadoop/hadoop-1.2.1/conf
MAHOUT-JOB: /home/jifeng/hadoop/mahout-distribution-0.9/mahout-examples-0.9-job.jar
Warning: $HADOOP_HOME is deprecated.14/09/20 15:42:39 WARN driver.MahoutDriver: No testnb.props found on classpath, will use command-line arguments only
14/09/20 15:42:39 INFO common.AbstractJob: Command line arguments: {--endPhase=[2147483647], --input=[/tmp/mahout-work-jifeng/20news-test-vectors], --labelIndex=[/tmp/mahout-work-jifeng/labelindex], --model=[/tmp/mahout-work-jifeng/model], --output=[/tmp/mahout-work-jifeng/20news-testing], --overwrite=null, --startPhase=[0], --tempDir=[temp]}
14/09/20 15:42:39 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jifeng/20news-testing
14/09/20 15:42:41 INFO input.FileInputFormat: Total input paths to process : 1
14/09/20 15:42:41 INFO mapred.JobClient: Running job: job_201409201505_0014
14/09/20 15:42:42 INFO mapred.JobClient:  map 0% reduce 0%
14/09/20 15:42:59 INFO mapred.JobClient:  map 100% reduce 0%
14/09/20 15:42:59 INFO mapred.JobClient: Job complete: job_201409201505_0014
14/09/20 15:42:59 INFO mapred.JobClient: Counters: 20
14/09/20 15:42:59 INFO mapred.JobClient:   Job Counters
14/09/20 15:42:59 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=8035
14/09/20 15:42:59 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/09/20 15:42:59 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/09/20 15:42:59 INFO mapred.JobClient:     Launched map tasks=1
14/09/20 15:42:59 INFO mapred.JobClient:     Data-local map tasks=1
14/09/20 15:42:59 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
14/09/20 15:42:59 INFO mapred.JobClient:   File Output Format Counters
14/09/20 15:42:59 INFO mapred.JobClient:     Bytes Written=1440968
14/09/20 15:42:59 INFO mapred.JobClient:   FileSystemCounters
14/09/20 15:42:59 INFO mapred.JobClient:     FILE_BYTES_READ=3663744
14/09/20 15:42:59 INFO mapred.JobClient:     HDFS_BYTES_READ=8662748
14/09/20 15:42:59 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58876
14/09/20 15:42:59 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1440968
14/09/20 15:42:59 INFO mapred.JobClient:   File Input Format Counters
14/09/20 15:42:59 INFO mapred.JobClient:     Bytes Read=8662607
14/09/20 15:42:59 INFO mapred.JobClient:   Map-Reduce Framework
14/09/20 15:42:59 INFO mapred.JobClient:     Map input records=7648
14/09/20 15:42:59 INFO mapred.JobClient:     Physical memory (bytes) snapshot=58269696
14/09/20 15:42:59 INFO mapred.JobClient:     Spilled Records=0
14/09/20 15:42:59 INFO mapred.JobClient:     CPU time spent (ms)=4120
14/09/20 15:42:59 INFO mapred.JobClient:     Total committed heap usage (bytes)=31711232
14/09/20 15:42:59 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=349515776
14/09/20 15:42:59 INFO mapred.JobClient:     Map output records=7648
14/09/20 15:42:59 INFO mapred.JobClient:     SPLIT_RAW_BYTES=141
14/09/20 15:43:00 INFO test.TestNaiveBayesDriver: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances          :       6913       90.3896%
Incorrectly Classified Instances        :        735        9.6104%
Total Classified Instances              :       7648=======================================================
Confusion Matrix
-------------------------------------------------------
a       b       c       d       e       f       g       h       i       j       k       l       m       n        o       p       q       r       s       t       <--Classified as
327     0       0       0       1       0       0       0       0       1       0       0       0       1        0       9       0       2       20      3        |  364         a     = alt.atheism
0       350     2       22      7       10      8       0       0       1       0       1       2       1        5       0       0       0       1       0        |  410         b     = comp.graphics
0       25      254     73      17      20      8       0       0       0       0       1       4       2        0       0       0       0       0       2        |  406         c     = comp.os.ms-windows.misc
1       4       2       375     18      3       4       1       0       0       0       0       5       0        0       0       0       0       1       0        |  414         d     = comp.sys.ibm.pc.hardware
0       4       3       16      355     0       5       0       0       0       0       1       4       0        0       0       0       0       0       0        |  388         e     = comp.sys.mac.hardware
0       28      0       7       8       348     2       0       1       1       0       5       0       0        1       0       0       0       0       0        |  401         f     = comp.windows.x
1       5       1       15      9       0       330     11      4       0       3       0       5       2        1       0       0       0       1       1        |  389         g     = misc.forsale
0       1       0       0       1       1       7       357     3       0       0       0       1       0        2       0       0       1       0       0        |  374         h     = rec.autos
0       0       0       0       0       0       0       5       386     0       0       0       0       1        0       0       0       0       0       0        |  392         i     = rec.motorcycles
0       0       0       1       2       0       1       0       1       389     3       0       1       0        0       0       0       0       0       0        |  398         j     = rec.sport.baseball
0       0       0       1       0       0       0       0       0       5       405     0       2       0        0       1       0       0       0       0        |  414         k     = rec.sport.hockey
1       2       1       0       3       2       1       0       0       0       0       386     1       1        0       0       0       4       1       3        |  406         l     = sci.crypt
0       3       0       14      7       1       10      6       0       0       1       2       350     1        1       0       0       1       0       1        |  398         m     = sci.electronics
1       1       1       1       2       0       6       3       0       2       1       2       2       389      2       0       1       3       1       0        |  418         n     = sci.med
0       3       0       0       4       2       1       0       0       0       0       1       3       1        346     0       0       2       3       3        |  369         o     = sci.space
2       3       0       1       0       0       0       0       0       1       0       0       1       4        0       382     0       1       8       1        |  404         p     = soc.religion.christian
1       1       0       0       0       0       0       0       0       0       1       0       0       0        0       3       367     0       0       2        |  375         q     = talk.politics.mideast
0       0       1       0       0       0       0       2       2       0       0       2       0       0        1       0       0       347     1       9        |  365         r     = talk.politics.guns
24      1       0       0       0       1       1       0       0       0       1       0       0       0        0       17      3       7       196     7        |  258         s     = talk.religion.misc
1       0       0       0       2       0       0       1       0       0       1       1       1       0        2       2       3       15      2       274      |  305         t     = talk.politics.misc=======================================================
Statistics
-------------------------------------------------------
Kappa                                       0.8758
Accuracy                                   90.3896%
Reliability                                85.9085%
Reliability (standard deviation)            0.213814/09/20 15:43:00 INFO driver.MahoutDriver: Program took 21261 ms (Minutes: 0.35435)

mahout贝叶斯分类器测试样例相关推荐

  1. 好消息,MaxtoCode完全支持2005BETA2版,多谢恩电提供的测试样例

    好消息,MaxtoCode完全支持2005BETA2版,多谢"恩电"提供的测试样例

  2. tesseract 测试样例

    该图片的链接为https://raw.githubusercontent.com/Python3WebSpider/TestTess/master/image.png,可以直接保存或下载. 首先用命令 ...

  3. 机器学习(五)---贝叶斯分类器算法总结

    1. 综述 1.1贝叶斯分类器 贝叶斯决策论是概率框架下实施决策的基本方法.对分类任务来说,在所有概率都已经知道的理想情况下,贝叶斯决策论考虑如何基于这些概率和误判损失来选择最优的类别标记. 具体的说 ...

  4. 实现贝叶斯分类器_机器学习实战项目-朴素贝叶斯

    朴素贝叶斯 概述 贝叶斯分类是一类分类算法的总称,这类算法均以贝叶斯定理为基础,故统称为贝叶斯分类.本章首先介绍贝叶斯分类算法的基础--贝叶斯定理.最后,我们通过实例来讨论贝叶斯分类的中最简单的一种: ...

  5. 机器学习:基于概率的朴素贝叶斯分类器详解--Python实现以及项目实战

    前言 前篇基础理论知识:机器学习:贝叶斯分类器详解(一)-贝叶斯决策理论与朴素贝叶斯 这篇主要使用代码实现贝叶斯分类. 一.准备数据 创建一个bayes.py程序,从文本中构建词向量,实现词表向向量转 ...

  6. 朴素贝叶斯分类器的应用-转载加我的理解注释

    2019独角兽企业重金招聘Python工程师标准>>> 生活中很多场合需要用到分类,比如新闻分类.病人分类等等. 本文介绍朴素贝叶斯分类器(Naive Bayes classifie ...

  7. sklearn朴素贝叶斯分类器_机器学习06——朴素贝叶斯

    一.概率公式: 条件概率公式: 事件A发生的条件下,事件B发生的概率=事件A和事件B同时发生的概率/事件A发生的概率 P(AB)=P(A)*P(B|A) 事件A和事件B同时发生的概率=事件A发生的概率 ...

  8. 《机器学习》 周志华学习笔记第七章 贝叶斯分类器(课后习题)python 实现

    课后习题答案 1.试用极大似然法估算西瓜集3.0中前3个属性的类条件概率. 好瓜有8个,坏瓜有9个 属性色泽,根蒂,敲声,因为是离散属性,根据公式(7.17) P(色泽=青绿|好瓜=是) = 3/8 ...

  9. (数据科学学习手札30)朴素贝叶斯分类器的原理详解Python与R实现

    一.简介 要介绍朴素贝叶斯(naive bayes)分类器,就不得不先介绍贝叶斯决策论的相关理论: 贝叶斯决策论(bayesian decision theory)是概率框架下实施决策的基本方法.对分 ...

最新文章

  1. 计算机音乐谱打上花火,原神乐谱打上花火
  2. 国产期刊崛起!上海大学、郑州大学主办期刊IF超15,2021版SCI期刊影响因子出炉...
  3. 【Kaggle-MNIST之路】两层的神经网络Pytorch(四行代码的模型)
  4. 成功解决AttributeError: 'map' object has no attribute 'items'
  5. Windows 7 Bcdedit 应用
  6. SpringCloud Finchley基础教程:3,spring cloud gateway网关
  7. arch linux 安装xfce_树莓派安装ArchLinux+桌面环境
  8. java代码识别_识别Java中的代码气味
  9. 周易Java_周易API接口_免费数据接口 - 极速数据
  10. libvirt命令行文档
  11. 悬镜服务器系统,悬镜服务器卫士V3.3.0.3961更新通知
  12. NOIP 2016 游记
  13. Linux系统 UDP 丢包问题分析思路和修改网卡缓存
  14. 【MATLAB】基本绘图 ( 绘图基本步骤 | plot 函数 | 多曲线绘制 | hold on / off )
  15. 57、RapidJson存储Base64数据和空间释放
  16. U盘里面的文件夹变成文件也打不开文件的寻回方法
  17. 使用虚拟机备份软件恢复VMware vSphere虚拟机
  18. 【电脑使用】修改注册表——让有密码的电脑开机自动登录
  19. 药物临床试验登记信息数据库18557条(2022年12月更新)
  20. *****MBA数学备考良言一(chenjian)*****

热门文章

  1. java对xml文件的解析_Java对XML文件的解析
  2. 发那科机器人控制柜示教器不通电_FANUC发那科工业机器人本地自动运行的设置方式...
  3. html5离线保存需要联网吗,html5 离线存储
  4. mybatis 动态字段与表中不一样_mybatis创建一个或多个新用户 insert 字段和表名不确定时动态添加问题...
  5. 八十五、store数据,actionCreators 与 constants 的拆分和redux-immutable的使用
  6. pyqt5中的对话框
  7. 最新综述:图像分类中的对抗机器学习
  8. 机器翻译中丢掉词向量层会怎样?
  9. 再谈变分自编码器VAE:从贝叶斯观点出发
  10. java第六章工具包6.1P6-01.Collections 2020.4.3+7