运行Hadoop自带的wordcount单词统计程序

1.使用示例程序实现单词统计

（1）wordcount程序

wordcount程序在hadoop的share目录下，如下：

[root@leaf mapreduce]# pwd

/usr/local/hadoop/share/hadoop/mapreduce

[root@leaf mapreduce]# ls

hadoop-mapreduce-client-app-2.6.5.jar hadoop-mapreduce-client-jobclient-2.6.5-tests.jar

hadoop-mapreduce-client-common-2.6.5.jar hadoop-mapreduce-client-shuffle-2.6.5.jar

hadoop-mapreduce-client-core-2.6.5.jar hadoop-mapreduce-examples-2.6.5.jar

hadoop-mapreduce-client-hs-2.6.5.jar lib

hadoop-mapreduce-client-hs-plugins-2.6.5.jar lib-examples

hadoop-mapreduce-client-jobclient-2.6.5.jar sources

就是这个hadoop-mapreduce-examples-2.6.5.jar程序。

（2）创建HDFS数据目录

创建一个目录，用于保存MapReduce任务的输入文件：

1	`[root@leaf ~]# hadoop fs -mkdir -p /data/wordcount`

创建一个目录，用于保存MapReduce任务的输出文件：

1	`[root@leaf ~]# hadoop fs -mkdir /output`

查看刚刚创建的两个目录：

[root@leaf ~]# hadoop fs -ls /

drwxr-xr-x - root supergroup 0 2017-09-01 20:34 /data

drwxr-xr-x - root supergroup 0 2017-09-01 20:35 /output

（3）创建一个单词文件，并上传到HDFS

创建的单词文件如下：

[root@leaf ~]# cat myword.txt

leaf yyh

yyh xpleaf

katy ling

yeyonghao leaf

xpleaf katy

上传该文件到HDFS中：

1	`[root@leaf ~]# hadoop fs -put myword.txt /data/wordcount`

在HDFS中查看刚刚上传的文件及内容：

[root@leaf ~]# hadoop fs -ls /data/wordcount

-rw-r--r-- 1 root supergroup 57 2017-09-01 20:40 /data/wordcount/myword.txt

[root@leaf ~]# hadoop fs -cat /data/wordcount/myword.txt

leaf yyh

yyh xpleaf

katy ling

yeyonghao leaf

xpleaf katy

（4）运行wordcount程序

执行如下命令：

[root@leaf ~]

# hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar wordcount /data/wordcount /output/wordcount

...

17/09/01 20:48:14 INFO mapreduce.Job: Job job_local1719603087_0001 completed successfully

17/09/01 20:48:14 INFO mapreduce.Job: Counters: 38

File System Counters

FILE: Number of bytes read=585940

FILE: Number of bytes written=1099502

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=114

HDFS: Number of bytes written=48

HDFS: Number of read operations=15

HDFS: Number of large read operations=0

HDFS: Number of write operations=4

Map-Reduce Framework

Map input records=5

Map output records=10

Map output bytes=97

Map output materialized bytes=78

Input split bytes=112

Combine input records=10

Combine output records=6

Reduce input groups=6

Reduce shuffle bytes=78

Reduce input records=6

Reduce output records=6

Spilled Records=12

Shuffled Maps =1

Failed Shuffles=0

Merged Map outputs=1

GC time elapsed (ms)=92

CPU time spent (ms)=0

Physical memory (bytes) snapshot=0

Virtual memory (bytes) snapshot=0

Total committed heap usage (bytes)=241049600

Shuffle Errors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

File Input Format Counters

Bytes Read=57

File Output Format Counters

Bytes Written=48

（5）查看统计结果

如下：

[root@leaf ~]# hadoop fs -cat /output/wordcount/part-r-00000

katy 2

leaf 2

ling 1

xpleaf 2

yeyonghao 1

yyh 2

本文转自 xpleaf 51CTO博客，原文链接：http://blog.51cto.com/xpleaf/1962271，如需转载请自行联系原作者

运行Hadoop自带的wordcount单词统计程序相关推荐

wordcount linux java_linux下在eclipse上运行hadoop自带例子wordcount
启动eclipse:打开windows->open perspective->other->map/reduce 可以看到map/reduce开发视图.设置Hadoop locati ...
运行hadoop自带的wordcount例子
2019独角兽企业重金招聘Python工程师标准>>> 1.进入hadoop目录 cd /home/jason/hadoop-1.0.1/ 2.格式化DFS文件系统 hadoop n ...
c语言文件加密解密单词统计,C语言文件加密解密及单词统计程序.doc
C语言文件加密解密及单词统计程序.doc (10页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦! 15.9 积分高级语言程序设计课程设计学院计算 ...
WordCount单词统计笔记
1.在本机的/root目录下,依次创建文件夹data,文本文件word.txt. mkdir -p /root/data vim /root/data/word.txt 键入i,进入编辑模式,输入如下 ...
hadoop基础教程(二) MapReduce 单词统计
1.这是hadoop基础系列教程,适合入门者学习. 2.MapReduce是一种分布式计算模型,解决海量数据问题,由两个阶段组成,map()和reduce().本文不讲解原理,下面实际操作利用MapR ...
windows环境下跑hadoop自带的wordcount遇到的问题
hadoop环境自己之前也接触过,搭建的是一个伪分布的环境,主从节点都在我自己的机子上,即127.0.0.1,当初记得步骤很多很麻烦的样子(可能自己用ubuntu还不够熟练),包括myeclipse. ...
hadoop实例分析之WordCount单词统计分析
WordCount单词统计分析最近在网上看了hadoop相关资料以及单词计数的一个实例,结合网上的资料和自己的看法简要分析一下执行过程. MyMapper.java package com.mpr ...
Hadoop实例之利用MapReduce实现Wordcount单词统计 (附源代码)
大致思路是将hdfs上的文本作为输入,MapReduce通过InputFormat会将文本进行切片处理,并将每行的首字母相对于文本文件的首地址的偏移量作为输入键值对的key,文本内容作为输入键值对的v ...
【Hadoop遇到的坑】运行hadoop自带的例子报错 Error: Could not find or load main class org.apache.hadoop.mapred.YarnCh
原创不易,未经允许,请勿转载. 博客主页:https://xiaojujiang.blog.csdn.net/ 2021-05-09 22:31:33,652 INFO client.RMProxy: ...

运行Hadoop自带的wordcount单词统计程序

运行Hadoop自带的wordcount单词统计程序相关推荐

最新文章

热门文章