创建MapperTask

创建一个java类继承Mapper父类

接口形参说明

参数	说明
K1	默认是一行一行读取的偏移量的类型
V1	默认读取的一行的类型
K2	用户处理完成后返回的数据的key的类型
V2	用户处理完成后返回的value的类型

注意数据经过网络传输，所以需要序列化

数据类型	序列化类型
Integer	IntWritable
Long	LongWritable
Double	DoubleWritable
Float	FloatWritable
String	Text
null	NullWritable
Boolean	BooleanWritable
…

/*** 注意数据经过网络传输，所以需要序列化* * KEYIN:默认是一行一行读取的偏移量  long LongWritable* VALUEIN:默认读取的一行的类型 String * * KEYOUT:用户处理完成后返回的数据的key String LongWritable* VALUEOUT:用户处理完成后返回的value integer IntWritable* @author 波波烤鸭*       dengpbs@163.com*/
public class MyMapperTask extends Mapper<LongWritable, Text, Text, IntWritable> {/*** Map阶段的业务逻辑写在Map方法中* 默认是 每读取一行记录就会调用一次该方法* @param key 读取的偏移量* @param value 读取的那行数据*/@Overrideprotected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String line = value.toString();// 根据空格切割单词String[] words = line.split(" ");for (String word : words) {// 将单词作为key 将1作为值 以便于后续的数据分发context.write(new Text(word), new IntWritable(1));}}
}

创建ReduceTask

创建java类继承自Reducer父类。

参数	说明
KEYIN	对应的是map阶段的 KEYOUT
VALUEIN	对应的是map阶段的 VALUEOUT
KEYOUT	reduce逻辑处理的输出Key类型
VALUEOUT	reduce逻辑处理的输出Value类型

/*** KEYIN和VALUEIN 对应的是map阶段的 KEYOUT和VALUEOUT* * KEYOUT:    reduce逻辑处理的输出类型* VALUEOUT:* @author 波波烤鸭*      dengpbs@163.com*/
public class MyReducerTask extends Reducer<Text, IntWritable, Text, IntWritable>{/*** @param key map阶段输出的key* @param values map阶段输出的相同的key对应的数据集* @param context 上下文*/@Overrideprotected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int count = 0 ;// 统计同一个key下的单词的个数for (IntWritable value : values) {count += value.get();}context.write(key, new IntWritable(count));}
}

创建启动工具类

package com.bobo.mr.wc;import java.io.IOException;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WcTest {public static void main(String[] args) throws Exception {// 创建配置文件对象Configuration conf = new Configuration(true);// 获取Job对象Job job = Job.getInstance(conf);// 设置相关类job.setJarByClass(WcTest.class);// 指定 Map阶段和Reduce阶段的处理类job.setMapperClass(MyMapperTask.class);job.setReducerClass(MyReducerTask.class);// 指定Map阶段的输出类型job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);// 指定job的原始文件的输入输出路径 通过参数传入FileInputFormat.setInputPaths(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));// 提交任务，并等待响应job.waitForCompletion(true);}
}

打包部署

maven打包为jar包

上传测试

在HDFS系统中创建wordcount案例文件夹，并测试

hadoop fs -mkdir -p /hdfs/wordcount/input
hadoop fs -put a.txt b.txt /hdfs/wordcount/input/

执行程序测试

hadoop jar hadoop-demo-0.0.1-SNAPSHOT.jar com.bobo.mr.wc.WcTest /hdfs/wordcount/input /hdfs/wordcount/output/

执行成功

[root@hadoop-node01 ~]# hadoop jar hadoop-demo-0.0.1-SNAPSHOT.jar com.bobo.mr.wc.WcTest /hdfs/wordcount/input /hdfs/wordcount/output/
19/04/03 16:56:43 INFO client.RMProxy: Connecting to ResourceManager at hadoop-node01/192.168.88.61:8032
19/04/03 16:56:46 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner t
o remedy this.19/04/03 16:56:48 INFO input.FileInputFormat: Total input paths to process : 2
19/04/03 16:56:49 INFO mapreduce.JobSubmitter: number of splits:2
19/04/03 16:56:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554281786018_0001
19/04/03 16:56:52 INFO impl.YarnClientImpl: Submitted application application_1554281786018_0001
19/04/03 16:56:53 INFO mapreduce.Job: The url to track the job: http://hadoop-node01:8088/proxy/application_1554281786018_0001/
19/04/03 16:56:53 INFO mapreduce.Job: Running job: job_1554281786018_0001
19/04/03 16:57:14 INFO mapreduce.Job: Job job_1554281786018_0001 running in uber mode : false
19/04/03 16:57:14 INFO mapreduce.Job:  map 0% reduce 0%
19/04/03 16:57:38 INFO mapreduce.Job:  map 100% reduce 0%
19/04/03 16:57:56 INFO mapreduce.Job:  map 100% reduce 100%
19/04/03 16:57:57 INFO mapreduce.Job: Job job_1554281786018_0001 completed successfully
19/04/03 16:57:57 INFO mapreduce.Job: Counters: 50File System CountersFILE: Number of bytes read=181FILE: Number of bytes written=321388FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=325HDFS: Number of bytes written=87HDFS: Number of read operations=9HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=2Launched reduce tasks=1Data-local map tasks=1Rack-local map tasks=1Total time spent by all maps in occupied slots (ms)=46511Total time spent by all reduces in occupied slots (ms)=12763Total time spent by all map tasks (ms)=46511Total time spent by all reduce tasks (ms)=12763Total vcore-milliseconds taken by all map tasks=46511Total vcore-milliseconds taken by all reduce tasks=12763Total megabyte-milliseconds taken by all map tasks=47627264Total megabyte-milliseconds taken by all reduce tasks=13069312Map-Reduce FrameworkMap input records=14Map output records=14Map output bytes=147Map output materialized bytes=187Input split bytes=234Combine input records=0Combine output records=0Reduce input groups=10Reduce shuffle bytes=187Reduce input records=14Reduce output records=10Spilled Records=28Shuffled Maps =2Failed Shuffles=0Merged Map outputs=2GC time elapsed (ms)=1049CPU time spent (ms)=5040Physical memory (bytes) snapshot=343056384Virtual memory (bytes) snapshot=6182891520Total committed heap usage (bytes)=251813888Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=91File Output Format Counters Bytes Written=87

查看结果

[root@hadoop-node01 ~]# hadoop fs -cat /hdfs/wordcount/output/part-r-00000
ajax    1
bobo烤鸭  1
hello   2
java    2
mybatis 1
name    1
php 1
shell   2
spring  2
springmvc   1

OK~

Hadoop之MapReduce02【自定义wordcount案例】相关推荐

在Hadoop系统中运行WordCount案例失败解决方法
报错提示: mapreduce.shuffle set in yarn.nodemanager.aux-services is invalid 请在yarn-site.xml中添加 <prope ...
MapReduce流程（WordCount案例实现）
文章目录 1 MapReduce概述设计构思实例进程实例进程分类完整执行过程总结 2 MapReduce编程规范 Map阶段2个步骤 Shuffle阶段4个步骤 Reduce阶段2个步骤 3 ...
MapReduce之WordCount案例
前言学习大数据框架通常都是从wordcount案例开始的,也是学习框架的基础,wordcount虽然简单,如果能彻底搞清楚其运行原理,对后续深入学习和掌握MapReduce非常有帮助的,本篇以一个w ...
MapReduce入门（一）—— MapReduce概述 + WordCount案例实操
MapReduce入门(一)-- MapReduce概述文章目录 MapReduce入门(一)-- MapReduce概述 1.1 MapReduce 定义 1.2 MapReduce 优缺点 1. ...
2. WordCount案例实操
文章目录 WordCount案例实操 1. 官方WordCount源码 2. 常用数据序列化类型 3. MapReduce编程规范 3.1 Mapper阶段 3.2 Reducer阶段 3.3 Dri ...
Spark快速上手-WordCount案例
在此之前,我已经用MapReduce 框架实现了WordCount案例,接下来,我开始学习数据处理的另外一个非常重要的方法:Spark.首先,使用WordCount案例实现Spark快速上手. 创建M ...
WordCount案例
WordCount案例需求 1. 需求说明 2. 文件案例分析 1.需求分析 2.输入数据 3.期望输出数据 4.Mapper类 5. Reducer类 6. Driver类代码实现 1. 编写 ...
自定义InputFormat案例
自定义InputFormat案例背景说明需求 1. 需求说明 2.文件案例分析 1.需求 2.输入数据 3.输出数据 4.实现分析代码实现 1.自定义InputFromat 2.自定义Reco ...
自定义OutputFormat案例实操
自定义OutputFormat案例实操文章目录 1)需求 2)需求分析 3)编程实现 1.创建Mapper类 2.创建Reducer类 3.创建OutputFormat类 4.创建RecordWri ...
大数据培训课程WordCount案例实操
WordCount案例实操 1．需求在给定的文本文件中统计输出每一个单词出现的总次数 (1)输入数据 (2)期望输出数据 atguigu 2 banzhang 1 cls 2 hadoop ...

Hadoop之MapReduce02【自定义wordcount案例】

创建MapperTask

创建ReduceTask

创建启动工具类

打包部署

上传测试

Hadoop之MapReduce02【自定义wordcount案例】相关推荐

最新文章

热门文章