MapReduce之OutputFormat理解

一 OutputFormat作用

1校验job中指定输出路径是否存在

2将结果写入输出文件

二 OutputFormat的实现

2.1DBOutputFormat: 发送Reduce结果到SQL表中

2.2FileOutputFormat: 将Reduce结果写入文件中

2.2.1MapFileOutputFormat: 主要是处理MapFile(特殊的SequenceFile)的输出

2.2.2SequenceFileOutputFormat: 主要是处理SequenceFile的输出

2.2.3TextFileOutputFormat: 主要是处理普通文本的输出，也是默认实现

2.3FilterOutputFormat：主要就是方便包装其他OutputFromat(没用过)

2.4NullOutputFormat: 把所有的输出放到/dev/null(没用过)

三 MultipleOutputs

在有些场景中，我们需要将Map-Reduce结果输出到多个文件中，我们就可以使用MapOutputs这个类。

MultipleOutputs的使用步骤：

3.1我们需要在Mapper中setup方法实例化MapOutputs

3.2在map方法中使用MapOutputs对象进行write, 并且需要把你的文件命传入write方法中

3.3在完成后需要在close方法中关闭MapOutputs

3.4最后生成的结果就是你传入的文件名-m|r-0000这样的序列

public class OutputMultipleFile extends Configured implements Tool{

public static class OutputMultipleMapper extendsMapper<LongWritable, Text, Text, Text>{

private Text key1 = new Text();

private Text value1 = new Text();

private MultipleOutputs<Text, Text> mos;

@Override

protected void cleanup(Mapper<LongWritable, Text, Text,Text>.Context context)

throws IOException, InterruptedException {

super.cleanup(context);

mos.close();

}

@Override

protected void setup(Mapper<LongWritable, Text, Text,Text>.Context context)

throws IOException, InterruptedException {

super.setup(context);

mos = new MultipleOutputs<Text, Text>(context);

}

@Override

protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text,Text>.Context context)

throws IOException, InterruptedException {

if (value == null) {

return;

}

StringTokenizer tokenizer = new StringTokenizer(value.toString());

while (tokenizer.hasMoreTokens()) {

String token = tokenizer.nextToken();

key1.set(token);

value1.set("=>"+key1);

mos.write(key1, value1, generateFileName(key1));

}

private String generateFileName(Text key){

if (key == null) {

return "default";

}

int len = key.toString().length();

if (len <5) {

return "primary";

}

return "extended";

}

public int run(String[] args) throws Exception {

Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

//对数组长度进行校验

if (otherArgs.length < 2) {

System.err.println("Usage:wordcount <in> [<in>...] <out>");

System.exit(2);

}

Job job = Job.getInstance(conf,this.getClass().getSimpleName());

//设置要运行的任务

job.setJarByClass(OutputMultipleFile.class);

//设置输入路径

Path in = new Path(args[0]);

FileInputFormat.addInputPath(job, in);

//设置输出路径

Path out = new Path(args[1]);

FileOutputFormat.setOutputPath(job, out);

//设置要运行的Mapper

job.setMapperClass(OutputMultipleMapper.class);

//设置Mapper的输出key和输出value的类型

job.setMapOutputKeyClass(LongWritable.class);

job.setMapOutputValueClass(Text.class);

job.setNumReduceTasks(0);

boolean isSuccess = job.waitForCompletion(Boolean.TRUE);

return isSuccess ? 0 : 1;

}

public static void main(String[] args) throws Exception {

int num = new Random().nextInt(1000);

if (args == null || args.length == 0) {

args = new String[]{

"hdfs://hdfs-cluster/user/hadoop/input",

"hdfs://hdfs-cluster/user/hadoop/output"+num

};

}

int status = new OutputMultipleFile().run(args);

System.exit(status);

}

MapReduce之OutputFormat理解相关推荐

大数据培训之核心知识点Hbase、Hive、Spark和MapReduce的概念理解、特点及机制等
今天,上海尚学堂大数据培训班毕业的一位学生去参加易普软件公司面试,应聘的职位是大数据开发.面试官问了他10个问题,主要集中在Hbase.Spark.Hive和MapReduce上,基础概念.特点.应用 ...
MapReduce优劣，理解MapReduce与Hadoop
MapReduce是一种计算模型,用于大规模数据集(大于1TB)的并行运算.概念"Map(映射)"和"Reduce(归约)",是它们的主要思想,都是从函数式编程 ...
Hadoop3中实现MapReduce自定义OutputFormat
由于MapReduce中默认的OutputFormat是TextOutputFormat,按行写入输出文件.但是对于我们实际应用场景中,对于Reduce的输出结果可能想要放到各种各样的输出目的地,可能 ...
MapReduce之RecordWriter理解
RecordWriter:其实主要就是负责将task的key/value结果写入内存或者磁盘一方法分析 1.1 write:写key/value键值对 1.2 close: 关闭RecordWri ...
MapReduce之InputFormat理解
一 InputFormat主要作用: #验证job的输入规范 #对输入的文件进行切分,形成多个InputSplit文件,每一个InputSplit对应着一个map任务 #创建RecordReader, ...
MapReduce之RecordReader理解
RecordReader:其作用就是将数据切分成key/value的形式然后作为输入传给Mapper. 一方法分析: 1.1initialize: 初始化RecordReader,只能被调用一次. ...
JAVA大数据(二) Hadoop 分布式文件系统HDFS 架构，MapReduce介绍，Yarn资源调度
文章目录 1.分布式文件系统HDFS 1.HDFS的来源 2.HDFS的架构图之基础架构 2.1 master/slave 架构 2.2 名字空间(NameSpace) 2.3 文件操作 2.4副本机 ...
【大数据实验】06：MapReduce操作
MapReduce操作 OVERVIEW MapReduce操作实验环境一.WordCount单词计数 1.实验内容 2.实验原理 3.实验步骤 (1)启动Hadoop集群 (2)准备数据文件 ( ...
2.2.1 hadoop体系之离线计算-mapreduce分布式计算-mapreduce架构概念
目录 1.写在前面 2.为什么需要MapReduce? 3.MapReduce具体细节 3.1 Hadoop MapReduce构思: 4.block,split,map,reduce关系 5.Shu ...

MapReduce之OutputFormat理解

MapReduce之OutputFormat理解相关推荐

最新文章

热门文章