自定义Counter使用

自定义计数器的使用（记录敏感单词）

  1 package counter;
  2
  3 import java.net.URI;
  4 import org.apache.hadoop.conf.Configuration;
  5 import org.apache.hadoop.fs.FileSystem;
  6 import org.apache.hadoop.fs.Path;
  7 import org.apache.hadoop.io.LongWritable;
  8 import org.apache.hadoop.io.Text;
  9 import org.apache.hadoop.mapreduce.Counter;
 10 import org.apache.hadoop.mapreduce.Job;
 11 import org.apache.hadoop.mapreduce.Mapper;
 12 import org.apache.hadoop.mapreduce.Reducer;
 13 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
 14 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
 15 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 16 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
 17
 18 public class WordCountApp {
 19     static final String INPUT_PATH = "hdfs://chaoren:9000/hello";
 20     static final String OUT_PATH = "hdfs://chaoren:9000/out";
 21
 22     public static void main(String[] args) throws Exception {
 23         Configuration conf = new Configuration();
 24         FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
 25         Path outPath = new Path(OUT_PATH);
 26         if (fileSystem.exists(outPath)) {
 27             fileSystem.delete(outPath, true);
 28         }
 29
 30         Job job = new Job(conf, WordCountApp.class.getSimpleName());
 31
 32         // 1.1指定读取的文件位于哪里
 33         FileInputFormat.setInputPaths(job, INPUT_PATH);
 34         // 指定如何对输入的文件进行格式化，把输入文件每一行解析成键值对
 35         //job.setInputFormatClass(TextInputFormat.class);
 36
 37         // 1.2指定自定义的map类
 38         job.setMapperClass(MyMapper.class);
 39         // map输出的<k,v>类型。如果<k3,v3>的类型与<k2,v2>类型一致，则可以省略
 40         //job.setOutputKeyClass(Text.class);
 41         //job.setOutputValueClass(LongWritable.class);
 42
 43         // 1.3分区
 44         //job.setPartitionerClass(org.apache.hadoop.mapreduce.lib.partition.HashPartitioner.class);
 45         // 有一个reduce任务运行
 46         //job.setNumReduceTasks(1);
 47
 48         // 1.4排序、分组
 49
 50         // 1.5归约
 51
 52         // 2.2指定自定义reduce类
 53         job.setReducerClass(MyReducer.class);
 54         // 指定reduce的输出类型
 55         job.setOutputKeyClass(Text.class);
 56         job.setOutputValueClass(LongWritable.class);
 57
 58         // 2.3指定写出到哪里
 59         FileOutputFormat.setOutputPath(job, outPath);
 60         // 指定输出文件的格式化类
 61         //job.setOutputFormatClass(TextOutputFormat.class);
 62
 63         // 把job提交给jobtracker运行
 64         job.waitForCompletion(true);
 65     }
 66
 67     /**
 68      *
 69      * KEYIN     即K1     表示行的偏移量
 70      * VALUEIN     即V1     表示行文本内容
 71      * KEYOUT     即K2     表示行中出现的单词
 72      * VALUEOUT 即V2        表示行中出现的单词的次数，固定值1
 73      *
 74      */
 75     static class MyMapper extends
 76             Mapper<LongWritable, Text, Text, LongWritable> {
 77         protected void map(LongWritable k1, Text v1, Context context)
 78                 throws java.io.IOException, InterruptedException {
 79             /**
 80              * 自定义计数器的使用
 81              */
 82             Counter counter = context.getCounter("Sensitive Words", "hello");//自定义计数器名称Sensitive Words
 83             String line = v1.toString();
 84             if(line.contains("hello")){
 85                 counter.increment(1L);//记录敏感词汇hello的出现次数
 86             }
 87             String[] splited = line.split("\t");
 88             for (String word : splited) {
 89                 context.write(new Text(word), new LongWritable(1));
 90             }
 91         };
 92     }
 93
 94     /**
 95      * KEYIN     即K2     表示行中出现的单词
 96      * VALUEIN     即V2     表示出现的单词的次数
 97      * KEYOUT     即K3     表示行中出现的不同单词
 98      * VALUEOUT 即V3     表示行中出现的不同单词的总次数
 99      */
100     static class MyReducer extends
101             Reducer<Text, LongWritable, Text, LongWritable> {
102         protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s,
103                 Context ctx) throws java.io.IOException,
104                 InterruptedException {
105             long times = 0L;
106             for (LongWritable count : v2s) {
107                 times += count.get();
108             }
109             ctx.write(k2, new LongWritable(times));
110         };
111     }
112 }

在eclipse中运行后，可以在控制台查看到结果：

转载于:https://www.cnblogs.com/ahu-lichang/p/6656303.html

自定义Counter使用相关推荐

自定义Counter使用与
自定义计数器的使用(记录敏感单词) 复制代码 1 package counter; 2 3 import java.net.URI; 4 import org.apache.hadoop.conf.C ...
【Hello CSS】第一章-CSS的语法与工作流
作者:陈大鱼头 github: KRISACHAN 在上一篇[Hello CSS]的序章CSS起源中介绍了CSS的诞生原因以及发展历史,了解了CSS的存在意义.从正篇篇开始将会正式开始介绍CSS这门语 ...
hadoop下实现kmeans算法——一个mapreduce的实现方法
写mapreduce程序实现kmeans算法,我们的思路可能是这样的 1. 用一个全局变量存放上一次迭代后的质心 2. map里,计算每个质心与样本之间的距离,得到与样本距离最短的质心,以这个质心作为 ...
Hadoop Streaming高级编程
1. 概要本文主要介绍了Hadoop Streaming的一些高级编程技巧,包括,怎样在mapredue作业中定制输出输出格式?怎样向mapreduce作业中传递参数?怎么在mapreduce作业中 ...
MaxCompute 图计算用户手册（上）
概要 ODPS GRAPH是一套面向迭代的图计算处理框架.图计算作业使用图进行建模,图由点(Vertex)和边(Edge)组成,点和边包含权值(Value),ODPS GRAPH支持下述图编辑操作: ...
hadoop hive集群_Hive的优化和压缩
使用之前的数据库执行语句 explain select count(*) from emp; explain可以帮助我们看到有多少个任务会出现下面的信息根标签,操作语法树等信息根标签操作语法树 ...
hadoop python入门_Hadoop Streaming入门
说明:本文使用的Hadoop版本是2.6.0,示例语言用Python. 概述 Hadoop Streaming是Hadoop提供的一种编程工具,提供了一种非常灵活的编程接口, 允许用户使用任何语言编写 ...
prometheus-容器健康状况监控
Docker 安装prometheus(容器监控) prometheus是由谷歌研发的一款开源的监控软件 https://www.cnblogs.com/zqj-blog/p/10871033.htm ...
深入浅出监控神器Prometheus
点击上方"芋道源码",选择"设为星标" 管她前浪,还是后浪? 能浪的浪,才是好浪! 每天 10:33 更新文章,每天掉亿点点头发... 源码精品专栏原创 | ...

自定义Counter使用

自定义Counter使用相关推荐

最新文章

热门文章