Hadoop-MapReducer 利用计数器Counters(Java)和Error output(python)计数
Hadoop MR Java 代码,统计结果输出到日志文件中
package vitamin.user_static_table;import java.io.IOException;import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Mapper;public class GetUserStaticMapper extends Mapper<LongWritable, Text, Text, LongWritable> {private LongWritable out = new LongWritable(1);@Overrideprotected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {String line = value.toString().trim();String tks[] = line.split("\t");if (tks.length < 5) {return;}Counter number1 = context.getCounter("NULL", "all");number1.increment(1L);context.write(new Text("uid_"+tks[1]+"_"+tks[0]), out);if (!(tks[2].equals("_")||tks[3].equals("_"))) {Counter number2 = context.getCounter("NULL", "c_"+tks[1]);number2.increment(1L);if (tks[2].equals("1")) {Counter number3 = context.getCounter("NULL", "sex_c_"+tks[1]);number3.increment(1L);}if (tks[3].equals("1")) {Counter number4 = context.getCounter("NULL", "age_c_"+tks[1]);number4.increment(1L);}}if (tks[1].equals("1")&& !tks[4].equals("_")&&!tks[4].equals("0")) {Counter number5 = context.getCounter("NULL", "tag_c1_sum");number5.increment(1L);Counter number6 = context.getCounter("NULL", "tag_c_1");number6.increment(Integer.parseInt(tks[4])*1L); }}
}
统计结果直接输出到log文件中,不进入reducer里,输入如下(269-277为输出统计量):
426 17/08/09 09:05:38 INFO mapreduce.Job: Job job_1493284708000_147282 completed successfully
427 17/08/09 09:05:38 INFO mapreduce.Job: Counters: 53
428 File System Counters
429 FILE: Number of bytes read=768
430 FILE: Number of bytes written=28933956
431 FILE: Number of read operations=0
432 FILE: Number of large read operations=0
433 FILE: Number of write operations=0
434 HDFS: Number of bytes read=374470086
435 HDFS: Number of bytes written=0
436 HDFS: Number of read operations=684
437 HDFS: Number of large read operations=0
438 HDFS: Number of write operations=256
439 Job Counters
440 Killed map tasks=2
441 Launched map tasks=102
442 Launched reduce tasks=128
443 Data-local map tasks=14
444 Rack-local map tasks=88
445 Total time spent by all maps in occupied slots (ms)=3509175
446 Total time spent by all reduces in occupied slots (ms)=9734817
447 Total time spent by all map tasks (ms)=701835
448 Total time spent by all reduce tasks (ms)=3244939
449 Total vcore-seconds taken by all map tasks=701835
450 Total vcore-seconds taken by all reduce tasks=3244939
451 Total megabyte-seconds taken by all map tasks=3593395200
452 Total megabyte-seconds taken by all reduce tasks=9968452608
453 Map-Reduce Framework
454 Map input records=13186141
455 Map output records=0
456 Map output bytes=0
457 Map output materialized bytes=76800
458 Input split bytes=14800
459 Combine input records=0
460 Combine output records=0
461 Reduce input groups=0
462 Reduce shuffle bytes=76800
463 Reduce input records=0
466 Shuffled Maps =12800
467 Failed Shuffles=0
470 CPU time spent (ms)=546840
471 Physical memory (bytes) snapshot=137153257472
251 Total time spent by all maps in occupied slots (ms)=7766444
252 Total time spent by all reduces in occupied slots (ms)=0
253 Total time spent by all map tasks (ms)=3883222
254 Total vcore-seconds taken by all map tasks=3883222
255 Total megabyte-seconds taken by all map tasks=5964628992
258 Map output records=21091153
261 Failed Shuffles=0
262 Merged Map outputs=0
263 GC time elapsed (ms)=15595
264 CPU time spent (ms)=1460320
265 Physical memory (bytes) snapshot=168560390144
266 Virtual memory (bytes) snapshot=951342964736
267 Total committed heap usage (bytes)=468841398272
268 NULL
269 age_c_0=108143
270 age_c_1=7596379
271 all=21091153
272 c_0=1258386
273 c_1=19832601
274 sex_c_0=1138055
275 sex_c_1=18447951
276 tag_c1_sum=19175427
277 tag_c_1=6952428947
278 File Input Format Counters
279 Bytes Read=244379652
280 File Output Format Counters
281 Bytes Written=205508096
282 Job2 done...
283 All Jobs Finished !
hadoop stream python 里用错误输出统计
12 import sys,hashlib,struct,os14 18 19 if __name__=="__main__":20 for line in sys.stdin:21 line = line.strip()22 if 'uid_0' in line:23 print >> sys.stderr, "reporter:counter:group,keep_0,1"24 #print 'keep_0'+'\t'+'1'25 elif 'uid_1' in line:26 #print 'keep_1'+'\t'+'1'27 print >> sys.stderr, "reporter:counter:group,keep_1,1"
上例中hadoop任务不需要reducer,输出如下:(keep_0, keep_1为统计量)
425 17/08/09 09:05:25 INFO mapreduce.Job: map 100% reduce 100%
426 17/08/09 09:05:38 INFO mapreduce.Job: Job job_1493284708000_147282 completed successfully
427 17/08/09 09:05:38 INFO mapreduce.Job: Counters: 53
428 File System Counters
429 FILE: Number of bytes read=768
430 FILE: Number of bytes written=28933956
431 FILE: Number of read operations=0
432 FILE: Number of large read operations=0
433 FILE: Number of write operations=0
434 HDFS: Number of bytes read=374470086
435 HDFS: Number of bytes written=0
436 HDFS: Number of read operations=684
437 HDFS: Number of large read operations=0
438 HDFS: Number of write operations=256
439 Job Counters
440 Killed map tasks=2
441 Launched map tasks=102
442 Launched reduce tasks=128
443 Data-local map tasks=14
444 Rack-local map tasks=88
445 Total time spent by all maps in occupied slots (ms)=3509175
446 Total time spent by all reduces in occupied slots (ms)=9734817
447 Total time spent by all map tasks (ms)=701835
448 Total time spent by all reduce tasks (ms)=3244939
449 Total vcore-seconds taken by all map tasks=701835
450 Total vcore-seconds taken by all reduce tasks=3244939
451 Total megabyte-seconds taken by all map tasks=3593395200
452 Total megabyte-seconds taken by all reduce tasks=9968452608
453 Map-Reduce Framework
454 Map input records=13186141
455 Map output records=0
456 Map output bytes=0
457 Map output materialized bytes=76800
458 Input split bytes=14800
459 Combine input records=0
460 Combine output records=0
461 Reduce input groups=0
462 Reduce shuffle bytes=76800
463 Reduce input records=0
466 Shuffled Maps =12800
467 Failed Shuffles=0
470 CPU time spent (ms)=546840
471 Physical memory (bytes) snapshot=137153257472
472 Virtual memory (bytes) snapshot=656820338688
473 Total committed heap usage (bytes)=335173124096
474 Shuffle Errors
475 BAD_ID=0
476 CONNECTION=0
477 IO_ERROR=0
478 WRONG_LENGTH=0
479 WRONG_MAP=0
480 WRONG_REDUCE=0
481 group
482 keep_0=149982
483 keep_1=13036159
484 File Input Format Counters
485 Bytes Read=374455286
486 File Output Format Counters
487 Bytes Written=0
Hadoop-MapReducer 利用计数器Counters(Java)和Error output(python)计数相关推荐
- Sublime Text 2报 Decode error - output not utf-8 错误的解决办法
Sublime Text 2报"Decode error - output not utf-8"错误的解决办法 作者:chszs,转载需注明. 作者博客主页:http://blog ...
- 云计算学习笔记---异常处理---hadoop问题处理ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.NullPoin
云计算学习笔记---异常处理---hadoop问题处理ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.NullPoin ...
- Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeExcept
参考:http://mars914.iteye.com/blog/1410035 hive-site.xml 修改如下配置 <property><name>javax.jdo. ...
- Hadoop 运行jar包时 java.lang.ClassNotFoundException: Class com.zhen.mr.RunJob$HotMapper not found...
错误如下 Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.zhen.mr.RunJob$H ...
- HBase 计数器 (Counters)
HBase 提供了一个高级特性:计数器(counter).很多收集统计信息的应用,例如在线广告的单击或查看统计,将这些数据收集到日志文件中用于后期的分析. 利用计数器提供的实时统计,从而放弃延时较高的 ...
- Exception in thread main java.lang.Error: 无法解析的编译问题: 方法 main 不能声明为 static;只能在静态类型或顶级类型中才能声明静态方法
Exception in thread "main" java.lang.Error: 无法解析的编译问题: 方法 main 不能声明为 static:只能在静态类型或顶级类型中才 ...
- Exception in thread main java.lang.Error: Unresolved compilation problem
初学java,使用eclipse编译时,可能会遇到如下图所示的编译错误(Exception in thread "main" java.lang.Error: Unresolved ...
- Hadoop HDFS文件操作的Java代码
1.创建目录 import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.ha ...
- java中Error与Exception有什么区别
Error类和Exception类都继承自Throwable类. Error的继承关系: java.lang.Object java.lang.Throwable java.lang. ...
最新文章
- idea使用maven创建java工程log4j的配置
- 附录C 编译安装Hive
- mysql防止误删除的方法
- C语言经典例20-小球反弹高度问题
- 6kyu Build a pile of Cubes
- 数学建模第五节2020.5.8-17补
- SMB MS17-010 利用(CVE-2017-0144 )
- 老年人自学计算机,老年人怎样学电脑?请问从网上能找到学习资吗?
- STL(五)——slist/list链表
- 【Elasticsearch】es 电台 收听 笔记
- 华哥讲堂:解析智能电视语音控制功能
- Python学习第二章:变量和简单类型
- Hibernate检索策略
- jenkins k8s 动态增减 jenkins-salve (2) 实现 slave 节点动态构建
- csgo控制台服务器信息,《csgo》国服控制台怎么打开 控制台指令设置方法
- Java小程序 个人缴税
- 苏宁易购不易购,遭遇临时涨价、一月未送货
- CSS的3D应用:绘制长方体
- C++ push方法与push_back方法 浅析
- idea工具和激活码获取