一、    概念知识介绍

Hadoop MapReduce是一个用于处理海量数据的分布式计算框架。这个框架攻克了诸如数据分布式存储、作业调度、容错、机器间通信等复杂问题,能够使没有并行 处理或者分布式计算经验的project师,也能非常轻松地写出结构简单的、应用于成百上千台机器处理大规模数据的并行分布式程序。

Hadoop MapReduce基于“分而治之”的思想,将计算任务抽象成map和reduce两个计算过程,能够简单理解为“分散运算—归并结果”的过程。一个 MapReduce程序首先会把输入数据切割成不相关的若干键/值对(key1/value1)集合。这些键/值对会由多个map任务来并行地处理。 MapReduce会对map的输出(一些中间键/值对key2/value2集合)依照key2进行排序,排序是用memcmp的方式对key在内存中 字节数组比較后进行升序排序。并将属于同一个key2的全部value2组合在一起作为reduce任务的输入,由reduce任务计算出终于结果并输出 key3/value3。作为一个优化。同一个计算节点上的key2/value2会通过combine在本地归并。基本流程例如以下:

Hadoop和单机程序计算流程对照:

常计算任务的输入和输出都是存放在文件中的,而且这些文件被存放在Hadoop分布式文件系统HDFS(Hadoop Distributed File System)中,系统会尽量调度计算任务到数据所在的节点上执行,而不是尽量将数据移动到计算节点上。降低大量数据在网络中传输,尽量节省带宽消耗。

应用程序开发者普通情况下须要关心的是图中灰色的部分,单机程序须要处理数据读取和写入、数据处理;Hadoop程序须要实现map和 reduce。而数据读取和写入、map和reduce之间的传输数据、容错处理等由Hadoop MapReduce和HDFS自己主动完毕。

二、    开发环境搭建

Map/Reduce程序依赖Hadoop集群,另外Eclipse须要安装依赖的hadoop包。

Hadoop集群搭建:參考Hadoop 2.2.0集群搭建

1.   安装、配置Eclipse

在官网下载合适的Eclipse,将hadoop开发所依赖的插件jar包复制到eclipse的安装目录plugins下。下载地址參考:hadoop2.2.0开发依赖的jar包,当然也能够自己编译。

启动eclipse,选择Window—>Prefrances,若出现例如以下Hadoop Map/Reduce说明插件成功安装

2.   配置DFS,主要是数据文件的输入输出管理。

Window—>Open Perspective—>other—>Map/Reduce,显示Map/Reduce视图。点击Map/Reduce Locations 的小象图标。新建Hadoop Location,输入例如以下:

项目视图会出现DFS Location。用来管理输入、输出数据文件。

须要配置hadoop安装目录:新建Map/Reduceproject单击Configure Hadoop install direction。输入hadoop的安装路径。

右键单击DFS Location下的空目录上传一个文本文件,然后刷新,若文件出现了则说明环境配置成功。

三、    编程模型

MapReduce编程模型的原理是:利用一个输入key/value pair集合来产生一个输出的key/value pair集合。

MapReduce库的用户用两个函数表达这个计算:Map和Reduce。

用户自己定义的Map函数接受一个输入的key/value pair值,然后产生一个中间key/value pair值的集合。

MapReduce库把全部具有同样中间key值I的中间value值集合在一起后传递给reduce函数。

用户自己定义的Reduce函数接受一个中间key的值I和相关的一个value值的集合。Reduce函数合并这些value值,形成一个较小的 value值的集合。一般的。每次Reduce函数调用仅仅产生0或1个输出value值。

通常我们通过一个迭代器把中间value值提供给Reduce函 数,这样我们就能够处理无法所有放入内存中的大量的value值的集合。

四、    小样例

1.      数据准备

以Tomcat日志为例。日志格式例如以下:

127.0.0.1,-,-,[08/May/2014:13:42:40 +0800],GET / HTTP/1.1,200,11444
127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,-
127.0.0.1,-,-,[08/May/2014:13:42:42 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,-
127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20
127.0.0.1,-,-,[08/May/2014:13:42:47 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,-
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,-
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,-
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,-
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,-
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,-
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:47 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105
127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913
127.0.0.1,-,-,[08/May/2014:13:42:48 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22
0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/baseInfo_articleList?

flag=1 HTTP/1.1,200,8989 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:42:48 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,- 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093 127.0.0.1,-,-,[08/May/2014:13:43:21 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/baseInfo_articleList?

flag=1 HTTP/1.1,200,8989 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:21 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?

graduateBatchName=&pageSize=10&pageNo=1 HTTP/1.1,200,597 127.0.0.1,-,-,[08/May/2014:13:43:25 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?

graduateBatchName= HTTP/1.1,200,21 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:26 +0800],GET /jyglFront/graduate_initGraduateBatch HTTP/1.1,200,8766 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:13:43:27 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:28 +0800],GET /jyglFront/graduate_initGraduateQulifyCheck HTTP/1.1,200,26397 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:13:43:29 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:29 +0800],GET /jyglFront/graduate_initLeaveSchoolInfo HTTP/1.1,200,20125 127.0.0.1,-,-,[08/May/2014:13:43:30 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllStudyCenters HTTP/1.1,200,29089 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/exam/examParameterService/getAllGradeInfo HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 127.0.0.1,-,-,[08/May/2014:13:43:31 +0800],GET /jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch HTTP/1.1,200,597 0:0:0:0:0:0:0:1,-,-,[08/May/2014:13:43:31 +0800],GET /jyglFront/graduate_initGraduateInfo HTTP/1.1,200,28464 127.0.0.1,-,-,[08/May/2014:14:27:10 +0800],GET / HTTP/1.1,200,11444 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:14:27:12 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43 127.0.0.1,-,-,[08/May/2014:14:27:34 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16 127.0.0.1,-,-,[08/May/2014:14:27:35 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:35 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:37 +0800],GET /jyglFront/exam_initsubstudentsubscribe HTTP/1.1,500,3900 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:41 +0800],GET /jyglFront/supervisor/intoInitAssignmentDetail HTTP/1.1,200,1808 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/right/getUserByLoginName/admin HTTP/1.1,200,198 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/right_login2home?loginName=admin&password=superadmin&type=1 HTTP/1.1,200,2525 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/tree.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/style.css HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/frame.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/js/jquery.js HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/menuList.jsp HTTP/1.1,200,47603 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/leftmenu_bg.gif HTTP/1.1,404,1105 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/allmenu.gif HTTP/1.1,404,1093 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/logo.png HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/style/images/header_bg.jpg HTTP/1.1,304,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/mainView/navigate/images/toggle_menu.gif HTTP/1.1,404,1105 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getArticleList/10-1 HTTP/1.1,200,20913 127.0.0.1,-,-,[08/May/2014:14:27:42 +0800],GET /jygl/jaxrs/article/getTotalArticleRecords HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:42 +0800],GET /jyglFront/baseInfo_articleList?flag=1 HTTP/1.1,200,8989 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:43 +0800],GET /jyglFront/mainView/studentView/style/images/nav_10.png HTTP/1.1,404,1117 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getAllNationInPage?

pageSize=10&pageNo=1 HTTP/1.1,200,374 127.0.0.1,-,-,[08/May/2014:14:27:44 +0800],GET /jygl/jaxrs/nationInfo/getTotalNations HTTP/1.1,200,22 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/baseInfo_nationInfoList HTTP/1.1,200,7471 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/menuStyle2.css HTTP/1.1,404,1060 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:44 +0800],GET /jyglFront/common/css/basic.css HTTP/1.1,200,1476 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:45 +0800],GET /jyglFront/common/css/_images/botton2.gif HTTP/1.1,404,1075 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= HTTP/1.1,200,12061 127.0.0.1,-,-,[08/May/2014:14:27:47 +0800],GET /jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject HTTP/1.1,200,6006 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:48 +0800],GET /jyglFront/teaching/openReplaceChooseCourse HTTP/1.1,200,26455 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?

newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,- 127.0.0.1,-,-,[08/May/2014:14:27:49 +0800],GET /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 HTTP/1.1,204,- 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:49 +0800],GET /jyglFront/teaching/openChooseCourse HTTP/1.1,200,1611 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo HTTP/1.1,200,473 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/educationLevelService/allEducationLevels HTTP/1.1,200,227 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/enroll/gradeInfoService/allGradeInfos HTTP/1.1,200,3785 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?

gradeId=4af2a042437c2c0801437ed1cdea0017 HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?gradeId=4af2a0423f41d66d013f5a1f766c00ce HTTP/1.1,200,20 127.0.0.1,-,-,[08/May/2014:14:27:51 +0800],GET /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?

grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= HTTP/1.1,200,4849 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/teaching/teachingPlanList HTTP/1.1,200,22794 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:27:52 +0800],GET /jyglFront/js/jquery.form.js HTTP/1.1,200,30330 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest HTTP/1.1,200,43 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd HTTP/1.1,200,16 127.0.0.1,-,-,[08/May/2014:14:28:02 +0800],GET /jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403 HTTP/1.1,200,653 0:0:0:0:0:0:0:1,-,-,[08/May/2014:14:28:02 +0800],GET /jyglFront/exam_initgroupsubscribestatistic HTTP/1.1,200,13551 127.0.0.1,-,-,[08/May/2014:14:28:19 +0800],POST /jygl/jaxrs/right/addUserLog HTTP/1.1,200,- 127.0.0.1,-,-,[08/May/2014:14:31:42 +0800],GET /jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1 HTTP/1.1,200,-

2.      要解决的问题:统计资源(URL)被訪问的次数。

3.      编程实现

想法:解析Tomcat日志,map的工作是将每一行日志中的URL截取作为key值,value为1表示1次,reduce的工作是将同样key值的行合并。value为总次数。

代码例如以下:

package org.ly.ccnu;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class SecondTest extends Configured implements Tool{enum Counter{LINESKIP,}  public static class Map extends Mapper<LongWritable,Text,Text,IntWritable>{private static final IntWritable one = new IntWritable(1); public void map(LongWritable key,Text value,Context context) throws IOException, InterruptedException{String line = value.toString();try{String[] lineSplit = line.split(",");String requestUrl = lineSplit[4];requestUrl = requestUrl.substring(requestUrl.indexOf(' ')+1, requestUrl.lastIndexOf(' '));Text out = new Text(requestUrl);context.write(out,one);}catch(java.lang.ArrayIndexOutOfBoundsException e){context.getCounter(Counter.LINESKIP).increment(1);}         }}  public static class Reduce extends Reducer<Text,IntWritable,Text,IntWritable>{        public void reduce(Text key, Iterable<IntWritable> values,Context context)throws IOException{int count =  0;  for(IntWritable v : values){  count = count + 1;  }  try {context.write(key, new IntWritable(count));} catch (InterruptedException e) {e.printStackTrace();}             }       }   @Overridepublic int run(String[] args) throws Exception {Configuration conf = getConf();Job job = new Job(conf, "logAnalysis");job.setJarByClass(SecondTest.class);        FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));        job.setMapperClass(Map.class);job.setReducerClass(Reduce.class);job.setOutputFormatClass(TextOutputFormat.class);       //keep the same format with the output of Map and Reducejob.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);       job.waitForCompletion(true);return job.isSuccessful()?0:1;} public static void main(String[] args)throws Exception{     int res = ToolRunner.run(new Configuration(), new SecondTest(),args);      System.exit(res);}
}

4.      处理结果

/ 2
/jygl/jaxrs/article/getArticleList/10-1 3
/jygl/jaxrs/article/getTotalArticleRecords  3
/jygl/jaxrs/enroll/educationLevelService/allEducationLevels 5
/jygl/jaxrs/enroll/gradeInfoService/allGradeInfos   2
/jygl/jaxrs/enroll/gradeInfoService/currentGradeInfo    1
/jygl/jaxrs/enroll/studyCenterService/allStudyCentersByUtilObject   1
/jygl/jaxrs/exam/examArrangeService/getExamBatchIdByLatest  2
/jygl/jaxrs/exam/examArrangeService/getExamBatchNameByEBId/4af2a0424323412e014327739b1702bd 2
/jygl/jaxrs/exam/examParameterService/getAllGradeInfo   3
/jygl/jaxrs/exam/examParameterService/getAllStudyCenters    3
/jygl/jaxrs/exam/examSubscribeService/getUtilObjectThirExamBatchsByEBNN/201403  2
/jygl/jaxrs/exam/examSubscribeService/groupSubscribe/201403/0/0/201309/1    1
/jygl/jaxrs/graduate/graduateBatchService/getAllGraduateBatch   1
/jygl/jaxrs/graduate/graduateBatchService/getGraduateBatchByConditions?graduateBatchName=&pageSize=10&pageNo=1   1
/jygl/jaxrs/graduate/graduateBatchService/getTotalGraduateBatchByCondition?

graduateBatchName= 1 /jygl/jaxrs/nationInfo/getAllNationInPage?pageSize=10&pageNo=1 1 /jygl/jaxrs/nationInfo/getTotalNations 1 /jygl/jaxrs/right/addUserLog 1 /jygl/jaxrs/right/getUserByLoginName/admin 3 /jygl/jaxrs/right/isValidUserByType/1-admin-superadmin 3 /jygl/jaxrs/teaching/teachingPlanService/getSpeicalListByTwo?gradeID=&educationLevelID= 1 /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?

gradeId=4af2a0423f41d66d013f5a1f766c00ce 1 /jygl/jaxrs/teaching/teachingPlanService/hasTeachingPlanInGrade?

gradeId=4af2a042437c2c0801437ed1cdea0017 1 /jygl/jaxrs/teaching/teachingPlanService/teachingPlanListByEducationLevelAndGradeId?grade=4af2a042437c2c0801437ed1cdea0017&educationLevel= 1 /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassBatchPlanVOList?newClassBatchName=&gradeName=&term=-1 2 /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurClassPlanVO 2 /jygl/jaxrs/teachingManage/ClassBatchPlanService/getCurrentClassPlanVO 2 /jyglFront/baseInfo_articleList?flag=1 3 /jyglFront/baseInfo_nationInfoList 1 /jyglFront/common/css/_images/botton2.gif 1 /jyglFront/common/css/basic.css 1 /jyglFront/common/css/menuStyle2.css 1 /jyglFront/exam_initgroupsubscribestatistic 2 /jyglFront/exam_initsubstudentsubscribe 1 /jyglFront/graduate_initGraduateBatch 1 /jyglFront/graduate_initGraduateInfo 1 /jyglFront/graduate_initGraduateQulifyCheck 1 /jyglFront/graduate_initLeaveSchoolInfo 1 /jyglFront/js/jquery.form.js 1 /jyglFront/mainView/navigate/images/allmenu.gif 3 /jyglFront/mainView/navigate/images/leftmenu_bg.gif 3 /jyglFront/mainView/navigate/images/logo.png 3 /jyglFront/mainView/navigate/images/toggle_menu.gif 3 /jyglFront/mainView/navigate/js/frame.js 3 /jyglFront/mainView/navigate/js/jquery.js 3 /jyglFront/mainView/navigate/js/tree.js 3 /jyglFront/mainView/navigate/menuList.jsp 3 /jyglFront/mainView/navigate/style/images/header_bg.jpg 3 /jyglFront/mainView/navigate/style/style.css 3 /jyglFront/mainView/studentView/style/images/nav_10.png 3 /jyglFront/right_login2home?

loginName=admin&password=superadmin&type=1 3 /jyglFront/supervisor/intoInitAssignmentDetail 1 /jyglFront/teaching/openChooseCourse 1 /jyglFront/teaching/openReplaceChooseCourse 1 /jyglFront/teaching/teachingPlanList 1

Hadoop学习:Map/Reduce初探与小Demo实现相关推荐

  1. Map Reduce和流处理

    欢迎大家前往腾讯云+社区,获取更多腾讯海量技术实践干货哦~ 本文由@从流域到海域翻译,发表于腾讯云+社区 map()和reduce()是在集群式设备上用来做大规模数据处理的方法,用户定义一个特定的映射 ...

  2. hadoop学习之:Map、Reduce详解

    Hadoop学习重点主要为HDFS.MapReduce 部分: 接下来重点描述一下MAP与Reduce 的过程. 看了好多资料,如果有错误的地方请大家指出. MAP部分: 下图是官方给予的关于MapR ...

  3. 一步一步跟我学习hadoop(5)----hadoop Map/Reduce教程(2)

    Map/Reduce用户界面 本节为用户採用框架要面对的各个环节提供了具体的描写叙述,旨在与帮助用户对实现.配置和调优进行具体的设置.然而,开发时候还是要相应着API进行相关操作. 首先我们须要了解M ...

  4. Hadoop Map/Reduce教程

    Hadoop Map/Reduce教程 目的     先决条件     概述     输入与输出     例子:WordCount v1.0         源代码         用法        ...

  5. [ZZ]Map/Reduce hadoop 细节

    转自:Venus神庙原文:http://www.cnblogs.com/duguguiyu/archive/2009/02/28/1400278.html 分布式计算(Map/Reduce) 分布式计 ...

  6. Hadoop简介(1):什么是Map/Reduce

    看这篇文章请出去跑两圈,然后泡一壶茶,边喝茶,边看,看完你就对hadoop整体有所了解了. Hadoop简介 Hadoop就是一个实现了Google云计算系统的开源系统,包括并行计算模型Map/Red ...

  7. hadoop中map和reduce的数量设置问题

    转载http://my.oschina.net/Chanthon/blog/150500 map和reduce是hadoop的核心功能,hadoop正是通过多个map和reduce的并行运行来实现任务 ...

  8. python高阶函数、map reduce 自己如何去定义_小猿圈python之高阶函数lambda、map和reduce用法...

    python有很多内置函数,内置函数封装了很多功能,让我们用起来很方便,小猿圈针对高阶函数有详细的讲解视频,朋友们可以去看看,小编学后总结了一下,下面说一下lambda.map和reduce高阶函数的 ...

  9. Hadoop完全分布式搭建过程、maven和eclipse配置hadoop开发环境、配置Map/Reduce Locations、简单wordcount测试!

    Hadoop完全分布式搭建及测试 项目开始前准备工作 1.下载并安装VM workstation pro 15安装包,这里选择: VMware-workstation-full-15.1.0-1359 ...

最新文章

  1. 13个JavaScript图表图形绘制插件
  2. valgrind——Cachegrind分析CPU的cache命中率、丢失率,用于进行代码优化。
  3. python代码图片头像_Python帮你微信头像任意添加装饰别再@微信官方了
  4. 神策 2020 数据驱动用户大会主会场亮点回顾(内附回放)
  5. 一份详细的服务器安全解决方案
  6. 提升方法-Adaboost算法
  7. 图解SQL的inner join、left join、right join、full outer join、union、union all的区别
  8. Spring Boot基础学习笔记21:自定义用户认证
  9. kernel编译设置分区等功能
  10. 0配置EF连接MySql数据库_第八节:EF Core连接MySql数据库
  11. ARCore:从Android Studio开始
  12. 谷歌高级搜索技巧之高级语法查询指令
  13. 基于Qt、FFMpeg的音视频播放器设计二(FFMpeg视频处理之类封装)
  14. 【项目实战课】基于Pytorch的InceptionNet花卉图像分类实战
  15. 等保测评--管理制度安全测评
  16. apk 进行系统签名
  17. 程序员应该如何写好自己的简历
  18. CSU 1596: Dick.Z 的炉石赛(模拟)
  19. spring、mybatis、mybatis-spring 版本对应
  20. 一款以Python编码的自动化大规模漏洞测试工具

热门文章

  1. 绝对定位元素的百分比margin
  2. 基于APK的自动化测试
  3. cisco switch configuration
  4. 心事一件件的了掉,希望一切都能恢复到正常
  5. 忍辱负重的小白兔们 - 验收准则的意义
  6. 「万字干货」高并发系统分析与大型互联网架构介绍
  7. “脚踢各大Python Web框架”,Sanic真有这能耐么?
  8. 基于matlab地球物理,基于MATLAB的《地球物理资料处理和解释》教学研究
  9. 2008 r2彻底删除 server sql_mysql添加列、删除列,创建主键、备份等常用操作总结...
  10. 代理查询 mysql_查询数据库代理设置