MapReduce之求各个部门工资的总和

一、需求说明
二、测试数据
三、实现步骤
四、打包上传到集群中运行

一、需求说明

利用MapReduce程序求出各个部门工资的总和

二、测试数据

员工信息表：下载地址
表字段说明：

三、实现步骤

在Idea或eclipse中创建maven项目

在pom.xml中添加hadoop依赖

<dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>2.7.3</version>
</dependency>
<dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-hdfs</artifactId><version>2.7.3</version>
</dependency>
<dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-common</artifactId><version>2.7.3</version>
</dependency>
<dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>2.7.3</version>
</dependency>

添加log4j.properties文件在资源目录下即resources，文件内容如下：

### 配置根 ###
log4j.rootLogger = debug,console,fileAppender
## 配置输出到控制台 ###
log4j.appender.console = org.apache.log4j.ConsoleAppender
log4j.appender.console.Target = System.out
log4j.appender.console.layout = org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern = %d{ABSOLUTE} %5p %c:%L - %m%n
### 配置输出到文件 ###
log4j.appender.fileAppender = org.apache.log4j.FileAppender
log4j.appender.fileAppender.File = logs/logs.log
log4j.appender.fileAppender.Append = false
log4j.appender.fileAppender.Threshold = DEBUG,INFO,WARN,ERROR
log4j.appender.fileAppender.layout = org.apache.log4j.PatternLayout
log4j.appender.fileAppender.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss} [ %t:%r ] - [ %p ] %m%n

编写maper类

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class EmpMapper extends Mapper<LongWritable, Text, IntWritable,IntWritable> {IntWritable key2 = new IntWritable();IntWritable value2 = new IntWritable();@Overrideprotected void map(LongWritable k1, Text v1, Context context) throws IOException, InterruptedException {//数据格式：7369,SMITH,CLERK,7902,1980/12/17,800,,20//1、分词：按照逗号进行 取出工资和部门号String data = v1.toString();//["7369","SMITH","CLERK","7902","1980/12/17","800","","20"]String[] split = data.split(",");//2、通过context以<k2,v2> 即<20,800>String salary = split[5];String deptNo = split[7];int salaryInt = Integer.parseInt(salary);int deptNoInt = Integer.parseInt(deptNo);key2.set(deptNoInt);value2.set(salaryInt);context.write(key2,value2);}
}

编写reducer类

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;public class EmpReducer extends Reducer<IntWritable,IntWritable,IntWritable,IntWritable> {IntWritable key4;IntWritable value4 = new IntWritable();@Overrideprotected void reduce(IntWritable key3, Iterable<IntWritable> value3, Context context) throws IOException, InterruptedException {//求和操作：<20,[200,4000]>int sum = 0;for (IntWritable value : value3) {sum += value.get();}//2、context写出去key4 = key3;value4.set(sum);context.write(key4,value4);}
}

编写Driver类

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class EmpJob {public static void main(String[] args) throws Exception {//1、job对象Job job = Job.getInstance(new Configuration());job.setJarByClass(EmpJob.class);job.setMapperClass(EmpMapper.class);job.setMapOutputKeyClass(IntWritable.class);job.setMapOutputValueClass(IntWritable.class);job.setReducerClass(EmpReducer.class);job.setOutputKeyClass(IntWritable.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.setInputPaths(job,new Path("F:\\NIIT\\hadoopOnWindow\\input\\emp.csv"));FileOutputFormat.setOutputPath(job,new Path("F:\\NIIT\\hadoopOnWindow\\output\\emp001\\"));boolean completion = job.waitForCompletion(true);System.out.println("运行结果：" + completion);}
}

本地运行代码，测试下结果正确与否

四、打包上传到集群中运行

上传emp.csv到hdfs中的datas目录下
本地运行测试结果正确后，需要对Driver类输入输出部分代码进行修改，具体修改如下：
FileInputFormat.setInputPaths(job,new Path(args[0]));
FileOutputFormat.setOutputPath(job,new Path(args[1]));

将程序打成jar包，需要在pom.xml中配置打包插件

<build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId> maven-assembly-plugin </artifactId><configuration><!-- 使用Maven预配置的描述符--><descriptorRefs><descriptorRef>jar-with-dependencies</descriptorRef></descriptorRefs></configuration><executions><execution><id>make-assembly</id><!-- 绑定到package生命周期 --><phase>package</phase><goals><!-- 只运行一次 --><goal>single</goal></goals></execution></executions></plugin></plugins></build>

按照如下图所示进行操作

提交集群运行，执行如下命令：
```
hadoop jar packagedemo-1.0-SNAPSHOT.jar  com.niit.mr.EmpJob /datas/emp.csv /output/emp/
```
至此，所有的步骤已经完成，大家可以试试，祝大家好运~~~~

MapReduce之求各个部门工资的总和相关推荐

使用MapReduce序列化求每个部门的工资总额
跟之前的直接获取每个部门员工的工资有所不同的是,这个是获取一个对象,并非一个值了,用到了序列化的知识点. 案例代码如下: Employee :(序列化,反序列化) package Mapreducer ...
MapReduce序列化之统计各部门员工薪资总和
MapReduce序列化之统计各部门员工薪资总和文章目录 MapReduce序列化之统计各部门员工薪资总和 1.1 实验目的 1.2 实验环境 1.3 需求描述 1.4 实验步骤 1.4.1 采用I ...
6.组函数（avg(),sum(),max(),min(),count()）、多行函数，分组数据（group by，求各部门的平均工资），分组过滤(having和where)，sql优化...
1组函数 avg(),sum(),max(),min(),count()案例: selectavg(sal),sum(sal),max(sal),min(sal),count(sal) from ...
6.组函数（avg(),sum(),max(),min(),count()）、多行函数，分组数据（group by，求各部门的平均工资），分组过滤(having和where)，sql优化
1组函数 avg(),sum(),max(),min(),count()案例: selectavg(sal),sum(sal),max(sal),min(sal),count(sal) from ...
oracle工资第二高怎么查询,求各部门第二高的工资
Oracle 查询 EMP 表中各部门工资第二高的信息,注意是各部门,不能指定单个部门第一步:取出各部门第一高工资的员工的empno select b.empno from (select dept ...
SQL练习题_ 查询每个部门工资最高的前两名的姓名和部门名称【多测师_何sir】
查询每个部门工资最高的前两名的姓名和部门名称第一种解决方案: 第二种解决方案: SQL练习题: 假设一个部门有4个人,A,B,C,D A的工资=10000 B的工资=9000 C的工资=8000 D ...
oracle中部门工资降序排列,oracle面试题整理二（10级学员乔宇整理）
Oracle面试题整理二(10级学员乔宇整理) 1.查询工资最高的3 名员工信息 select * from (select * from emp order by sal desc) where ...
【SQL开发实战技巧】系列（十四）：计算消费后的余额计算银行流水累计和计算各部门工资排名前三位的员工
系列文章目录 [SQL开发实战技巧]系列(一):关于SQL不得不说的那些事 [SQL开发实战技巧]系列(二):简单单表查询 [SQL开发实战技巧]系列(三):SQL排序的那些事 [SQL开发实战技巧] ...
MR的案例：求每个部门的工资总额
MR的案例:求每个部门的工资总额 1.表:员工表emp SQL: select deptno,sum(sal) from emp group by deptno; DEPTNO SUM(SAL) ...

MapReduce之求各个部门工资的总和

MapReduce之求各个部门工资的总和

一、需求说明

二、测试数据

三、实现步骤

四、打包上传到集群中运行

MapReduce之求各个部门工资的总和相关推荐

最新文章

热门文章