java mapreduce程序_简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行...
[TOC]
简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行
程序源码
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class Score {
public static class Map extends
Mapper {
// 实现map函数
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
// 将输入的纯文本文件的数据转化成String
String line = value.toString();
// 将输入的数据首先按行进行分割
StringTokenizer tokenizerArticle = new StringTokenizer(line, "\n");
// 分别对每一行进行处理
while (tokenizerArticle.hasMoreElements()) {
// 每行按空格划分
StringTokenizer tokenizerLine = new StringTokenizer(tokenizerArticle.nextToken());
String strName = tokenizerLine.nextToken();// 学生姓名部分
String strScore = tokenizerLine.nextToken();// 成绩部分
Text name = new Text(strName);
int scoreInt = Integer.parseInt(strScore);
// 输出姓名和成绩
context.write(name, new IntWritable(scoreInt));
}
}
}
public static class Reduce extends
Reducer {
// 实现reduce函数
public void reduce(Text key, Iterable values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int count = 0;
Iterator iterator = values.iterator();
while (iterator.hasNext()) {
sum += iterator.next().get();// 计算总分
count++;// 统计总的科目数
}
int average = (int) sum / count;// 计算平均成绩
context.write(key, new IntWritable(average));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
// "localhost:9000" 需要根据实际情况设置一下
conf.set("mapred.job.tracker", "localhost:9000");
// 一个hdfs文件系统中的 输入目录 及 输出目录
String[] ioArgs = new String[] { "input/score", "output" };
String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: Score Average ");
System.exit(2);
}
Job job = new Job(conf, "Score Average");
job.setJarByClass(Score.class);
// 设置Map、Combine和Reduce处理类
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
// 设置输出类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 将输入的数据集分割成小数据块splites,提供一个RecordReder的实现
job.setInputFormatClass(TextInputFormat.class);
// 提供一个RecordWriter的实现,负责数据输出
job.setOutputFormatClass(TextOutputFormat.class);
// 设置输入和输出目录
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
编译
命令
javac Score.java
依赖错误
如果出现如下错误:
mint@lenovo ~/Desktop/hadoop $ javac Score.java
Score.java:4: error: package org.apache.hadoop.conf does not exist
import org.apache.hadoop.conf.Configuration;
^
Score.java:5: error: package org.apache.hadoop.fs does not exist
import org.apache.hadoop.fs.Path;
^
Score.java:6: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.IntWritable;
^
Score.java:7: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.LongWritable;
^
Score.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
尝试修改环境变量CLASSPATH
sudo vim /etc/profile
# 添加如下内容
export HADOOP_HOME=/usr/local/hadoop # 如果没设置的话, 路径是hadoop安装目录
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH # 如果没设置的话
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
source /etc/profile
然后重复上述编译命令.
打包
编译之后会生成三个class文件:
mint@lenovo ~/Desktop/hadoop $ ls | grep class
Score.class
Score$Map.class
Score$Reduce.class
使用tar程序打包class文件.
tar -cvf Score.jar ./Score*.class
会生成Score.jar文件.
提交运行
样例输入
mint@lenovo ~/Desktop/hadoop $ ls | grep txt
chinese.txt
english.txt
math.txt
mint@lenovo ~/Desktop/hadoop $ cat chinese.txt
Zhao 98
Qian 9
Sun 67
Li 23
mint@lenovo ~/Desktop/hadoop $ cat english.txt
Zhao 93
Qian 42
Sun 87
Li 54
mint@lenovo ~/Desktop/hadoop $ cat math.txt
Zhao 38
Qian 45
Sun 23
Li 43
上传到HDFS
hdfs dfs -put ./*/txt input/score
mint@lenovo ~/Desktop/hadoop $ hdfs dfs -ls input/score
Found 3 items
-rw-r--r-- 1 mint supergroup 28 2017-01-11 23:25 input/score/chinese.txt
-rw-r--r-- 1 mint supergroup 29 2017-01-11 23:25 input/score/english.txt
-rw-r--r-- 1 mint supergroup 29 2017-01-11 23:25 input/score/math.txt
运行
mint@lenovo ~/Desktop/hadoop $ hadoop jar Score.jar Score input/score output
17/01/11 23:26:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
17/01/11 23:26:27 INFO input.FileInputFormat: Total input paths to process : 3
17/01/11 23:26:27 INFO mapreduce.JobSubmitter: number of splits:3
17/01/11 23:26:27 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
17/01/11 23:26:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484147224423_0006
17/01/11 23:26:27 INFO impl.YarnClientImpl: Submitted application application_1484147224423_0006
17/01/11 23:26:27 INFO mapreduce.Job: The url to track the job: http://lenovo:8088/proxy/application_1484147224423_0006/
17/01/11 23:26:27 INFO mapreduce.Job: Running job: job_1484147224423_0006
17/01/11 23:26:33 INFO mapreduce.Job: Job job_1484147224423_0006 running in uber mode : false
17/01/11 23:26:33 INFO mapreduce.Job: map 0% reduce 0%
17/01/11 23:26:40 INFO mapreduce.Job: map 67% reduce 0%
17/01/11 23:26:41 INFO mapreduce.Job: map 100% reduce 0%
17/01/11 23:26:46 INFO mapreduce.Job: map 100% reduce 100%
17/01/11 23:26:46 INFO mapreduce.Job: Job job_1484147224423_0006 completed successfully
17/01/11 23:26:47 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=129
FILE: Number of bytes written=471147
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=443
HDFS: Number of bytes written=29
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=3
Launched reduce tasks=1
Data-local map tasks=3
Total time spent by all maps in occupied slots (ms)=15538
Total time spent by all reduces in occupied slots (ms)=2551
Total time spent by all map tasks (ms)=15538
Total time spent by all reduce tasks (ms)=2551
Total vcore-milliseconds taken by all map tasks=15538
Total vcore-milliseconds taken by all reduce tasks=2551
Total megabyte-milliseconds taken by all map tasks=15910912
Total megabyte-milliseconds taken by all reduce tasks=2612224
Map-Reduce Framework
Map input records=12
Map output records=12
Map output bytes=99
Map output materialized bytes=141
Input split bytes=357
Combine input records=12
Combine output records=12
Reduce input groups=4
Reduce shuffle bytes=141
Reduce input records=12
Reduce output records=4
Spilled Records=24
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=462
CPU time spent (ms)=2940
Physical memory (bytes) snapshot=992215040
Virtual memory (bytes) snapshot=7659905024
Total committed heap usage (bytes)=732430336
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=86
File Output Format Counters
Bytes Written=29
输出
mint@lenovo ~/Desktop/hadoop $ hdfs dfs -ls output
Found 2 items
-rw-r--r-- 1 mint supergroup 0 2017-01-11 23:26 output/_SUCCESS
-rw-r--r-- 1 mint supergroup 29 2017-01-11 23:26 output/part-r-00000
mint@lenovo ~/Desktop/hadoop $ hdfs dfs -cat output/part-r-00000
Li 40
Qian 32
Sun 59
Zhao 76
java mapreduce程序_简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行...相关推荐
- 简单java程序_简单的Java程序
简单java程序 Simple java programs are good for assessing the coding skills of a programmer. You will fin ...
- java 静态块初始化_简单了解java中静态初始化块的执行顺序
这篇文章主要介绍了简单了解java中静态初始化块的执行顺序,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下 在java中,其应该是先于所有的方法执行. ...
- java ear包_简单介绍Java 的JAR包、EAR包、WAR包区别
原标题:简单介绍Java 的JAR包.EAR包.WAR包区别 WAR包 WAR(Web Archive file)网络应用程序文件,是与平台无关的文件格式,它允许将许多文件组合成一个压缩文件.War专 ...
- java ssh客户端_简单的Java SSH客户端
java ssh客户端 可以使用jcabi-ssh在Java中通过几行代码通过SSH执行shell命令: String hello = new Shell.Plain(new SSH("ss ...
- java event事件_简单的Java Event-事件框架
自己写的一个简单的Java事件框架.目前具备以下功能: 1.通过继承Event类,用户可自定义事件. 2.通过EventService 的fireEvent(Event e) 发出一个事件. 3.通过 ...
- java for 死循环_简单的java死循环 java中的死循环问题
java中死循环是什么意思 循环一次不再循环是死循环java中死循环是什么意思 循环一次不再循环是死循环 还是不断循环才是死JAVA中死循环的意思是,不停地循环,不会终止,例如: for (int i ...
- udp java 检测连接_简单的JAVA UDP连接测试
UDP不像TCP那样专门提供了一个SERVER端API,所有的都用DatagramSocket,接受packet数据报.所以说UDP是无连接的,因为所有的链接都是在数据报里,让DatagramSock ...
- java学生通讯录_简单实现Java通讯录系统
本文实例为大家分享了Java通讯录系统的具体代码,供大家参考,具体内容如下 import java.util.Scanner; class Person { String name; String n ...
- java进行抽奖_简单实现java抽奖系统
本文为大家分享了java抽奖系统的具体代码,供大家参考,具体内容如下 用户信息类 /* * 用户信息类 * 1.账号 * 2.密码 * 3.卡号 * 4.是否登录 */ public class Us ...
最新文章
- mysql榨包是什么意思_模块与包 Mysql与Oracle区别
- 微软向开发者推出区块链概念验证框架
- Spring Data JPA 从入门到精通~@Param用法
- mysql用户添加_MySQL用户添加
- java基础英语---第一天
- 视觉SLAM笔记(3) 视觉SLAM框架
- flask request传参
- node sqlite 插入数据_方便且实用,Python内置的轻量级数据库实操
- 联合国超10万名员工记录遭泄露
- Unity简单操作:Unity资源商店 Asset store下载文件夹的位置
- 软件测试中的正交缺陷分析总结,正交缺陷分类(ODC)流程简介及应用经验分享(上)...
- excel合并多个工作表_EXCEL动态合并工作表,操作其实很简单
- vue手把手教你实现论坛bbs——(二)创建组件
- 获取不到Gist id?解决gist.github.com无法访问的办法
- 关于发现宇宙微波背景(CMB)辐射的一则趣闻
- 代谢组学数据处理软件——NormalizeMets
- Java 并发编程之美:并发编程高级篇之一-chat
- 怎样将MathType中的公式加入到iBooks Author
- SyntaxError: Non-UTF-8 code starting with ‘\xbd‘ in file C:\pycharm...Pycharm编译时出现以上提示
- UE4_UE5结合罗技手柄(F710)使用记录
热门文章
- Linux系统json文件打中文,如何在 Linux 终端上漂亮地打印 JSON 文件
- python print用法不换行_python3让print输出不换行的方法
- 深入体验php项目开发.pdf,《深入体验PHP项目开发》.(谭贞军).[PDF]
- c语言程序设计安徽区笔试部分,2021年安徽省二级C语言程序设计笔试样题-20210419093521.doc-原创力文档...
- api.php phpcms,phpcms程序api怎么写接口
- 在微型计算机系统中,打印机一般是通过( ,2013湖南省计算机等级考试试题 二级C试题最新考试试题库...
- linux中循环删除脚本,shell脚本:遍历删除
- 小皮面板有php环境吗,体验phpStudy小皮面板创建LAMP/LNMP系统和建站图文
- mysql聚集索引 myisam_一句话说清聚集索引和非聚集索引以及MySQL的InnoDB和MyISAM
- Android打开谷歌应用,谷歌确认 Android 12 新增剪贴板访问提醒,将在 Beta 2 上线