Win7下eclipse提交Job到hadoop集群

参考:http://zy19982004.iteye.com/blog/2031172

之前跑通eclipse连接hadoop2.2.0原来是local版的,一直在web UI界面上没有显示。

1.   出现的问题

1.1 main函数中的conf配置代码

Configuration conf = new Configuration();

conf.set("fs.defaultFS", "hdfs://192.168.178.181:9000");

conf.set("mapreduce.job.jar","D:\\Qing_WordCount.jar");

conf.addResource(new Path(    "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\hdfs-site.xml"));

conf.addResource(new Path(          "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\mapred-site.xml"));

conf.addResource(new Path(  "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\core-site.xml"));

conf.addResource(new Path(          "E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\yarn-site.xml"));

1.2    错误提示

application_1386170530016_0001 failed 2times due to AM Container for appattempt_1386170530016_0001_000002 exited withexitCode: 1 due to: Exception from container-launch:

org.apache.hadoop.util.Shell$ExitCodeException:/bin/bash: line 0: fg: no job control

atorg.apache.hadoop.util.Shell.runCommand(Shell.java:464)

atorg.apache.hadoop.util.Shell.run(Shell.java:379)

atorg.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)

atorg.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)

atorg.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)

atjava.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)

atjava.util.concurrent.FutureTask.run(FutureTask.java:166)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

atjava.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:724)

1.3    原因

网上搜索结果,发现这是hadoop2.2.0以及2.3.0的bug,由于不同操作系统之间的引起问题,详见apache官方Jira上的Bug提交:https://issues.apache.org/jira/browse/MAPREDUCE-5655

2.   解决办法

2.1 下载补丁

去官方jira下载补丁

2.2    打补丁

1)        用SecureCRT连接Master节点,进入hadoop-src源码目录。

2)        执行patch -p0 < MRApps.patch

按照提示输入:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-common/src/main/java/org/apache/hadoop/mapreduce/v2/util/MRApps.java

3)        执行patch -p0 < YARNRunner.patch

按照提示输入:

hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/main/java/org/apache/hadoop/mapred/YARNRunner.java

4)        同理还有HADOOP-10110.patch文件,这个文件的作用就是实现以下替换。

<源码根目录>/hadoop-common-project/hadoop-auth/pom.xml找到:

<dependency>

<groupId>org.mortbay.jetty</groupId>

<artifactId>jetty</artifactId>

<scope>test</scope>

</dependency>

在这之后添加:

<dependency>

<groupId>org.mortbay.jetty</groupId>

<artifactId>jetty-util</artifactId>

<scope>test</scope>

</dependency>

2.3    重新编译hadoop2.2.0

# mvn clean package -Pdist,native-DskipTests -Dtar -e –X

-e -X 参数是输出错误信息用的

编译的时候,一般会有网络连接不上的错误,重新执行命令就好了,一般需要个把小时。

成功后的截图:

2.4    获得Hadoop文件

最终我们希望获得的hadoop2.2.0文件在:

<源码根目录>/hadoop-dist/target/hadoop-2.2.0.tar.gz

2.5    获得BUG修复的Jar文件

1)        将hadoop-2.2.0.tar.gz解压,进入\share\hadoop\mapreduce目录,找到hadoop-mapreduce-client-jobclient-2.2.0.jar和hadoop-mapreduce-client-common-2.2.0.jar这两个jar,这两个就是导致eclipse上job提交不上集群的有BUG的文件。

2)        将这两个文件替换hadoop集群上在<hadoop目录>\share\hadoop\mapreduce下的同名文件。

3)        将这两个文件替换windows 7上的hadoop安装目录,也就是eclipse中的Window->Perferences->Hadoop Map/Reduece->Hadoopinstallation directory

2.6    修改mapred-site.xml配置文件

1)        修改hadoop集群所有节点上的mapred-site.xml文件,添加以下信息。

<property>

<name>mapreduce.application.classpath</name>

<value>

$HADOOP_CONF_DIR,

$HADOOP_COMMON_HOME/share/hadoop/common/*,

$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,

$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,

$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,

$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,

$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,

$HADOOP_YARN_HOME/share/hadoop/yarn/*,

$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

</value>

</property>

2)        修改windows环境下的mapred-site.xml,添加

<property>

<name>mapred.remote.os</name>

<value>Linux</value>

<description>RemoteMapReduce framework's OS, can be either Linux orWindows</description>

</property>

3.   总结

3.1 配置过程

Window向Linux Hadoop提交作业的方法

1)        配置好hadoop eclipse plugin。

2)        Job配置文件里mapreduce.framework.name为yarn。其它配置也需要正确。

3)        Run On Hadoop

3.2其他问题

若出现下面这个错误的话,那就是Job的jar没有设置。

解决办法:conf.set("mapreduce.job.jar","D:\\Qing_WordCount.jar");

其中Qing_WordCount.jar这是当前工程打包成jar,然后通过conf去设置这个jar的位置。

错误信息:

No job jar file set.  User classes may not be found. See Job orJob#setJar(String).

Error: java.lang.RuntimeException:java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found

atorg.apache.hadoop.conf.Configuration.getClass(Configuration.java:1720)

atorg.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)

atorg.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:721)

atorg.apache.hadoop.mapred.MapTask.run(MapTask.java:339)

atorg.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)

atjava.security.AccessController.doPrivileged(NativeMethod)

atjavax.security.auth.Subject.doAs(Subject.java:415)

atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

atorg.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)

Caused by: java.lang.ClassNotFoundException: Class WordCount$TokenizerMapper not found

atorg.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1626)

atorg.apache.hadoop.conf.Configuration.getClass(Configuration.java:1718)

... 8more

4.   成功演示

4.1 代码

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.JobConf;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public classWordCount {

public static class TokenizerMapper extends

Mapper<Object,Text, Text, IntWritable> {

private final static IntWritable one =new IntWritable(1);

private Textword =new Text();

// value已经是文件内容的一行

public void map(Object key, Textvalue, Context context)

throws IOException,InterruptedException {

StringTokenizeritr = newStringTokenizer(value.toString());

while (itr.hasMoreTokens()) {

word.set(itr.nextToken());

context.write(word,one);

}

}

}

public static class IntSumReducer extends

Reducer<Text,IntWritable, Text, IntWritable> {

private IntWritableresult =new IntWritable();

public void reduce(Text key,Iterable<IntWritable> values,

Contextcontext) throwsIOException, InterruptedException {

int sum = 0;

for (IntWritable val :values) {

sum+= val.get();

}

result.set(sum);

context.write(key,result);

}

}

public static void main(String[] args)throws Exception {

Configurationconf = newConfiguration();

conf.set("fs.defaultFS","hdfs://192.168.178.181:9000");

conf.addResource(new Path(

"E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\hdfs-site.xml"));

conf.addResource(new Path(

"E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\mapred-site.xml"));

conf.addResource(new Path(

"E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\core-site.xml"));

conf.addResource(new Path(

"E:\\eclipse\\hadoop-2.2.0\\etc\\hadoop\\yarn-site.xml"));

String[]otherArgs = newGenericOptionsParser(conf, args)

.getRemainingArgs();

if (otherArgs.length != 2) {

System.err.println("Usage: wordcount <in> <out>");

System.exit(2);

}

Jobjob = newJob(conf,"word count");

job.setJarByClass(WordCount.class);

job.setMapperClass(TokenizerMapper.class);

job.setCombinerClass(IntSumReducer.class);

job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job,newPath(otherArgs[0]));

FileOutputFormat.setOutputPath(job,newPath(otherArgs[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

4.2    eclipse Run Configuration

4.3    结果

1) Eclipse中的输出:

2014-04-26 18:53:45,881 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(56))- Connecting to ResourceManager at master/192.168.178.181:8032

2014-04-26 18:53:46,347 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(287))- Total input paths to process : 1

2014-04-26 18:53:46,429 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(394))- number of splits:1

2014-04-26 18:53:46,444 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- user.name is deprecated. Instead, use mapreduce.job.user.name

2014-04-26 18:53:46,444 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.jar is deprecated. Instead, use mapreduce.job.jar

2014-04-26 18:53:46,445 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- fs.default.name is deprecated. Instead, use fs.defaultFS

2014-04-26 18:53:46,446 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.output.value.class is deprecated. Instead, usemapreduce.job.output.value.class

2014-04-26 18:53:46,446 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.job.name is deprecated. Instead, use mapreduce.job.name

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class

2014-04-26 18:53:46,447 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.input.dir is deprecated. Instead, usemapreduce.input.fileinputformat.inputdir

2014-04-26 18:53:46,448 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.output.dir is deprecated. Instead, usemapreduce.output.fileoutputformat.outputdir

2014-04-26 18:53:46,448 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

2014-04-26 18:53:46,448 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.output.key.class is deprecated. Instead, usemapreduce.job.output.key.class

2014-04-26 18:53:46,449 INFO  [main] Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(840))- mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir

2014-04-26 18:53:46,571 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(477))- Submitting tokens for job: job_1398500824037_0005

2014-04-26 18:53:46,802 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(174))- Submitted application application_1398500824037_0005 to ResourceManager atmaster/192.168.178.181:8032

2014-04-26 18:53:46,862 INFO  [main] mapreduce.Job (Job.java:submit(1272)) - The url to track thejob: http://master:8088/proxy/application_1398500824037_0005/

2014-04-26 18:53:46,863 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1317))- Running job: job_1398500824037_0005

2014-04-26 18:53:55,827 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1338))- Job job_1398500824037_0005 running in uber mode : false

2014-04-26 18:53:55,829 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345))-  map 0% reduce 0%

2014-04-26 18:54:02,899 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345))-  map 100% reduce 0%

2014-04-26 18:54:12,982 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1345))-  map 100% reduce 100%

2014-04-26 18:54:12,992 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1356))- Job job_1398500824037_0005 completed successfully

2014-04-26 18:54:13,097 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1363))- Counters: 43

FileSystem Counters

FILE:Number of bytes read=6216483

FILE:Number of bytes written=12593661

FILE:Number of read operations=0

FILE:Number of large read operations=0

FILE:Number of write operations=0

HDFS:Number of bytes read=6144044

HDFS:Number of bytes written=6090981

HDFS:Number of read operations=6

HDFS:Number of large read operations=0

HDFS:Number of write operations=2

JobCounters

Launchedmap tasks=1

Launchedreduce tasks=1

Data-localmap tasks=1

Totaltime spent by all maps in occupied slots (ms)=5026

Totaltime spent by all reduces in occupied slots (ms)=7932

Map-ReduceFramework

Mapinput records=38038

Mapoutput records=19349

Map output bytes=6189081

Mapoutput materialized bytes=6216483

Inputsplit bytes=121

Combineinput records=19349

Combineoutput records=18984

Reduceinput groups=18984

Reduceshuffle bytes=6216483

Reduceinput records=18984

Reduceoutput records=18984

SpilledRecords=37968

ShuffledMaps =1

FailedShuffles=0

MergedMap outputs=1

GCtime elapsed (ms)=259

CPUtime spent (ms)=4020

Physicalmemory (bytes) snapshot=311271424

Virtualmemory (bytes) snapshot=1683406848

Totalcommitted heap usage (bytes)=164630528

ShuffleErrors

BAD_ID=0

CONNECTION=0

IO_ERROR=0

WRONG_LENGTH=0

WRONG_MAP=0

WRONG_REDUCE=0

FileInput Format Counters

BytesRead=6143923

FileOutput Format Counters

BytesWritten=6090981

2)        Web UI中的输出:

Win7下eclipse提交Job到hadoop集群相关推荐

  1. eclipse手动pom本地包_环境篇--Eclipse如何远程连接Hadoop集群调试

    关注 DLab数据实验室 公众号 带你一起学习大数据- 写在前面:最近终于闲下来了,打算把之前了解到的内容整理一下,先从搭建环境开始吧- 现在接触大数据开发的朋友可能直接使用Spark或者其他的查询引 ...

  2. Window平台的eclipse连接linux的hadoop集群

    前提条件:之前已经在VM上虚拟了三台linux,并且安装了hadoop集群 feixu-master, feixu-slave1, feixu-slave2, feixu-slave3 需求: 为了开 ...

  3. eclipse远程调试Tomcat, Hadoop集群等

    主导是JPDA(Java Platform Debugger Architecture), 它支持java中的各种调试,由两个接口(JVM Tool Interface和JDI).一个协议(JDWP) ...

  4. Hadoop集群_Eclipse开发环境设置

    1.Hadoop开发环境简介 1.1 Hadoop集群简介 Java版本:jdk-6u31-linux-i586.bin Linux系统:CentOS6.0 Hadoop版本:hadoop-1.0.0 ...

  5. windows下eclipse远程连接hadoop集群开发mapreduce

    转载请注明出处,谢谢 2017-10-22 17:14:09 之前都是用python开发maprduce程序的,今天试了在windows下通过eclipse java开发,在开发前先搭建开发环境.在此 ...

  6. Hadoop集群部署模型纵览1

    vSphere Big Data Extensions(简称BDE)支持多种部署方式来构建Hadoop集群.按: 存储/计算绑定模型:将存储节点(Data Node)和计算节点(Task Tracke ...

  7. Eclipse中使用Hadoop集群模式开发配置及简单程序示例(Windows下)

    Hadoop入门配置系列博客目录一览 1. Eclipse中使用Hadoop伪分布模式开发配置及简单程序示例(Linux下) 2. 使用Hadoop命令行执行jar包详解(生成jar.将文件上传到df ...

  8. Windows下Eclipse提交MR程序到HadoopCluster

    作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 欢迎转载,转载请注明出处. 以前Eclipse上写好的MapReduce项目经常是打好包上传到Hadoop测试集 ...

  9. Eclipse连接远程Hadoop集群运行WordCount例程

    hadoop开发主要分为两种,第一种是脱离集群开发:第二种是连接远程Hadoop集群,将我们开发的API提交hadoop执行: http://www.aboutyun.com/thread-6950- ...

最新文章

  1. Firefox浏览器的安装
  2. C++ Primer 5th笔记(chap 15 OOP)虚函数
  3. javaweb学习总结(四十四)——监听器(Listener)学习
  4. 0基础必看:如何轻松成为C语言高手
  5. java定义变量的输入_Terraform中输入变量
  6. exists的用法 python_Python 办公自动化自动整理文件,一键完成!
  7. android+查看内存容量apk,如何检查 Android 应用的内存使用情况
  8. SQLyog客户端常用快捷键
  9. it招聘上说熟悉linux系统,运维入门:细说Linux,做IT必看
  10. Bailian2816 红与黑【DFS】
  11. 一个流氓的SQL设计,备份(一个字段存多个数据)
  12. 【23】数据可视化:基于 Echarts + Python 动态实时大屏范例 - Redis 数据源
  13. 魔兽世界怀旧服服务器显示配置,魔兽世界怀旧服配置要求很高吗 魔兽世界怀旧服电脑最低配置要求...
  14. 元数据看板的初步设计思路
  15. Introduction to OOC Programming Language
  16. mysql 1677错误_[转载]MySQL 5.5.12 row格式复制下从库结构变更引发1677错误一则
  17. ubuntu18.4 安装谷歌浏览器
  18. 千克与磅之间的转换 Exercise05_05
  19. ARM及ZigBee技术实现智能家居控制器的设计
  20. unity接入百度人体识别

热门文章

  1. 微信公众号封面一键生成器-续
  2. 三菱触摸屏(GS2110)触摸屏经宇电AI-mobdus485通讯转换器与2个宇电70482D7多路温度模块直接通讯实例
  3. Touch ID使用
  4. 视频教程-VR 游戏创业中的那些坑-其他
  5. Mongodb- paly中操作mongodb记录
  6. ion orphaned memory
  7. 软件测试人员的年终绩效考核怎么应对
  8. 找个问道自动架设工具
  9. Mysql-innoDB锁总结
  10. 进阶阿里巴巴之路——招聘要求汇总