hive on tez踩坑记2-hive0.14 on tez
在测试hive0.14.0 on tez时遇到的问题比较多:
1.在使用cdh5.2.0+hive0.14.0+tez-0.5.0测试时,首先遇到下面的问题
java.lang.NoSuchMethodError: org.apache.tez.dag.api.client.Progress.getFailedTaskAttemptCount()Iat org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.printStatusInPlace(TezJobMonitor.java:613)at org.apache.hadoop.hive.ql.exec.tez.TezJobMonitor.monitorExecution(TezJobMonitor.java:311)at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:167)at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1604)at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1364)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1177)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1004)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:994)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:247)at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:199)at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:410)at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:783)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:677)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)at java.lang.reflect.Method.invoke(Method.java:597)at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
通过堆栈可以看出是在tez job提交之后报的错,在org.apache.hadoop.hive.ql.exec.tez.TezTask中
job通过submit方法提交后,实例化一个TezJobMonitor 对象,用来记录tez job的运行情况:
// submit will send the job to the cluster and start executing
client = submit(jobConf, dag, scratchDir, appJarLr, session,
additionalLr, inputOutputJars, inputOutputLocalResources);
// finally monitor will print progress until the job is done
TezJobMonitor monitor = new TezJobMonitor();
rc = monitor.monitorExecution(client, ctx.getHiveTxnManager(), conf, dag);
TezJobMonitor.monitorExecution方法中:
boolean isProfileEnabled = conf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_SUMMARY); //hive.tez.exec.print.summary,默认为false
boolean inPlaceUpdates = conf.getBoolVar(conf, HiveConf.ConfVars.TEZ_EXEC_INPLACE_PROGRESS); //hive.tez.exec.inplace.progress,默认为true
boolean wideTerminal = false;
boolean isTerminal = inPlaceUpdates == true ? isUnixTerminal() : false;
// we need at least 80 chars wide terminal to display in-place updates properly
if (isTerminal) {if (getTerminalWidth() >= MIN_TERMINAL_WIDTH) {wideTerminal = true;}
}
boolean inPlaceEligible = false;
if (inPlaceUpdates && isTerminal && wideTerminal && !console.getIsSilent()) {inPlaceEligible = true;
}
//进入一个while循环,判断 job的状态,并运行printStatusInPlace或者printStatus方法(其中printStatus最终调用getReport方法)
......
case RUNNING:if (!running) {perfLogger.PerfLogEnd(CLASS_NAME, PerfLogger.TEZ_SUBMIT_TO_RUNNING);console.printInfo("Status: Running (" + dagClient.getExecutionContext() + ")\n");startTime = System.currentTimeMillis();running = true;}if (inPlaceEligible) {printStatusInPlace(progressMap, startTime, false, dagClient);// log the progress report to log file as welllastReport = logStatus(progressMap, lastReport, console);} else {lastReport = printStatus(progressMap, lastReport, console);}break;
比如在printStatusInPlace方法中:
SortedSet<String> keys = new TreeSet<String>(progressMap.keySet());
int idx = 0;
int maxKeys = keys.size();
for (String s : keys) {idx++;Progress progress = progressMap.get(s);final int complete = progress.getSucceededTaskCount();final int total = progress.getTotalTaskCount();final int running = progress.getRunningTaskCount();final int failed = progress.getFailedTaskAttemptCount(); // 会调用Progress类getFailedTaskAttemptCount方法获取失败的task数final int pending = progress.getTotalTaskCount() - progress.getSucceededTaskCount() -progress.getRunningTaskCount();final int killed = progress.getKilledTaskCount();
在0.5.0的tez中org.apache.tez.dag.api.client.Progress类没有getFailedTaskAttemptCount方法
在0.5.2的tez中才开始增加这个方法,因此要想使用hive0.14.0的话,需要使用tez-0.5.2以上的版本
2.升级至hive0.14.0+tez-0.5.2之后,发现如下错误:
15/01/13 14:09:21 INFO client.TezClient: The url to track the Tez Session: http://xxxx:8042/proxy/application_1416818587155_0049/
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdownat org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:457)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:672)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:616)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)at java.lang.reflect.Method.invoke(Method.java:597)at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdownat org.apache.tez.client.TezClient.waitTillReady(TezClient.java:599)at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:212)at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:122)at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:454)... 7 more
可以看到是由于在session初始化异常导致,异常是由TezSessionState.open方法抛出:
....try {session.waitTillReady();} catch(InterruptedException ie) {//ignore}
其中session为TezClient的实例,在TezClient.waitTillReady方法中
public synchronized void waitTillReady() throws IOException, TezException, InterruptedException {if (!isSession) {// nothing to wait for in non-session modereturn;}verifySessionStateForSubmission();while (true) {TezAppMasterStatus status = getAppMasterStatus(); //这里getAppMasterStatus方法返回了TezAppMasterStatus.SHUTDOWNif (status.equals(TezAppMasterStatus.SHUTDOWN)) {throw new SessionNotRunning("TezSession has already shutdown");}if (status.equals(TezAppMasterStatus.READY)) {return;}Thread.sleep(SLEEP_FOR_READY);}
}
这里创建TezClient时设置了为sessionmode,并且getAppMasterStatus返回了TezAppMasterStatus.SHUTDOWN,导致在waitTillReady方法中抛出异常,即TezAppMaster没有启动正常导致,查看nm的日志,发现由如下报错:
2015-01-13 16:27:58,162 WARN org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Exception from container-launch with container ID: container_1416818587155_0060_01_000001 and exit code: 1
ExitCodeException exitCode=1:at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)at org.apache.hadoop.util.Shell.run(Shell.java:455)at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)at java.util.concurrent.FutureTask.run(FutureTask.java:138)at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)at java.lang.Thread.run(Thread.java:662)
是由于启动am的container异常报错,查看对应的container日志:
2015-01-13 17:34:59,731 FATAL [main] app.DAGAppMaster: Error starting DAGAppMaster
java.lang.VerifyError: class org.apache.hadoop.yarn.proto.YarnProtos$ApplicationIdProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;at java.lang.ClassLoader.defineClass1(Native Method)at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)at java.lang.ClassLoader.defineClass(ClassLoader.java:615)at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)at java.net.URLClassLoader.access$000(URLClassLoader.java:58)at java.net.URLClassLoader$1.run(URLClassLoader.java:197)at java.security.AccessController.doPrivileged(Native Method)at java.net.URLClassLoader.findClass(URLClassLoader.java:190)at java.lang.ClassLoader.loadClass(ClassLoader.java:306)at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)at java.lang.ClassLoader.loadClass(ClassLoader.java:247)at java.lang.Class.getDeclaredConstructors0(Native Method)at java.lang.Class.privateGetDeclaredConstructors(Class.java:2389)at java.lang.Class.getConstructor0(Class.java:2699)at java.lang.Class.getConstructor(Class.java:1657)at org.apache.hadoop.yarn.factories.impl.pb.RecordFactoryPBImpl.newRecordInstance(RecordFactoryPBImpl.java:62)at org.apache.hadoop.yarn.util.Records.newRecord(Records.java:36)at org.apache.hadoop.yarn.api.records.ApplicationId.newInstance(ApplicationId.java:49)at org.apache.hadoop.yarn.util.ConverterUtils.toApplicationAttemptId(ConverterUtils.java:137)at org.apache.hadoop.yarn.util.ConverterUtils.toContainerId(ConverterUtils.java:177)at org.apache.tez.dag.app.DAGAppMaster.main(DAGAppMaster.java:1794)
看样子是protoc-buf兼容的问题。
cdh5.2.0默认使用protobuf-java-2.5.0.jar,hive0.14.0默认使用protobuf-java-2.5.0.jar,tez 0.5.2也使用pb2.5.0编译,理论上应该不会有pb兼容性问题,怀疑是在tezam启动时加载了2.4.0a 的pb,需要查看启动命令,找到对应的classpath:
通过更改org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor类,增加Thread.sleep来查看启动am的shell,重新编译cdh5.2.0包(主要需要java7支持 range [1.7.0,1.7.1000}],编译时跳过native: mvn package -DskipTests -Pdist -Dtar -e -X),
并替换./share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.5.0-cdh5.2.0.jar 测试:
shell的调用如下:
default_container_executor.sh-->default_container_executor_session.sh-->launch_container.sh
而在launch_container.sh脚本:
export HADOOP_COMMON_HOME="/home/vipshop/platform/hadoop-2.5.0-cdh5.2.0" #先设置相关的变量
export CLASSPATH="$PWD:$PWD/*:$HADOOP_CONF_DIR:" #这里重设了CLASSPATH
export HADOOP_TOKEN_FILE_LOCATION="/home/vipshop/hard_disk/7/yarn/local/usercache/hdfs/appcache/application_1416818587155_0075/container_1416818587155_0075_01_000001/container_tokens"
....
ln -sf "/home/vipshop/hard_disk/10/yarn/local/filecache/42/hadoop-yarn-api-2.5.0.jar" "hadoop-yarn-api-2.5.0.jar" #建立相关jar的软连接到本地目录
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
thenexit $hadoop_shell_errorcode
fi
.....
exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx819m -server-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA-XX:+UseParallelGC -Dlog4j.configuration=tez-container-log4j.properties-Dyarn.app.container.log.dir=/home/vipshop/hard_disk/9/yarn/logs/application_1416818587155_0075/container_1416818587155_0075_01_000001 -Dtez.root.logger=INFO,CLA -Dsun.nio.ch.bugLevel='' org.apache.tez.dag.app.DAGAppMaster --session 1>/home/vipshop/hard_disk/9/yarn/logs/application_1416818587155_0075/container_1416818587155_0075_01_000001/stdout 2>/home/vipshop/hard_disk/9/yarn/logs/application_1416818587155_0075/container_1416818587155_0075_01_000001/stderr "
#最后运行 java org.apache.tez.dag.app.DAGAppMaster,即
org.apache.tez.dag.app.DAGAppMaster的main方法,启动DAGAppMaster
CLASSPATH为shell所在的目录,比如这里
CLASSPATH='/home/vipshop/hard_disk/11/yarn/local/usercache/hdfs/appcache/
application_1416818587155_0079/container_1416818587155_0079_01_000001:
/home/vipshop/hard_disk/11/yarn/local/usercache/hdfs/appcache/
application_1416818587155_0079/container_1416818587155_0079_01_000001/*:
/home/vipshop/conf:'
在shell的当前目录下查找包含pb的包,发现有一个hive-solr中集成了pb,并且查看到其pb版本为2.4.0a:
for i in `find . -name "*.jar"`; do echo $i `jar -tvf $i|grep GeneratedMessage|wc -l`; done|awk '{if($2>0) print}'
./protobuf-java-2.5.0.jar 31 //2.5.0
./hive-exec-0.14.0-dfffe4217f40bd764977b741ad970a562e07fb9×××f0180620bd13f68a2577b.jar 31 //2.5.0
./hive-solr-0.0.1-SNAPSHOT-jar-with-dependencies.jar //2.4.0a
这就导致在container启动时,classloader加载到了2.4.0a的pb,最终导致container启动失败。使用2.5.0的pb重新编译这个jar包后,hive on tez就运行正常了。
转载于:https://blog.51cto.com/caiguangguang/1604100
hive on tez踩坑记2-hive0.14 on tez相关推荐
- hive on tez踩坑记1-hive0.13 on tez
最近集群准备升级到cdh5.2.0,并使用tez,在测试集群cdh5.2.0已经稳定运行了很长时间,因此开始折腾hive on tez了,期间遇到不少问题,这里记录下. hive on tez的部署比 ...
- python从入门到实践django看不懂_Python编程:从入门到实践踩坑记 Django
<>踩坑记 Django Django Python 19.1.1.5 模板new_topic 做完书上的步骤后,对主题添加页面经行测试,但是浏览器显示 服务器异常. 个人采用的开发环境是 ...
- 东八区转为0时区_踩坑记 | Flink 天级别窗口中存在的时区问题
❝ 本系列每篇文章都是从一些实际的 case 出发,分析一些生产环境中经常会遇到的问题,抛砖引玉,以帮助小伙伴们解决一些实际问题.本文介绍 Flink 时间以及时区问题,分析了在天级别的窗口时会遇到的 ...
- Spring @Transactional踩坑记
@Transactional踩坑记 总述 Spring在1.2引入@Transactional注解, 该注解的引入使得我们可以简单地通过在方法或者类上添加@Transactional注解,实现事务 ...
- 服务器重新部署踩坑记
服务器重新部署踩坑记 Intro 之前的服务器是 Ubuntu 18.04 ,上周周末想升级一下服务器系统,从 18.04 升级到 20.04,结果升级升挂了... 后来 SSH 始终连不上,索性删除 ...
- IdentityServer 部署踩坑记
IdentityServer 部署踩坑记 Intro 周末终于部署了 IdentityServer 以及 IdentityServerAdmin 项目,踩了几个坑,在此记录分享一下. 部署架构 项目是 ...
- windows container 踩坑记
windows container 踩坑记 Intro 我们有一些服务是 dotnet framework 的,不能直接跑在 docker linux container 下面,最近一直在折腾把它部署 ...
- Spark踩坑记——数据库(Hbase+Mysql)转
转自:http://www.cnblogs.com/xlturing/p/spark.html 前言 在使用Spark Streaming的过程中对于计算产生结果的进行持久化时,我们往往需要操作数据库 ...
- android小程序_小程序踩坑记
小程序踩坑记 希望这个文章能尽量记录下小程序的那些坑,避免开发者们浪费自己的生命来定位到底是自己代码导致的还是啥神秘的字节跳变原因. 前记 小程序大多数坑是同一套代码在不同平台上表现不一致导致的,微信 ...
最新文章
- 获得Google搜索字符串中的关键字
- Xming + PuTTY 在Windows下远程Linux主机使用图形界面的程序
- 从传统运维到云运维演进历程之软件定义存储(五)下
- 10 windows 启动虚拟机报错_Windows 系统如何安装 Docker
- java mysql连接两张表,如何使用Java和MySQL在一个语句中插入两个不同的表?
- logstash收集java日志,多行合并成一行
- [BZOJ5329] [SDOI2018] 战略游戏
- 【转】深入理解Java:SimpleDateFormat安全的时间格式化
- 大数据Hadoop之——Hadoop图形化管理系统Hue(Hue环境部署)
- 真人发音计算机怎么弄成音乐模式,文字转语音真人发声怎么转换?简单教程分享...
- c语言中char有无符号,char代表有符号还是无符号?
- i7台式电脑配置推荐_高配游戏电脑 intel酷睿i7-8700配RTX2070六核台式电脑配置清单表...
- Linux刻录系统文件ISO到光盘
- 在屏幕的任意位置拖拽,控制精灵移动
- three.js使用外部模型创建动画,使用GLTF格式文件动画创建动画(vue中使用three.js71)
- 大数据世界中的新技术
- WebView打开第三方APP
- 【C语言编程】求Fibonacci(斐波那契)数列前40个数
- 商业Web应用程序的用户界面设计
- 这些魔术用的是物理原理?有啥诀窍?
热门文章
- 对DBF的操作建议用微软的驱动和新的链接字符串。
- -Git 使用技巧 总结 MD
- Smali文件添加try/catch语句,出现“invalid use of move-exception”异常
- java中,字符串和集合判断是否为空
- TabLayout+ViewPager更新fragment的ui数据
- hdoj Last non-zero Digit in N! 【数论】
- Search Insert Position @leetcode
- 从网上找到一个清晰CSS视频教程和大家分享一下
- 关于Beta分布、二项分布与Dirichlet分布、多项分布的关系
- 简单介绍常用hadoop dfs命令