H

ive-On-Spark配置成功后,准备试用下Hive UDF是否能在Spark-on-Hive环境下正常使用:

set hive.execution.engine=spark;

add jar viewfs:///dirs/brickhouse-0.7.1-SNAPSHOT-jar-with-dependencies.jar;

create temporary function to_json AS 'brickhouse.udf.json.ToJsonUDF';

select to_json(app_metric) as tt from tbl_name where dt = '20180417' limit 10;

但在yarn-cluster模式下执行后会complain如下错误:

org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: brickhouse.udf.json.ToJsonUDF

Serialization trace:

genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc)

colExprMap (org.apache.hadoop.hive.ql.exec.SelectOperator)

childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)

aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)

invertedWorkGraph (org.apache.hadoop.hive.ql.plan.SparkWork)

at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:156)

at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:133)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClass(SerializationUtilities.java:181)

at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:118)

at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:176)

at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)

at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:214)

at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:176)

at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:134)

at org.apache.hive.com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:40)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:214)

at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:176)

at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:161)

at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:214)

at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:790)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readClassAndObject(SerializationUtilities.java:176)

at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:153)

at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:39)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:708)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:214)

at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:125)

at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:551)

at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:686)

at org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:206)

at org.apache.hadoop.hive.ql.exec.spark.KryoSerializer.deserialize(KryoSerializer.java:60)

at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient$JobStatusJob.call(RemoteHiveSparkClient.java:329)

at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:358)

at org.apache.hive.spark.client.RemoteDriver$JobWrapper.call(RemoteDriver.java:323)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: brickhouse.udf.json.ToJsonUDF

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:348)

at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:154)

... 47 more

此时如果只看stacktrace,感觉可能是jar文件没有分发到对应的RemoteDriver,导致类找不到。

MARK: debug除了看stacktrace,另一个思路是看看stacktrace上面的INFO。因为有些错误可能是fail-silent, 所以INFO里已经暴露出一些不正确的日志内容可以用于排查问题

拉取部分INFO日志内容如下:

18/04/23 18:51:39 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done

18/04/23 18:51:39 INFO spark.SparkContext: Added JAR viewfs://hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/hive-exec-2.3.2.jar at viewfs://hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/hive-exec-2.3.2.jar with timestamp 1524480699144

18/04/23 18:51:39 INFO spark.SparkContext: Added JAR viewfs://hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/kuaishou-analytics-auth-1.0.0.jar at viewfs://hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/kuaishou-analytics-auth-1.0.0.jar with timestamp 1524480699162

18/04/23 18:51:39 INFO spark.SparkContext: Added JAR viewfs://hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/brickhouse-0.7.1-SNAPSHOT-jar-with-dependencies-2.jar at viewfs://hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/brickhouse-0.7.1-SNAPSHOT-jar-with-dependencies-2.jar with timestamp 1524480699165

18/04/23 18:51:39 INFO storage.BlockManagerMasterEndpoint: Registering block manager bjlt-h1180.sy:37446 with 7.0 GB RAM, BlockManagerId(53, bjlt-h1180.sy, 37446)

18/04/23 18:51:39 INFO client.RemoteDriver: Received job request 2d71a807-c512-4032-8c9e-71a378d3168b

18/04/23 18:51:39 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (10.48.74.35:59232) with ID 41

18/04/23 18:51:39 INFO spark.ExecutorAllocationManager: New executor 41 has registered (new total is 53)

18/04/23 18:51:39 INFO storage.BlockManagerMasterEndpoint: Registering block manager bjlt-h1864.sy:36387 with 7.0 GB RAM, BlockManagerId(41, bjlt-h1864.sy, 36387)

18/04/23 18:51:39 INFO client.SparkClientUtilities: Added jar[file:/media/disk3/yarn_data/usercache/dp/appcache/application_1523431310007_1301182/container_e95_1523431310007_1301182_01_000001/viewfs:/hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/hive-exec-2.3.2.jar] to classpath.

18/04/23 18:51:39 INFO client.SparkClientUtilities: Added jar[file:/media/disk3/yarn_data/usercache/dp/appcache/application_1523431310007_1301182/container_e95_1523431310007_1301182_01_000001/viewfs:/hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/brickhouse-0.7.1-SNAPSHOT-jar-with-dependencies-2.jar] to classpath.

18/04/23 18:51:39 INFO client.SparkClientUtilities: Added jar[file:/media/disk3/yarn_data/usercache/dp/appcache/application_1523431310007_1301182/container_e95_1523431310007_1301182_01_000001/viewfs:/hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/kuaishou-analytics-auth-1.0.0.jar] to classpath.

18/04/23 18:51:39 INFO client.RemoteDriver: Failed to run job 2d71a807-c512-4032-8c9e-71a378d3168b

发现SparkContext的"Added JAR"语句对应的路径是正确的,但SparkClientUtilities对应的"Added JAR"是类似"file:/media/disk3/yarn_data/usercache/dp/appcache/application_1523431310007_1301182/container_e95_1523431310007_1301182_01_000001/viewfs:/hadoop-lt-cluster/tmp/hive/dp/_spark_session_dir/bc42b5c8-f183-4088-b238-2c3a75725d06/hive-exec-2.3.2.jar", 可见路径解析有问题。此时,去Hive源代码中review下发现code snippet如下:

private static URL urlFromPathString(String path, Long timeStamp,

Configuration conf, File localTmpDir) {

URL url = null;

try {

if (StringUtils.indexOf(path, "file:/") == 0) {

url = new URL(path);

} else if (StringUtils.indexOf(path, "hdfs:/") == 0) {

Path remoteFile = new Path(path);

Path localFile =

new Path(localTmpDir.getAbsolutePath() + File.separator + remoteFile.getName());

Long currentTS = downloadedFiles.get(path);

if (currentTS == null) {

currentTS = -1L;

}

if (!new File(localFile.toString()).exists() || currentTS < timeStamp) {

LOG.info("Copying " + remoteFile + " to " + localFile);

FileSystem remoteFS = remoteFile.getFileSystem(conf);

remoteFS.copyToLocalFile(remoteFile, localFile);

downloadedFiles.put(path, timeStamp);

}

return urlFromPathString(localFile.toString(), timeStamp, conf, localTmpDir);

} else {

url = new File(path).toURL();

}

} catch (Exception err) {

LOG.error("Bad URL " + path + ", ignoring path", err);

}

return url;

}

可见当前版本的Hive(0.23)只有对hdfs和file两种scheme的支持,不支持viewfs。增加适配viewfs的代码判断:else if (StringUtils.indexOf(path, "hdfs:/") == 0 || StringUtils.indexOf(path, "viewfs:/") == 0 ) {, 问题解决。

hive udf kyroexception unable to find class相关推荐

  1. Hive UDF,就这

    摘要:Hive UDF是什么?有什么用?怎么用?什么原理?本文从UDF使用入手,简要介绍相关源码,UDF从零开始. 本文分享自华为云社区<Hive UDF,就这>,作者:汤忒撒. Hive ...

  2. Hive UDF初探

    1. 引言 在前一篇中,解决了Hive表中复杂数据结构平铺化以导入Kylin的问题,但是平铺之后计算广告日志的曝光PV是翻倍的,因为一个用户对应于多个标签.所以,为了计算曝光PV,我们得另外创建视图. ...

  3. spark hive udf java_【填坑六】 spark-sql无法加载Hive UDF的jar

    /usr/custom/spark/bin/spark-sql --deploy-mode client add jar hdfs://${clusterName}/user/hive/udf/udf ...

  4. hive udf 分组取top1_Hive中分组取前N个值的实现-row_number()

    背景 假设有一个学生各门课的成绩的表单,应用hive取出每科成绩前100名的学生成绩. 这个就是典型在分组取Top N的需求. 解决思路 对于取出每科成绩前100名的学生成绩,针对学生成绩表,根据学科 ...

  5. Impala UDF - Impala调用Hive UDF函数

    Impala 中运行 Hive UDF 场景:部分查询需要快速返回,使用Impala进行快速.复杂的查询 1.简单的UDF函数过滤,判断是否包含"好"字,返回boolean类型 i ...

  6. hive UDF函数取最新分区

    hive UDF函数取最新分区 1.pom文件 <dependencies><!-- https://mvnrepository.com/artifact/org.apache.hi ...

  7. Hive UDF 函数指南

    精选30+云产品,助力企业轻松上云!>>> 点击蓝色"大数据每日哔哔"关注我 加个"星标",第一时间获取大数据架构,实战经验 Hive 内置了 ...

  8. Spark SQL 和 Hive UDF ExceptionInInitializerError getRemoteBlockReaderFromTcp BlockReaderFactory

    文章目录 1.背景 2. hive UDF函数 2. 注册到hive中 3. Spark SQL 4.运行报错 5. HDFS读取问题? 6. 牛 ???????? 7 . 解决后在谷歌搜索发现 8. ...

  9. Hive UDF开发

    Hive进行UDF开发十分简单,此处所说UDF为Temporary的function,所以需要hive版本在0.4.0以上才可以. Hive的UDF开发只需要重构UDF类的evaluate函数即可.例 ...

最新文章

  1. 数据结构--数组队列的实现
  2. python保存变量_python – 在代码运行之间保存变量的数据
  3. 安卓开发 adb命令使用
  4. T-SQL常用字符串函数
  5. higher likelyhood that where your key is
  6. zookeeper命令行(zkCli.shzkServer.sh)使用及四字命令
  7. OSSIM中配置网络资产监控
  8. Spring REST:异常处理卷。 3
  9. STM32用IAR调试出现Error[Pe020]: identifier FILE is undefined 解决方法
  10. Echarts单条折线可拖拽
  11. WampServer图标黄色如何解决
  12. 不用找,你想要的餐饮酒店word模板素材都在这里
  13. 百战程序员python视频下载_[视频教程] 百战程序员python400集(第一季115集)
  14. vb视屏教程计算机二级b,计算机二级vb教程|二级vb视频教程|计算机二级视频教程|快速通过计算机二级VB|猎豹网校视频...
  15. 【Ubuntu】安装H.264解码器
  16. 研究揭示肿瘤基因突变检测的复杂性
  17. DSP之定时器理论笔记
  18. 建筑企业并购:人才整合的三大误区
  19. 【es6】用map对数组对象中插入新的属性
  20. 23考研——2月份计划

热门文章

  1. MCU普通GPIO与高速GPIO的差异
  2. 【解决】PHPStorm许可证过期
  3. catia工具条无法横竖的调整
  4. Lua + Unity + VS LuaPerfect让Lua在VS上编辑如此简单
  5. Pr导出视频如何设置为1080P
  6. 通过VNC远程接入BeagleBone Black桌面
  7. Numpy模块的学习(下)
  8. [记录]curl命令笔记
  9. 解决Maven依赖版本不是最新的
  10. 中国联手Linux国际社区 研发新版Linux桌面