kylo问题总结1

Spark configuration

cp /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml

# Snappy isn't working well for Spark on Cloudera

echo "spark.io.compression.codec=lz4" >> /etc/spark/conf/spark-defaults.conf

Kylo-Nifi安装后的相关问题

防火墙关闭

systemctl stop firewalld.service

8400,8079端口已开放

iptables -A INPUT -ptcp --dport 8400 -j ACCEPT

mysql有kylo库及kylo用户并正确授权

spark-shell能够正常启动

相应用户有操作hive的权限(Sentry集群中)

问题

目录与权限

安装完成后,如未自动创建下列目录,需要考虑权限问题或手动创建所需目录,并赋予相应权限。

hdfs dfs -mkdir /user/kylo

hdfs dfs -chown kylo:kylo /user/kylo

hdfs dfs -mkdir /user/nifi

hdfs dfs -chown nifi:nifi /user/nifi

hdfs dfs -mkdir /etl

hdfs dfs -chown nifi:nifi /etl

hdfs dfs -mkdir /model.db

hdfs dfs -chown nifi:nifi /model.db

hdfs dfs -mkdir /archive

hdfs dfs -chown nifi:nifi /archive

hdfs dfs -mkdir -p /app/warehouse

hdfs dfs -chown nifi:nifi /app/warehouse

本地/tmp目录也需要提供相应权限,如果无法读写/tmp/kylo-nifi/目录,怎有可能会报错

Elasticsearch索引

EsIndexException in Kylo services logs

Problem

Kylo services log contains errors similar to this:org.modeshape.jcr.index.elasticsearch.EsIndexException: java.io.IOException: Not Found

Solution

Pre-create the indexes used by Kylo in Elasticsearch. Execute this script: /opt/kylo/bin/create-kylo-indexes-es.sh

The script takes 4 parameters.

Examples values:

host: localhost

rest-port: 9200

num-shards: 1

num-replicas: 1

Note: num-shards and num-replicas can be set to 1 for development environment

spark默认压缩格式

错误信息: UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I

cp /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml

# Snappy isn't working well for Spark on Cloudera

echo "spark.io.compression.codec=lz4" >> /etc/spark/conf/spark-defaults.conf

return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

hdfs dfs -mkdir /user/dladmin

hdfs dfs -chown dladmin: dladmin /user/dladmin

默认路径多为hdp的路径

在nifi的模版中,多数流程的默认路径都为hdp的路径,因此需要修改相应的路径为当前环境正确的路径,否则会造成Not Found的错误。

data_transformation模版中,Prepare Script生成的脚本存在问题

该问题待确认 Prepare Script中的判断存在问题,也可能是环境配置有误,导致生成的脚本同时存在insertInto()与partitionBy()。 当前解决方法为注释相关代码:

if (!isPreFeed && (sparkVersion == null || sparkVersion == "1")) {

//该行为注释行

//script = script + ".partitionBy(\"processing_dttm\")"

}

使用模版导入数据

File Filter中的数据类型识别存在一定问题,不够准确,导致后续的数据处理产生问题,因此在创建Feed时,需要留意每个field真实的数据类型。

原数据与目标数据的数据类型

在nifi的处理中,Feed中设定的数据格式为目标数据格式,由原数据ETL而来,因此,需要原数据与目标数据的数据格式相同,否则报错。

[图片上传失败...(image-89a745-1548150455520)]

这个错误是由于spark-shell启动不起来

查看配置,nifi服务的principal认证失败造成

这个错误是由于hive2连接不上

[root@kylo2 soft]# beeline

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0

Beeline version 1.1.0-cdh5.15.0 by Apache Hive

beeline>

beeline> !connect jdbc:hive2://10.88.88.120:10000/default;principal=hive/kylo1.hypers.cc@KYLO.CC

scan complete in 2ms

Connecting to jdbc:hive2://10.88.88.120:10000/default;principal=hive/kylo1.hypers.cc@KYLO.CC

18/09/30 17:10:54 [main]: ERROR transport.TSaslTransport: SASL negotiation failure

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)

Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

... 35 more

Unknown HS2 problem when communicating with Thrift server.

Error: Could not open client transport with JDBC Uri: jdbc:hive2://10.88.88.120:10000/default;principal=hive/kylo1.hypers.cc@KYLO.CC: GSS initiate failed (state=08S01,code=0)

hive链接失败

kylo-service.log 日志

2018-10-12 13:56:54 ERROR http-nio-8420-exec-6:ConnectionPool:182 - Unable to create initial connections of pool.

java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://10.88.88.120:10000/default: java.net.ConnectException: Connection refused (Connection refused)

at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:215)

at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:163)

Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)

at org.apache.thrift.transport.TSocket.open(TSocket.java:185)

at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248)

at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)

at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:190)

... 147 more

Caused by: java.net.ConnectException: Connection refused (Connection refused)

at java.net.PlainSocketImpl.socketConnect(Native Method)

at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)

at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)

at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)

at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)

at java.net.Socket.connect(Socket.java:589)

at org.apache.thrift.transport.TSocket.open(TSocket.java:180)

... 150 more

解决方案

vim /opt/kylo/kylo-services/conf/application.properties

hive.datasource.driverClassName=org.apache.hive.jdbc.HiveDriver

hive.datasource.url=jdbc:hive2://10.88.88.120:10000/default

hive.datasource.username=hive

hive.datasource.password=hive

hive.datasource.validationQuery=show tables 'test'

问题

kylo-service.log 日志

2018-10-12 14:07:17 ERROR http-nio-8420-exec-5:ThrowableMapper:43 - toResponse() caught throwable

org.springframework.jdbc.CannotGetJdbcConnectionException: Could not get JDBC Connection; nested exception is java.sql.SQLInvalidAuthorizationSpecException: Could not connect: Access denied for user 'kylo'@'10.88.88.122' (using password: NO)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.sql.SQLInvalidAuthorizationSpecException: Could not connect: Access denied for user 'kylo'@'10.88.88.122' (using password: NO)

at org.mariadb.jdbc.internal.util.ExceptionMapper.get(ExceptionMapper.java:135)

at org.mariadb.jdbc.internal.util.ExceptionMapper.getException(ExceptionMapper.java:101)

at org.mariadb.jdbc.internal.util.ExceptionMapper.throwException(ExceptionMapper.java:91)

at org.mariadb.jdbc.Driver.connect(Driver.java:109)

at org.apache.tomcat.jdbc.pool.PooledConnection.connectUsingDriver(PooledConnection.java:307)

at org.apache.tomcat.jdbc.pool.PooledConnection.connect(PooledConnection.java:200)

at org.apache.tomcat.jdbc.pool.ConnectionPool.createConnection(ConnectionPool.java:710)

at org.apache.tomcat.jdbc.pool.ConnectionPool.borrowConnection(ConnectionPool.java:644)

at org.apache.tomcat.jdbc.pool.ConnectionPool.init(ConnectionPool.java:466)

at org.apache.tomcat.jdbc.pool.ConnectionPool.(ConnectionPool.java:143)

at org.apache.tomcat.jdbc.pool.DataSourceProxy.pCreatePool(DataSourceProxy.java:115)

at org.apache.tomcat.jdbc.pool.DataSourceProxy.createPool(DataSourceProxy.java:102)

at org.apache.tomcat.jdbc.pool.DataSourceProxy.getConnection(DataSourceProxy.java:126)

at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)

at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)

... 126 more

Caused by: org.mariadb.jdbc.internal.util.dao.QueryException: Could not connect: Access denied for user 'kylo'@'10.88.88.122' (using password: NO)

at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.authentication(AbstractConnectProtocol.java:557)

at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.handleConnectionPhases(AbstractConnectProtocol.java:499)

at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connect(AbstractConnectProtocol.java:384)

at org.mariadb.jdbc.internal.protocol.AbstractConnectProtocol.connectWithoutProxy(AbstractConnectProtocol.java:825)

at org.mariadb.jdbc.internal.util.Utils.retrieveProxy(Utils.java:469)

at org.mariadb.jdbc.Driver.connect(Driver.java:104)

... 137 more

Visual Query

162 hive.datasource.driverClassName=org.apache.hive.jdbc.HiveDriver

163 hive.datasource.url=jdbc:hive2://10.16.4.68:10000/default;principal=hive/kylo-cdh5.cs1cloud.internal@CS1CLOUD.INTERNAL

164 hive.datasource.username=hive

165 hive.datasource.password=123456

166 hive.datasource.validationQuery=show tables 'test'

...

170 ##Also Clouder url should be /metastore instead of /hive

171 hive.metastore.datasource.driverClassName=org.mariadb.jdbc.Driver

172 hive.metastore.datasource.url=jdbc:mysql://10.16.4.68:3306/hive

173 #hive.metastore.datasource.url=jdbc:mysql://10.16.4.68:3306/hive

174 hive.metastore.datasource.username=metastore

175 hive.metastore.datasource.password=hive123

176 hive.metastore.datasource.validationQuery=SELECT 1

177 hive.metastore.datasource.testOnBorrow=true

hive的principal可以用nifi的principal

要给nifi的keytab权限777

hive的模拟用户设置为false

2018-11-23 15:30:46,289 ERROR [Timer-Driven Process Thread-2] c.t.nifi.v2.ingest.HdiMergeTable HdiMergeTable[id=0b301887-5de0-3dce-725d-b6e81225713d] Unable to execute merge doMerge for StandardFlowFileRecord[uuid=c822a690-7d3e-4a1d-8a28-d18dbc35e18d,claim=,offset=0,name=701401108440816,size=0] due to java.lang.RuntimeException: Failed to execute query; routing to failure: java.lang.RuntimeException: Failed to execute query

java.lang.RuntimeException: Failed to execute query

at com.thinkbiganalytics.ingest.TableMergeSyncSupport.doExecuteSQL(TableMergeSyncSupport.java:759)

at com.thinkbiganalytics.ingest.HdiTableMergeSyncSupport.doRolling(HdiTableMergeSyncSupport.java:97)

at com.thinkbiganalytics.nifi.v2.ingest.HdiMergeTable.onTrigger(HdiMergeTable.java:267)

at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)

at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1147)

at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:175)

at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)

Caused by: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

确认hive的hive.metastore链接信息是否正确

在hive的日志中,找没添加的用,在hdfs 上加上并即可

echo "spark.io.compression.codec=lz4" >> /etc/spark/conf/spark-defaults.conf

eleaseLocks start=1544077817490 end=1544077817541 duration=51 from=org.apache.hadoop.hive.ql.Driver>

2018-12-06 14:30:17,611 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-67]: Error running hive query:

org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)

at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)

at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)

at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1924)

at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:748)

2018-12-06 14:30:17,627 INFO org.apache.hive.service.cli.operation.OperationManager: [HiveServer2-Handler-Pool: Thread-54]: Closing operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=6dc999b0-deb5-43cc-9b54-a63945f81bc3]

问题

Compression codec com.hadoop.compression.lzo.LzoCodec not found

2019-01-11 16:28:21,995 ERROR [Timer-Driven Process Thread-7] o.apache.nifi.processors.hadoop.PutHDFS PutHDFS[id=10f86baf-cc95-32df-4f6a-95237b99d4ed] Failed to write to HDFS due to java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.

java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.

at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)

at org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:180)

at org.apache.nifi.processors.hadoop.AbstractHadoopProcessor.getCompressionCodec(AbstractHadoopProcessor.java:415)

解决方案:

P:Archive Originals

Additional Classpath Resources: /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar

P:Upload to HDFS

Additional Classpath Resources: /usr/lib/hadoop-lzo/lib/hadoop-lzo.jar

Transform Exception An error occurred while initializing the Spark Shell.

解决方案:重启kylo后,在UI上点击Visual Query 后等着,耐心等待。。。直到kylo-spark-shell.log文件中看到如下日志内容后在去查询

...Successfully registered client.

...

INFO SparkShellApp: Started SparkShellApp in 156.081 seconds (JVM running for 156.983)

java数据流编辑 kylo,kylo问题总结1相关推荐

  1. 使用GZIP和Zip压缩Java数据流

    转载自   使用GZIP和Zip压缩Java数据流 本文通过对数据压缩算法的简要介绍,然后以详细的示例演示了利用java.util.zip包实现数据的压缩与解压,并扩展到在网络传输方面如何应用java ...

  2. Java jdt 编辑_JDT入门

    1.打开Java类型 要打开一个Java类或Java接口以进行编辑,可以执行以下操作之一: 在编辑器中所显示的源代码里选择所要编辑的Java类或Java接口的名字(或者简单地将插入光标定位到所要编辑的 ...

  3. [剑指offer]面试题第[41]题[Leetcode][第235题][JAVA][数据流中的中位数][优先队列][堆]

    [问题描述][困难] [解答思路] 1. 思路1 时间复杂度:O(logN) 空间复杂度:O(N) import java.util.PriorityQueue;public class Median ...

  4. java界面编辑教程_java程序设计基础教程第六章图形用户界面编辑.docx

    java程序设计基础教程第六章图形用户界面编辑.docx 还剩 27页未读, 继续阅读 下载文档到电脑,马上远离加班熬夜! 亲,很抱歉,此页已超出免费预览范围啦! 如果喜欢就下载吧,价低环保! 内容要 ...

  5. java word编辑_java实现word在线编辑及流转

    [实例简介] java开发web办公系统,调用PageOffice组件实现word在线编辑及流转 [实例截图] [核心代码] worddemo ├── worddemo │   ├── css │   ...

  6. java 协作编辑,在线协作编辑器之周报收集

    在线协作编辑器之周报收集 一.实验说明 下述介绍为实验楼默认环境,如果您使用的是定制环境,请修改成您自己的环境介绍. 1. 环境登录 无需密码自动登录,系统用户名shiyanlou 2. 环境介绍 本 ...

  7. java jtable 编辑_JTable可编辑

    /** * * Title:[FileFieldEditor] * Description: [JTable可编辑] * Copyright 2009 Upengs Co., Ltd. * All r ...

  8. Java数据流和打印流

    数据流 DataInputStream和DataOutputStream两个类创建的对象分别被称为数据输入流和数据输出流.这是 很有用的两个流,它们允许程序按与机器无关的风格读写Java数据.所以比较 ...

  9. java excel 编辑_Java 创建、编辑和删除Excel迷你图表

    在Excel中,迷你图表是指在单元格中表示数据的微型图表.用其可以清晰简明地表现出相邻数据的变化趋势,同时也不会占用大量空间.根据图表形式的不同,迷你图表可分为折线迷你图.柱状迷你图及盈亏迷你图.本文 ...

  10. Java 创建/编辑/删除Excel迷你图表

    迷你图是Excel工作表单元格中表示数据的微型图表.使用迷你图可以非常直观的显示数据变化趋势,突出最大值.最小值,放在数据表格中可起到很好的数据分析效果.本文将通过Java代码示例介绍如何在Excel ...

最新文章

  1. Android属性动画 ValueAnimator
  2. c语言设计成行考核答案,20秋广东开放大学C语言程序设计成性考核参考答案(10页)-原创力文档...
  3. HTML中直接写js 函数
  4. Python:IndentationError: unexpected indent
  5. 魔方阵(奇数,单偶,双偶)
  6. tensor数据类型转换_PyTorch的tensor数据类型及其相关转换
  7. 启动rrt什么意思_面试官:你来说一下springboot启动时的一个自动装配过程吧!...
  8. 谈谈BFC与ie特有属性hasLayout
  9. OpenGL编程指南(第8版)PDF
  10. QT控件之QComboBox(下拉框相关)
  11. 如何不授权情况下获取自己微信openid/傻瓜式获取
  12. 微信小程序直接打开第三方app,如何实现?
  13. 管理培训决定企业生死的5个层面
  14. 最新版腾讯防水墙(二代)识别
  15. USB3014-应用程序开发
  16. STM32粗略延时,大致精确
  17. [Java]使用ArrayList类来把54张扑克牌发给3位玩家各17张,剩下3张是底牌
  18. 杭电LCY-ACM算法入门习题(01-04)
  19. 短信验证码安全常见逻辑漏洞
  20. 李开复解密微软成功之道 盖茨鲍尔默好搭档(zz)

热门文章

  1. 新推多多旅行搅局在线旅游市场,拼多多有多少胜算?
  2. 最全-最详细的进制转换
  3. bugku-pokergame
  4. python实现键盘自动输入_如何使用Python实现自动化点击鼠标和操作键盘?
  5. for linux shell 菜鸟_Linux 命令大全 | 菜鸟教程
  6. 信息系统项目管理师计算题(进度管理总浮动时间、自由浮动时间、工期)
  7. SDK开发技术规范总结
  8. 【Matlab土壤分类】多类SVM土壤分类【含GUI源码 1398期】
  9. <笠翁对韵>全文及译文(上卷)
  10. 边缘计算(二)边缘计算与智能制造