目录

  • Hive 连接 HBase

    • 问题
    • 解决方法
    • Hive中table的定义
    • One more thing 关于Hive的关闭
    • 参考链接

Hive 连接 HBase

我的版本是:

HADOOP 2.4.1
HBase 0.98.6.1
Hive 0.13.1

关于 HBase 0.98.6.1
我好像还是没有完全正确安装HBase,0.98.6.1对应的Hadoop版本是2.2,我这里面用的2.4.1。
使用的过程中,会遇到各种问题,比如在用importtsv向HBase里面导入数据的时候,会报错。暂时的解决方法是,用Hadoop2.4.1的jar包直接替换掉HBase里面的hadoop开头的2.2的jar包。运行以后没有报错。

问题

首先在Hbase里面先创建一个table

$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.92.0, r1231986, Mon Jan 16 13:16:35 UTC 2012
hbase(main):001:0>hbase(main):001:0> create 'bar', 'cf'
0 row(s) in 0.1200 seconds
hbase(main):002:0>

然后使用Hive连接HBase中的这个表,使用Hive的HBaseStorageHandler,DDL语句如下:

hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');

出现了如下错误:

14/10/24 19:31:43 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect.  Use hive.hmshandler.retry.* insteadFAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:java.io.IOException: Attempt to start meta tracker failed.at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:201)at org.apache.hadoop.hbase.client.HBaseAdmin.getCatalogTracker(HBaseAdmin.java:230)at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:277)at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:293)at org.apache.hadoop.hive.hbase.HBaseStorageHandler.preCreateTable(HBaseStorageHandler.java:162)at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:554)at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)at java.lang.reflect.Method.invoke(Method.java:597)at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)at com.sun.proxy.$Proxy9.createTable(Unknown Source)at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613)at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:792)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)at java.lang.reflect.Method.invoke(Method.java:597)at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/meta-region-serverat org.apache.zookeeper.KeeperException.create(KeeperException.java:99)at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:425)at org.apache.hadoop.hbase.zookeeper.ZooKeeperNodeTracker.start(ZooKeeperNodeTracker.java:77)at org.apache.hadoop.hbase.catalog.CatalogTracker.start(CatalogTracker.java:197)... 33 more

找了好久终于找到了解决办法


解决方法

HBaseIntegration使用的是 hive-hbase-handler-x.y.z.jar模块。
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration

The handler requires Hadoop 0.20 or higher, and has only been tested with dependency versions hadoop-0.20.x, hbase-0.92.0 and zookeeper-3.3.4. If you are not using hbase-0.92.0, you will need to rebuild the handler with the HBase jar matching your version, and change the --auxpath above accordingly. Failure to use matching versions will lead to misleading connection failures such as MasterNotRunningException since the HBase RPC protocol changes often.

使用这个HBaseStorageHandler需要用到一些jar包,需要使用--auxpath来指定相对路径。但是cwiki上面说方法太复杂,使用起来容易出错。

但是在介绍 HBaseBulkLoad 的时候也用到了额外的jar包,这里面的使用方式就简单多了。
https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad

Add necessary JARs
You will need to add a couple jar files to your path. First, put them in DFS:

hadoop dfs -put /usr/lib/hive/lib/hbase-VERSION.jar /user/hive/hbase-VERSION.jar
hadoop dfs -put /usr/lib/hive/lib/hive-hbase-handler-VERSION.jar /user/hive/hive-hbase-handler-VERSION.jar

Then add them to your hive-site.xml:

<property><name>hive.aux.jars.path</name><value>/user/hive/hbase-VERSION.jar,/user/hive/hive-hbase-handler-VERSION.jar</value>
</property>

在hive-site.xml里面直接设置jar包路径,方便多了。
我把文件传到hdfs上面之后,添加的配置如下:

<property><name>hive.aux.jars.path</name><value>/user/hive/lib/hbase-common-0.98.6.1-hadoop2.jar,/user/hive/lib/hive-hbase-handler-0.13.1.jar,/user/hive/lib/zookeeper-3.4.6.jar</value><description>The location of the plugin jars that contain implementations of user defined functions and serdes.</description>
</property>

这样修改完成之后,再重新启动Hive

#nohup hive --service metastore > $HIVE_HOME/log/hive_metastore.log & #nohup hive --service hiveserver > $HIVE_HOME/log/hiveserver.log & #./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3 

最后一步#./hive -hiveconf hbase.zookeeper.quorum=slave1,slave2,slave3一定不能少了,这是启动成功的关键。

关于最后一句的作用,参考大神的原话:

You need to tell Hive where to find the zookeepers quorum which would elect the HBase master

现在重新在Hive的shell中执行:

hive>CREATE EXTERNAL TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('hbase.columns.mapping' = ':key,cf:c1,cf:c2') TBLPROPERTIES ('hbase.table.name' = 'bar');

不报错,成功添加外部表!


Hive中table的定义

Hive 相关概念:

【受管理的表】A managed table is one for which the definition is primarily managed in Hive's metastore, and for whose data storage Hive is responsible.
【外部表】An external table is one whose definition is managed in some external catalog, and whose data Hive does not own (i.e. it will not be deleted when the table is dropped).
【内部表】native
【外部表】non-native

These two distinctions (managed vs. external and native vs non-native) are orthogonal(正交).
Hence, there are four possibilities for base tables:

  • managed native: what you get by default with CREATE TABLE
  • external native: what you get with CREATE EXTERNAL TABLE when no STORED BY clause is specified
  • managed non-native: what you get with CREATE TABLE when a STORED BY clause is specified; Hive stores the definition in its metastore, but does not create any files itself; instead, it calls the storage handler with a request to create a corresponding object structure
  • external non-native: what you get with CREATE EXTERNAL TABLE when a STORED BY clause is specified; Hive registers the definition in its metastore and calls the storage handler to check that it matches the primary definition in the other system

One more thing 关于Hive的关闭

Hive好像没有指定关闭的脚本。我暂时的用的方法是,找出Hive的pid(两个东西),然后直接kill...简单粗暴啊。

# netstat -lnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address               Foreign Address             State       PID/Program name
tcp        0      0 0.0.0.0:10000               0.0.0.0:*                   LISTEN      21415/java
tcp        0      0 0.0.0.0:50070               0.0.0.0:*                   LISTEN      12601/java
tcp        0      0 0.0.0.0:22                  0.0.0.0:*                   LISTEN      884/sshd
tcp        0      0 127.0.0.1:25                0.0.0.0:*                   LISTEN      960/master
tcp        0      0 0.0.0.0:9083                0.0.0.0:*                   LISTEN      21100/java
tcp        0      0 192.168.129.63:9000         0.0.0.0:*                   LISTEN      12601/java
tcp        0      0 192.168.129.63:9001         0.0.0.0:*                   LISTEN      12783/java
tcp        0      0 :::22                       :::*                        LISTEN      884/sshd
tcp        0      0 ::ffff:192.168.129.63:8088  :::*                        LISTEN      12939/java
tcp        0      0 ::1:25                      :::*                        LISTEN      960/master
tcp        0      0 ::ffff:192.168.129.63:8030  :::*                        LISTEN      12939/java
tcp        0      0 ::ffff:192.168.129.63:8031  :::*                        LISTEN      12939/java
tcp        0      0 ::ffff:192.168.129.63:60000 :::*                        LISTEN      20610/java
tcp        0      0 ::ffff:192.168.129.63:8032  :::*                        LISTEN      12939/java
tcp        0      0 ::ffff:192.168.129.63:8033  :::*                        LISTEN      12939/java
tcp        0      0 :::60010                    :::*                        LISTEN      20610/java
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     8318   1/init              @/com/ubuntu/upstart
unix  2      [ ACC ]     STREAM     LISTENING     10389  850/dbus-daemon     /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     10698  960/master          public/cleanup
unix  2      [ ACC ]     STREAM     LISTENING     10705  960/master          private/tlsmgr
unix  2      [ ACC ]     STREAM     LISTENING     10709  960/master          private/rewrite
unix  2      [ ACC ]     STREAM     LISTENING     10713  960/master          private/bounce
unix  2      [ ACC ]     STREAM     LISTENING     10717  960/master          private/defer
unix  2      [ ACC ]     STREAM     LISTENING     10721  960/master          private/trace
unix  2      [ ACC ]     STREAM     LISTENING     10725  960/master          private/verify
unix  2      [ ACC ]     STREAM     LISTENING     10729  960/master          public/flush
unix  2      [ ACC ]     STREAM     LISTENING     10733  960/master          private/proxymap
unix  2      [ ACC ]     STREAM     LISTENING     10737  960/master          private/proxywrite
unix  2      [ ACC ]     STREAM     LISTENING     10741  960/master          private/smtp
unix  2      [ ACC ]     STREAM     LISTENING     10745  960/master          private/relay
unix  2      [ ACC ]     STREAM     LISTENING     10749  960/master          public/showq
unix  2      [ ACC ]     STREAM     LISTENING     10753  960/master          private/error
unix  2      [ ACC ]     STREAM     LISTENING     10757  960/master          private/retry
unix  2      [ ACC ]     STREAM     LISTENING     10761  960/master          private/discard
unix  2      [ ACC ]     STREAM     LISTENING     10765  960/master          private/local
unix  2      [ ACC ]     STREAM     LISTENING     10769  960/master          private/virtual
unix  2      [ ACC ]     STREAM     LISTENING     10773  960/master          private/lmtp
unix  2      [ ACC ]     STREAM     LISTENING     10777  960/master          private/anvil
unix  2      [ ACC ]     STREAM     LISTENING     10781  960/master          private/scache
#kill -9 21110
#kill -9 21415

参考链接

https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
https://cwiki.apache.org/confluence/display/Hive/StorageHandlers
http://stackoverflow.com/questions/23658600/error-while-creating-an-hive-table-on-top-of-an-hbase-table

转载于:https://www.cnblogs.com/DamianZhou/p/4049281.html

Hive 连接 HBase 错误处理相关推荐

  1. JDBC——java连接mysql、hive、hbase教程

    JDBC模板 一.准备工作 1.1.创建Maven工程 1.2.修改pom文件 1.3.修改Project Structure 1.4.修改Settings 1.5.资源文件夹 二.Java连接mys ...

  2. Hive与Hbase结合使用

    hive的启动需要使用到zookeeper, 所以, 要么自己搭建zookeeper, 要么跟其它东西一起使用, 我这里做的是跟hbase一起使用的zookeeper, 因为hbase自带zookee ...

  3. Hive 整合Hbase(来自学习资料--博学谷)

    1.摘要 Hive提供了与HBase的集成,使得能够在HBase表上使用HQL语句进行查询 插入操作以及进行Join和Union等复杂查询.同时也可以将hive表中的数据映射到Hbase中. 2.应用 ...

  4. hive 集成hbase 笔记

    Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,并提供完整的sql查询功能,可以将sql语句转换为 MapReduce任务进行运行.其优点学习成本低,可以通过类 ...

  5. 【华为云技术分享】如何整合hive和hbase

    引言  为什么要集成Hive和HBase Hive和Hbase在大数据架构中处在不同位置,Hive是一个构建在Hadoop基础之上的数据仓库,主要解决分布式存储的大数据处理和计算问题,Hive提供了类 ...

  6. Python使用HappyBase连接Hbase与基本操作

    hbase基本概念可以参考:https://www.cnblogs.com/swordfall/p/8737328.html 文章目录 安装Hbase Python连接Hbase 1.创建.删除表结构 ...

  7. Hadoop Hive与Hbase关系 整合

    用hbase做数据库,但因为hbase没有类sql查询方式,所以操作和计算数据很不方便,于是整合hive,让hive支撑在hbase数据库层面 的 hql查询.hive也即 做数据仓库 1. 基于Ha ...

  8. Hive vs HBase (配合使用才是最佳方案)

    前言 Hive和HBase是Hadoop生态圈两个非常重要的技术,但对于初学者很容易进行混淆,因此这里做一个简单的总结,希望能帮助大家缕出一条比较清晰的思路 一句话概括 Hive是数据仓库,用来分析H ...

  9. Hive on Hbase

    Hive on Hbase Hive on Hbase介绍 Hive on Hbase 配置 Hive on Hbase 实现 Hive on Hbase介绍 功能:实现Hive与Hbase集成,使用 ...

  10. 连接HBase的正确姿势

    连接HBase的正确姿势 在云HBase值班的时候,经常会遇见有用户咨询诸如"HBase是否支持连接池?"这样的问题,也有用户因为应用中创建的Connection对象过多,触发Zo ...

最新文章

  1. 用c++写一个猜数字的小游戏
  2. MySQL优化篇:explain性能分析
  3. 刚毕业就年薪百万!华为给予八名博士高薪惹争议:值这么多钱吗
  4. Leetcode 117. 填充每个节点的下一个右侧节点指针 II 解题思路及C++实现
  5. xcode 8   去除无用打印信息
  6. 迫零响应法用于MIMO系统
  7. Set集合以及HashSet、LinkedHashSet、TreeSet等讲解
  8. 不要在意七十亿分之一对另七十亿分之一的看法
  9. 长春java开发能开多少钱,从理论到实践!
  10. JavaScript 按对象的属性排序方法(升序和降序)
  11. 15.软件架构设计:大型网站技术架构与业务架构融合之道 --- 技术架构与业务架构的融合
  12. 香港科技大学工学院理学硕士土木基建工程及管理(MSc CIEM)2022Fall宣讲会(线上)
  13. php程序的完整路径和文件名,php从完整文件路径中分离文件目录和文件名的方法...
  14. spring html导出excel文件,Spring 导出 Excel-Fun言
  15. Java成神之路(持续更新)
  16. 计算机系统原理实验之BombLab二进制炸弹1、2关
  17. (转)11gR2新特性:Heavy swapping observed on system in last 5 mins
  18. 关于2018后新款 Mac增加T2安全芯片造成无法U盘启动解决办法
  19. Webstrom取消下划线
  20. 不仅是工程学!人类认知偏差导致的12个AI研究盲区

热门文章

  1. 快切-开源中文css框架之纯css透明
  2. win2003server域控服务器安装及设置
  3. 服务器可以显示的血量显示,魔兽世界怀旧服怪物如何显示血量
  4. k means sse python_python实现kMeans算法
  5. nginx master-worker进程间通信
  6. Jenkins系列一Linux环境安装Jenkins
  7. layui 滚动加载与ajax,909422229_layUi关于ajax与loading问题
  8. 网易云音乐html代码_网易云音乐歌单列表导出方法!
  9. SpringMVC国际化(i18n)(五)
  10. 【渝粤教育】国家开放大学2019年春季 2321物流学概论 参考试题