hbase hive java_hive 与 hbase 结合

一、hive与hbase的结合

Hive会经常和Hbase结合使用，把Hbase作为Hive的存储路径，所以Hive整合Hbase尤其重要。使用Hive读取Hbase中的数据，可以使用HQL语句在HBase表上进行查询、插入操作；甚至是进行Join和Union等复杂查询。此功能是从Hive 0.6.0开始引入的。Hive与HBase整合的实现是利用两者本身对外的API接口互相进行通信，相互通信主要是依靠hive-hbase-handler-*.jar工具里面的类实现的。使用Hive操作HBase中的表，只是提供了便捷性，hiveQL引擎使用的是MapReduce，对于性能上，表现不尽人意。

步骤：

1、将hbase相关jar包复制到hive/lib下，操作如下：

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-protocol-1.4.5.jar /opt/hive/lib/

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-server-1.4.5.jar /opt/hive/lib/

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-client-1.4.5.jar /opt/hive/lib/

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5.jar /opt/hive/lib/

[hadoop@bus-stable hive]$ cp /opt/hbase/lib/hbase-common-1.4.5-tests.jar /opt/hive/lib/

[hadoop@bus-stable hive]$

2、在hive-site.xml文件中引用hbase，添加如下内容：

[hadoop@bus-stable hive]$ vim /opt/hive/conf/hive-site.xml

hive.aux.jars.path

file:///opt/hive/lib/hive-hbase-handler-2.3.3.jar,

file:///opt/hive/lib/hbase-protocol-1.4.5.jar,

file:///opt/hive/lib/hbase-server-1.4.5.jar,

file:///opt/hive/lib/hbase-client-1.4.5.jar,

file:///opt/hive/lib/hbase-common-1.4.5.jar,

file:///opt/hive/lib/hbase-common-1.4.5-tests.jar,

file:///opt/hive/lib/zookeeper-3.4.6.jar,

file:///opt/hive/lib/guava-14.0.1.jar

The location of the plugin jars that contain implementations of user defined functions and serdes.

hbase.zookeeper.quorum

open-stable,permission-stable,sp-stable

dfs.permissions.enabled

false

3、启动hive:

[hadoop@bus-stable hive]$ hive -hiveconf hbase.master=oversea-stable:60000

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.3-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in jar:file:/opt/apache-hive-2.3.3-bin/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true

Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

hive> create table htest(key int,value string) stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties('hbase.columns.mapping'=':key,f:value') tblproperties('hbase.table.name'='htest');

Time taken: 9.376 seconds

hive> show databases;

default

inspiry

Time taken: 0.121 seconds, Fetched: 2 row(s)

hive> show tables;

htest

Time taken: 0.047 seconds, Fetched: 1 row(s)

hive> select * from htest;

Time taken: 1.967 seconds

hive>

4、在hbase中验证数据：

[hadoop@oversea-stable opt]$ hbase shell

SLF4J: Class path contains multiple SLF4J bindings.

SLF4J: Found binding in [jar:file:/opt/hbase-1.4.5/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

HBase Shell

Use "help" to get list of supported commands.

Use "exit" to quit this interactive shell.

Version 1.4.5, rca99a9466415dc4cfc095df33efb45cb82fe5480, Wed Jun 13 15:13:00 EDT 2018

hbase(main):001:0> list

TABLE

htest

1 row(s) in 0.2970 seconds

=> ["htest"]

hbase(main):002:0> scan "htest"

ROW COLUMN+CELL

0 row(s) in 0.1410 seconds

hbase(main):003:0>

二、导入外部数据

(1) 数据文件如下：

[hadoop@bus-stable ~]$ cat score.csv

hive,85

hbase,90

hadoop,92

flume,89

kafka,95

spark,80

storm,70

[hadoop@bus-stable ~]$ hadoop fs -put score.csv /data/score.csv

[hadoop@bus-stable ~]$ hadoop fs -ls /data/

Found 2 items

-rw-r--r-- 3 hadoop supergroup 88822 2018-06-15 10:32 /data/notepad.txt

-rw-r--r-- 3 hadoop supergroup 70 2018-06-26 15:59 /data/score.csv

[hadoop@bus-stable ~]$

(2) 创建外部表

利用hdfs上的现有数据，创建hive外部表

hive> create external table if not exists course.testcourse(cname string,score int) row format delimited fields terminated by ',' stored as textfile location '/data';

Time taken: 0.282 seconds

hive> show databases;

course

default

inspiry

Time taken: 0.013 seconds, Fetched: 3 row(s)

hive> use course;

Time taken: 0.021 seconds

hive> show tables;

testcourse

Time taken: 0.036 seconds, Fetched: 1 row(s)

hive> select * from testcourse ;

hive 85

hbase 90

hadoop 92

flume 89

kafka 95

spark 80

storm 70

Time taken: 2.272 seconds, Fetched: 7 row(s)

hive>

三、利用HQL语句创建hbase 表

使用HQL语句创建一个指向HBase的Hive表，语法如下：

CREATE TABLE tbl_name(key int, value string) //Hive中的表名tbl_name

STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' //指定存储处理器

WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") //声明列族，列名

TBLPROPERTIES ("hbase.table.name" = "tbl_name", "hbase.mapred.output.outputtable" = "iteblog"); //hbase.table.name 声明HBase表名，为可选属性默认与Hive的表名相同， hbase.mapred.output.outputtable 指定插入数据时写入的表，如果以后需要往该表插入数据就需要指定该值

(1) 创建语句如下

hive> create table course.hbase_testcourse(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score")TBLPROPERTIES("hbase.table.name" = "hbase_testcourse","hbase.mapred.output.outputtable" = "hbase_testcourse");

Time taken: 3.745 seconds

hive> show databases;

course

default

inspiry

Time taken: 0.019 seconds, Fetched: 3 row(s)

hive> use course;

Time taken: 0.02 seconds

hive> show tables;

hbase_testcourse

testcourse

Time taken: 0.025 seconds, Fetched: 2 row(s)

hive> select * from hbase_testcourse;

Time taken: 1.883 seconds

hive>

(2) 创建完内部表，可以通过Hive支持的insert overwrite 方式将一个表的数据导入 HBase

hive> insert overwrite table course.hbase_testcourse select cname,score from course.testcourse;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Query ID = hadoop_20180626170540_c7eecb8d-2925-4ad2-be7f-237d9815d1cb

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1529932626564_0002, Tracking URL = http://oversea-stable:8088/proxy/application_1529932626564_0002/

Kill Command = /opt/hadoop/bin/hadoop job -kill job_1529932626564_0002

Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0

2018-06-26 17:06:02,793 Stage-3 map = 0%, reduce = 0%

2018-06-26 17:06:14,126 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.12 sec

MapReduce Total cumulative CPU time: 6 seconds 120 msec

Ended Job = job_1529932626564_0002

MapReduce Jobs Launched:

Stage-Stage-3: Map: 1 Cumulative CPU: 6.12 sec HDFS Read: 4224 HDFS Write: 0 SUCCESS

Total MapReduce CPU Time Spent: 6 seconds 120 msec

Time taken: 41.489 seconds

hive>

hive> select * from hbase_testcourse;

flume 89

hadoop 92

hbase 90

hive 85

kafka 95

spark 80

storm 70

Time taken: 0.201 seconds, Fetched: 7 row(s)

hive>

(3) 验证hbase

hbase(main):011:0> list

TABLE

hbase_testcourse

htest

2 row(s) in 0.0110 seconds

=> ["hbase_testcourse", "htest"]

hbase(main):012:0> scan "hbase_testcourse"

ROW COLUMN+CELL

flume column=cf:score, timestamp=1530003973026, value=89

hadoop column=cf:score, timestamp=1530003973026, value=92

hbase column=cf:score, timestamp=1530003973026, value=90

hive column=cf:score, timestamp=1530003973026, value=85

kafka column=cf:score, timestamp=1530003973026, value=95

spark column=cf:score, timestamp=1530003973026, value=80

storm column=cf:score, timestamp=1530003973026, value=70

7 row(s) in 0.0760 seconds

hbase(main):013:0>

四、使用Hive映射HBase中已经存在的表

(1) 在hbase中创建HBase表，进入HBase Shell客户端执行建表命令

hbase(main):036:0> create 'hbase_test',{ NAME => 'cf'}

0 row(s) in 2.2830 seconds

=> Hbase::Table - hbase_test

(2) 插入数据

hbase(main):037:0> put 'hbase_test','hadoop','cf:score', '95'

0 row(s) in 0.1110 seconds

hbase(main):038:0> put 'hbase_test','storm','cf:score', '96'

0 row(s) in 0.0120 seconds

hbase(main):039:0> put 'hbase_test','spark','cf:score', '97'

0 row(s) in 0.0110 seconds

(3) 查看数据

hbase(main):041:0> scan "hbase_test"

ROW COLUMN+CELL

hadoop column=cf:score, timestamp=1530004351399, value=95

spark column=cf:score, timestamp=1530004365368, value=97

storm column=cf:score, timestamp=1530004359169, value=96

3 row(s) in 0.0220 seconds

hbase(main):042:0>

(4) 进入Hive Shell 客户端，创建外部表course.hbase_test，建表命令如下所示

hive> create external table course.hbase_test(cname string,score int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,cf:score") TBLPROPERTIES("hbase.table.name" = "hbase_test", "hbase.mapred.output.outputtable" = "hbase_test");

Time taken: 0.221 seconds

hive> show tables;

hbase_test

hbase_testcourse

testcourse

Time taken: 0.024 seconds, Fetched: 3 row(s)

备注：创建外部表和创建内部表的命令基本一致，唯一的区别就是：创建内部表使用create table，创建外部表使用create external table。

Hive 查看数据

hive> select * from hbase_test;

hadoop 95

spark 97

storm 96

Time taken: 0.22 seconds, Fetched: 3 row(s)

hive>

该Hive表一个外部表，所以删除该表并不会删除HBase表中的数据，有几点需要注意的是：

a)、建表或映射表的时候如果没有指定:key则第一个列默认就是行键

b)、HBase对应的Hive表中没有时间戳概念，默认返回的就是最新版本的值

c)、由于HBase中没有数据类型信息，所以在存储数据的时候都转化为String类型

五、使用java连接hive操作hbase

pom.xml

xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

4.0.0

cn.itcast.hbase

hbase

0.0.1-SNAPSHOT

org.apache.hadoop

hadoop-client

2.6.4

org.apache.hadoop

hadoop-common

2.6.4

junit

4.12

org.apache.hbase

hbase-client

1.4.0

org.apache.hbase

hbase-server

1.4.0

org.apache.hive

hive-jdbc

1.2.1

org.apache.hive

hive-metastore

1.2.1

org.apache.hive

hive-exec

1.2.1

Hive_Hbase.java

package cn.itcast.bigdata.hbase;

import java.sql.Connection;

import java.sql.DriverManager;

import java.sql.ResultSet;

import java.sql.SQLException;

import java.sql.Statement;

public class Hive_Hbase {

public static void main(String[] args) {

try {

Class.forName("org.apache.hive.jdbc.HiveDriver");

Connection connection = DriverManager.getConnection("jdbc:hive2://hadoop1:10000/shizhan02","hadoop","");

Statement statement = connection.createStatement();

String sql = "SELECT * FROM hive_hbase_table_kv";

ResultSet res = statement.executeQuery(sql);

while (res.next()) {

System.out.println(res.getString(2));

}

} catch (ClassNotFoundException | SQLException e) {

// TODO Auto-generated catch block

e.printStackTrace();

}

hbase hive java_hive 与 hbase 结合相关推荐

Hadoop核心架构HDFS+MapReduce+Hbase+Hive内部机理详解
编者按:HDFS和MapReduce是Hadoop的两大核心,除此之外Hbase.Hive这两个核心工具也随着Hadoop发展变得越来越重要.本文作者张震的博文<Thinking in BigD ...
创建Hbase Hive外部表报错： Unable to determine ZooKeeper ensemble
创建HBase的Hive外部表 1: create external table ttt(rowkey string,info map<string,string>)STORED BY ' ...
Hadoop hive sqoop zookeeper hbase生产环境日志统计应用案例（hadoop篇）
Hadoop集群安装部署与配置(2015-01-15) 1.集群环境说明主机列表说明: 下文中蓝色部分为实际的执行命令:红色部分是重要的配置信息:"##"后为注释 a. 由于h ...
Hadoop的学习笔记（Hive|pig|zookeeper|hbase）
轉載的,此筆記的鏈接地址請點擊此處 hadoop笔记本 <div class="postText"><div id="cnblogs_post_body ...
hive创建映射hbase外表
HBase建表: create 'ods_member',{NAME => 'cf', VERSIONS => 1} Hive建表: CREATE EXTERNAL TABLE `ods_ ...
HBase项目实战：HBase+Flume+Kafka+Hive+SSM实现电信大数据通话信息实时读写定位系统
内容简介一.项目内容深度分析 1. 项目内容概览 2.数据的大致流向分析 3. 涉及的知识难点分析二.项目所用到的框架清单三.项目实战代码 1. 后端开发 1. 构建工程项目模块 2.开发通话记 ...
企业级数据仓库:数据仓库概述;核心技术框架,数仓理论,数据通道Hive技术框架,HBase设计,系统调度,关系模式范式,ER图,维度建模,星型/雪花/星座模式,数据采集同步,业务数据埋点,数据仓库规范
文章目录第一章数据仓库概述 1.1 数据仓库简介 1.1.2 什么是数据仓库? 1.1.3 OLTP 与 OLAP 1.2 数据仓库技术架构 1.3 课程目标第二章核心技术框架 2.1 数据仓 ...
大数据基础（二）hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装和sqoop与hdfs,hive,mysql导入导出
hadoop, mave, hbase, hive, sqoop在ubuntu 14.04.04下的安装 2016.05.15 本文测试环境: hadoop2.6.2 ubuntu 14.04.04 ...
Thinking in BigData（八）大数据Hadoop核心架构HDFS+MapReduce+Hbase+Hive内部机理详解
纯干货:Hadoop核心架构HDFS+MapReduce+Hbase+Hive内部机理详解. 通过这一阶段的调研总结,从内部机理的角度详细分析,HDFS.MapReduce.Hbase.H ...

hbase hive java_hive 与 hbase 结合

hbase hive java_hive 与 hbase 结合相关推荐

最新文章

热门文章