HBase详细的安装和使用方法

简介

HBase的原型是Google的BigTable论文，受到了该论文思想的启发，目前作为Hadoop的子项目来开发维护，用于支持结构化的数据存储。

官方网站：http://hbase.apache.org

民间中文文档：https://hbase.apachecn.org/#/

2006年Google发表BigTable白皮书

2006年开始开发HBase

2008年北京成功开奥运会，程序员默默地将HBase弄成了Hadoop的子项目

2010年HBase成为Apache顶级项目

现在很多公司基于HBase开发出了定制版，比如阿里云HBase

总结：

HBase是构建在HDFS之上的分布式、【面向列】的存储系统，在需要实时读写、随机访问的超大规模数据集是，可以使用HBase。

为什么需要HBase

# 海量数据存储一个表百亿行 百万列;(MySQL实战最大值500万行，30列)
# 实时查询1秒内查询得到结果。

HBase特点

# 1. 容量大HBase单表百亿行，百万列。
# 2. 面向列HBase存储是面向列，可以再数据存在以后动态增加新列和数据，并支持列数据的独立操作。
# 3. 多版本HBase每个数据，可以同时保存多个版本，按照时间去标记。
# 4. 稀疏性HBase每条数据的增删，并不是要操作所有的列，的列可以动态增加，可以存在大量空白单元格，不会占用磁盘空间，这对于海量数据来讲，非常重要。
# 5. 扩展性底层使用HDFS，存储能力可以横向扩展。
# 6. 高可靠性底层使用HDFS，拥有replication的数据高可靠性。
# 7. 高性能表数据达到一定规模，"自动分区"，具备主键索引，缓存机制，使得HBase海量数据查询能达到毫秒级。

HBase和RDBMS对比

HBase	关系型数据库
数据库以`region`的形式存在	数据库以Table的形式存在
使用`行键`（row key）	支持主键PK
使用行表示一条数据	一条数据用row代表
使用列 column、`列族 column family`	column代表列数据的含义
使用`HBase shell`命令操作数据	使用SQL操作数据
数据文件可以基于HDFS，是分布式文件系统，可以任意扩展，数据总量取决于服务器数量	数据总量依赖于单体服务器的配置
不支持事务、不支持ACID	支持事务和ACID
不支持表连接	支持join表连接

HBase表逻辑结构

数据相关概念

# namespace 命名空间hbase管理表的结构，在HDFS中对应一个文件夹。
# table 表hbase管理数据的结构，在HDFS中对应一个文件。
# column family 列族表中数据的列，要属于某个列族，所有的列的访问格式(列族:列名)
# rowkey 主键用来标记和检索数据的主键key。
# cell 单元格由`row key+column family+column+version` 唯一确定的一条数据
# timestamp 时间戳时间戳，每个单元格可以保存多个值，每个值有对应的时间戳，每个cell中，不同版本的数据倒叙排序，排在最前面的是最新数据。

HBase单机版安装

下载

地址：http://archive.apache.org/dist/hbase/

准备

安装并配置hadoop

[root@hadoop10 installs]# jps
3440 Jps
3329 SecondaryNameNode
3030 NameNode
3134 DataNode

安装并配置zookeeper

 [root@hadoop10 installs]# jps3329 SecondaryNameNode3509 QuorumPeerMain3030 NameNode3595 Jps3134 DataNode[root@hadoop10 installs]# zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: /opt/installs/zookeeper3.4.14/bin/../conf/zoo.cfgMode: standalone

设置好日期同步

 # 查看linux系统时间[root@hadoop10 installs]# date# 重启chronyd服务，同步系统时间。[root@hadoop10 installs]# systemctl restart chronyd[root@hadoop10 installs]# date2020年 04月 12日 星期日 22:51:31 CST

安装

# 1. 安装hbase

1. 解压HBase[root@hadoop30 modules]# tar zxvf hbase-1.2.4-bin.tar.gz -C /opt/installs/
2. 配置环境变量#JAVAexport JAVA_HOME=/opt/installs/jdk1.8export PATH=$PATH:$JAVA_HOME/bin# HADOOPexport HADOOP_HOME=/opt/installs/hadoop2.9.2/export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin# zookeeperexport PATH=$PATH:/opt/installs/zookeeper3.4.14/bin/# HBaseexport HBASE_HOME=/opt/installs/hbase-1.2.4/export PATH=$PATH:$HBASE_HOME/bin
3. 加载profile配置source /etc/profile

# 2. 初始化配置文件

# 1 -------------------hbase-env.sh--------------------# 配置Java_home
export JAVA_HOME=/opt/installs/jdk1.8# 注释掉如下2行。
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"# 禁用内置zookeeper
export HBASE_MANAGES_ZK=false

# 2. -------------------hbase-site.xml-------------------------
<configuration><!-- hbase的入口，ns HaHadoop的虚拟命名空间 --><property><name>hbase.rootdir</name><value>hdfs://hadoop10:9000/hbase</value></property><!-- 使用伪分布式模式 --><property><name>hbase.cluster.distributed</name><value>true</value></property><!-- zookeeper集群地址，端口默认2181不需要指定 --><property><name>hbase.zookeeper.quorum</name><value>hadoop10</value></property></configuration>

#  -------------------配置regionservers（regionserver所在节点的ip） -------------------
hadoop10

# 3. 启动hbase
启动顺序：
1. 启动zookeeper
2. 启动hdfs
3. 启动hbase关闭顺序:
1.关闭hbase
2.关闭hdfs
3.关闭zk

# hbase启动方式一
1. 启动hbasestart-hbase.sh
2. 关闭hbasestop-hbase.sh# hbase启动方式二
1. 启动HMaster
[root@hadoop10 installs]# hbase-daemon.sh start master
# 关闭
[root@hadoop10 installs]# hbase-daemon.sh stop master
2. 启动HRegionServer
[root@hadoop10 installs]# hbase-daemon.sh start regionserver
# 关闭
[root@hadoop10 installs]# hbase-daemon.sh stop master

# 4. 验证访问
1. java进程查看
[root@hadoop10 installs]# jps
4688 NameNode
5618 HMaster
5730 HRegionServer
4819 DataNode
3509 QuorumPeerMain
6150 Jps
4984 SecondaryNameNode
2. HMaster WebUI查看
http://ip:16010
3. 进入客户端
hbase shell
hbase(main):001:0>

HBase 命令

1. 客户端进出命令

# 进入客户端：./hbase shell
# 退出客户端命令：quit
# 帮助help

2. namespace操作

默认存在一个default的namespace

#1. 查看namespacelist_namespace#2. 创建namespacecreate_namespace "命名空间名字"#3. 删除namespacedrop_namespace "命令空间名字"

3. 表操作

# 1. 查看所有表
hbase(main):024:0> list
TABLE
baizhins:t_person # namespace:表
t_user # default:表 default被省略了
2 row(s) in 0.1140 seconds# 2. 查看某个namespace下的所有表
hbase(main):027:0> list_namespace_tables "baizhins"
TABLE
t_person
1 row(s) in 0.3970 seconds# 3. 创建表
语法：create "namespace:表名","列族1","列族2"
hbase(main):023:0> create "baizhins:t_person","info","edu"
0 row(s) in 9.9000 seconds# 4. 查看表结构
hbase(main):030:0> desc "baizhins:t_person"
Table baizhins:t_person is ENABLED
baizhins:t_person
COLUMN FAMILIES DESCRIPTION
{NAME => 'edu', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE',DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =>'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
{NAME => 'info', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE'
, DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0', BLOCKCACHE =
> 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
2 row(s) in 1.6400 seconds# 5. 删除表和禁用表
hbase(main):002:0> disable "namespace:表"
0 row(s) in 4.4790 secondshbase(main):002:0> drop "namespace:表"
0 row(s) in 4.4790 seconds

4.数据增删改查

# 1. 添加数据(每次只能添加一个列)put "namespace:表","rowkey","列族1:列名1","值"hbase(main):007:0> put 'baizhins:t_person','1001','info:name','zhangsan'
0 row(s) in 1.7250 seconds
hbase(main):008:0> put 'baizhins:t_person','1001','info:age',20
0 row(s) in 0.0210 seconds
hbase(main):009:0> put 'baizhins:t_person','1002','info:name','lisi'
0 row(s) in 0.0190 seconds
hbase(main):010:0> put 'baizhins:t_person','1002','info:age',21
0 row(s) in 0.0620 seconds# 2. 根据rowkey查找数据get "namespace:表名","rowkey"hbase(main):015:0> get 'baizhins:t_person','1001'
COLUMN                     CELLinfo:age                  timestamp=1598752891747, value=20info:name                 timestamp=1598752881461, value=zhangsan
2 row(s) in 0.1550 seconds# 3. 根据rowkey和列族查找数据get "namespace:表名","rowkey","列族:列"# 4. scan 查询表中所有数据hbase(main):019:0> scan "baizhins:t_person"hbase(main):024:0> scan 'baizhins:t_person'
ROW                        COLUMN+CELL1001                      column=info:age, timestamp=1598753486814, value=201001                      column=info:name, timestamp=1598753478658, value=zhangsan1002                      column=info:age, timestamp=1598753520306, value=211002                      column=info:name, timestamp=1598753509800, value=lisi
2 row(s) in 0.0410 seconds# 5. scan 查询表中前2条数据hbase(main):022:0> scan "baizhins:t_person",{LIMIT=>2}# 6. 使用start row 和 end row 范围查找hbase(main):029:0> scan "baizhins:t_person",{STARTROW=>"1001",STOPROW=>"1003"}hbase(main):032:0> scan 'baizhins:t_person',{STARTROW=>'1001',STOPROW=>'1003'}
ROW                        COLUMN+CELL1001                      column=info:age, timestamp=1598753486814, value=201001                      column=info:name, timestamp=1598753478658, value=zhangsan1002                      column=info:age, timestamp=1598753520306, value=211002                      column=info:name, timestamp=1598753509800, value=lisi问题：HBase中的数据是按照Rowkey的ASCII字典顺序进行全局排序的
假如有5个Rowkey："012", "0", "123", "234", "3"，按ASCII字典排序后的结果为："0", "012", "123", "234", "3"。
Rowkey排序时会先比对两个Rowkey的第一个字节，如果相同，然后会比对第二个字节，依次类推... 对比到第X个字节时，已经超出了其中一个Rowkey的长度，短的Rowkey排在前面。# 7. 使用start row和limit查找hbase(main):032:0> scan "baizhins:t_person",{STARTROW=>"1002",LIMIT=>2}hbase(main):033:0> scan 'baizhins:t_person',{STARTROW=>'1002',LIMIT=>2}
ROW                        COLUMN+CELL1002                      column=info:age, timestamp=1598753520306, value=211002                      column=info:name, timestamp=1598753509800, value=lisi1003                      column=info:name, timestamp=1598753628840, value=wangwu# 8. 修改数据(本质上是覆盖)put "namespace:表","rowkey","列族:列名","值"# 9. 删除数据(删除某个cell)delete "namespace:表","rowkey","列族:列名"# 10. 删除某个rowkey对应的数据deleteall "namespace:表","rowkey"# 11. 统计表中所有数据count "namespace:表"# 12. 清空表中的所有数据truncate "namespace:表"

5. 多版本问题

# 1. 创建表
hbase(main):013:0> create "baizhins:user","info"
# 2. 修改版本数
hbase(main):016:0> alter "baizhins:user",{NAME=>'info',VERSIONS=>2}# 表的列族的VERSIONS=>2表示的该列族的数据，要保存2个版本。如果put3次，则保留最新的2个版本。# 3. 同一个cell添加2次数据。
hbase(main):014:0> put "baizhi:user","10001","info:name","aaa"
0 row(s) in 0.2620 secondshbase(main):015:0> put "baizhi:user","10001","info:name","bb"
0 row(s) in 0.0290 seconds
# 4. 查看多版本
hbase(main):017:0> get "baizhi:user","10001",{COLUMN=>'info:name',VERSIONS=>3}
COLUMN                      CELLinfo:name                  timestamp=1586795010367, value=bbinfo:name                  timestamp=1586795004085, value=aaa
说明：1. 可以查看VERSIONS指定的版本数量的值。2. cell中多个版本的值，按照时间戳降序排序。3. 在get或者scan查询数据，并不指定VERSIONS,默认读取的cell中最新的1个的版本的值。

HBase API

环境准备

依赖

<dependency><groupId>org.apache.hbase</groupId><artifactId>hbase-client</artifactId><version>1.2.4</version>
</dependency><dependency><groupId>org.apache.hbase</groupId><artifactId>hbase-server</artifactId><version>1.2.4</version>
</dependency>

初始化配置

将hbase中的conf中的 hbase-site.xml放到resource配置文件目录中。

conf.addResource("/hbase-site.xml")
windows配置ip映射

API介绍

API	含义	创建
Configuration	配置文件	HBaseConfiguration.create();
Connection	连接，用来操作数据	ConnectionFactory.createConnection(conf);
Admin	客户端，用来操作元数据 (namespace和table结构)	conn.getAdmin();
NamespaceDescriptor	命名空间相当于database	NamespaceDescriptor.create(“baizhins”).build();
TableName	表名	TableName.valueOf(“baizhi:user”);
HTableDescriptor	表	new HTableDescriptor(tablename);
HColumnDescriptor	列族	new HColumnDescriptor(“info”);
Put	添加数据	new Put(Bytes.toBytes(“10001”));
Delete	rowkey的删除条件	new Delete(Bytes.toBytes(“10001”));
Get	scan多行查询器	new Get(Bytes.toBytes(“10019”));
Scan	scan多行查询器	new Scan();
Result	查询结果集(单条结果)	table.get(get);
ResultScanner	查询结果集(N条结果)	table.getScanner(scan);
Bytes	类型转化工具类，HBase中数据类型为字节，所有类型存入后都变成字节,需要相互转化。

HBase客户端连接

注意：配置windows向linux的ip映射。

// 获得客户端
//1. 读取配置文件
Configuration conf = HBaseConfiguration.create();
conf.set("hbase.zookeeper.quorum","hadoop10");
BasicConfigurator.configure();//打印日志信息
//2. 建立连接
Connection conn = ConnectionFactory.createConnection(conf);
//3. 获得客户端
admin = conn.getAdmin();
// 释放资源
admin.close();

常用API

1. 创建namespace

//1. 构建namespace信息。
NamespaceDescriptor baizhins = NamespaceDescriptor.create("baizhins").build();
//2. 创建namespace
admin.createNamespace(baizhins);

2. 表操作

操作表，使用admin

判断表是否存在

//1. 创建表名
TableName tableName = TableName.valueOf("baizhins:person");
//2. 判断表是否存在
boolean b = admin.tableExists(tableName);
System.out.println(b?"存在":"不存在");

创建表

//1. 初始化表名
TableName person = TableName.valueOf("baizhins:person");
//2. 初始化列族信息
HColumnDescriptor info = new HColumnDescriptor("info");
HColumnDescriptor addr = new HColumnDescriptor("addr");
//3. 绑定表名，绑定列族
HTableDescriptor hTableDescriptor = new HTableDescriptor(person);
hTableDescriptor.addFamily(info);
hTableDescriptor.addFamily(addr);
//4. 创建表
admin.createTable(hTableDescriptor);

3. 添加

操作数据使用conn

//1. 初始化要操作的表
Table table = conn.getTable(TableName.valueOf("baizhins:person"));
//2. 创建 添加数据
Put put = new Put(Bytes.toBytes("1001"));//构造rowkey
// Bytes是HBase提供的进行字节和java数据类型转化的工具类
put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("name"),Bytes.toBytes("张三") );
put.addColumn(Bytes.toBytes("info"),Bytes.toBytes("age"), Bytes.toBytes(18));
put.addColumn(Bytes.toBytes("addr"), Bytes.toBytes("zipCode"), Bytes.toBytes("45000"));
//3. 将put数据添加。
table.put(put);
//4. 释放资源
table.close();

4. 修改

//1. 初始化要操作的表
Table table = conn.getTable(TableName.valueOf("baizhins:person"));
//2. 修改的本质就是添加，利用时间戳覆盖旧的数据而已。
Put put = new Put(Bytes.toBytes("1001"));
put.addColumn(Bytes.toBytes("addr"), Bytes.toBytes("zipCode"), Bytes.toBytes("45001"));
//3. 添加到表中
table.put(put);
//4. 关闭table
table.close();

5. 删除

//1. 获得要操作的表
Table table = conn.getTable(TableName.valueOf("baizhins:person"));
//2. 创建要删除的条件，以rowkey为条件
Delete delete = new Delete(Bytes.toBytes("1001"));//删除某个列族
//delete.addFamily(Bytes.toBytes("cf2"));
//删除某个列
//delete.addColumn(Bytes.toBytes("cf1"),Bytes.toBytes("age"));//3. 执行删除
table.delete(delete);

6. 查询

根据rowkey单条查询。

//1. 获得要操作的表
Table table = conn.getTable(TableName.valueOf("baizhins:person"));
//2. 使用rowkey作为查询条件
Get get = new Get(Bytes.toBytes("10019"));
//3. 执行查询
Result result = table.get(get);
//4. 处理结果集：result.getValue;
byte[] namebyte = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));
//下面代码雷同。
byte[] agebyte = result.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));
byte[] zipbyte = result.getValue(Bytes.toBytes("addr"), Bytes.toBytes("zipCode"));
//获得rowkey
byte[] rowbytes = result.getRow();
System.out.println(Bytes.toString(namebyte));
System.out.println(Bytes.toInt(agebyte));
System.out.println(Bytes.toString(zipbyte));

多条查询

//1. 获得要操作的表
Table table = conn.getTable(TableName.valueOf("baizhins:person"));
//2. 创建scan扫描器，多行查询
Scan scan = new Scan();
//3. 指定要投射的列族。
scan.addFamily(Bytes.toBytes("info"));
scan.addFamily(Bytes.toBytes("addr"));
//4. 设置起始和查询条数
scan.setStartRow(Bytes.toBytes("1001"));
scan.setFilter(new PageFilter(3));
//5. 执行查询
ResultScanner result = table.getScanner(scan);
//6. 处理结果集
for (Result res:result){byte[] namebyte = res.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));byte[] agebyte = res.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));byte[] zipCodebyte = res.getValue(Bytes.toBytes("addr"), Bytes.toBytes("zipCode"));String name = Bytes.toString(namebyte);int age = Bytes.toInt(agebyte);String zipcode = Bytes.toString(zipCodebyte);System.out.println(name+":"+age+":"+zipcode);
}
//7. 关闭table
table.close();

范围查询

//1. 获得要操作的表
Table table = conn.getTable(TableName.valueOf("baizhins:person"));
//2. 创建scan扫描器，多行查询
Scan scan = new Scan();
//3. 指定要投射的列族。
scan.addFamily(Bytes.toBytes("info"));
scan.addFamily(Bytes.toBytes("addr"));
//4. 设置起始和查询条数
scan.setStartRow(Bytes.toBytes("1001"));
scan.setStopRow(Bytes.toBytes("1003"));
//5. 执行查询
ResultScanner result = table.getScanner(scan);
//6. 处理结果集
for (Result res:result){byte[] namebyte = res.getValue(Bytes.toBytes("info"), Bytes.toBytes("name"));byte[] agebyte = res.getValue(Bytes.toBytes("info"), Bytes.toBytes("age"));byte[] zipCodebyte = res.getValue(Bytes.toBytes("addr"), Bytes.toBytes("zipCode"));String name = Bytes.toString(namebyte);int age = Bytes.toInt(agebyte);String zipcode = Bytes.toString(zipCodebyte);System.out.println(name+":"+age+":"+zipcode);
}
//7. 关闭table
table.close();

前缀查询

Scan scan = new Scan();
Filter filter = new RowFilter(CompareFilter.CompareOp.EQUAL,new RegexStringComparator("a-"));
scan.setFilter(filter);ResultScanner results = table.getScanner(scan);for (Result result : results) {byte[] nameByte = result.getValue(Bytes.toBytes("cf1"),Bytes.toBytes("name"));byte[] ageByte = result.getValue(Bytes.toBytes("cf1"),Bytes.toBytes("age"));System.out.println(Bytes.toString(nameByte) + "\t" + Bytes.toString(ageByte));
}
table.close();

多版本查询

Get get = new Get(Bytes.toBytes("1001"));
//可以指定查询某一个列
get.addColumn(Bytes.toBytes("cf1"),Bytes.toBytes("name"));
get.setMaxVersions(5);
Result result = table.get(get);Cell[] cells = result.rawCells();
for (Cell cell : cells) {System.out.println(Bytes.toString(CellUtil.cloneValue(cell)));
}

HBase架构原理

读写数据操作原理

读数据

写数据

HBase底层原理

HBase架构体系

架构相关概念

HRegionServer

HRegionServer(和DataNode同一节点)
1. 存储表数据部分
2. put delete get scan等针对数据的操作
3. 定时向Master报告自身节点的状态
4. 管理表的数据的Table的数据

HMaster

HMaster
1. Region Server状态的管理
2. 表的管理：create drop alter
3. 实现HRegionServer的数据负载均衡，平衡HRegion的分布

Zookeeper

Zookeeper
1. 解决HMaster的单点故障问题
2. 存放HMaster管理的HRegionServer的状态信息，并通知HMaster
3. 存放HMaster管理的表的元数据信息表名、列名、key区间等。

HRegion

HRegion表的横向切片的物理表现,大表的子表,有(startkey endkey)，多行数据。为了减小单表操作的大小，提高读写效率。

Store

Store
1. 表的纵向切分的物理表现，按照列族作为切分。
2. 按照列族查询，仅需要检索一定范围内的数据，减少全表扫描。

HBase底层原理

Region Split 分区

分区原因

提高Region的负载和读写效率。
说明

Region一拆为二，并分布在不同的RegionServer上。
默认分区机制
Region中数据超过128M、512M、1152M… *Region数量²hbase.hregion.memstore.flush.size … 10G、10G

查看参数
```
hbase.hregion.memstore.flush.size=128M
hbase.hregion.max.filesize=10G
```
问题

默认分区容易导致数据倾斜，硬件资源无法利用。(数据热点问题，大量的客户端访问，落在部分节点上，导致忙的忙死，闲的闲死。)

Region预分区

为什么
- 增加读写效率。(多个region分布在不同的RegionServer中，可以提高并发效率)
- 尽量保证每个Region中的数据量相当，防止数据倾斜。(合理利用计算资源)
分区的效果

每个Region维护一对StartKey和EndKey，限定维护输入rowkey范围。

添加数据时，将rowkey放入匹配的region中。
创建表时分区,手动指定

命令：

create "namespace:表","列族",SPLITS=>["100000","200000","300000","400000"]

效果：(http://ip:16030)访问RegionServers

java代码分区

MemStore Flush刷写

说明

简言：持久化，保护数据不丢失。

将RegionServer中内存中的数据Memstore，写入到硬盘中。
图
时机
1. 当 region server 中 memstore 的总大小达到java_heapsize的阈值，默认值 0.4
对应参数：hbase.regionserver.global.memstore.size
1. 到达自动刷写的时间，默认 1 小时
对应参数：hbase.regionserver.optionalcacheflushinterval
1. 单个Region中的数据文件大小超过128M。
对应参数：hbase.hregion.memstore.flush.size
手动flush

命令：flush "namespace:表名"

文件位置：

hdfs:ip:50070/hbase/data/baizhins/user2/faf64f7f6cfa6282c2a92864faa3909d

Store File Compaction 合并

目的

storefile小文件过多，查询时，需要遍历所有文件，效率低。

storefile中遍布过期数据，占用空间，且查询效率低。
说明

简言：为提高检索效率，合并store。
图
分类和时机
- minor compact(局部合并)
```
特点：少量相邻(加速合并，并有序)文件的合并
```
时机：发生频率较高，不影响性能。

手动命令：compact "namespace:表名"
- major compact(全局合并)
```
特点：
1. 全局的所有store file文件的合并。
2. 去除删除被覆盖的文件。
3. 特别消耗RegionServer的性能资源。(重点)
```
时机：每7天执行一次:参数：hbase.hregion.majorcompaction

一般手动触发。手动触发命令：major_compact "namespace:表名"

rowkey设计

# rowkey对hbase有什么影响
1. 影响region数据分布，负载均衡，不好rowkey设计，会导致数据倾斜，导致数据热点。希望：一段时间内，新增数据(访问请求)，尽可能均匀分布到不同的HRegion。
2. 唯一标记1条数据希望：rowkey唯一性。
3. 为查询业务服务。希望：rowkey设计必须满足查询业务需求

为什么HBase数据读取速度快BlockCache

# 1 MemstoreRegion内存中特点：(内存)(数据最新的)(有序)
# 2 BlockCache(LRU)HBase缓存中。缓存策略：LRU(数据淘汰机制)，最近最少使用原则，保留最近最新使用多的数据。
# 3:磁盘storeFile(每个小file中rowkey是有序的) LSM磁盘的检索速度慢是因为寻道。磁盘合并大storeFile(减少file数量，可以提高磁盘检索效率)1. storefile文件数量少，减少遍历。2. 文件内以及文件在磁盘中，rowkey有序，代码检索，磁盘寻道大大节省时间。

HBase架构完整版

注意

编辑regionservers，使用vi编辑

安装hbase之前，同步系统时间

集群规划
192.168.199.11: HMaster
192.168.199.12: HRegionServer
192.168.199.13: HRegionServer

# 0 确保HDFS HA已经搭建完毕
[root@hadoop11 ~]# jps
1259 JournalNode
1965 NameNode
1758 DFSZKFailoverController
2110 Jps
1215 QuorumPeerMain

# 1. 安装HBase

1. 解压HBase[root@hadoop11 modules]# tar zxvf hbase-1.2.4-bin.tar.gz -C /opt/installs/
2. 配置环境变量#JAVAexport JAVA_HOME=/opt/installs/jdk1.8export PATH=$PATH:$JAVA_HOME/bin# HADOOPexport HADOOP_HOME=/opt/installs/hadoop2.9.2/export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin# zookeeperexport PATH=$PATH:/opt/installs/zookeeper3.4.14/bin/# HBaseexport HBASE_HOME=/opt/installs/hbase-1.2.4export PATH=$PATH:$HBASE_HOME/bin
3. 加载profile配置source /etc/profile

# 2. 初始化HBase 配置文件

# 1 -------------------hbase-env.sh--------------------# 配置Java_home
export JAVA_HOME=/opt/installs/jdk1.8# 注释掉如下2行。
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m -XX:ReservedCodeCacheSize=256m"# 禁用内置zookeeper
export HBASE_MANAGES_ZK=false

# 2. -------------------hbase-site.xml-------------------------
<configuration><!-- hbase的入口，ns HaHadoop的虚拟命名空间 --><property><name>hbase.rootdir</name><value>hdfs://ns/hbase</value></property><!-- 使用分布式模式 --><property><name>hbase.cluster.distributed</name><value>true</value></property><!-- zookeeper集群地址，端口默认2181不需要指定 --><property><name>hbase.zookeeper.quorum</name><value>hadoop10,hadoop11,hadoop12</value></property><!--配置hdfs的hflush：否则该版本启动会报错--><property><name>hbase.unsafe.stream.capability.enforce</name><value>false</value></property>
</configuration>

# 3. -------------------regionservers--------------------
hadoop12
hadoop13

#** 4. 将hadoop的配置文件拷贝到hbase的conf目录中。(core-site.xml  hdfs-site.xml)
[root@hadoop11 installs]# ln -s /opt/installs/hadoop2.9.2/etc/hadoop/core-site.xml /opt/installs/hbase-1.2.4/conf/core-site.xml
[root@hadoop11 installs]# ln -s /opt/installs/hadoop2.9.2/etc/hadoop/hdfs-site.xml /opt/installs/hbase-1.2.4/conf/hdfs-site.xml

# 3. 远程拷贝
1. 拷贝profile文件
[root@hadoop11 installs]# scp /etc/profile root@hadoop12:/etc/
[root@hadoop11 installs]# scp /etc/profile root@hadoop13:/etc/
2. 拷贝hbase安装软件和配置文件
[root@hadoop11 installs]# scp -r hbase1.2.4/ root@hadoop12:/opt/installs/
[root@hadoop11 installs]# scp -r hbase1.2.4/ root@hadoop13:/opt/installs/
3. 重新加载profile
[root@hadoop11 ~]# source /etc/profile
[root@hadoop12 ~]# source /etc/profile
[root@hadoop13 ~]# source /etc/profile

# 3. 启动HBase
1. 启动hbasestart-hbase.sh
2. 关闭hbasestop-hbase.sh

false


~~~shell
# 3. -------------------regionservers--------------------
hadoop12
hadoop13

#** 4. 将hadoop的配置文件拷贝到hbase的conf目录中。(core-site.xml  hdfs-site.xml)
[root@hadoop11 installs]# ln -s /opt/installs/hadoop2.9.2/etc/hadoop/core-site.xml /opt/installs/hbase-1.2.4/conf/core-site.xml
[root@hadoop11 installs]# ln -s /opt/installs/hadoop2.9.2/etc/hadoop/hdfs-site.xml /opt/installs/hbase-1.2.4/conf/hdfs-site.xml

# 3. 远程拷贝
1. 拷贝profile文件
[root@hadoop11 installs]# scp /etc/profile root@hadoop12:/etc/
[root@hadoop11 installs]# scp /etc/profile root@hadoop13:/etc/
2. 拷贝hbase安装软件和配置文件
[root@hadoop11 installs]# scp -r hbase1.2.4/ root@hadoop12:/opt/installs/
[root@hadoop11 installs]# scp -r hbase1.2.4/ root@hadoop13:/opt/installs/
3. 重新加载profile
[root@hadoop11 ~]# source /etc/profile
[root@hadoop12 ~]# source /etc/profile
[root@hadoop13 ~]# source /etc/profile

# 3. 启动HBase
1. 启动hbasestart-hbase.sh
2. 关闭hbasestop-hbase.sh

HBase详细的安装和使用方法相关推荐

最详细python安装库的方法（以安装pygame库为例）
文章目录前言
EasyRecovery 15 mac中文免费密钥数据恢复安装软件的方法教程及版本对比
为了方便果粉们的使用,EasyRecovery 15 For Mac已经率先上线,那么在使用之前需要在Mac电脑上安装,小编在此说明EasyRecovery 详细介绍安装软件的方法教程及版本对比. 准 ...
db2top详细使用方法_Py之PIL：Python的PIL库的简介、安装、使用方法详细攻略
Py之PIL:Python的PIL库的简介.安装.使用方法详细攻略目录 PIL库的简介 PIL库的安装 PIL库的用方法 1.几何图形的绘制与文字的绘制 2.绘制图形的各种案例 PIL库的简介 PI ...
python compiler库_Python之compiler：compiler库的简介、安装、使用方法之详细攻略
Python之compiler:compiler库的简介.安装.使用方法之详细攻略目录 compiler库的简介 compiler库的安装 compiler库的使用方法 compiler库的简介根 ...
Py之Xlrd：Xlrd简介、安装、使用方法(读取xlsx文件的shee表头名/总行数/总列数、每一行的内容、指定列的内容)之详细攻略
Py之Xlrd:Xlrd简介.安装.使用方法(读取xlsx文件的shee表头名/总行数/总列数.每一行的内容.指定列的内容)之详细攻略导读 xlrd,xlwt和xlutils是用Pyth ...
Python之ffmpeg-python：ffmpeg-python库的简介、安装、使用方法之详细攻略
Python之ffmpeg-python:ffmpeg-python库的简介.安装.使用方法之详细攻略目录 ffmpeg-python库的简介 ffmpeg-python库的安装 ffmpeg-py ...
Python之fastai：fastai库的简介、安装、使用方法之详细攻略
Python之fastai:fastai库的简介.安装.使用方法之详细攻略目录 fastai库的简介 fastai库的安装 fastai库的使用方法 1.计算机视觉分类
Python之tushare：tushare库的简介、安装、使用方法之详细攻略
Python之tushare:tushare库的简介.安装.使用方法之详细攻略目录 tushare库的简介 tushare库的安装 tushare库的使用方法 1.基础用法 tushare库的简介 ...
Dataset之babyboom.dat：babyboom.dat数据集的简介、安装、使用方法之详细攻略
Dataset之babyboom.dat:babyboom.dat数据集的简介.安装.使用方法之详细攻略目录 babyboom.dat数据集的简介.安装.使用方法 babyboom.dat数据集的简 ...

HBase详细的安装和使用方法

简介

为什么需要HBase

HBase特点

HBase和RDBMS对比

HBase表逻辑结构

数据相关概念

HBase单机版安装

下载

准备

安装

HBase 命令

1. 客户端进出命令

2. namespace操作

3. 表操作

4.数据增删改查

5. 多版本问题

HBase API

环境准备

API介绍

HBase客户端连接

常用API

1. 创建namespace

2. 表操作

3. 添加

4. 修改

5. 删除

6. 查询

HBase架构原理

读写数据操作原理

读数据

写数据

HBase底层原理

HBase架构体系

架构相关概念

HRegionServer

HMaster

Zookeeper

HRegion

Store

HBase底层原理

Region Split 分区

Region预分区

MemStore Flush刷写

Store File Compaction 合并

rowkey设计

为什么HBase数据读取速度快BlockCache

HBase架构完整版

HBase详细的安装和使用方法相关推荐

最新文章

热门文章