Hbase Scan类 ResultScanner类

Scan类

作用
用于执行扫描操作。
除了实例化之外，所有操作均与Get相同。可以定义可选的startRow和stopRow而不是指定单行。如果未指定行，则扫描程序将遍历所有行。
要从表的所有行中获取所有列，请创建一个没有约束的实例；使用Scan（）构造函数。要将扫描限制为特定的列族，请为每个族调用addFamily以在您的Scan实例上进行检索。
要获取特定列，请为要检索的每一列调用addColumn。
要仅检索版本时间戳记特定范围内的列，请调用setTimeRange。
要仅检索具有特定时间戳记的列，请调用setTimestamp。
要限制要返回的每列的版本数，请调用setMaxVersions。
要将每次调用返回的最大值限制为next（），请调用setBatch。
要添加过滤器，请调用setFilter。
专家：要为此扫描明确禁用服务器端块缓存，请执行setCacheBlocks（boolean）。
注意：用法会更改扫描实例。在内部，属性随扫描运行而更新，如果启用，则度量会在扫描实例中累积。请注意，当您要克隆扫描实例或要重用已创建的扫描实例时，就是这种情况。比较安全的做法是按使用情况创建扫描实例。

体系

属性

 private byte [] startRow = HConstants.EMPTY_START_ROW;private byte [] stopRow  = HConstants.EMPTY_END_ROW;private int maxVersions = 1;private int batch = -1;

构造函数

就是构造函数里面可以包含startkey.限制查询范围,也可以包含filter.

idea中查看如下

方法

主要一些常用的set get方法.
比如

  /*** Set the start row of the scan.* @param startRow row to start scan on (inclusive)* Note: In order to make startRow exclusive add a trailing 0 byte* @return this*/public Scan setStartRow(byte [] startRow) {this.startRow = startRow;return this;}

属性的get,set方法在idea中一般如下显示.紫色的p图标

##　举例

查询CF f1 中ename为zhaoyun10的

select * from xx where ename=“zhaoyun10”

public void testCom() throws IOException {//完全匹配,可以用正则替换BinaryComparator comparator=new BinaryComparator("zhaoyun10".getBytes());SingleColumnValueFilter enamefilter = new SingleColumnValueFilter("f1".getBytes(), "ename".getBytes(), CompareFilter.CompareOp.EQUAL,comparator);enamefilter.setFilterIfMissing(true);Scan scan=new Scan("0".getBytes(),enamefilter);excuteScan(scan);}

结果如下

对比如下代码

public void testCom() throws IOException {//正则字符串比较器//RegexStringComparator comparator=new RegexStringComparator("10");//SubstringComparator comparator=new SubstringComparator("10");//完全匹配,可以用正则替换BinaryComparator comparator=new BinaryComparator("zhaoyun10".getBytes());SingleColumnValueFilter enamefilter = new SingleColumnValueFilter("f1".getBytes(), "ename".getBytes(), CompareFilter.CompareOp.EQUAL,comparator);enamefilter.setFilterIfMissing(true);Scan scan=new Scan("0".getBytes(),enamefilter);scan.addColumn("f1".getBytes(),"ename".getBytes());excuteScan(scan);}

结果为.没有获取该行的其他信息

ResultScanner类

可以看出继承了Iterable接口.那么

  /*** Returns an iterator over elements of type {@code T}.** @return an Iterator.*/Iterator<T> iterator();

方法

public interface ResultScanner
extends java.io.Closeable, Iterable<org.apache.hadoop.hbase.client.Result>

继承了Iterable接口,说明可以迭代,而且迭代出来的结果是Result对象,那么Result的构造又是如何呢?

Result类

Get或Scan查询的单行结果。
此类不是线程安全的。
便捷方法可以直接返回各种Map结构和值。
要获取Result中所有单元的完整映射，其中可以包括多个族和多个版本，请使用getMap（）。
要获取每个族到其列（限定符和值）的映射（仅包括每个族的最新版本），请使用getNoVersionMap（）。要获取单个家庭的限定词到最新值的映射，请使用getFamilyMap（byte []）。
要获取特定族和限定符的最新值，请使用getValue（byte []，byte []）。结果由一个Cell对象数组支持，每个Cell对象代表由行，族，限定符，时间戳和值定义的HBase单元。
可以通过方法listCells（）访问基础Cell对象。这将从内部单元[]创建一个列表。更好的是利用一个事实，即新的Result实例是引发的CellScanner。就像调用任何CellScanner一样，只需调用advance（）和current（）即可遍历Cells。如果您需要再次迭代相同的结果，请调用cellScanner（）进行重置（CellScanners是一次性的）。如果您需要用另一个Result实例覆盖一个Result（如在旧的“ mapred” RecordReader下一次调用中一样），请使用null构造函数创建一个空Result，然后使用copyFrom（Result）

属性

private Cell[] cells;private Boolean exists; // if the query was just to check existence.private boolean stale = false;/*** Partial results do not contain the full row's worth of cells. The result had to be returned in* parts because the size of the cells in the row exceeded the RPC result size on the server.* Partial results must be combined client side with results representing the remainder of the* row's cells to form the complete result. Partial results and RPC result size allow us to avoid* OOME on the server when servicing requests for large rows. The Scan configuration used to* control the result size on the server is {@link Scan#setMaxResultSize(long)} and the default* value can be seen here: {@link HConstants#DEFAULT_HBASE_CLIENT_SCANNER_MAX_RESULT_SIZE}*/private boolean partial = false;// We're not using java serialization.  Transient here is just a marker to say// that this is where we cache row if we're ever asked for it.private transient byte [] row = null;// Ditto for familyMap.  It can be composed on fly from passed in kvs.private transient NavigableMap<byte[], NavigableMap<byte[], NavigableMap<Long, byte[]>>> familyMap = null;private static ThreadLocal<byte[]> localBuffer = new ThreadLocal<byte[]>();private static final int PAD_WIDTH = 128;public static final Result EMPTY_RESULT = new Result(true);private final static int INITIAL_CELLSCANNER_INDEX = -1;/*** Index for where we are when Result is acting as a {@link CellScanner}.*/private int cellScannerIndex = INITIAL_CELLSCANNER_INDEX;private ClientProtos.RegionLoadStats stats;private final boolean readonly;

实现了CellScannable接口

Result的cellScanner方法返回CellScanner对象.那么CellScanner对象是怎样的呢?

@Overridepublic CellScanner cellScanner() {// Resetthis.cellScannerIndex = INITIAL_CELLSCANNER_INDEX;return this;}

private static final int INITIAL_CELLSCANNER_INDEX = -1
这个常量是-1.

CellScanner类

用于遍历一系列单元格的接口。与Java的Iterator相似，但没有hasNext（）或remove（）方法。 hasNext（）方法存在问题，因为它可能需要实际加载下一个对象，而这又需要将前一个对象存储在某个地方。
核心数据块解码器应尽可能快，因此我们将同时跟踪多个单元格到CellScanner上方的层上的复杂性和性能费用提高了。
current（）方法将返回对Cell实现的引用。此引用可能会或可能不会指向可重复使用的单元实现，因此CellScanner的用户不应例如累积单元列表。所有引用都可能指向同一对象，这将是基础Cell的最新状态。简而言之，细胞是可变的。
典型用法：

   while (scanner.advance()) {Cell cell = scanner.current();// do something}

通常用于阅读由org.apache.hadoop.hbase.io.CellOutputStream编写的Cell。
方法

public abstract org.apache.hadoop.hbase.Cell current()

current返回一个Cell对象,Cell对象怎样构造的?

Cell类

HBase中的存储单位包括以下字段：
1）行
2）列族
3）列限定符
4）时间戳
5）类型
6）MVCC版本
7）价值

唯一性由行，列族，列限定符，时间戳和类型的组合确定。
自然比较器将对行，列族和列限定符进行按位比较。不太直观的做法是，将较大的时间戳视为较小的值，目的是首先对较新的单元格进行排序。
该接口不应包括分配新的byte []的方法，例如客户端或调试代码中使用的方法。这些用户应使用CellUtil类中的方法。当前，为了使现有应用程序在0.94和0.96之间移动的影响最小化，我们将标记为已弃用的昂贵的帮助程序方法包括在内。
Cell实现Comparable ，仅当与同一表中的其他键进行比较时才有意义。它使用的CellComparator在-ROOT-和hbase：meta表上不起作用。
将来，我们可能会考虑添加一个boolean isOnHeap（）方法和getValueBuffer（）方法，这些方法可用于将值直接从堆外ByteBuffer传递到网络，而无需复制到堆上byte []。
历史说明：原始的Cell实现（KeyValue）要求将所有字段编码为同一byte []中的连续字节，而此接口允许字段驻留在单独的byte []中。

属性

CellUtil类

举个例子 cloneRow
参数为Cell类型的cell.返回值为byte数组.既然是byte数组就可以转为String打印了.

public static byte[] cloneRow(@NotNull org.apache.hadoop.hbase.Cell cell)

实例代码

public void excuteScan(Scan scan) throws IOException {ResultScanner resultScanner = table.getScanner(scan);for (Result result : resultScanner) {CellScanner cellScanner = result.cellScanner();//使用迭代器遍历元素 advance判断是否有下一个元素while (cellScanner.advance()) {//取出当前单元格Cell current = cellScanner.current();System.out.println("\n" + new String(CellUtil.cloneRow(current)) +"\t" + new String(CellUtil.cloneFamily(current)) +"\t" + new String(CellUtil.cloneQualifier(current)) +"\t" + new String(CellUtil.cloneValue(current)));}}}