关键词:hdp , hive , StorageHandler

了解Hive StorageHandler的同学都知道,StorageHandler作为Hive适配不同存储的拓展类,同时肩负着HiveStoragePredicateHandler的角色对相关存储做下推优化,核心方法如下:

/*** HiveStoragePredicateHandler is an optional companion to {@link* HiveStorageHandler}; it should only be implemented by handlers which* support decomposition of predicates being pushed down into table scans.*/
public interface HiveStoragePredicateHandler {/*** Gives the storage handler a chance to decompose a predicate.  The storage* handler should analyze the predicate and return the portion of it which* cannot be evaluated during table access.  For example, if the original* predicate is <code>x = 2 AND upper(y)='YUM'</code>, the storage handler* might be able to handle <code>x = 2</code> but leave the "residual"* <code>upper(y)='YUM'</code> for Hive to deal with.  The breakdown* need not be non-overlapping; for example, given the* predicate <code>x LIKE 'a%b'</code>, the storage handler might* be able to evaluate the prefix search <code>x LIKE 'a%'</code>, leaving* <code>x LIKE '%b'</code> as the residual.** @param jobConf contains a job configuration matching the one that* will later be passed to getRecordReader and getSplits** @param deserializer deserializer which will be used when* fetching rows** @param predicate predicate to be decomposed** @return decomposed form of predicate, or null if no pushdown is* possible at all*/public DecomposedPredicate decomposePredicate(JobConf jobConf,Deserializer deserializer,ExprNodeDesc predicate);/*** Struct class for returning multiple values from decomposePredicate.*/public static class DecomposedPredicate {/*** Portion of predicate to be evaluated by storage handler.  Hive* will pass this into the storage handler's input format.*/public ExprNodeGenericFuncDesc pushedPredicate;/*** Serialized format for filter*/public Serializable pushedPredicateObject;/*** Portion of predicate to be post-evaluated by Hive for any rows* which are returned by storage handler.*/public ExprNodeGenericFuncDesc residualPredicate;}

核心方法便是decomposePredicate方法,返回一个 DecomposePredicate 对象,其中,对象中的属性成员 Serializable pushedPredicateObject 是一个自由度非常高的属性,你可以把你任何下推的结果、配置、甚至在下推中解析表达树得到的一些函数声明等都可以传递出去,给到InputFormat侧去决定如何读取数据。但是在HDP 2.2.6-2800(对应Hive和 HDP (对应 Hive 1.2.1000. 中,经测试,DecomposePredicate的另外两个属性都能起效,唯独pushedPredicateObject怎么都拿不到,在InputFormat侧一直为null。

单步跟了Hive的源码,pushedPredicateObject测试能用,本地打包上传测试服务器替换原来的hive-exec jar包重启HiveServer2,居然也测试成功能用。由于HDP的代码小版本号太多,而且也不确定后面横线后的版本号对应的数字是代表什么意思(revision?),所以暂时找不到确定的源码了,认为最近似的源码2.2.6.0手动编译打包的是没问题的。



