Tez是Apache最新开源的支持DAG作业的计算框架,它直接源于MapReduce框架,核心思想是将Map和Reduce两个操作进一步拆分,即Map被拆分成Input、Processor、Sort、Merge和Output, Reduce被拆分成Input、Shuffle、Sort、Merge、Processor和Output等,这样,这些分解后的元操作可以任意灵活组合,产生新的操作,这些操作经过一些控制程序组装后,可形成一个大的DAG作业。总结起来,Tez有以下特点:
(1)Apache二级开源项目(源代码今天发布的)
(2)运行在YARN之上
(3) 适用于DAG(有向图)应用(同Impala、Dremel和Drill一样,可用于替换Hive/Pig等)
其中,第三点需要做一些简单的说明,Apache当前有顶级项目Oozie用于DAG作业设计,但Oozie是比较高层(作业层面)的,它只是提供了一种多类型作业(比如MR程序、Hive、Pig等)依赖关系表达方式,并按照这种依赖关系提交这些作业,而Tez则不同,它在更底层提供了DAG编程接口,用户编写程序时直接采用这些接口进行程序设计,这种更底层的编程方式会带来更高的效率.
详细介绍:http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/

Tez有以下几个特色:
(1) 丰富的数据流(dataflow,NOT Streaming!)编程接口;
(2) 扩展性良好的“Input-Processor-Output”运行模型;
(3) 简化数据部署(充分利用了YARN框架,Tez本身仅是一个客户端编程库,无需事先部署相关服务)
(4) 性能优于MapReduce
(5)  优化的资源管理(直接运行在资源管理系统YARN之上)

(6) 动态生成物理数据流(dataflow)

操作日志:
http://zh.hortonworks.com/hadoop-tutorial/supercharging-interactive-queries-hive-tez/

[root@sandbox ~]# hiveLogging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
hive> set hive.execution.engine=mr;
hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid;
Query ID = root_20141030000808_c3d3772d-3aa0-4673-a28e-515043452d7a
Total jobs = 1
14/10/30 00:08:46 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/30 00:08:46 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: /tmp/root/root_20141030000808_c3d3772d-3aa0-4673-a28e-515043452d7a.log
2014-10-30 12:08:48     Starting to launch local task to process map join;      maximum memory = 260177920
2014-10-30 12:08:52     Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable
2014-10-30 12:08:52     Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-08-32_847_2342871984031221118-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile00--.hashtable (1044 bytes)
2014-10-30 12:08:52     End of local task; Time Taken: 3.833 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1414630785017_0012, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0012/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414630785017_0012
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2014-10-30 00:09:13,234 Stage-3 map = 0%,  reduce = 0%
2014-10-30 00:09:21,143 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.91 sec
MapReduce Total cumulative CPU time: 1 seconds 910 msec
Ended Job = job_1414630785017_0012
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.91 sec   HDFS Read: 240754 HDFS Write: 416262 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 910 msec
OK
6/1/13  0:00:01 66      58      13      20      4       Finland GG1919  17      M4
......
6/17/13 2:33:07 68      72      17      27      12      Finland FN39TG  26      M12
6/18/13 3:33:07 68      69      10      4       3       Brazil  JDNS77  28      M3
6/19/13 4:33:07 65      63      7       23      20      Argentina       ACMAX22 19      M20
6/20/13 5:33:07 66      66      9       21      3       Brazil  JDNS77  28      M3
Time taken: 51.333 seconds, Fetched: 8000 row(s)
hive> set hive.execution.engine=tez;
hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid;
Query ID = root_20141030001010_9b9e76a3-5da9-48b6-9bd6-302c6560aa79
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0013)Map 1: -/-      Map 2: -/-
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 1/1      Map 2: 0/1
Map 1: 1/1      Map 2: 1/1
Status: Finished successfully
OK
6/1/13  0:00:01 66      58      13      20      4       Finland GG1919  17      M4
......
6/17/13 2:33:07 68      72      17      27      12      Finland FN39TG  26      M12
6/18/13 3:33:07 68      69      10      4       3       Brazil  JDNS77  28      M3
6/19/13 4:33:07 65      63      7       23      20      Argentina       ACMAX22 19      M20
6/20/13 5:33:07 66      66      9       21      3       Brazil  JDNS77  28      M3
Time taken: 52.979 seconds, Fetched: 8000 row(s)
hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10;
Query ID = root_20141030001616_e16c39f3-8cc6-4809-96f1-c76a55543a72
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.Status: Running (application id: application_1414630785017_0014)Map 1: -/-      Map 2: -/-      Reducer 3: 0/1
Map 1: 0/1      Map 2: 0/1      Reducer 3: 0/1
Map 1: 0/1      Map 2: 0/1      Reducer 3: 0/1
Map 1: 0/1      Map 2: 0/1      Reducer 3: 0/1
Map 1: 0/1      Map 2: 0/1      Reducer 3: 0/1
Map 1: 1/1      Map 2: 0/1      Reducer 3: 0/1
Map 1: 1/1      Map 2: 1/1      Reducer 3: 0/1
Map 1: 1/1      Map 2: 1/1      Reducer 3: 1/1
Status: Finished successfully
OK
1       M1      14
2       M2      15
3       M3      15
4       M4      15
5       M5      15
6       M6      15
7       M7      14
8       M8      15
9       M9      14
10      M10     15
Time taken: 37.644 seconds, Fetched: 10 row(s)
hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10;
Query ID = root_20141030001919_7ef87e96-9848-404b-9f7c-2e847efbda72
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0014)Map 1: -/-      Map 2: -/-
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 0/1      Map 2: 0/1
Map 1: 1/1      Map 2: 0/1
Map 1: 1/1      Map 2: 1/1
Status: Finished successfully
OK
6/1/13  0:00:01 66      58      13      20      4       Finland GG1919  17      M4
6/2/13  1:00:01 69      68      3       20      17      Egypt   FN39TG  11      M17
6/3/13  2:00:01 70      73      17      20      18      Indonesia       JDNS77  25      M18
6/4/13  3:00:01 67      63      2       23      15      Israel  ACMAX22 19      M15
6/5/13  4:00:01 68      74      16      9       3       Brazil  JDNS77  28      M3
6/6/13  5:00:01 67      56      13      28      4       Finland GG1919  17      M4
6/7/13  6:00:01 70      58      12      24      2       France  FN39TG  27      M2
6/8/13  7:00:01 70      73      20      26      16      Turkey  AC1000  23      M16
6/9/13  8:00:01 66      69      16      9       9       Mexico  GG1919  11      M9
6/10/13 9:00:01 65      57      6       5       12      Finland FN39TG  26      M12
Time taken: 30.373 seconds, Fetched: 10 row(s)
hive> set hive.execution.engine=mr;
hive> select h.*, b.country, b.hvacproduct, b.buildingage, b.buildingmgr > from building b join hvac h > on b.buildingid = h.buildingid limit 10;
Query ID = root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793
Total jobs = 1
14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/30 00:21:17 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10006/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: /tmp/root/root_20141030002121_82b6c25d-3efe-4edb-97f0-6ee104a44793.log
2014-10-30 12:21:20     Starting to launch local task to process map join;      maximum memory = 260177920
2014-10-30 12:21:26     Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable
2014-10-30 12:21:26     Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-21-05_917_4670481632857759726-1/-local-10003/HashTable-Stage-3/MapJoin-mapfile40--.hashtable (1044 bytes)
2014-10-30 12:21:26     End of local task; Time Taken: 6.094 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1414630785017_0015, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0015/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414630785017_0015
Hadoop job information for Stage-3: number of mappers: 0; number of reducers: 0
2014-10-30 00:21:40,956 Stage-3 map = 0%,  reduce = 0%
2014-10-30 00:21:51,576 Stage-3 map = 100%,  reduce = 0%
Ended Job = job_1414630785017_0015
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
6/1/13  0:00:01 66      58      13      20      4       Finland GG1919  17      M4
6/2/13  1:00:01 69      68      3       20      17      Egypt   FN39TG  11      M17
6/3/13  2:00:01 70      73      17      20      18      Indonesia       JDNS77  25      M18
6/4/13  3:00:01 67      63      2       23      15      Israel  ACMAX22 19      M15
6/5/13  4:00:01 68      74      16      9       3       Brazil  JDNS77  28      M3
6/6/13  5:00:01 67      56      13      28      4       Finland GG1919  17      M4
6/7/13  6:00:01 70      58      12      24      2       France  FN39TG  27      M2
6/8/13  7:00:01 70      73      20      26      16      Turkey  AC1000  23      M16
6/9/13  8:00:01 66      69      16      9       9       Mexico  GG1919  11      M9
6/10/13 9:00:01 65      57      6       5       12      Finland FN39TG  26      M12
Time taken: 45.935 seconds, Fetched: 10 row(s)
hive> select a.buildingid, b.buildingmgr, max(a.targettemp-a.actualtemp)> from hvac a join building b> on a.buildingid = b.buildingid> group by a.buildingid, b.buildingmgr limit 10;
Query ID = root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30
Total jobs = 1
14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/10/30 00:23:16 WARN conf.Configuration: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10007/jobconf.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: /tmp/root/root_20141030002323_041fbe30-9942-4a05-b307-425a8865fe30.log
2014-10-30 12:23:17     Starting to launch local task to process map join;      maximum memory = 260177920
2014-10-30 12:23:19     Dump the side-table into file: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable
2014-10-30 12:23:19     Uploaded 1 File to: file:/tmp/root/hive_2014-10-30_00-23-12_351_424724314401221979-1/-local-10004/HashTable-Stage-2/MapJoin-mapfile51--.hashtable (714 bytes)
2014-10-30 12:23:19     End of local task; Time Taken: 1.254 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapreduce.job.reduces=<number>
Starting Job = job_1414630785017_0016, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0016/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414630785017_0016
Hadoop job information for Stage-2: number of mappers: 0; number of reducers: 0
2014-10-30 00:23:27,599 Stage-2 map = 0%,  reduce = 0%
2014-10-30 00:23:37,062 Stage-2 map = 100%,  reduce = 0%
2014-10-30 00:23:44,578 Stage-2 map = 100%,  reduce = 100%
Ended Job = job_1414630785017_0016
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
1       M1      14
2       M2      15
3       M3      15
4       M4      15
5       M5      15
6       M6      15
7       M7      14
8       M8      15
9       M9      14
10      M10     15
Time taken: 32.685 seconds, Fetched: 10 row(s)

Query Vectorization

Step 1:

hive> create table hvac_orc stored as orc as select * from hvac;
Query ID = root_20141030002727_5b56b417-f2d7-405c-bb2b-48a74971fa14
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1414630785017_0017, Tracking URL = http://sandbox.hortonworks.com:8088/proxy/application_1414630785017_0017/
Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill job_1414630785017_0017
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2014-10-30 00:27:56,573 Stage-1 map = 0%,  reduce = 0%
2014-10-30 00:28:08,218 Stage-1 map = 100%,  reduce = 0%
Ended Job = job_1414630785017_0017
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://sandbox.hortonworks.com:8020/tmp/hive-root/hive_2014-10-30_00-27-32_394_2850138279224601231-1/-ext-10001
Moving data to: hdfs://sandbox.hortonworks.com:8020/apps/hive/warehouse/hvac_orc
Table default.hvac_orc stats: [numFiles=1, numRows=8000, totalSize=26593, rawDataSize=1768000]
MapReduce Jobs Launched:
Job 0:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 37.248 seconds

Step 2:

Run the following statement to enable Tez.

hive> set hive.execution.engine=tez;

Step 3:

hive> select date, count(buildingid) from hvac group by date;
Query ID = root_20141030003131_d3ff96e0-a7d7-42b2-ab4a-bcfa6507a5dd
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
OK
6/1/13  267
6/10/13 267
6/11/13 267
6/12/13 267
6/13/13 267
6/14/13 267
6/15/13 267
6/16/13 267
6/17/13 267
6/18/13 267
6/19/13 267
6/2/13  267
6/20/13 267
6/21/13 266
6/22/13 266
6/23/13 266
6/24/13 266
6/25/13 266
6/26/13 266
6/27/13 266
6/28/13 266
6/29/13 266
6/3/13  267
6/30/13 266
6/4/13  267
6/5/13  267
6/6/13  267
6/7/13  267
6/8/13  267
6/9/13  267
Time taken: 25.633 seconds, Fetched: 30 row(s)

Step 4:

hive> select date, count(buildingid) from hvac_orc group by date;
Query ID = root_20141030003131_fcf22ec7-ff7f-4a01-8316-ad07929929d8
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/-      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
OK
6/1/13  267
6/10/13 267
6/11/13 267
6/12/13 267
6/13/13 267
6/14/13 267
6/15/13 267
6/16/13 267
6/17/13 267
6/18/13 267
6/19/13 267
6/2/13  267
6/20/13 267
6/21/13 266
6/22/13 266
6/23/13 266
6/24/13 266
6/25/13 266
6/26/13 266
6/27/13 266
6/28/13 266
6/29/13 266
6/3/13  267
6/30/13 266
6/4/13  267
6/5/13  267
6/6/13  267
6/7/13  267
6/8/13  267
6/9/13  267
Time taken: 10.683 seconds, Fetched: 30 row(s)

Step 5:

Now let’s run the following steps to enable vectorization:

set hive.vectorized.execution.enabled;
hive> set hive.vectorized.execution.enabled;
hive.vectorized.execution.enabled=true

and then run the sql query from previous step

select date, count(buildingid) from hvac_orc group by date;
hive> select date, count(buildingid) from hvac_orc group by date;
Query ID = root_20141030003232_da026977-69c2-4dd8-8648-56c73434b4d0
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0019)Map 1: -/-      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
OK
6/1/13  267
6/10/13 267
6/11/13 267
6/12/13 267
6/13/13 267
6/14/13 267
6/15/13 267
6/16/13 267
6/17/13 267
6/18/13 267
6/19/13 267
6/2/13  267
6/20/13 267
6/21/13 266
6/22/13 266
6/23/13 266
6/24/13 266
6/25/13 266
6/26/13 266
6/27/13 266
6/28/13 266
6/29/13 266
6/3/13  267
6/30/13 266
6/4/13  267
6/5/13  267
6/6/13  267
6/7/13  267
6/8/13  267
6/9/13  267
Time taken: 6.904 seconds, Fetched: 30 row(s)

Step 6:

Let’s look at the ‘explain’ plan to confirm that it is indeed using a vectorized query plan:

explain select date, count(buildingid) from hvac_orc group by date;
hive> explain select date, count(buildingid) from hvac_orc group by date;
OK
STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030003333_eef5ffbf-8427-4d9b-b4be-4a8994926a77:8Vertices:Map 1 Map Operator Tree:TableScanalias: hvac_orcStatistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: date (type: string), buildingid (type: bigint)outputColumnNames: date, buildingidStatistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: count(buildingid)keys: date (type: string)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: string)sort order: +Map-reduce partition columns: _col0 (type: string)Statistics: Num rows: 8000 Data size: 1768000 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: bigint)Execution mode: vectorizedReducer 2 Reduce Operator Tree:Group By Operatoraggregations: count(VALUE._col0)keys: KEY._col0 (type: string)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 4000 Data size: 884000 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 0.936 seconds, Fetched: 57 row(s)

Stats & Cost Based Optimization (CBO)

Cost Based Optimization(CBO) engine uses statistics within Hive tables to produce optimal query plans.

hive> desc hvac;
OK
date                    string
time                    string
targettemp              bigint
actualtemp              bigint
system                  bigint
systemage               bigint
buildingid              bigint
Time taken: 1.951 seconds, Fetched: 7 row(s)
hive> explain select buildingid, max(targettemp-actualtemp) from hvac group by buildingid;
OK
STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030003838_da1cb621-bd61-4533-b131-fe88a004fb9b:9Vertices:Map 1 Map Operator Tree:TableScanalias: hvacStatistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: buildingid (type: bigint), targettemp (type: bigint), actualtemp (type: bigint)outputColumnNames: buildingid, targettemp, actualtempStatistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEGroup By Operatoraggregations: max((targettemp - actualtemp))keys: buildingid (type: bigint)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEReduce Output Operatorkey expressions: _col0 (type: bigint)sort order: +Map-reduce partition columns: _col0 (type: bigint)Statistics: Num rows: 10022 Data size: 240531 Basic stats: COMPLETE Column stats: NONEvalue expressions: _col1 (type: bigint)Reducer 2 Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: bigint)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONESelect Operatorexpressions: _col0 (type: bigint), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 5011 Data size: 120265 Basic stats: COMPLETE Column stats: NONEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 0.374 seconds, Fetched: 56 row(s)
hive> analyze table hvac compute STATISTICS;
Query ID = root_20141030003939_a34fc478-076b-4563-9f6a-f1c4c8703acf
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.Status: Running (application id: application_1414630785017_0020)Map 1: -/-
Map 1: -/-
Map 1: -/-
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 0/1
Map 1: 1/1
Status: Finished successfully
Table default.hvac stats: [numFiles=1, numRows=8000, totalSize=240531, rawDataSize=232532]
OK
Time taken: 89.063 seconds
hive> ANALYZE TABLE hvac COMPUTE STATISTICS FOR COLUMNS targettemp,actualtemp,buildingid;
Query ID = root_20141030004343_cff30a2c-aa1c-48ff-af7c-ba92a4718693
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0020)Map 1: -/-      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
OK
Time taken: 26.089 seconds
hive> hive.compute.query.using.stats = true;
NoViableAltException(26@[])at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:999)at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:199)at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:408)at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322)at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:976)at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1041)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:912)at org.apache.hadoop.hive.ql.Driver.run(Driver.java:902)at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:423)at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
FAILED: ParseException line 1:0 cannot recognize input near 'hive' '.' 'compute'
hive> set hive.compute.query.using.stats = true;
hive> set hive.stats.fetch.column.stats = true;
hive> set hive.stats.fetch.partition.stats = true;
hive> set hive.cbo.enable = true;
hive> explain select buildingid, max(targettemp-actualtemp) from hvac group by buildingid;
OK
STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 is a root stageSTAGE PLANS:Stage: Stage-1TezEdges:Reducer 2 <- Map 1 (SIMPLE_EDGE)DagName: root_20141030004545_b1fff813-64cb-4254-a526-9d492c1f0cfd:12Vertices:Map 1 Map Operator Tree:TableScanalias: hvacStatistics: Num rows: 8000 Data size: 232532 Basic stats: COMPLETE Column stats: COMPLETESelect Operatorexpressions: buildingid (type: bigint), targettemp (type: bigint), actualtemp (type: bigint)outputColumnNames: buildingid, targettemp, actualtempStatistics: Num rows: 8000 Data size: 232532 Basic stats: COMPLETE Column stats: COMPLETEGroup By Operatoraggregations: max((targettemp - actualtemp))keys: buildingid (type: bigint)mode: hashoutputColumnNames: _col0, _col1Statistics: Num rows: 8000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEReduce Output Operatorkey expressions: _col0 (type: bigint)sort order: +Map-reduce partition columns: _col0 (type: bigint)Statistics: Num rows: 8000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEvalue expressions: _col1 (type: bigint)Reducer 2 Reduce Operator Tree:Group By Operatoraggregations: max(VALUE._col0)keys: KEY._col0 (type: bigint)mode: mergepartialoutputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 32000 Basic stats: COMPLETE Column stats: COMPLETESelect Operatorexpressions: _col0 (type: bigint), _col1 (type: bigint)outputColumnNames: _col0, _col1Statistics: Num rows: 4000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEFile Output Operatorcompressed: falseStatistics: Num rows: 4000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETEtable:input format: org.apache.hadoop.mapred.TextInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Time taken: 3.874 seconds, Fetched: 56 row(s)
hive> select buildingid, max(targettemp-actualtemp) from hvac group by buildingid;
Query ID = root_20141030004545_c7a5f1a2-4f62-4a72-b36e-7e3d7f9c0673
Total jobs = 1
Launching Job 1 out of 1Status: Running (application id: application_1414630785017_0020)Map 1: -/-      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 0/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 0/1
Map 1: 1/1      Reducer 2: 1/1
Status: Finished successfully
OK
1       14
2       15
3       15
4       15
5       15
6       15
7       14
8       15
9       14
10      15
11      14
12      15
13      15
14      15
15      15
16      15
17      15
18      15
19      15
20      15
Time taken: 56.247 seconds, Fetched: 20 row(s)

Apache Hive on Apache Tez相关推荐

  1. ssis组件_使用SSIS Hadoop组件连接到Apache Hive和Apache Pig

    ssis组件 In our previously published articles in this series, we talked about many SSIS Hadoop compone ...

  2. Apache Hive 快速入门 (CentOS 7.3 + Hadoop-2.8 + Hive-2.1.1)

    2019独角兽企业重金招聘Python工程师标准>>> 本文节选自<Netkiller Database 手札> 第 63 章 Apache Hive 目录 63.1. ...

  3. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask

    在hive启动之后,可以正常进入hive的交互命令行,创建表和查询表都可以正常使用,但是在对表进行插入时候报错. 如插入语句 hive (default)> insert into studen ...

  4. Apache Hive JdbcStorageHandler 编程入门指南

    以下博文转载自:https://www.iteblog.com/archives/2525.html Apache Hive 从 HIVE-1555 开始引入了 JdbcStorageHandler ...

  5. 4.1 数据仓库基础与Apache Hive入门

    数据仓库基础与Apache Hive入门 一.数据仓库基本概念 1.数据仓库概念 2. 案例:数据仓库为何而来 (1)业务数据的存储问题 (2)分析型决策的制定 3.数据仓库主要特征 面向主题性(Su ...

  6. Apache Hive实战基础

    本节目标 了解Hive的作用和优势 了解Hive的基本架构 了解Hive的数据类型 了解Hive的数据库和表操作 理解Hive的数据分区 理解Hive的数据分桶 4.1 数据仓库 4.1.1 为什么要 ...

  7. Apache Hive入门:模拟实现Hive功能、Hive架构、 组件

    一.Apache Hive概述 什么是Hive Apache Hive是一款建立在Hadoop之上的开源数据仓库系统,可以将存储在Hadoop文件中的结构化.半结构化数据文件映射为一张数据库表,  基 ...

  8. Hive结合Apache Ranger进行数据脱敏

    Hive概述         Apache Hive是构建在Hadoop之上的数据仓库,支持通过SQL接口查询分析存储在Hadoop中的数据. 在Hive出现之前,数据分析人员需要编写MapReduc ...

  9. Apache Ignite与Apache Hive的个人理解与总结

    首先,贴一下官网链接辟邪:官网链接 下面,就看我强行总结吧,如果理解有误,请大佬及时指正,感激不尽! Apache Ignite是啥玩意,在上一篇已经讲过了,可能还比较易于理解:理解Ignite传送门 ...

最新文章

  1. 解决Vue用v-html、v-text渲染后台富文本框文本内容样式修改问题,用自定义css样式无法渲染出对应效果的问题
  2. java中建立单链表_Java数据结构,单链表的建立
  3. 二、Windows下TortoiseGit的安装与配置
  4. [转帖]Sqlcmd使用详解
  5. Python字符串常用函数详解
  6. 为rm命令增加回收站功能
  7. jzoj3845-简单题【dp】
  8. VMware和NVIDIA推出新一代混合云架构
  9. opencv中的安全指针和指针对齐
  10. 机器学习——数据预处理
  11. 华为发布最强 AI 处理器昇腾 910,全场景 AI 框架 MindSpore 将开源
  12. 在SharePoint 2010中创建联系人Web数据库网站
  13. 深度学习caffe:激活函数
  14. 苹果mp3软件_第二十一期:喜马拉雅听书x2m格式转换mp3
  15. 【ubutun22.04】mac修改与吉林大学校园网链接
  16. @inherited 注解详解
  17. shopee虾皮注册很难吗?shopee虾皮注册有哪些注意事项?
  18. 区块链技术如何让租房市场回归理性?
  19. 无敌2_大师级鱼丸云吞终极海鲜面
  20. ThinkPad T410i 2516A21 升級手札(換SSD固態硬碟、I7 CPU、開機20秒)

热门文章

  1. keras、tf、numpy实现logloss对比
  2. box-sizing详解
  3. linux 在线帮助,linux教程之在线帮助
  4. mysql 整形转换_mysql IP地址整形转换
  5. 单相桥式相控整流电路multisim仿真_单相半波可控整流电路电阻负载的Matlab Simulink仿真...
  6. 【推荐系统】基于模型的协同过滤算法
  7. 七十五、Python | Leetcode哈希表系列
  8. 期末考试前的预习,科目:化工设备与反应器(6)
  9. 十、深入Java字符串(下篇)
  10. 十九、深入Python匿名函数