over 与lateral view 的hive、spark sql执行计划
建表语句
create table test_over
(user_id string,login_date string
) COMMENT '测试函数使用,可以删除'
row format delimited
fields terminated by '\t';
over 执行计划
spark
spark-sql> explain select> user_id> ,login_date> ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date> from test_over;
22/03/10 10:55:50 INFO [main] CodeGenerator: Code generated in 9.641436 ms
== Physical Plan ==
Window [lag(login_date#34, 1, 0001-01-01) windowspecdefinition(user_id#33, login_date#34 ASC NULLS FIRST, specifiedwindowframe(RowFrame, -1, -1)) AS prev_date#30], [user_id#33], [login_date#34 ASC NULLS FIRST]
+- *(1) Sort [user_id#33 ASC NULLS FIRST, login_date#34 ASC NULLS FIRST], false, 0+- Exchange hashpartitioning(user_id#33, 200)+- Scan hive default.test_over [user_id#33, login_date#34], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#33, login_date#34]
Time taken: 0.098 seconds, Fetched 1 row(s)
22/03/10 10:55:50 INFO [main] SparkSQLCLIDriver: Time taken: 0.098 seconds, Fetched 1 row(s)
spark-sql> > explain > select> user_id> ,login_date> ,first_value(login_date) over(partition by user_id ) prev_date> from test_over;
== Physical Plan ==
Window [first(login_date#39, false) windowspecdefinition(user_id#38, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#35], [user_id#38]
+- *(1) Sort [user_id#38 ASC NULLS FIRST], false, 0+- Exchange hashpartitioning(user_id#38, 200)+- Scan hive default.test_over [user_id#38, login_date#39], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#38, login_date#39]
Time taken: 0.077 seconds, Fetched 1 row(s)
22/03/10 10:57:34 INFO [main] SparkSQLCLIDriver: Time taken: 0.077 seconds, Fetched 1 row(s)
spark-sql> > > explain select> user_id> ,login_date> ,max(login_date) over(partition by user_id ) prev_date> from test_over;
== Physical Plan ==
Window [max(login_date#45) windowspecdefinition(user_id#44, specifiedwindowframe(RowFrame, unboundedpreceding$(), unboundedfollowing$())) AS prev_date#41], [user_id#44]
+- *(1) Sort [user_id#44 ASC NULLS FIRST], false, 0+- Exchange hashpartitioning(user_id#44, 200)+- Scan hive default.test_over [user_id#44, login_date#45], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#44, login_date#45]
Time taken: 0.081 seconds, Fetched 1 row(s)
22/03/10 10:58:15 INFO [main] SparkSQLCLIDriver: Time taken: 0.081 seconds, Fetched 1 row(s)
hive
hive> explain select> user_id> ,login_date> ,lag(login_date,1,'0001-01-01') over(partition by user_id order by login_date) prev_date> from test_over;
OK
STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 depends on stages: Stage-1STAGE PLANS:Stage: Stage-1Map ReduceMap Operator Tree:TableScanalias: test_overStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEReduce Output Operatorkey expressions: user_id (type: string), login_date (type: string)sort order: ++Map-reduce partition columns: user_id (type: string)Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEExecution mode: vectorizedReduce Operator Tree:Select Operatorexpressions: KEY.reducesinkkey0 (type: string), KEY.reducesinkkey1 (type: string)outputColumnNames: _col0, _col1Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEPTF OperatorFunction definitions:Input definitioninput alias: ptf_0output shape: _col0: string, _col1: stringtype: WINDOWINGWindowing table definitioninput alias: ptf_1name: windowingtablefunctionorder by: _col1 ASC NULLS FIRSTpartition by: _col0raw input shape:window functions:window function definitionalias: lag_window_0arguments: _col1, 1, '0001-01-01'name: lagwindow function: GenericUDAFLagEvaluatorwindow frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)isPivotResult: trueStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: string), lag_window_0 (type: string)outputColumnNames: _col0, _col1, _col2Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEtable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Processor Tree:ListSink
Time taken: 0.211 seconds, Fetched: 61 row(s)
hive> explain select> user_id> ,login_date> ,max(login_date) over(partition by user_id ) prev_date> from test_over;
OK
STAGE DEPENDENCIES:Stage-1 is a root stageStage-0 depends on stages: Stage-1STAGE PLANS:Stage: Stage-1Map ReduceMap Operator Tree:TableScanalias: test_overStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEReduce Output Operatorkey expressions: user_id (type: string)sort order: +Map-reduce partition columns: user_id (type: string)Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEvalue expressions: login_date (type: string)Execution mode: vectorizedReduce Operator Tree:Select Operatorexpressions: KEY.reducesinkkey0 (type: string), VALUE._col0 (type: string)outputColumnNames: _col0, _col1Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEPTF OperatorFunction definitions:Input definitioninput alias: ptf_0output shape: _col0: string, _col1: stringtype: WINDOWINGWindowing table definitioninput alias: ptf_1name: windowingtablefunctionorder by: _col0 ASC NULLS FIRSTpartition by: _col0raw input shape:window functions:window function definitionalias: max_window_0arguments: _col1name: maxwindow function: GenericUDAFMaxEvaluatorwindow frame: ROWS PRECEDING(MAX)~FOLLOWING(MAX)Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: string), max_window_0 (type: string)outputColumnNames: _col0, _col1, _col2Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEFile Output Operatorcompressed: falseStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEtable:input format: org.apache.hadoop.mapred.SequenceFileInputFormatoutput format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormatserde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDeStage: Stage-0Fetch Operatorlimit: -1Processor Tree:ListSinkTime taken: 3.278 seconds, Fetched: 61 row(s)
lateral view 执行计划
spark
spark-sql> > explain select> user_id> ,login_date> ,single_num> from test_over> lateral view explode(split(login_date,'-')) tmp as single_num;
== Physical Plan ==
Generate explode(split(login_date#58, -)), [user_id#57, login_date#58], false, [single_num#59]
+- Scan hive default.test_over [user_id#57, login_date#58], HiveTableRelation `default`.`test_over`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [user_id#57, login_date#58]
Time taken: 0.103 seconds, Fetched 1 row(s)
22/03/10 14:39:38 INFO [main] SparkSQLCLIDriver: Time taken: 0.103 seconds, Fetched 1 row(s)
hive
hive> > explain select> user_id> ,login_date> ,single_num> from test_over> lateral view explode(split(login_date,'-')) tmp as single_num;
OK
STAGE DEPENDENCIES:Stage-0 is a root stageSTAGE PLANS:Stage: Stage-0Fetch Operatorlimit: -1Processor Tree:TableScanalias: test_overStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONELateral View ForwardStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONESelect Operatorexpressions: user_id (type: string), login_date (type: string)outputColumnNames: user_id, login_dateStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONELateral View Join OperatoroutputColumnNames: _col0, _col1, _col5Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)outputColumnNames: _col0, _col1, _col2Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONEListSinkSelect Operatorexpressions: split(login_date, '-') (type: array<string>)outputColumnNames: _col0Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEUDTF OperatorStatistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONEfunction name: explodeLateral View Join OperatoroutputColumnNames: _col0, _col1, _col5Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONESelect Operatorexpressions: _col0 (type: string), _col1 (type: string), _col5 (type: string)outputColumnNames: _col0, _col1, _col2Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONEListSinkTime taken: 0.081 seconds, Fetched: 41 row(s)
over 与lateral view 的hive、spark sql执行计划相关推荐
- Hive/Spark SQL使用案例
Hive/Spark SQL使用案例 求 TOPN:开窗函数 求天数:datediff() 函数 求每个学生的成绩都大于...系列:开窗 / 分组 表转置/行转列系列一:concat_ws 函数 表转 ...
- oracle执行计划走索引类型,SQL执行计划问题:where条件是主键(NUMBER类型字段)LIKE :VAR,为什么执行计划不走索引?...
SQL执行计划问题:where条件是主键(NUMBER类型字段)LIKE :VAR,为什么执行计划不走索引? 中文社区 (MOSC) 数据库 (MOSC) 6 Replies Last update ...
- 一次搞定各种数据库SQL执行计划
作者 | 董旭阳TonyDong 出品 | CSDN 博客 执行计划(execution plan,也叫查询计划或者解释计划)是数据库执行 SQL 语句的具体步骤,例如通过索引还是全表扫描访问表中的数 ...
- sql执行组件是灰色的_如何分析SQL执行计划图形组件
sql执行组件是灰色的 In the previous articles of this series, SQL Server Execution Plans overview and SQL Ser ...
- sql 执行计划 嵌套循环_性能调优–嵌套和合并SQL循环与执行计划
sql 执行计划 嵌套循环 In this article, we will explore Nested and Merge SQL Loops in the SQL Execution plan ...
- Oracle 查看 SQL执行计划
Oracle 查看 SQL执行计划 SQL性能分析 执行计划可以用来分析SQL的性能 一.查看执行计划的方法 1. 设置autotrace set autotrace off: 此为默认值,即关闭au ...
- mysql 执行计划extra_SQL优化 MySQL版 -分析explain SQL执行计划与Extra
Extra 作者 : Stanley 罗昊 [转载请注明出处和署名,谢谢!] 注:此文章必须有一定的Mysql基础,或观看执行计划入门篇传送门: https://www.cnblogs.com/Sta ...
- Oracle查看SQL执行计划的方式
Oracle查看SQL执行计划的方式 获取Oracle sql执行计划并查看执行计划,是掌握和判断数据库性能的基本技巧.下面案例介绍了多种查看sql执行计划的方式: 基本有以下几种方式: 1.通过sq ...
- sql server varchar最大长度_来自灵魂的拷问—知道什么是SQL执行计划吗?
面试官说:工作这么久了,应该知道sql执行计划吧,讲讲Sql的执行计划吧!看了看面试官手臂上纹的大花臂和一串看不懂的韩文,吞了吞口水,暗示自己镇定点,整理了一下思绪缓缓的对面试官说:我不会 面试官:. ...
最新文章
- java 加载dll后打包_让Jacob从当前路径读取dll文件及相关打包方法
- 菜单Menu(AS开发实战第四章学习笔记)
- 线程通信问题--生产者和消费者问题
- Docker 修改运行中的容器端口映射
- 从0到1开发实战手机站(二):Git提交规范配置
- linux日志文件备份,linux配置文件、日志文件全备份
- Okhttp 插入缓存拦截器 解析
- matebook13linux送U盘系统,HUAWEI MateBook 13笔记本U盘安装win10系统的操作教程
- java 队列的使用
- JSON.stringify()实现原理
- 曾鸣:区块链中没有绝对的“去中心化”
- 海康摄像头如何查看IP,重置密码
- 【实用工具】如何录制电脑屏幕gif动图?
- 图片右侧加文字html完整代码,怎么用css在图片右下方添加文字
- 时间曲线统计图数据结构,时间工具
- windows7最简单最快速解决“此windows副本不是正版”(“This copy of Windows is not genuine”)方法
- GPU显存占满但利用率却很低
- Disruptor笔记
- 总结餐饮行业现状痛点
- [附源码]Node.js计算机毕业设计高校心理咨询管理系统Express