Hive函数:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
参考:lxw大数据田地:http://lxw1234.com/archives/2015/04/193.htm
数据准备:
CREATE EXTERNAL TABLE test_data ( month STRING, day STRING, cookieid STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/user/jc_rc_ftp/test_data';select * from test_data l; +----------+-------------+-------------+--+ | l.month | l.day | l.cookieid | +----------+-------------+-------------+--+ | 2015-03 | 2015-03-10 | cookie1 | | 2015-03 | 2015-03-10 | cookie5 | | 2015-03 | 2015-03-12 | cookie7 | | 2015-04 | 2015-04-12 | cookie3 | | 2015-04 | 2015-04-13 | cookie2 | | 2015-04 | 2015-04-13 | cookie4 | | 2015-04 | 2015-04-16 | cookie4 | | 2015-03 | 2015-03-10 | cookie2 | | 2015-03 | 2015-03-10 | cookie3 | | 2015-04 | 2015-04-12 | cookie5 | | 2015-04 | 2015-04-13 | cookie6 | | 2015-04 | 2015-04-15 | cookie3 | | 2015-04 | 2015-04-15 | cookie2 | | 2015-04 | 2015-04-16 | cookie1 | +----------+-------------+-------------+--+ 14 rows selected (0.249 seconds)
GROUPING SETS
在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day GROUPING SETS (month,day) ORDER BY GROUPING__ID;等价于 SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day+----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | 2015-04 | NULL | 6 | 1 | | 2015-03 | NULL | 5 | 1 | | NULL | 2015-04-16 | 2 | 2 | | NULL | 2015-04-15 | 2 | 2 | | NULL | 2015-04-13 | 3 | 2 | | NULL | 2015-04-12 | 2 | 2 | | NULL | 2015-03-12 | 1 | 2 | | NULL | 2015-03-10 | 4 | 2 | +----------+-------------+-----+---------------+--+ 8 rows selected (177.299 seconds)SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day GROUPING SETS (month,day,(month,day)) ORDER BY GROUPING__ID;等价于 SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day UNION ALL SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | 2015-04 | NULL | 6 | 1 | | 2015-03 | NULL | 5 | 1 | | NULL | 2015-03-10 | 4 | 2 | | NULL | 2015-04-16 | 2 | 2 | | NULL | 2015-04-15 | 2 | 2 | | NULL | 2015-04-13 | 3 | 2 | | NULL | 2015-04-12 | 2 | 2 | | NULL | 2015-03-12 | 1 | 2 | | 2015-04 | 2015-04-16 | 2 | 3 | | 2015-04 | 2015-04-12 | 2 | 3 | | 2015-04 | 2015-04-13 | 3 | 3 | | 2015-03 | 2015-03-12 | 1 | 3 | | 2015-03 | 2015-03-10 | 4 | 3 | | 2015-04 | 2015-04-15 | 2 | 3 | +----------+-------------+-----+---------------+--+
备注:其中的 GROUPING__ID,表示结果属于哪一个分组集合。
CUBE
根据GROUP BY的维度的所有组合进行聚合。
SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day WITH CUBE ORDER BY GROUPING__ID;等价于 SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM test_data UNION ALL SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month UNION ALL SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day UNION ALL SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | NULL | NULL | 7 | 0 | | 2015-03 | NULL | 5 | 1 | | 2015-04 | NULL | 6 | 1 | | NULL | 2015-04-16 | 2 | 2 | | NULL | 2015-04-15 | 2 | 2 | | NULL | 2015-04-13 | 3 | 2 | | NULL | 2015-04-12 | 2 | 2 | | NULL | 2015-03-12 | 1 | 2 | | NULL | 2015-03-10 | 4 | 2 | | 2015-04 | 2015-04-12 | 2 | 3 | | 2015-04 | 2015-04-16 | 2 | 3 | | 2015-03 | 2015-03-12 | 1 | 3 | | 2015-03 | 2015-03-10 | 4 | 3 | | 2015-04 | 2015-04-15 | 2 | 3 | | 2015-04 | 2015-04-13 | 3 | 3 | +----------+-------------+-----+---------------+--+
ROLLUP
是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。
比如,以month维度进行层级聚合: SELECT month, day, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY month,day WITH ROLLUP ORDER BY GROUPING__ID; 可以实现这样的上钻过程:月天的UV->月的UV->总UV +----------+-------------+-----+---------------+--+ | month | day | uv | grouping__id | +----------+-------------+-----+---------------+--+ | NULL | NULL | 7 | 0 | | 2015-04 | NULL | 6 | 1 | | 2015-03 | NULL | 5 | 1 | | 2015-04 | 2015-04-16 | 2 | 3 | | 2015-04 | 2015-04-15 | 2 | 3 | | 2015-04 | 2015-04-13 | 3 | 3 | | 2015-04 | 2015-04-12 | 2 | 3 | | 2015-03 | 2015-03-12 | 1 | 3 | | 2015-03 | 2015-03-10 | 4 | 3 | +----------+-------------+-----+---------------+--+--把month和day调换顺序,则以day维度进行层级聚合: SELECT day, month, COUNT(DISTINCT cookieid) AS uv, GROUPING__ID FROM test_data GROUP BY day,month WITH ROLLUP ORDER BY GROUPING__ID; +-------------+----------+-----+---------------+--+ | day | month | uv | grouping__id | +-------------+----------+-----+---------------+--+ | NULL | NULL | 7 | 0 | | 2015-04-12 | NULL | 2 | 1 | | 2015-04-15 | NULL | 2 | 1 | | 2015-03-12 | NULL | 1 | 1 | | 2015-04-16 | NULL | 2 | 1 | | 2015-03-10 | NULL | 4 | 1 | | 2015-04-13 | NULL | 3 | 1 | | 2015-04-16 | 2015-04 | 2 | 3 | | 2015-04-15 | 2015-04 | 2 | 3 | | 2015-04-13 | 2015-04 | 3 | 3 | | 2015-03-12 | 2015-03 | 1 | 3 | | 2015-03-10 | 2015-03 | 4 | 3 | | 2015-04-12 | 2015-04 | 2 | 3 | +-------------+----------+-----+---------------+--+
可以实现这样的上钻过程:
天月的UV->天的UV->总UV
(这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)
转载于:https://www.cnblogs.com/yy3b2007com/p/8583181.html
Hive函数:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP相关推荐
- Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP
GROUPING SETS 该关键字可以实现同一数据集的多重group by操作.事实上GROUPING SETS是多个GROUP BY进行UNION ALL操作的简单表达,它仅仅使用一个stage完 ...
- Hive sql分组函数grouping sets、cube、rollup用法简介
文章目录 1.数据如下: 2.建表如下: 3.grouping sets 4.cube 5.rollup 1.数据如下: user_id,dep_id,group_id,salary 10001,a, ...
- 大数据之hive:hive新功能之GROUPING SETS,Cube, Rollup
目录 一.GROUPING SETS 1.概述 2.实战 二.Cube 1.概述 2.实战 三.Rollup 1.概述 2.实战 四.Grouping_ID函数 一.GROUPING SETS 1.概 ...
- presto和hive中grouping sets的格式不一致问题
背景 遇到的问题,在presto中使用hive中的grouping sets报错 报错信息如下 [1] Query failed (#20220811_003524_00009_hzxre): lin ...
- Hive之grouping sets用法详解
目录 关键字: 简单示例: 实例一: presto中grouping sets函数 关键字: GROUPING SETS: 根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UN ...
- mysql group by cube_group by、grouping sets、with rollup、with cube方法
场景 在编写报表的 sql 脚本的时候,可能会遇到多维度组合的情况,例如下面的情况.常规的做法是编写不同维度组合的 sql ,然后再使用 union all 进行全集(当分组维度数量比较多的时候,un ...
- mysql grouping sets_Spark--Spark多维分析cube/rollup/grouping sets/group by
概念简述 group by:主要用来对查询的结果进行分组,相同组合的分组条件在结果集中只显示一行记录.可以添加聚合函数. grouping sets:对分组集中指定的组表达式的每个子集执行group ...
- 【Hive】grouping sets() 函数
文章目录 1. 语法 2. 例子 1. 语法 grouping sets()函数是一种将多个group by逻辑写在一个sql语句中的便利写法. 等价于将不同维度的GROUP BY结果集进行UNION ...
- hive通过grouping sets多维度组合去重统计避免使用distinct
在hive中,如果遇到多维度组合统计,并且要进行去重统计,例如统计不同维度组合的访问用户数,比如统计运营商.手机品牌.网络类型的用户数,怎样避免不用ditinct(因为distinct效率低),并且g ...
最新文章
- python环境下,执行系统命令方法
- linux动态二进制翻译,仿真:解释和二进制翻译
- 利用ABAP调试器脚本修改数据库表的值
- Github pull request 工作流总结
- Object-C非正式协议与正式协议的区别
- 智能优化算法:蜜獾算法-附代码
- [knowledge][lisp] lisp与AI
- windows局域网共享文件
- 1211: 【入门】数字走向IV
- c语言和java哪个好学_学java前要学C语言吗?java和C语言哪个好学?
- 非对称加密之公钥加密与私钥加密的应用场景
- 基于.net的大型web开源免费erp
- mysql connstring_(最全的数据库连接字符串)connectionstring
- css怎样让字体变细,css怎么把字体变细?
- 相约,一朵春天的微笑
- 使用CentOS7.4搭建bgp网络实验Quagga
- 计算机应用最普遍的汉字字符编码是什么,计算机中,目前最普遍使用的汉字字符编码是__________...
- 1062 最简分数 (C++)
- html格子像素画,html – rotateY()文本模糊/像素化
- 无线应用安全剖析-sanr