参考:lxw大数据田地:http://lxw1234.com/archives/2015/04/193.htm

数据准备:

CREATE EXTERNAL TABLE test_data (
month STRING,
day STRING,
cookieid STRING
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
stored as textfile location '/user/jc_rc_ftp/test_data';select * from test_data l;
+----------+-------------+-------------+--+
| l.month  |    l.day    | l.cookieid  |
+----------+-------------+-------------+--+
| 2015-03  | 2015-03-10  | cookie1     |
| 2015-03  | 2015-03-10  | cookie5     |
| 2015-03  | 2015-03-12  | cookie7     |
| 2015-04  | 2015-04-12  | cookie3     |
| 2015-04  | 2015-04-13  | cookie2     |
| 2015-04  | 2015-04-13  | cookie4     |
| 2015-04  | 2015-04-16  | cookie4     |
| 2015-03  | 2015-03-10  | cookie2     |
| 2015-03  | 2015-03-10  | cookie3     |
| 2015-04  | 2015-04-12  | cookie5     |
| 2015-04  | 2015-04-13  | cookie6     |
| 2015-04  | 2015-04-15  | cookie3     |
| 2015-04  | 2015-04-15  | cookie2     |
| 2015-04  | 2015-04-16  | cookie1     |
+----------+-------------+-------------+--+
14 rows selected (0.249 seconds)

GROUPING SETS

在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL

SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
GROUPING SETS (month,day)
ORDER BY GROUPING__ID;等价于
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| 2015-04  | NULL        | 6   | 1             |
| 2015-03  | NULL        | 5   | 1             |
| NULL     | 2015-04-16  | 2   | 2             |
| NULL     | 2015-04-15  | 2   | 2             |
| NULL     | 2015-04-13  | 3   | 2             |
| NULL     | 2015-04-12  | 2   | 2             |
| NULL     | 2015-03-12  | 1   | 2             |
| NULL     | 2015-03-10  | 4   | 2             |
+----------+-------------+-----+---------------+--+
8 rows selected (177.299 seconds)SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
GROUPING SETS (month,day,(month,day))
ORDER BY GROUPING__ID;等价于
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| 2015-04  | NULL        | 6   | 1             |
| 2015-03  | NULL        | 5   | 1             |
| NULL     | 2015-03-10  | 4   | 2             |
| NULL     | 2015-04-16  | 2   | 2             |
| NULL     | 2015-04-15  | 2   | 2             |
| NULL     | 2015-04-13  | 3   | 2             |
| NULL     | 2015-04-12  | 2   | 2             |
| NULL     | 2015-03-12  | 1   | 2             |
| 2015-04  | 2015-04-16  | 2   | 3             |
| 2015-04  | 2015-04-12  | 2   | 3             |
| 2015-04  | 2015-04-13  | 3   | 3             |
| 2015-03  | 2015-03-12  | 1   | 3             |
| 2015-03  | 2015-03-10  | 4   | 3             |
| 2015-04  | 2015-04-15  | 2   | 3             |
+----------+-------------+-----+---------------+--+

备注:其中的 GROUPING__ID,表示结果属于哪一个分组集合。

CUBE

根据GROUP BY的维度的所有组合进行聚合。

SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
WITH CUBE
ORDER BY GROUPING__ID;等价于
SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM test_data
UNION ALL
SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM test_data GROUP BY month
UNION ALL
SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM test_data GROUP BY day
UNION ALL
SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM test_data GROUP BY month,day
+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| NULL     | NULL        | 7   | 0             |
| 2015-03  | NULL        | 5   | 1             |
| 2015-04  | NULL        | 6   | 1             |
| NULL     | 2015-04-16  | 2   | 2             |
| NULL     | 2015-04-15  | 2   | 2             |
| NULL     | 2015-04-13  | 3   | 2             |
| NULL     | 2015-04-12  | 2   | 2             |
| NULL     | 2015-03-12  | 1   | 2             |
| NULL     | 2015-03-10  | 4   | 2             |
| 2015-04  | 2015-04-12  | 2   | 3             |
| 2015-04  | 2015-04-16  | 2   | 3             |
| 2015-03  | 2015-03-12  | 1   | 3             |
| 2015-03  | 2015-03-10  | 4   | 3             |
| 2015-04  | 2015-04-15  | 2   | 3             |
| 2015-04  | 2015-04-13  | 3   | 3             |
+----------+-------------+-----+---------------+--+

ROLLUP

是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。

比如,以month维度进行层级聚合:
SELECT
month,
day,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY month,day
WITH ROLLUP
ORDER BY GROUPING__ID;
可以实现这样的上钻过程:月天的UV->月的UV->总UV
+----------+-------------+-----+---------------+--+
|  month   |     day     | uv  | grouping__id  |
+----------+-------------+-----+---------------+--+
| NULL     | NULL        | 7   | 0             |
| 2015-04  | NULL        | 6   | 1             |
| 2015-03  | NULL        | 5   | 1             |
| 2015-04  | 2015-04-16  | 2   | 3             |
| 2015-04  | 2015-04-15  | 2   | 3             |
| 2015-04  | 2015-04-13  | 3   | 3             |
| 2015-04  | 2015-04-12  | 2   | 3             |
| 2015-03  | 2015-03-12  | 1   | 3             |
| 2015-03  | 2015-03-10  | 4   | 3             |
+----------+-------------+-----+---------------+--+--把month和day调换顺序,则以day维度进行层级聚合:
SELECT
day,
month,
COUNT(DISTINCT cookieid) AS uv,
GROUPING__ID
FROM test_data
GROUP BY day,month
WITH ROLLUP
ORDER BY GROUPING__ID;
+-------------+----------+-----+---------------+--+
|     day     |  month   | uv  | grouping__id  |
+-------------+----------+-----+---------------+--+
| NULL        | NULL     | 7   | 0             |
| 2015-04-12  | NULL     | 2   | 1             |
| 2015-04-15  | NULL     | 2   | 1             |
| 2015-03-12  | NULL     | 1   | 1             |
| 2015-04-16  | NULL     | 2   | 1             |
| 2015-03-10  | NULL     | 4   | 1             |
| 2015-04-13  | NULL     | 3   | 1             |
| 2015-04-16  | 2015-04  | 2   | 3             |
| 2015-04-15  | 2015-04  | 2   | 3             |
| 2015-04-13  | 2015-04  | 3   | 3             |
| 2015-03-12  | 2015-03  | 1   | 3             |
| 2015-03-10  | 2015-03  | 4   | 3             |
| 2015-04-12  | 2015-04  | 2   | 3             |
+-------------+----------+-----+---------------+--+

可以实现这样的上钻过程:
天月的UV->天的UV->总UV
(这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)

转载于:https://www.cnblogs.com/yy3b2007com/p/8583181.html

Hive函数:GROUPING SETS,GROUPING__ID,CUBE,ROLLUP相关推荐

  1. Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

    GROUPING SETS 该关键字可以实现同一数据集的多重group by操作.事实上GROUPING SETS是多个GROUP BY进行UNION ALL操作的简单表达,它仅仅使用一个stage完 ...

  2. Hive sql分组函数grouping sets、cube、rollup用法简介

    文章目录 1.数据如下: 2.建表如下: 3.grouping sets 4.cube 5.rollup 1.数据如下: user_id,dep_id,group_id,salary 10001,a, ...

  3. 大数据之hive:hive新功能之GROUPING SETS,Cube, Rollup

    目录 一.GROUPING SETS 1.概述 2.实战 二.Cube 1.概述 2.实战 三.Rollup 1.概述 2.实战 四.Grouping_ID函数 一.GROUPING SETS 1.概 ...

  4. presto和hive中grouping sets的格式不一致问题

    背景 遇到的问题,在presto中使用hive中的grouping sets报错 报错信息如下 [1] Query failed (#20220811_003524_00009_hzxre): lin ...

  5. Hive之grouping sets用法详解

    目录 关键字: 简单示例: 实例一: presto中grouping sets函数 关键字: GROUPING SETS: 根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UN ...

  6. mysql group by cube_group by、grouping sets、with rollup、with cube方法

    场景 在编写报表的 sql 脚本的时候,可能会遇到多维度组合的情况,例如下面的情况.常规的做法是编写不同维度组合的 sql ,然后再使用 union all 进行全集(当分组维度数量比较多的时候,un ...

  7. mysql grouping sets_Spark--Spark多维分析cube/rollup/grouping sets/group by

    概念简述 group by:主要用来对查询的结果进行分组,相同组合的分组条件在结果集中只显示一行记录.可以添加聚合函数. grouping sets:对分组集中指定的组表达式的每个子集执行group ...

  8. 【Hive】grouping sets() 函数

    文章目录 1. 语法 2. 例子 1. 语法 grouping sets()函数是一种将多个group by逻辑写在一个sql语句中的便利写法. 等价于将不同维度的GROUP BY结果集进行UNION ...

  9. hive通过grouping sets多维度组合去重统计避免使用distinct

    在hive中,如果遇到多维度组合统计,并且要进行去重统计,例如统计不同维度组合的访问用户数,比如统计运营商.手机品牌.网络类型的用户数,怎样避免不用ditinct(因为distinct效率低),并且g ...

最新文章

  1. python环境下,执行系统命令方法
  2. linux动态二进制翻译,仿真:解释和二进制翻译
  3. 利用ABAP调试器脚本修改数据库表的值
  4. Github pull request 工作流总结
  5. Object-C非正式协议与正式协议的区别
  6. 智能优化算法:蜜獾算法-附代码
  7. [knowledge][lisp] lisp与AI
  8. windows局域网共享文件
  9. 1211: 【入门】数字走向IV
  10. c语言和java哪个好学_学java前要学C语言吗?java和C语言哪个好学?
  11. 非对称加密之公钥加密与私钥加密的应用场景
  12. 基于.net的大型web开源免费erp
  13. mysql connstring_(最全的数据库连接字符串)connectionstring
  14. css怎样让字体变细,css怎么把字体变细?
  15. 相约,一朵春天的微笑
  16. 使用CentOS7.4搭建bgp网络实验Quagga
  17. 计算机应用最普遍的汉字字符编码是什么,计算机中,目前最普遍使用的汉字字符编码是__________...
  18. 1062 最简分数 (C++)
  19. html格子像素画,html – rotateY()文本模糊/像素化
  20. 无线应用安全剖析-sanr

热门文章

  1. java如何声明一个数组用来存储随机生成的字母并且保证不重复
  2. Java并发程序设计(二)Java并行程序基础
  3. listview的item中嵌套多个EditText时的问题
  4. TextBlock or Label?
  5. html5 多文件选择
  6. VBS脚本压缩IIS日志
  7. Akka之actor模型
  8. ElasticSearch多shard场景相关度打分不准确问题
  9. (74)FPGA面试题-Verilog序列发生器,产生 10110 的序列
  10. (26)FPGA计数器设计(VHDL代码实现)