本教程中简要介绍了分析钻1.2,即ANSI兼容sql解析和窗口函数。钻支持以下SQL窗口函数:

  • PATRIRION BY和OVER条款

  • 各种聚合窗口函数,Sum,Max,Min,Count,Avg

  • 分析函数,如First_Value Last_Value、Lead、Lag,NTile,Row_Number和Rank

窗口功能多种多样。你可以减少连接、子查询和需要编写显式游标。窗口函数解决各种用例以最少的编码工作。

本教程建立在之前的教程,分析Yelp学术数据集 和 分析高度动态数据集,并使用相同的Yelp数据集。

开始

  1. 要开始,请下载 Yelp (业务审查)。

  2. 安装和启动钻。

  3. 在Drill中列出可用的模式列表。

SHOW schemas;
+---------------------+
|     SCHEMA_NAME     |
+---------------------+
| INFORMATION_SCHEMA  |
| cp.default          |
| dfs.default         |
| dfs.root            |
| dfs.tmp             |
| dfs.yelp            |
| sys                 |
+---------------------+7 rows selected (1.755 seconds)
  1. 切换到使用Yelp的工作区数据加载。

    USE dfs.yelp;

    +-------+---------------------------------------+
    | ok | summary |
    +-------+---------------------------------------+
    | true | Default schema changed to [dfs.yelp] |
    +-------+---------------------------------------+

    1 row selected (0.129 seconds)

  2. 开始探索的一个数据集在Yelp数据集——业务信息。

    SELECT * FROM business.json LIMIT 1;

    +------------------------+-----------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+--------------------------------+---------+--------------+-------------------+-------------+-------+-------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+
    | business_id | full_address | hours | open | categories | city | review_count | name | longitude | state | stars | latitude | attributes | type | neighborhoods |
    +------------------------+--------------+------+-------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+--------------------------------+---------+--------------+-------------------+-------------+-------+-------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+
    | vcNAWiLM4dR7D2nwwJ7nCA | 4840 E Indian School Rd Ste 101 Phoenix, AZ 85018 | {"Tuesday":{"close":"17:00","open":"08:00"},"Friday":{"close":"17:00","open":"08:00"},"Monday":{"close":"17:00","open":"08:00"},"Wednesday":{"close":"17:00","open":"08:00"},"Thursday":{"close":"17:00","open":"08:00"},"Sunday":{},"Saturday":{}} | true | ["Doctors","Health & Medical"] | Phoenix | 7 | Eric Goldberg, MD | -111.983758 | AZ | 3.5 | 33.499313 | {"By Appointment Only":true,"Good Ambience":{},"Parking":{},"Music":{},"Hair Types Specialized In":{},"Payment Types":{},"Dietary Restrictions":{}} | business | [] |
    +-------------+--------------+-------+------+------------+------+--------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+--------------------------------+---------+--------------+-------------------+-------------+-------+-------+-----------+-----------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------------+
    1 row selected (0.514 seconds)

使用窗口功能简单的查询

  1. 得到最高Yelp企业基于评论在每个城市数量和业务的行号。

    SELECT name, city, review_count, row_number()
    OVER (PARTITION BY city ORDER BY review_count DESC) AS rownum
    FROM business.json LIMIT 15;

    +----------------------------------------+------------+---------------+---------+
    | name | city | review_count | rownum |
    +----------------------------------------+------------+---------------+---------+
    | Cupz N' Crepes | Ahwatukee | 124 | 1 |
    | My Wine Cellar | Ahwatukee | 98 | 2 |
    | Kathy's Alterations | Ahwatukee | 12 | 3 |
    | McDonald's | Ahwatukee | 7 | 4 |
    | U-Haul | Ahwatukee | 5 | 5 |
    | Hi-Health | Ahwatukee | 4 | 6 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 7 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 8 |
    | Roberto's Authentic Mexican Food | Anthem | 117 | 1 |
    | Q to U BBQ | Anthem | 74 | 2 |
    | Outlets At Anthem | Anthem | 64 | 3 |
    | Dara Thai | Anthem | 56 | 4 |
    | Cafe Provence | Anthem | 53 | 5 |
    | Shanghai Club | Anthem | 50 | 6 |
    | Two Brothers Kitchen | Anthem | 43 | 7 |
    +----------------------------------------+------------+---------------+---------+
    15 rows selected (0.67 seconds)

  2. 检查评论数量的每个业务相比评论的平均数量在所有业务。

    SELECT name, city,review_count,
    Avg(review_count) OVER (PARTITION BY City) AS city_reviews_avg
    FROM business.json LIMIT 15;

    +----------------------------------------+------------+---------------+---------------------+
    | name | city | review_count | city_reviews_avg |
    +----------------------------------------+------------+---------------+---------------------+
    | Hi-Health | Ahwatukee | 4 | 32.25 |
    | My Wine Cellar | Ahwatukee | 98 | 32.25 |
    | U-Haul | Ahwatukee | 5 | 32.25 |
    | Cupz N' Crepes | Ahwatukee | 124 | 32.25 |
    | McDonald's | Ahwatukee | 7 | 32.25 |
    | Kathy's Alterations | Ahwatukee | 12 | 32.25 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 32.25 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 32.25 |
    | Anthem Community Center | Anthem | 4 | 14.492063492063492 |
    | Scrapbooks To Remember | Anthem | 4 | 14.492063492063492 |
    | Hungry Howie's Pizza | Anthem | 7 | 14.492063492063492 |
    | Pinata Nueva | Anthem | 3 | 14.492063492063492 |
    | Starbucks Coffee Company | Anthem | 13 | 14.492063492063492 |
    | Pizza Hut | Anthem | 6 | 14.492063492063492 |
    | Rays Pizza | Anthem | 19 | 14.492063492063492 |
    +----------------------------------------+------------+---------------+---------------------+
    15 rows selected (0.395 seconds)

  3. 看看为每个业务贡献的评论数量对所有企业在城市的总数。

SELECT name, city,review_count,
Sum(review_count) OVER (PARTITION BY City) AS city_reviews_sum
FROM business.jsonlimit 15;

name city review_count city_reviews_sum
+----------------------------------------+------------+---------------+-------------------+
Hi-Health Ahwatukee 4 258
My Wine Cellar Ahwatukee 98 258
U-Haul Ahwatukee 5 258
Cupz N' Crepes Ahwatukee 124 258
McDonald's Ahwatukee 7 258
Kathy's Alterations Ahwatukee 12 258
Healthy and Clean Living Environments Ahwatukee 4 258
Active Kids Pediatrics Ahwatukee 4 258
Anthem Community Center Anthem 4 913
Scrapbooks To Remember Anthem 4 913
Hungry Howie's Pizza Anthem 7 913
Pinata Nueva Anthem 3 913
Starbucks Coffee Company Anthem 13 913
Pizza Hut Anthem 6 913
Rays Pizza Anthem 19 913

15 rows selected (0.543 seconds)

对复杂的查询使用窗口函数

  1. 排名前十的城市名单和他们的排名最高的企业数量的评论。使用钻窗口等功能等级,dense_rank这些查询。

    WITH X
    AS
    (SELECT name, city, review_count,
    RANK()
    OVER (PARTITION BY city
    ORDER BY review_count DESC) AS review_rank
    FROM business.json)
    SELECT X.name, X.city, X.review_count
    FROM X
    WHERE X.review_rank =1 ORDER BY review_count DESC LIMIT 10;

    +-------------------------------------------+-------------+---------------+
    | name | city | review_count |
    +-------------------------------------------+-------------+---------------+
    | Mon Ami Gabi | Las Vegas | 4084 |
    | Studio B | Henderson | 1336 |
    | Phoenix Sky Harbor International Airport | Phoenix | 1325 |
    | Four Peaks Brewing Co | Tempe | 1110 |
    | The Mission | Scottsdale | 783 |
    | Joe's Farm Grill | Gilbert | 770 |
    | The Old Fashioned | Madison | 619 |
    | Cornish Pasty Company | Mesa | 578 |
    | SanTan Brewing Company | Chandler | 469 |
    | Yard House | Glendale | 321 |
    +-------------------------------------------+-------------+---------------+
    10 rows selected (0.49 seconds)

  2. 比较评论的数量为每个业务在城市顶部和底部评论数。

    SELECT name, city, review_count,
    FIRST_VALUE(review_count)
    OVER(PARTITION BY city ORDER BY review_count DESC) AS top_review_count,
    LAST_VALUE(review_count)
    OVER(PARTITION BY city ORDER BY review_count DESC) AS bottom_review_count
    FROM business.json limit 15;

    +----------------------------------------+------------+---------------+-------------------+----------------------+
    | name | city | review_count | top_review_count | bottom_review_count |
    +----------------------------------------+------------+---------------+-------------------+----------------------+
    | My Wine Cellar | Ahwatukee | 98 | 124 | 12 |
    | McDonald's | Ahwatukee | 7 | 124 | 12 |
    | U-Haul | Ahwatukee | 5 | 124 | 12 |
    | Hi-Health | Ahwatukee | 4 | 124 | 12 |
    | Healthy and Clean Living Environments | Ahwatukee | 4 | 124 | 12 |
    | Active Kids Pediatrics | Ahwatukee | 4 | 124 | 12 |
    | Cupz N' Crepes | Ahwatukee | 124 | 124 | 12 |
    | Kathy's Alterations | Ahwatukee | 12 | 124 | 12 |
    | Q to U BBQ | Anthem | 74 | 117 | 117 |
    | Dara Thai | Anthem | 56 | 117 | 117 |
    | Cafe Provence | Anthem | 53 | 117 | 117 |
    | Shanghai Club | Anthem | 50 | 117 | 117 |
    | Two Brothers Kitchen | Anthem | 43 | 117 | 117 |
    | The Tennessee Grill | Anthem | 32 | 117 | 117 |
    | Dollyrockers Boutique and Salon | Anthem | 30 | 117 | 117 |
    +----------------------------------------+------------+---------------+-------------------+----------------------+
    15 rows selected (0.516 seconds)

  3. 比较的评论评论的数量之前和之后的业务。

    SELECT city, review_count, name,
    LAG(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC)
    AS preceding_count,
    LEAD(review_count, 1) OVER(PARTITION BY city ORDER BY review_count DESC)
    AS following_count
    FROM business.json limit 15;

    +------------+---------------+----------------------------------------+------------------+------------------+
    | city | review_count | name | preceding_count | following_count |
    +------------+---------------+----------------------------------------+------------------+------------------+
    | Ahwatukee | 124 | Cupz N' Crepes | null | 98 |
    | Ahwatukee | 98 | My Wine Cellar | 124 | 12 |
    | Ahwatukee | 12 | Kathy's Alterations | 98 | 7 |
    | Ahwatukee | 7 | McDonald's | 12 | 5 |
    | Ahwatukee | 5 | U-Haul | 7 | 4 |
    | Ahwatukee | 4 | Hi-Health | 5 | 4 |
    | Ahwatukee | 4 | Healthy and Clean Living Environments | 4 | 4 |
    | Ahwatukee | 4 | Active Kids Pediatrics | 4 | null |
    | Anthem | 117 | Roberto's Authentic Mexican Food | null | 74 |
    | Anthem | 74 | Q to U BBQ | 117 | 64 |
    | Anthem | 64 | Outlets At Anthem | 74 | 56 |
    | Anthem | 56 | Dara Thai | 64 | 53 |
    | Anthem | 53 | Cafe Provence | 56 | 50 |
    | Anthem | 50 | Shanghai Club | 53 | 43 |
    | Anthem | 43 | Two Brothers Kitchen | 50 | 32 |
    +------------+---------------+----------------------------------------+------------------+------------------+
    15 rows selected (0.518 seconds)

←分析社交媒体 安装钻→

Tutorials 使用窗口功能分析信息相关推荐

  1. 气象站可以用计算机分析什么数据,自动气象站监控软件窗口显示信息分析与应用...

    [摘 要]自动气象站监控软件(SAWSS)是自动气象站采集器与计算机的接口软件.其主要功能是对采集器进行控制:将采集器中的数据调取到计算机中,显示在实时数据监测窗口,形成各种规定的采集文件和实时传输数 ...

  2. autoHotkey —— 查看目标窗口的信息-工具方法,不是代码方法

    文章目录 autoHotkey -- 查看目标窗口的信息-工具方法,不是代码方法 基本环境 为什么要做这个事情 实现 autoHotkey -- 查看目标窗口的信息-工具方法,不是代码方法 这个操作其 ...

  3. 子窗口_不同线程下主窗口与子窗口的信息交互(一)

    在使用aardio编程时(aardio官方网站:http://www.aardio.com/),如何实现主界面线程与子窗口线程的信息交互?我们用实例来逐步研究一下. 一.在子线程运行一个子窗体 1.先 ...

  4. C# message简单实现窗口间信息接收与发送

    刚接触windows 不同程序 窗口消息传递,不理解IntPtr SendMessage(int hWnd, int msg, IntPtr wParam, IntPtr lParam)这函数怎么用? ...

  5. 宋分题——Java实现登录窗口 和 信息录入窗口

    编写一个登录窗口, 密码输入采用密码框,输入密码显示为"*",当输入用户名admin密码123的时候点击确定跳转到学生信息录入窗口界面,其他输入显示用户名密码错误.点击取消退出运行 ...

  6. js中的3种弹出式消息提醒(警告窗口,确认窗口,信息输入窗口)的命令是什么?

    弹出警告窗口"输入数据无效" 弹出确认窗口"确认保存吗?" 弹出信息输入窗口"请在此输入你的姓名" "输入数据无效": ...

  7. vue 使用高德地图给海量点标记,并点击标记弹出信息窗口,信息窗口绑定点击事件

    目录 一.需求 二.引入高德地图 2.1.将高德地图引入到项目中 2.2.查看官网快速上手,熟悉高德地图的主要API 2.3.海量点标注要用到的API文档 2.4.全部代码 其他: 一.需求 因为不知 ...

  8. windows cmd 窗口 显示信息慢_你玩过Windows 10新版CMD了吗?

    [PConline应用]CMD是Windows里一项经典的命令行工具,很多人认为PowerShell的出现将逐步取代CMD,成为新一代默认命令行.近日微软在Win10官方商店,发布了一款全新的Wind ...

  9. windows cmd 窗口 显示信息慢_Windows系统直接运行Linux,竟是如此简单

    要想在Windows10以前的系统上,运行Linux程序是比较复杂的,需要使用cygwin之类的工具,不仅下载慢,而且功能有限,后来出现了诸如VirtualBox,VMWare Workstation ...

  10. python可视化窗口打印信息,【python】Tkinter可视化窗口(一)

    因为想给自己的毕设要做个可视化,而不是简单地黑框框,就试着学了学Tkinter,发现上手起来是真的简单,在此,推荐给大家! Tkinter是什么 Tkinter是使用 python 进行窗口视窗设计的 ...

最新文章

  1. [BUAA软工]提问回顾与个人总结
  2. (1 LEETCODE)2. Add Two Numbers
  3. 斐波那契数列python递归 0、1、1、2、3_python: 递归和递推方法求斐波那契数列
  4. PHP面试题:你所知道的php数组相关的函数?
  5. 这里有一份面筋请查收(六)
  6. 第13课:动手制作自己的简易聊天机器人
  7. centeros6.8 mysql_centeros7安装mysql8,以及设置root密码
  8. 批量 材质 调整_游戏图形批量渲染及优化:Unity静态合批技术
  9. 《深入理解JVM.2nd》笔记(四):虚拟机性能监控与故障处理工具
  10. Allegro 导入ASC file的步骤
  11. Linux运维之--zabbix使用(实时更新)
  12. python glob模块的应用
  13. 谷歌插件如何下载到本地
  14. @hapi/joi@17.1.0和express-jwt使用问题
  15. 一步控制台编译java_在控制台运行一个 Java 程序 Test . class ,使用的命令正确的是( )_学小易找答案...
  16. 初识Hibernate——关系映射
  17. Python基于OpenCV的交通路口红绿灯控制系统设计
  18. 输出字符的ascii码
  19. HarmonyOS内存占用,华为HarmonyOS对比EMUI11:内存占用更少 系统更流畅
  20. 心电信号越界怎么回事_【心电学】易误诊为起搏器功能异常的心电图表现

热门文章

  1. Vscode C环境配置(转)
  2. 【语音去噪】基于matlab GUI软阈值+硬阈值+软硬折中阈值语音去噪【含Matlab源码 1810期】
  3. 【优化调度】基于matlab蚁群算法求解无等待流水线调度优化问题【含Matlab源码 1516期】
  4. 【图像分析】基于matlab小波变换图像分析【含Matlab源码 1365期】
  5. 【电力负荷预测】基于matlab量子粒子群算法优化LSTM短期电力负荷预测【含Matlab源码 1560期】
  6. 【表盘识别】基于matlab投影法电表表盘读数识别【含Matlab源码 1101期】
  7. 【TSP】基于matlab GUI混合粒子群算法求解旅行商问题【含Matlab源码 925期】
  8. 【指纹识别】基于matlab GUI指纹识别【含Matlab源码 586期】
  9. 强化学习在游戏中的作用_游戏中的强化学习
  10. python assert简单记忆方法