数据大师:

Jmx's Blog | Keep it Simple and Stupid!

猴子 - 知乎公众号(猴子数据分析)著有畅销书《数据分析思维》科普中国专家 回答数 647,获得 171,083 次赞同https://www.zhihu.com/people/houziliaorenwu

一、窗口函数:

1、通俗易懂的学会:SQL窗口函数 - 知乎一.窗口函数有什么用? 在日常工作中,经常会遇到需要在每组内排名,比如下面的业务需求:排名问题:每个部门按业绩来排名 topN问题:找出每个部门排名前N的员工进行奖励面对这类需求,就需要使用sql的高级功能窗…https://zhuanlan.zhihu.com/p/92654574

2、

4分钟了解什么是SQL窗口函数 - 51CTO.COM你也许很熟悉SQL的简单查询,比如使用SELECT FROM WHERE GROUP BY这样的基础语句,但是如果你想进一步提升自己的SQL技能,你不能不知道窗口函数(Window Function),又被叫做分析函数(Analytics Function)。https://database.51cto.com/art/202101/639239.htm

3、

Hive常用函数大全(二)(窗口函数、分析函数、增强group)_吃果冻不吐果冻皮-CSDN博客_hive 窗口函数关系运算## > < =##注意: String 的比较要注意(常用的时间比较可以先 to_date 之后再比较)select long_time>short_time, long_timehttps://blog.csdn.net/scgaliguodong123_/article/details/60135385,long_time=short_time,>

4、

Hive:窗口函数_不花的花和尚的博客-CSDN博客_窗口函数https://blog.csdn.net/weixin_38750084/article/details/82779910

5、

MySQL操作实战(二):窗口函数_陆-CSDN博客_mysql窗口函数https://blog.csdn.net/weixin_39010770/article/details/87862407

5.1、

HIVE 常用函数总结 - 知乎https://zhuanlan.zhihu.com/p/102502175

6、

MYSQL窗口函数 - 知乎https://zhuanlan.zhihu.com/p/138282683

7、

MySQL - SQL窗口函数_william_n的博客-CSDN博客_mysql 窗口函数https://blog.csdn.net/william_n/article/details/103236086

7.1、

MySQL FIRST_VALUE 函数 | 新手教程https://www.begtut.com/mysql/mysql-first_value-function.html

7.2、

mysql窗口函数排名_MySQL 8.0 窗口函数 排名、topN问题_姜东凯的博客-CSDN博客https://blog.csdn.net/weixin_42510924/article/details/113301371

7.3、(面试题,重点)

经典Hive-SQL面试题_loay-_-的博客-CSDN博客_hive sql 面试题https://blog.csdn.net/weixin_41836276/article/details/106289034

7.4、

Hive Sql中六种面试题型总结_lightupworld的博客-CSDN博客https://blog.csdn.net/lightupworld/article/details/108583548

7.5、数仓建设流程

数仓建设流程_lightupworld的博客-CSDN博客_数仓建设流程https://blog.csdn.net/lightupworld/article/details/108513990

7.6、

Hive窗口分析函数(案例详细讲解)_lightupworld的博客-CSDN博客_hive窗口分析函数https://blog.csdn.net/lightupworld/article/details/108520149

7.7、

MySQL中的窗口函数 - 别看窗外的世界 - 博客园https://www.cnblogs.com/kate7/p/13291744.html

7.8、(类似同事做的题)

https://www.iteye.com/blog/53873039oycg-2020836https://www.iteye.com/blog/53873039oycg-2020836

8、

总结

1.窗口函数语法

<窗口函数> over (partition by <用于分组的列名>order by <用于排序的列名>)

<窗口函数>的位置,可以放以下两种函数:

1) 专用窗口函数,比如rank, dense_rank, row_number等

2) 聚合函数,如sum. avg, count, max, min等

2.窗口函数有以下功能:

1)同时具有分组(partition by)和排序(order by)的功能

2)不减少原表的行数,所以经常用来在每组内排名

3.注意事项

窗口函数原则上只能写在select子句中

4.窗口函数使用场景

1)业务需求“在每组内排名”,比如:

排名问题:每个部门按业绩来排名
topN问题:找出每个部门排名前N的员工进行奖励

5、

在深入研究Over字句之前,一定要注意:在SQL处理中,窗口函数都是最后一步执行,而且仅位于Order by字句之前。

9、举例说明:窗口函数的应用:

初始化表:
SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;-- ----------------------------
-- Table structure for bj_table
-- ----------------------------
DROP TABLE IF EXISTS `bj_table`;
CREATE TABLE `bj_table`  (`xuehaoid` int(0) NOT NULL,`banji` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,`chengji` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,PRIMARY KEY (`xuehaoid`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;-- ----------------------------
-- Records of bj_table
-- ----------------------------
INSERT INTO `bj_table` VALUES (1, '1', '86');
INSERT INTO `bj_table` VALUES (2, '1', '95');
INSERT INTO `bj_table` VALUES (3, '2', '89');
INSERT INTO `bj_table` VALUES (4, '1', '83');
INSERT INTO `bj_table` VALUES (5, '2', '86');
INSERT INTO `bj_table` VALUES (6, '3', '92');
INSERT INTO `bj_table` VALUES (7, '3', '86');
INSERT INTO `bj_table` VALUES (8, '1', '86');SET FOREIGN_KEY_CHECKS = 1;1、按班级分组后,再按成绩倒叙排列:select *, RANK() over(PARTITION by banjiORDER BY chengji desc) as rank_test,dense_RANK() over(PARTITION by banjiORDER BY chengji desc) as dense_RANK_test,ROW_NUMBER() over(PARTITION by banjiORDER BY chengji desc) as ROW_NUMBER_test
from bj_table2、按班级分组后,查询出班级的前两名,人名和成绩都要显示。select * from(
select *, RANK() over(PARTITION by banjiORDER BY chengji desc) as rank_test,dense_RANK() over(PARTITION by banjiORDER BY chengji desc) as dense_RANK_test,ROW_NUMBER() over(PARTITION by banjiORDER BY chengji desc) as ROW_NUMBER_test
from bj_table)  cs_table
#方法1、 where 查询
where cs_table.dense_RANK_test = 1 or cs_table.dense_RANK_test = 2
#方法2、 where 查询
#where cs_table.dense_RANK_test in  (1, 2)

9.1

CREATE TABLE overtime (employee_name VARCHAR(50) NOT NULL,department VARCHAR(50) NOT NULL,hours INT NOT NULL,PRIMARY KEY (employee_name , department)
);INSERT INTO overtime(employee_name, department, hours)
VALUES('Diane Murphy','Accounting',37),
('Mary Patterson','Accounting',74),
('Jeff Firrelli','Accounting',40),
('William Patterson','Finance',58),
('Gerard Bondur','Finance',47),
('Anthony Bow','Finance',66),
('Leslie Jennings','IT',90),
('Leslie Thompson','IT',88),
('Julie Firrelli','Sales',81),
('Steve Patterson','Sales',29),
('Foon Yue Tseng','Sales',65),
('George Vanauf','Marketing',89),
('Loui Bondur','Marketing',49),
('Gerard Hernandez','Marketing',66),
('Pamela Castillo','SCM',96),
('Larry Bott','SCM',100),
('Barry Jones','SCM',65); # 举例说明,FIRST_VALUE的用法,下面两条sql语句输出结果一致,但是,
# FIRST_VALUE(),不是字面上只取第一条数据。
# 可以通过rank(),DENSE_RANK()或 ROW_NUMBER()实现或替代FIRST_VALUE()功能。select employee_name,hours from
(SELECTemployee_name,hours,FIRST_VALUE(employee_name) OVER (ORDER BY hours) least_over_time
FROMovertime) cs_t1-- ====================select employee_name,hours from (
select employee_name,hours,DENSE_RANK() over (ORDER BY hours) as t_num
from overtime)  cs_t2#  ====================================-- 以下语句查找每个部门加班时间最少的员工,并按小时升序排列。
select * from (
SELECTemployee_name,department,hours,FIRST_VALUE(employee_name) OVER (PARTITION BY departmentORDER BY hours) least_over_time
FROMovertime) stsb
where stsb.employee_name = stsb.least_over_time
ORDER BY stsb.hours

#--------------------------------------------------------------------------下面是自己随便测试的例子

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;-- ----------------------------
-- Table structure for test_table
-- ----------------------------
DROP TABLE IF EXISTS `test_table`;
CREATE TABLE `test_table`  (`id` bigint(6) NULL DEFAULT NULL,`province` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,`city` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,`uname` varchar(255) CHARACTER SET utf8 COLLATE utf8_general_ci NULL DEFAULT NULL,`money` bigint(255) NULL DEFAULT NULL,INDEX `sy_name`(`id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic;-- ----------------------------
-- Records of test_table
-- ----------------------------
INSERT INTO `test_table` VALUES (1, '南方地区', '深圳', '张三', 100);
INSERT INTO `test_table` VALUES (2, '南方地区', '广州', '董州', 39);
INSERT INTO `test_table` VALUES (3, '南方地区', '东莞', '黄丽', 56);
INSERT INTO `test_table` VALUES (4, '中原地区', '上海', '郭广昌', 109);
INSERT INTO `test_table` VALUES (5, '中原地区', '杭州', '马云', 980);
INSERT INTO `test_table` VALUES (6, '中原地区', '郑州', '许家印', 101);
INSERT INTO `test_table` VALUES (7, '北方地区', '北京', '王健林', 505);
INSERT INTO `test_table` VALUES (8, '北方地区', '哈尔滨', '付强', 21);
INSERT INTO `test_table` VALUES (9, '北方地区', '铁岭', '赵本山', 86);SET FOREIGN_KEY_CHECKS = 1;-- 每个地区最有钱的人名
select ta.*
from
(select t.*,
DENSE_RANK() over (PARTITION  by t.province ORDER BY t.money DESC) as d_r_n
from test_table t)ta
where ta.d_r_n <=1select * from test_table;-- -- 方法2select tab.*
from(
select ta.*
from
test_table ta)tab
INNER JOIN
(select t.province as tprovince,MAX(t.money) as tmoney
from test_table t
GROUP BY t.province) tac
on tab.province = tac.tprovince and tab.money = tac.tmoney-- --------------------- 例子:-- 1、按地区分组后,再对money进行排序。用DENSE_BANK()方法。
-- 2、地区分组后,money迭代增加,用sum(迭代增加的字段),这里面有个坑
--  必须带 order by 方法,不带迭代增加错误。
select tt.*,
DENSE_RANK() over (PARTITION by tt.province ORDER BY tt.money) as num,
SUM(tt.money) over (PARTITION by tt.province ORDER BY tt.money) as total_money
from test_table tt  ---------------------------------------------------- LAST_VALUE 与 FIRST_VALUE  的用法。-- LAST_VALUE 的用法就是去分组后的最后一个值,
-- 不能用order by。因为这样功能就和FIRST_VALUE的功能重复。
select tt.id, tt.province, tt.city,tt.uname, tt.money,
LAST_VALUE(tt.money) over (PARTITION by tt.province)
from test_table tt--  FIRST_VALUE 的用法,正反排序都可以,都是取第一个值。
--  所以,LAST_VALUE() 再正反向排序,就和FIRST_VALUE用法重复了。
select tt.id, tt.province, tt.city,tt.uname, tt.money,
FIRST_VALUE(tt.money) over (PARTITION by tt.province order by tt.money)
from test_table tt例子来源:
https://www.cnblogs.com/zmoumou/p/10222127.html
select * from test_table tsd where
tsd.province in(select tt.province from test_table tt
GROUP BY tt.province desc)
and
tsd.money in (select MAX(tt.money)
from test_table tt
GROUP BY tt.province desc)
ORDER BY money desc# 用多种方法,实现,地区中金额最大的人。其实,地区,城市,人名,金额,都要查询出来。# MySql8.0以上版本,不支持上面的查询:需要去掉子表中的desc 关键字:如下才行。select * from test_table tsd where
tsd.province in(select tt.province from test_table tt
GROUP BY tt.province )
and
tsd.money in (select MAX(tt.money)
from test_table tt
GROUP BY tt.province )
ORDER BY money desc;
# 类似用python 赋值的条件查询select * from test_table tsd where
(tsd.province, tsd.money) = ("南方地区", "100") 
用sql语句查询MySQL安装路径和版本mysql安装路径
SELECT @@basedir AS basePath FROM DUAL版本
SELECT VERSION() FROM DUAL

-- 通过行号,直接定位到哪一行数据
-- 方法1
select ta.*
from
(select (ROW_NUMBER() over ()) as rn,t.*
from test_table t) ta
where ta.rn = 9;

-- 方法2、(下标从零开始,8代表9行。1代表共查询几条数据)
select t.*
from test_table t
limit 8,1

面试题例子如下:

1、第一题

-- 查询表
select * from test1;-- 创建表
CREATE TABLE test1 ( userId varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NOT NULL,visitDate varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NOT NULL,visitCount int(0) NOT NULL)-- 插入信息  INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u01', '2017/1/21', 5 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u02', '2017/1/23', 6 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u03', '2017/1/22', 8 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u04', '2017/1/20', 3 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u01', '2017/1/23', 6 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u01', '2017/2/21', 8 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u02', '2017/1/23', 6 );INSERT INTO  test1(userId, visitDate, visitCount)VALUES( 'u01', '2017/2/22', 4 );-- ---- 查询要求:
-- 要求使用SQL统计出每个用户的累积访问次数,如下表所示:-- 用户id    月份  小计  累积
--     u01 2017-01 11  11
--     u01 2017-02 12  23
--     u02 2017-01 12  12
--     u03 2017-01 8   8
--     u04 2017-01 3   3-- 方法1、select abc.abauid, abc.abavisitmonth, abc.nums,
sum(abc.nums) over (PARTITION by abc.abauid ORDER BY abc.nums) as c_total
from (
select CONCAT(ab.u_time,ab.auid) as abu_time , ab.auid as abauid, ab.avisitmonth as abavisitmonth, SUM(ab.anum_total) as nums from
(select CONCAT(a.uid,a.visitmonth) as u_time,
a.uid as auid, a.visitmonth as avisitmonth, a.num_total as anum_total
from (
SELECT userid as uid,REPLACE(STR_TO_DATE(visitDate,"%Y/%m"), "-00", "")AS visitmonth,visitcount as num_totalFROM test1ORDER BY uid,visitmonth) a) abGROUP BY ab.u_timeORDER BY ab.auid ) abc-- 方法2
-- 测试用
-- select * from test1;
-- 测试用
-- select date_format(t.visitDate, '%Y-%m') as t_time from test1 t;select tab.*,SUM(tab.tatvisitCount) over (PARTITION by tab.tatuserId ORDER BY tab.tatvisitCount) as total_num
FROM(SELECTta.tuserId AS tatuserId,ta.t_time AS tat_time,SUM( ta.tvisitCount ) AS tatvisitCount FROM( SELECT t.userId AS tuserId, date_format( t.visitDate, '%Y-%m' ) AS t_time, t.visitCount AS tvisitCount FROM test1 t ) ta GROUP BYCONCAT( ta.tuserId, ta.t_time ) ORDER BYtatuserId) tab#  --------------------------下面是自己练习用的-- 年月日取法(可以任意混合取)
select date_format(t.visitDate, '%Y') as t_time from test1 t;
select date_format(t.visitDate, '%m') as t_time from test1 t;
select date_format(t.visitDate, '%d') as t_time from test1 t;
-- 混合任意取
select date_format(t.visitDate, '%Y-%m') as t_time from test1 t;
select date_format(t.visitDate, '%m-%d') as t_time from test1 t;-- YEAR,MONTH,DAY的用法
select YEAR(date_format(t.visitDate, '%Y-%m-%d')) as t_time from test1 t;
select MONTH((date_format(t.visitDate, '%Y-%m-%d'))) as t_time from test1 t;
select DAY((date_format(t.visitDate, '%Y-%m-%d'))) as t_time from test1 t;

第二题:

-- 创建表
CREATE TABLE test2 ( user_id varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NOT NULL, shop varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NOT NULL)-- 插入数据INSERT INTO test2 VALUES( 'u1', 'a' ),( 'u2', 'b' ),( 'u1', 'b' ),( 'u1', 'a' ),( 'u3', 'c' ),( 'u4', 'b' ),( 'u1', 'a' ),( 'u2', 'c' ),( 'u5', 'b' ),( 'u4', 'b' ),( 'u6', 'c' ),( 'u2', 'c' ),( 'u1', 'b' ),( 'u2', 'a' ),( 'u2', 'a' ),( 'u3', 'a' ),( 'u5', 'a' ),( 'u5', 'a' ),( 'u5', 'a' );    -- 查询表
select * from test2;请统计:
(1)每个店铺的UV(访客数)
-- 方法1、
select  t.shop, count(DISTINCT t.user_id)
from test2 t
group by t.shop;-- 方法2、
select ta.tshop, count(ta.tuserid)
from
(select DISTINCT CONCAT(t.user_id,t.shop), t.shop as tshop, t.user_id as tuserid from test2 t) taGROUP BY ta.tshop-- 方法3、SELECTt.shop,count(*)
FROM( SELECT user_id, shop FROM test2 GROUP BY user_id, shop ) t
GROUP BYt.shop;请统计:
(2)每个店铺访问次数top3的访客信息。输出店铺名称、访客id、访问次数-- 1、自己写的答案:
select
tab.tatshop, tab.tatuser_id, tab.tasf
from
(select ta.tshop as tatshop, ta.tuser_id as tatuser_id, ta.sf as tasf,
ROW_NUMBER() over (PARTITION by ta.tshop ORDER BY ta.sf DESC) as R_NUM
from (SELECT t.shop as tshop, t.user_id as tuser_id, count(user_id) as sf FROM test2 tGROUP BY tshop, tuser_id order by tshop, sf DESC) ta)tabwhere tab.R_NUM <= 3;---下面是教科书上的答案SELECT t2.shop,t2.user_id,t2.cnt
FROM(SELECT t1.*,row_number() over (partition BY t1.shopORDER BY t1.cnt DESC) as ranksFROM(SELECT user_id,shop,count(*) AS cntFROM test2GROUP BY user_id,shop) t1)t2
WHERE ranks <= 3;          

第三道题:

-- 第三题
-- 需求
--
--     已知一个表STG.ORDER,有如下字段:Date,Order_id,User_id,amount。
--     数据样例:2017-01-01,10029028,1000003251,33.57。
--     请给出sql进行统计:
--     (1)给出 2017年每个月的订单数、用户数、总成交金额。
--     (2)给出2017年11月的新客数(指在11月才有第一笔订单)DROP table test3;
--  创建表
CREATE TABLE test3 ( dt varchar(50),order_id varchar(50), user_id varchar(50), amount FLOAT ( 10, 2 ) );-- 插入数据
INSERT INTO test3 VALUES ('2017-01-01','10029028','1000003251',33.57);
INSERT INTO test3 VALUES ('2017-01-01','10029029','1000003251',33.57);
INSERT INTO test3 VALUES ('2017-01-01','100290288','1000003252',33.57);
INSERT INTO test3 VALUES ('2017-02-02','10029088','1000003251',33.57);
INSERT INTO test3 VALUES ('2017-02-02','100290281','1000003251',33.57);
INSERT INTO test3 VALUES ('2017-02-02','100290282','1000003253',33.57);
INSERT INTO test3 VALUES ('2017-11-02','10290282','100003253',234);
INSERT INTO test3 VALUES ('2018-11-02','10290284','100003243',234);-- 查收表
select * from test3;
-- (1)给出 2017年每个月的订单数、用户数、总成交金额。
-- 自己写得答案:
-- 方法1
select date_format(t.dt, '%Y-%m') as t_time, count(t.order_id) as order_num,count(DISTINCT t.user_id) as user_num
,sum(t.amount) as amount_total
from
test3 t
GROUP BY t_time
HAVING left(t_time, 4) = '2017';
-- 也可用模糊查询,刷选出结果。
-- HAVING t_time LIKE '%2017%';-- 网上给出的答案:
-- 方法2
SELECT t1.mon,count(t1.order_id) AS order_cnt,count(DISTINCT t1.user_id) AS user_cnt,sum(amount) AS total_amount
FROM(SELECT order_id,user_id,amount,date_format(dt,'%Y-%m') monFROM test3WHERE date_format(dt,'%Y') = '2017') t1
GROUP BY t1.mon;--  (2)给出2017年11月的新客数(指在11月才有第一笔订单)SELECT count(user_id)
FROM test3
GROUP BY user_id
HAVING date_format(min(dt),'%Y-%m')='2017-11';

第四题:

-- 第四题
-- 需求
-- 1、有一个5000万的用户文件(user_id,name,age),一个2亿记录的用户看电影的记录文件(user_id,url),根据年龄段观看电影的次数进行排序?        -- 创建表
CREATE TABLE test4user(user_id varchar(50),name varchar(50),age int);select * from test4user;CREATE TABLE test4log(user_id varchar(50),url varchar(50));select * from test4log;-- 插入数据INSERT INTO test4user VALUES('001','u1',10);
INSERT INTO test4user VALUES('002','u2',15);
INSERT INTO test4user VALUES('003','u3',15);
INSERT INTO test4user VALUES('004','u4',20);
INSERT INTO test4user VALUES('005','u5',25);
INSERT INTO test4user VALUES('006','u6',35);
INSERT INTO test4user VALUES('007','u7',40);
INSERT INTO test4user VALUES('008','u8',45);
INSERT INTO test4user VALUES('009','u9',50);
INSERT INTO test4user VALUES('0010','u10',65);
INSERT INTO test4log VALUES('001','url1');
INSERT INTO test4log VALUES('002','url1');
INSERT INTO test4log VALUES('003','url2');
INSERT INTO test4log VALUES('004','url3');
INSERT INTO test4log VALUES('005','url3');
INSERT INTO test4log VALUES('006','url1');
INSERT INTO test4log VALUES('007','url5');
INSERT INTO test4log VALUES('008','url7');
INSERT INTO test4log VALUES('009','url5');
INSERT INTO test4log VALUES('0010','url1'); -- 查询结果
-- 1、有一个5000万的用户文件(user_id,name,age),一个2亿记录的用户看电影的记录文件(user_id,url),根据年龄段观看电影的次数进行排序?  select * from test4user;
select * from test4log;-- 自己写得答案
select tab.taage_phase, count(tab.taage_phase) as view_num
from
(
select t2.user_id as t2user_id,t2.url as t2url,
ta.tage as tatage,ta.age_phase as  taage_phase
from
(SELECT t.user_id as tuser_id, t.age as tage,CASE WHEN age <= 10 AND age > 0 THEN '0-10' WHEN age <= 20 AND age > 10 THEN '10-20'WHEN age >20 AND age <=30 THEN '20-30'WHEN age >30 AND age <=40 THEN '30-40'WHEN age >40 AND age <=50 THEN '40-50'WHEN age >50 AND age <=60 THEN '50-60'WHEN age >60 AND age <=70 THEN '60-70'ELSE '70以上' END as age_phase
FROM test4user t) ta
RIGHT JOIN test4log t2 on ta.tuser_id = t2.user_id) tab
GROUP BY tab.taage_phase;--  网上给出的答案
SELECT
t2.age_phase,
sum(t1.cnt) as view_cnt
FROM(SELECT user_id,count(*) cnt
FROM test4log
GROUP BY user_id) t1
JOIN
(SELECT user_id,CASE WHEN age <= 10 AND age > 0 THEN '0-10' WHEN age <= 20 AND age > 10 THEN '10-20'WHEN age >20 AND age <=30 THEN '20-30'WHEN age >30 AND age <=40 THEN '30-40'WHEN age >40 AND age <=50 THEN '40-50'WHEN age >50 AND age <=60 THEN '50-60'WHEN age >60 AND age <=70 THEN '60-70'ELSE '70以上' END as age_phase
FROM test4user) t2 ON t1.user_id = t2.user_id
GROUP BY t2.age_phase;

第五题:

第五题:
要求如下:
有日志如下,请写出代码,求得所有用户和活跃用户的总数及平均年龄。
(活跃用户指连续两天都有访问记录的用户)
日期       用户  年龄
2019-02-11,test_1,23
2019-02-11,test_2,19
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-12,test_2,19
2019-02-13,test_1,23
2019-02-15,test_2,19
2019-02-16,test_2,19-------- 创建表CREATE TABLE test5(
dt varchar(50),
user_id varchar(50),
age int)-- -------创建数据INSERT INTO test5 VALUES ('2019-02-11','test_1',23);
INSERT INTO test5 VALUES ('2019-02-11','test_2',19);
INSERT INTO test5 VALUES ('2019-02-11','test_3',39);
INSERT INTO test5 VALUES ('2019-02-11','test_1',23);
INSERT INTO test5 VALUES ('2019-02-11','test_3',39);
INSERT INTO test5 VALUES ('2019-02-11','test_1',23);
INSERT INTO test5 VALUES ('2019-02-12','test_2',19);
INSERT INTO test5 VALUES ('2019-02-13','test_1',23);
INSERT INTO test5 VALUES ('2019-02-15','test_2',19);
INSERT INTO test5 VALUES ('2019-02-16','test_2',19); -- 查询表
select * from test5-- 自己的答案:select
count(tosuccess.tyatyuser_id),
sum(tosuccess.tyatyage)/count(tosuccess.tyatyuser_id),
count(tosuccess.ttaabtat1user_id),
sum(tosuccess.ttaabtat1age)/count(tosuccess.ttaabtat1user_id)
from
(select
tya.tyuser_id as tyatyuser_id, ttaab.tat1user_id as ttaabtat1user_id,
tya.tyage as tyatyage,ttaab.tat1age as ttaabtat1age
from
( select ty.dt as tydt, ty.user_id as tyuser_id,ty.age as tyagefrom test5 tyORDER BY ty.user_id) tya
LEFT JOIN
(select DISTINCT ta.t1user_id as tat1user_id, ta.t1age as tat1age
from (select t1.age as t1age, t1.dt as t1dt, t1.user_id as t1user_idfrom test5 t1ORDER BY t1.user_id) taINNER JOIN(SELECT DATE_SUB(t2.dt,INTERVAL -1 DAY) AS t2dt, t2.user_id as t2user_id
FROM test5 t2
ORDER BY t2.user_id) tb
on ta.t1user_id = tb.t2user_id and ta.t1dt = tb.t2dt) ttaab
on
tya.tyuser_id  = ttaab.tat1user_id
GROUP BY tya.tyuser_id,tya.tyage) tosuccess-- 网上的查询结果SELECT sum(total_user_cnt) total_user_cnt,sum(total_user_avg_age) total_user_avg_age,sum(two_days_cnt) two_days_cnt,sum(avg_age) avg_age
FROM(SELECT 0 total_user_cnt,0 total_user_avg_age,count(*) AS two_days_cnt,cast(sum(age) / count(*) AS decimal(5,2)) AS avg_ageFROM(SELECT user_id,max(age) ageFROM(SELECT user_id,max(age) ageFROM(SELECT user_id,age,DATE_SUB(dt,INTERVAL rank_num DAY)  as  flagccFROM(SELECT dt,user_id,max(age) age,row_number() over (PARTITION BY user_idORDER BY dt) as rank_numFROM test5GROUP BY dt,user_id) t1) t2GROUP BY user_id,flagccHAVING count(*) >=2) t3GROUP BY user_id) t4UNION ALL SELECT count(*) total_user_cnt,cast(sum(age) /count(*) AS decimal(5,2)) total_user_avg_age,0 two_days_cnt,0 avg_ageFROM(SELECT user_id,max(age) ageFROM test5GROUP BY user_id) t5) t6--- 同事写得(查询出连续2天活跃用户)select distinct  user_id from (
SELECT * ,case when @name=user_id then (CASEWHEN DATE_SUB(str_to_date(dt,'%Y-%m-%d'),INTERVAL 1 DAY) = str_to_date(@old,'%Y-%m-%d') and @old:=dt THEN @size:=@size+1WHEN @old:=dt then @size:=1END)when @name:=user_id and @old:=dt then @size:=1endAS tt
FROM test5,(SELECT @old:=null,@size:=1,@name:=null)r
ORDER BY user_id,dt) t_ where tt = 2;-------------  自己写的另一种方法(查询连续活跃2天的用户)select DISTINCT ta.t1user_id from(select t1.dt as t1dt, t1.user_id as t1user_idfrom test5 t1ORDER BY t1.user_id) taINNER JOIN(SELECT DATE_SUB(t2.dt,INTERVAL -1 DAY) AS t2dt, t2.user_id as t2user_id
FROM test5 t2
ORDER BY t2.user_id) tb
on ta.t1user_id = tb.t2user_id and ta.t1dt = tb.t2dt

第六题

第六题
需求请用sql写出所有用户中在今年10月份第一次购买商品的金额,
表ordertable字段:
(购买用户:userid,金额:money,
购买时间:paymenttime(格式:2017-10-01),订单id:orderid            实现
数据准备
select * from test6;CREATE TABLE test6 (userid varchar(50),money FLOAT(10,2),paymenttime varchar(50),orderid varchar(50));INSERT INTO test6 VALUES('007',100,'2017-09-01','133');
INSERT INTO test6 VALUES('007',200,'2017-10-02','134');
INSERT INTO test6 VALUES('010',500,'2017-10-01','135');
INSERT INTO test6 VALUES('011',100,'2017-08-01','136');
INSERT INTO test6 VALUES('011',100,'2018-10-11','137');select * from test6;-- 自己写得查询select ta.*
from
(select t.userid as tuserid, t.money as tmoney, t.orderid as torderid,
t.paymenttime as tpaymenttime,
date_format(t.paymenttime, '%Y-%m') as t_y_m,
DENSE_RANK() over (PARTITION by t.userid ORDER BY t.paymenttime) as d_r_num
from test6 t)ta
where ta.t_y_m = '2017-10' and ta.d_r_num = 1

第十题

第十题
需求有一个账号表如下,请写出SQL语句,查询各自区组的money排名前十的账号(分组取前10)dist_id string  '区组id',account string  '账号',gold     int    '金币' 实现数据准备创建表,插入数据CREATE TABLE test10(dist_id varchar(50) COMMENT '区组id',account varchar(50) COMMENT '账号',gold int COMMENT '金币'
);select * from test10;INSERT INTO test10 VALUES ('1','77',18);
INSERT INTO test10 VALUES ('1','88',106);
INSERT INTO test10 VALUES ('1','99',10);
INSERT INTO test10 VALUES ('1','12',13);
INSERT INTO test10 VALUES ('1','13',14);
INSERT INTO test10 VALUES ('1','14',25);
INSERT INTO test10 VALUES ('1','15',36);
INSERT INTO test10 VALUES ('1','16',12);
INSERT INTO test10 VALUES ('1','17',158);
INSERT INTO test10 VALUES ('2','18',12);
INSERT INTO test10 VALUES ('2','19',44);
INSERT INTO test10 VALUES ('2','10',66);
INSERT INTO test10 VALUES ('2','45',80);
INSERT INTO test10 VALUES ('2','78',98); select * from test10;select  ta.*
from
(select t.dist_id,t.account,t.gold,
DENSE_RANK() over (PARTITION by t.dist_id order by t.gold DESC) as d_r_nfrom test10 t)tawhere ta.d_r_n<11

第九题

第九题
需求有一个充值日志表credit_log,字段如下:`dist_id` int  '区组id',
`account` string  '账号',
`money` int   '充值金额',
`create_time` string  '订单时间'请写出SQL语句,查询充值日志表2019年01月02号每个区组下充值额最大的账号,要求结果:
区组id,账号,金额,充值时间        -- 创建表,插入数据
CREATE TABLE test9(dist_id varchar(50) COMMENT '区组id',account varchar(50) COMMENT '账号',money FLOAT(10,2) COMMENT '充值金额',create_time varchar(50) COMMENT '订单时间');select * from test9;        -- DELETE from test9;INSERT INTO test9 VALUES ('1','11',100006,'2019-01-02 13:00:01');
INSERT INTO test9 VALUES ('1','11',100066,'2019-01-02 13:13:13');
INSERT INTO test9 VALUES ('1','11',100666,'2019-01-02 15:55:55');
INSERT INTO test9 VALUES ('1','22',110000,'2019-01-02 13:00:02');
INSERT INTO test9 VALUES ('1','22',118888,'2019-01-02 18:58:58');
INSERT INTO test9 VALUES ('1','33',102000,'2019-01-02 13:00:03');
INSERT INTO test9 VALUES ('1','44',100300,'2019-01-02 13:00:04');
INSERT INTO test9 VALUES ('1','55',100040,'2019-01-02 13:00:05');
INSERT INTO test9 VALUES ('1','66',100005,'2019-01-02 13:00:06');
INSERT INTO test9 VALUES ('1','77',180000,'2019-01-03 13:00:07');
INSERT INTO test9 VALUES ('1','88',106000,'2019-01-02 13:00:08');
INSERT INTO test9 VALUES ('1','99',100400,'2019-01-02 13:00:09');
INSERT INTO test9 VALUES ('1','12',100030,'2019-01-02 13:00:10');
INSERT INTO test9 VALUES ('1','13',100003,'2019-01-02 13:00:20');
INSERT INTO test9 VALUES ('1','14',100020,'2019-01-02 13:00:30');
INSERT INTO test9 VALUES ('1','15',100500,'2019-01-02 13:00:40');
INSERT INTO test9 VALUES ('1','16',106000,'2019-01-02 13:00:50');
INSERT INTO test9 VALUES ('1','17',100800,'2019-01-02 13:00:59');
INSERT INTO test9 VALUES ('2','18',100800,'2019-01-02 13:00:11');
INSERT INTO test9 VALUES ('2','19',100030,'2019-01-02 13:00:12');
INSERT INTO test9 VALUES ('2','10',100000,'2019-01-02 13:00:13');
INSERT INTO test9 VALUES ('2','45',100010,'2019-01-02 13:00:14');
INSERT INTO test9 VALUES ('2','78',100070,'2019-01-02 13:00:15');
INSERT INTO test9 VALUES ('3','78',100080,'2019-01-02 16:00:56');select * from test9;     -- 自己写得方法
select tab.*
from
(select ta.tadist_id,ta.taaccount,ta.sum_money,
DENSE_RANK() over (PARTITION by ta.tadist_id ORDER BY ta.sum_money DESC) as d_r_n
from
(select t.dist_id as tadist_id,t.account as taaccount,
SUM(t.money) as sum_money
from test9 t
where LEFT(t.create_time,10) = '2019-01-02'
GROUP BY t.dist_id,t.account)ta)tab
where d_r_n = 1-- 备注(条件查询):
-- where 方法2
-- where t.create_time like '2019-01-02%'
-- where 方法3
-- where DATE_FORMAT(t.create_time,'%Y-%m-%d') = '2019-01-02'-- 网上的方法、WITH TEMP AS(SELECT dist_id,account,sum(money) sum_moneyFROM test9WHERE date_format(create_time,'%Y-%m-%d') = '2019-01-02'GROUP BY dist_id,account)
SELECT t1.dist_id,t1.account,t1.sum_money
FROM(SELECT temp.dist_id,temp.account,temp.sum_money,rank() over(partition BY temp.dist_idORDER BY temp.sum_money DESC) ranksFROM TEMP) t1
WHERE ranks = 1;             

第八题

第八题
需求有一个线上服务器访问日志格式如下(用sql答题)
时间                        接口                         ip地址
2016-11-09 14:22:05        /api/user/login             110.23.5.33
2016-11-09 14:23:10        /api/user/detail            57.3.2.16
2016-11-09 15:59:40        /api/user/login             200.6.5.166
… …
求11月9号下午14点(14-15点),访问/api/user/login接口的top10的ip地址                创建表,插入数据CREATE TABLE test8(`date` varchar(50),interface varchar(50),ip varchar(50));select * from test8;INSERT INTO test8 VALUES ('2016-11-09 11:22:05','/api/user/login','110.23.5.23');
INSERT INTO test8 VALUES ('2016-11-09 11:23:10','/api/user/detail','57.3.2.16');
INSERT INTO test8 VALUES ('2016-11-09 23:59:40','/api/user/login','200.6.5.166');
INSERT INTO test8 VALUES('2016-11-09 11:14:23','/api/user/login','136.79.47.70');
INSERT INTO test8 VALUES('2016-11-09 11:15:23','/api/user/detail','94.144.143.141');
INSERT INTO test8 VALUES('2016-11-09 11:16:23','/api/user/login','197.161.8.206');
INSERT INTO test8 VALUES('2016-11-09 12:14:23','/api/user/detail','240.227.107.145');
INSERT INTO test8 VALUES('2016-11-09 13:14:23','/api/user/login','79.130.122.205');
INSERT INTO test8 VALUES('2016-11-09 14:14:23','/api/user/detail','65.228.251.189');
INSERT INTO test8 VALUES('2016-11-09 14:15:23','/api/user/detail','245.23.122.44');
INSERT INTO test8 VALUES('2016-11-09 14:17:23','/api/user/detail','22.74.142.137');
INSERT INTO test8 VALUES('2016-11-09 14:19:23','/api/user/detail','54.93.212.87');
INSERT INTO test8 VALUES('2016-11-09 14:20:23','/api/user/detail','218.15.167.248');
INSERT INTO test8 VALUES('2016-11-09 14:24:23','/api/user/detail','20.117.19.75');
INSERT INTO test8 VALUES('2016-11-09 15:14:23','/api/user/login','183.162.66.97');
INSERT INTO test8 VALUES('2016-11-09 16:14:23','/api/user/login','108.181.245.147');
INSERT INTO test8 VALUES('2016-11-09 14:17:23','/api/user/login','22.74.142.137');
INSERT INTO test8 VALUES('2016-11-09 14:19:23','/api/user/login','22.74.142.137');--  select * from test8;-- 自己写的方法
select ta.ip, COUNT(ta.ip)
from
(select t.*
from test8 t
where LEFT(t.date,13) = '2016-11-09 14' and t.interface = '/api/user/login'
ORDER BY t.date) ta
GROUP BY ta.ip
LIMIT 10-- 网上的方法SELECT ip,count(*) AS cnt
FROM test8
WHERE date_format(date,'%Y-%m-%d %H') >= '2016-11-09 14'AND date_format(date,'%Y-%m-%d %H') < '2016-11-09 15'AND interface='/api/user/login'
GROUP BY ip
ORDER BY cnt desc
LIMIT 10;             -- 字符串时间格式化处理后的显示方式(备注)
select date_format(date,'%Y-%m-%d %H:%i:%s')
from test8;
-- 显示的格式:2016-11-09 11:22:05select DATE_FORMAT(date,'%Y-%m-%e %H:%i:%s')
from test8;
-- 显示的格式:2016-11-9 11:22:05

第七题

第七题
需求现有图书管理数据库的三个数据模型如下:
图书(数据表名:BOOK)序号      字段名称    字段描述    字段类型1       BOOK_ID     总编号         文本2       SORT        分类号         文本3       BOOK_NAME   书名          文本4       WRITER      作者          文本5       OUTPUT      出版单位    文本6       PRICE       单价          数值(保留小数点后2位)
读者(数据表名:READER)序号      字段名称    字段描述    字段类型1       READER_ID   借书证号    文本2       COMPANY     单位          文本3       NAME        姓名          文本4       SEX         性别          文本5       GRADE       职称          文本6       ADDR        地址          文本
借阅记录(数据表名:BORROW LOG)序号      字段名称        字段描述    字段类型1       READER_ID       借书证号    文本2       BOOK_ID         总编号         文本3       BORROW_DATE     借书日期    日期
(1)创建图书管理库的图书、读者和借阅三个基本表的表结构。请写出建表语句。
(2)找出姓李的读者姓名(NAME)和所在单位(COMPANY)。
(3)查找“高等教育出版社”的所有图书名称(BOOK_NAME)及单价(PRICE),结果按单价降序排序。
(4)查找价格介于10元和20元之间的图书种类(SORT)出版单位(OUTPUT)和单价(PRICE),结果按出版单位(OUTPUT)和单价(PRICE)升序排序。
(5)查找所有借了书的读者的姓名(NAME)及所在单位(COMPANY)。
(6)求”科学出版社”图书的最高单价、最低单价、平均单价。
(7)找出当前至少借阅了2本图书(大于等于2本)的读者姓名及其所在单位。
(8)考虑到数据安全的需要,需定时将“借阅记录”中数据进行备份,请使用一条SQL语句,在备份用户bak下创建与“借阅记录”表结构完全一致的数据表BORROW_LOG_BAK.井且将“借阅记录”中现有数据全部复制到BORROW_L0G_ BAK中。
(9)现在需要将原Oracle数据库中数据迁移至Hive仓库,请写出“图书”在Hive中的建表语句(Hive实现,提示:列分隔符|;数据表数据需要外部导入:分区分别以month_part、day_part 命名)
(10)Hive中有表A,现在需要将表A的月分区 201505 中 user_id为20000的user_dinner字段更新为bonc8920,其他用户user_dinner字段数据不变,请列出更新的方法步骤。(Hive实现,提示:Hlive中无update语法,请通过其他办法进行数据更新)创建表,插入数据(1)-- 创建图书表bookCREATE TABLE book(book_id varchar(50),`SORT` varchar(50),book_name varchar(50),writer varchar(50),OUTPUT varchar(50),price FLOAT(10,2));select * from     book;INSERT INTO book VALUES ('001','TP391','信息处理','author1','机械工业出版社','20');
INSERT INTO book VALUES ('002','TP392','数据库','author12','科学出版社','15');
INSERT INTO book VALUES ('003','TP393','计算机网络','author3','机械工业出版社','29');
INSERT INTO book VALUES ('004','TP399','微机原理','author4','科学出版社','39');
INSERT INTO book VALUES ('005','C931','管理信息系统','author5','机械工业出版社','40');
INSERT INTO book VALUES ('006','C932','运筹学','author6','科学出版社','55');-- 创建读者表readerCREATE TABLE reader (reader_id VARCHAR(200),company VARCHAR(200),name VARCHAR(200),sex VARCHAR(200),grade VARCHAR(200),addr VARCHAR(200));select * from       reader; INSERT INTO reader VALUES ('0001','阿里巴巴','jack','男','vp','addr1');
INSERT INTO reader VALUES ('0002','百度','robin','男','vp','addr2');
INSERT INTO reader VALUES ('0003','腾讯','tony','男','vp','addr3');
INSERT INTO reader VALUES ('0004','京东','jasper','男','cfo','addr4');
INSERT INTO reader VALUES ('0005','网易','zhangsan','女','ceo','addr5');
INSERT INTO reader VALUES ('0006','搜狐','lisi','女','ceo','addr6');-- 创建借阅记录表borrow_logCREATE TABLE borrow_log(reader_id VARCHAR(200),book_id VARCHAR(200),borrow_date VARCHAR(200));select * from borrow_log;                                            INSERT INTO borrow_log VALUES ('0001','002','2019-10-14');
INSERT INTO borrow_log VALUES ('0002','001','2019-10-13');
INSERT INTO borrow_log VALUES ('0003','005','2019-09-14');
INSERT INTO borrow_log VALUES ('0004','006','2019-08-15');
INSERT INTO borrow_log VALUES ('0005','003','2019-10-10');
INSERT INTO borrow_log VALUES ('0006','004','2019-17-13');(2)SELECT name,companyFROM readerWHERE name LIKE 'j%';
(3)SELECT book_name,priceFROM bookWHERE OUTPUT = "高等教育出版社"ORDER BY price DESC;
(4)SELECT sort,output,priceFROM bookWHERE price >= 10 and price <= 20ORDER BY output,price ;
(5)SELECT b.name,b.companyFROM borrow_log aJOIN reader b ON a.reader_id = b.reader_id;
(6)SELECT max(price),min(price),avg(price)FROM bookWHERE OUTPUT = '科学出版社';
(7)SELECT b.name,b.companyFROM(SELECT reader_idFROM borrow_logGROUP BY reader_idHAVING count(*) >= 2) aJOIN reader b ON a.reader_id = b.reader_id;(8)CREATE TABLE borrow_log_bak ASSELECT *FROM borrow_log;select * from borrow_log_bak;
(9)CREATE TABLE book_hive ( book_id VARCHAR(200),SORT VARCHAR(200), book_name VARCHAR(200),writer VARCHAR(200), OUTPUT VARCHAR(200), price FLOAT ( 10, 2 ) )partitioned BY ( month_part VARCHAR(200), day_part VARCHAR(200))
--     ROW format delimited FIELDS TERMINATED BY '\\|' stored AS textfile;
(10)方式1:配置hive支持事务操作,分桶表,orc存储格式方式2:第一步找到要更新的数据,将要更改的字段替换为新的值,第二步找到不需要更新的数据,第三步将上两步的数据插入一张新表中。

知识延伸扩展

1、mysql排序的特殊说明:

用例子说明:(表的创建在上面的内容中)
例子:
select tt.province as ttprovince
from  test_table tt
-- where ttprovince = '北方地区'; --(不可以,用别名)
-- group by ttprovince; --(可以用别名)
-- HAVING ttprovince = '北方地区';  --(可以用别名)
-- ORDER BY ttprovince DESC; --(可以用别名)在mysql中对查询做了,加强控制:列的别名可以在group by 后,having 后 ,order by 后。都可以用别名。https://blog.csdn.net/qq_26442553/article/details/80867076

--  Hive窗口函数之LAG、LEAD的用法

--  Hive窗口函数之LAG、LEAD的用法create table windows_ss
(polno varchar(300),eff_date varchar(300),userno varchar(300)
);select * from windows_ss;INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 09:00:02","user01");
INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 09:00:00","user02");
INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 09:03:04","user11");
INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 09:50:05","user03");
INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 10:00:00","user51");INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 09:10:00","user09");
INSERT INTO windows_ss VALUES ("P066666666666","2016-04-02 09:50:01","user32");
INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 09:00:02","user41");
INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 09:00:00","user55");
INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 09:03:04","user23");INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 09:50:05","user80");
INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 10:00:00","user08");
INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 09:10:00","user22");
INSERT INTO windows_ss VALUES ("P088888888888","2016-04-02 09:50:01","user31");select * from windows_ss;-- 1 LAG 的用法
--  LAG(col,n,DEFAULT) 用于统计窗口内往上第n行值
-- 第一个参数为列名,第二个参数为往上第n行(可选,默认为1),
-- 第三个参数为默认值
--  (当往上第n行为NULL时候,取默认值,如不指定,则为NULL)SELECTpolno,eff_date,userno,ROW_NUMBER() OVER(PARTITION BY polno ORDER BY eff_date) AS rn,LAG(eff_date,1,'1970-01-01 00:00:00') OVER(PARTITION BY polno ORDER BY eff_date) AS last_1_time,LAG(eff_date,2) OVER(PARTITION BY polno ORDER BY eff_date) AS last_2_time
FROM windows_ss;-- 2、 LEAD
-- 与LAG相反
-- LEAD(col,n,DEFAULT) 用于统计窗口内往下第n行值
-- 第一个参数为列名,第二个参数为往下第n行(可选,默认为1),
-- 第三个参数为默认值
-- (当往下第n行为NULL时候,取默认值,如不指定,则为NULL) SELECTpolno,eff_date,userno,ROW_NUMBER() OVER(PARTITION BY polno ORDER BY eff_date) AS rn,LEAD(eff_date,1,'1970-01-01 00:00:00') OVER(PARTITION BY polno ORDER BY eff_date) AS next_1_time,LEAD(eff_date,2) OVER(PARTITION BY polno ORDER BY eff_date) AS next_2_time
FROM windows_ss;

window子句 的 用法:

window子句 的 用法我们在上面已经通过使用partition by子句将数据进行了分组的处理.如果我们想要更细粒度的划分,我们就要引入window子句了.我们首先要理解两个概念:
- 如果只使用partition by子句,未指定order by的话,我们的聚合是分组内的聚合.
- 使用了order by子句,未使用window子句的情况下,默认从起点到当前行.当同一个select查询中存在多个窗口函数时,他们相互之间是没有影响的.每个窗口函数应用自己的规则.window子句:
- PRECEDING:往前
- FOLLOWING:往后
- CURRENT ROW:当前行
- UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点我们按照name进行分区,按照购物时间进行排序,做cost的累加.
如下我们结合使用window子句进行查询-- 创建表
CREATE TABLE t_window(uname VARCHAR(200),orderdate  VARCHAR(200),cost     INT
);select * from t_window;INSERT INTO t_window VALUES ("jack","2015-01-01",10);
INSERT INTO t_window VALUES ("tony","2015-01-02",15);
INSERT INTO t_window VALUES ("jack","2015-02-03",23);
INSERT INTO t_window VALUES ("tony","2015-01-04",29);
INSERT INTO t_window VALUES ("jack","2015-01-05",46);
INSERT INTO t_window VALUES ("jack","2015-04-06",42);
INSERT INTO t_window VALUES ("tony","2015-01-07",50);
INSERT INTO t_window VALUES ("jack","2015-01-08",55);
INSERT INTO t_window VALUES ("mart","2015-04-08",62);
INSERT INTO t_window VALUES ("mart","2015-04-09",68);
INSERT INTO t_window VALUES ("neil","2015-05-10",12);
INSERT INTO t_window VALUES ("mart","2015-04-11",75);
INSERT INTO t_window VALUES ("neil","2015-06-12",80);
INSERT INTO t_window VALUES ("mart","2015-04-13",94);select * from t_window;
--  执行的sql例子select uname,orderdate,cost,
-- 所有行相加
sum(cost) over() as sample1,
-- 按name分组,组内数据相加
sum(cost) over(partition by uname) as sample2,
-- 按name分组,组内数据累加
sum(cost) over(partition by uname order by orderdate) as sample3,
-- 和sample3一样,由起点到当前行的聚合
sum(cost) over(partition by uname order by orderdate rows between UNBOUNDED PRECEDING and current row )  as sample4,
-- 当前行和前面一行做聚合
sum(cost) over(partition by uname order by orderdate rows between 1 PRECEDING   and current row) as sample5,
-- 当前行和前边一行及后面一行
sum(cost) over(partition by uname order by orderdate rows between 1 PRECEDING   AND 1 FOLLOWING  ) as sample6,
-- 当前行及后面所有行
sum(cost) over(partition by uname order by orderdate rows between current row and UNBOUNDED FOLLOWING ) as sample7
from t_window;

mysql 函数知识点总结:


--  取当前时间的函数
select NOW() a1;
select SYSDATE() a2;
select CURDATE() a3;--  字符串时间转换的函数
SELECT STR_TO_DATE("2021-09-27", "%Y-%m-%d") as a1;
SELECT DATE_FORMAT("2021-09-27", "%Y-%m-%d") as a2;

mysql,函数讲解:

--  取当前时间的函数
select NOW() a1;
select SYSDATE() a2;
select CURDATE() a3;--  字符串时间转换的函数
SELECT STR_TO_DATE("2021-09-27", "%Y-%m-%d") as a1;
SELECT DATE_FORMAT("2021-09-27", "%Y-%m-%d") as a2;-- 求时间间隔(函数)
select DATEDIFF("2021-09-26","2021-09-25");
SELECT TIMEDIFF("13:10:11", "13:10:10"); 

mysql数据库,求时间间隔

1、MySQL中的两个时间函数,用来做两个时间之间的对比TIMESTAMPDIFF,(如果当期时间和之前时间的分钟数相比较。大于1天,即等于1;小于1天,则等于0)select TIMESTAMPDIFF(DAY,'2016-11-16 10:13:42',NOW());DATEDIFF,(只按2016-11-16计算,不会加小时分钟数,按天计算)select DATEDIFF(NOW(),'2016-11-16 17:10:52');2、mysql窗口函数 ,lead的用法数据库表及数据:
/*Navicat Premium Data TransferSource Server         : jackSource Server Type    : MySQLSource Server Version : 80021Source Host           : localhost:3306Source Schema         : mysql8Target Server Type    : MySQLTarget Server Version : 80021File Encoding         : 65001Date: 10/11/2021 14:02:08
*/SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;-- ----------------------------
-- Table structure for user_order
-- ----------------------------
DROP TABLE IF EXISTS `user_order`;
CREATE TABLE `user_order`  (`user_id` int(0) NOT NULL,`user_name` varchar(555) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NULL DEFAULT NULL,`product_name` varchar(555) CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_as_ci NULL DEFAULT NULL,`buy_time` datetime(0) NULL DEFAULT NULL,`login_time` datetime(0) NULL DEFAULT NULL,PRIMARY KEY (`user_id`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_0900_as_ci ROW_FORMAT = Dynamic;-- ----------------------------
-- Records of user_order
-- ----------------------------
INSERT INTO `user_order` VALUES (1, 'jack', '华为手机', '2021-11-09 20:07:25', '2021-11-21 20:00:36');
INSERT INTO `user_order` VALUES (2, 'jack', '华为手机', '2021-10-09 21:17:56', '2021-11-23 20:13:15');
INSERT INTO `user_order` VALUES (3, 'lucy', '小米手机', '2021-08-09 10:07:24', '2021-11-09 20:27:06');
INSERT INTO `user_order` VALUES (4, 'lucy', '小米手机', '2021-09-09 09:07:33', '2021-05-09 03:07:44');
INSERT INTO `user_order` VALUES (5, 'mark', 'oppo手机', '2020-08-08 08:07:28', '2019-06-09 02:19:16');
INSERT INTO `user_order` VALUES (6, 'mark', 'oppo手机', '2017-07-07 07:07:18', '2016-05-09 02:10:16');
INSERT INTO `user_order` VALUES (7, 'jack', '华为手机', '2010-11-09 16:17:15', '2018-10-09 10:17:55');SET FOREIGN_KEY_CHECKS = 1;--   -- ---  创建表结束-- select * from user_order;
-- 1、两次买手机,时间间隔最大的人。
-- 2、两次买手机,时间间隔最大的人。隔了多少天。
-- 3、两次买手机,时间间隔最短的前两名。
-- 4、连续3天,都登录系统的人。
select * from(
select *,TIMESTAMPDIFF(day,buy_time,b_time) as t_day
from
(select *,
lead(tb.buy_time,1) over(PARTITION by tb.user_name ORDER BY tb.buy_time ) as 'b_time'
from
(select *,
ROW_NUMBER() over(PARTITION by t.user_name ORDER BY t.buy_time ) as 'r_n'
from user_order t) tb)tbc)tbcd
where t_day  =  (select MAX(t_day) as max_time
from(
select *,TIMESTAMPDIFF(day,buy_time,b_time) as 't_day'
from
(select *,
lead(tb.buy_time,1) over(PARTITION by tb.user_name ORDER BY tb.buy_time ) as 'b_time'
from
(select *,
ROW_NUMBER() over(PARTITION by t.user_name ORDER BY t.buy_time ) as 'r_n'
from user_order t) tb)tbc)tbcd)

3、两次买手机,时间间隔最短的前两名。

-- 3、两次买手机,时间间隔最短的前两名。
select * from
(select  *,ROW_NUMBER() over(PARTITION by user_name ORDER BY t_day) as rr_nn
from(
select *,TIMESTAMPDIFF(day,buy_time,b_time) as 't_day'
from
(select *,
lead(tb.buy_time,1) over(PARTITION by tb.user_name ORDER BY tb.buy_time ) as 'b_time'
from
(select *,
ROW_NUMBER() over(PARTITION by t.user_name ORDER BY t.buy_time ) as 'r_n'
from user_order t) tb)tbc)tbcd
where t_day is NOT null ) tbcde
where rr_nn = 1
ORDER BY t_day
LIMIT 2
-- 4、连续3天,都登录系统的人。select * from(
select *,
lead(t.login_time,1)  over(partition by t.user_name ORDER BY t.login_time) as lead_t1,
lead(t.login_time,2)  over(partition by t.user_name ORDER BY t.login_time) as lead_t2
from user_order t) tb
where lead_t2 is not null
and
datediff(login_time,lead_t1) = -1
and
datediff(login_time,lead_t2) = -2

SELECT * FROM `user_order`--  NTILE函数的作用是,将有序数据分成n个桶,记录等级数。
select *,
ROW_NUMBER() over (PARTITION by t.user_name order by t.buy_time) as r_n,
NTILE(2) over (PARTITION by t.user_name order by t.buy_time) as n_e
from user_order t;--  NTH_VALUE 函数的作用是,窗口函数数组后,将指定那个字段放到那个位置,记录字段赋值记数。
select *,
ROW_NUMBER() over (PARTITION by t.user_name order by t.buy_time) as r_n,
NTH_VALUE(t.login_time,2) over (PARTITION by t.user_name order by t.buy_time) as n_v
from user_order t;-- -- 分布函数
-- 分布函数1、 PERCENT_RANK()的用法,PERCENT_RANK()=(rank - 1)/(rows - 1)
select *,
ROW_NUMBER() over (PARTITION by t.user_name order by t.buy_time) as r_n,
PERCENT_RANK() over (PARTITION by t.user_name order by t.buy_time) as p_r
from user_order t;-- 分布函数2、 CUME_DIST()的用法,当前rank值的行数/总行数
select *,
ROW_NUMBER() over (PARTITION by t.user_name order by t.buy_time) as r_n,
CUME_DIST() over (PARTITION by t.user_name order by t.buy_time) as c_d
from user_order t;-- 头尾函数的应用
-- FIRST_VALUE(expr)的应用:
select *,
ROW_NUMBER() over (PARTITION by t.user_name order by t.buy_time) as r_n,
FIRST_VALUE(t.login_time) over (PARTITION by t.user_name order by t.buy_time) as f_v
from user_order t;-- LAST_VALUE(expr)的应用:有个bug,如果在窗口函数中对buy_time 或 login_time
-- ORDER BY 排序 LAST_VALUE的功能是失效了。
select *,
ROW_NUMBER() over (PARTITION by t.user_name order by t.buy_time) as r_n,
-- LAST_VALUE(expr)
LAST_VALUE(t.login_time) over (PARTITION by t.user_name ORDER BY t.user_name) as L_v
from user_order t;

15、对字段连续相加求和,最接近100的是那一条?

select(
select xiao_o from (
select *,
LAST_VALUE(book_id) over (order by r_o desc),
100 - s_o as xiao_o,
ROW_NUMBER() over() as row_n
from (
select *,
sum(book_id) over (ORDER BY book_id) as s_o,
ROW_NUMBER() over() as r_o
from book t) ta
where s_o<=100) tab
where row_n=1
)
-
(
select xiao_o from (
select *,
LAST_VALUE(book_id) over (order by r_o),
s_o - 100   as xiao_o,
ROW_NUMBER() over() as row_n
from (
select *,
sum(book_id) over (ORDER BY book_id) as s_o,
ROW_NUMBER() over() as r_o
from book t) ta
where s_o>100) tab
where row_n=1
)
from dual;

HIVE:窗口函数,用sql语句查询MySQL安装路径和版本相关推荐

  1. sql 语句查询 mysql 版本号

    通过sql 语句查询 mysql 版本号 select version() from dual;

  2. nsis查询mysql安装路径_NSIS目录

    NSIS 路径 $INSTDIR 安装目录 ($INSTDIR 可以使用 StrCpy.ReadRegStr.ReadINIStr 等等来更改.例如在 .onInit 函数里可以用来做高级的检测安装定 ...

  3. mysql和hive的sql语句,hive中使用sql语句需要注意的事项

    最近在熟悉hive,使用hive中的sql语句过程中出现了一些问题. 1,hive中的insert into语句 hive> select * from t_hive2; OK 1623 611 ...

  4. mysql hql查询语句_使用Query进行HQL语句查询和SQL语句查询

    HQL的语法比较简单,与普通SQL的区别之处是针对对象的不同,在查询语句中将sql中的表名替换成了sql中的持久化类名,因为hibernate机制是基于对象进行查询的. 不带参数的查询,语句是&quo ...

  5. MySQL一条SQL语句查询多个科目的成绩

    MySQL一条SQL语句查询多个科目的成绩 SELECT pn.name,c.`cname`,SUM(CASE WHEN ps.coid='1' THEN sname END )AS 'domain' ...

  6. mysql写出总分最高的学生姓名_编写SQL语句查询出每个各科班分数最高的同学的名字,班级名称,课程名称,分数...

    这个问题是在csdn上一位朋友写的,但是答案有点复杂,而且查询使用的效率也不是很好,于是自己也写了一个.以下是表结构和数据, 有以下两张表, Class表 classid classname 1 高三 ...

  7. excel mysql 参数查询语句_如何用SQL语句查询Excel数据?

    如何用SQL语句查询Excel数据? Q:如何用SQL语句查询Excel数据? A:下列语句可在SQL SERVER中查询Excel工作表中的数据. 2007和2010版本: SELECT*FROMO ...

  8. excel mysql 参数查询语句_如何用SQL语句查询Excel数据

    Q:如何用SQL语句查询Excel数据? A:下列语句可在SQL SERVER中查询Excel工作表中的数据. 2007和2010版本: SELECT * FROM OpenDataSource( ' ...

  9. thinkphp5基本的一些操作/API友好/获取请求信息(Request)/判断请求类型(GET...)/验证参数数据(Validate)/连接数据库/原生sql语句查询

    文章目录 一.API友好 1.举两个thinkphp5关于API友好的例子 (1)数据输出 (2)错误调试Trace 二.获取请求信息(Request) 1.获取URL信息 2.获取 模块/控制器/操 ...

最新文章

  1. 关于JDBC中的 PreparedStatement 的使用讲解
  2. CF758 D. Ability To Convert 细节处理字符串
  3. Error:collect2:ld returned 1 exit status的其他原因
  4. 十分钟学会用Go编写Web中间件
  5. 《美团数据平台及数仓建设实践》(209页).PDF
  6. phpinfo 有imagick php artisan 没有_WordPress 上传图片时 async-upload.php出现520 Bug的原因及解决方案...
  7. HDU2078 复习时间【水题】
  8. IOS开发学习----给表视图设置缩进级别
  9. [转载] real和imag在python_Python numpy.imag() 使用实例
  10. JAVA的反射机制原理
  11. Cisco命令大全(清除配置和恢复口令)
  12. usb计算机连接 灰色,USB调试选项显示为灰色
  13. 马蹄集------函数的幂
  14. Oracle监听服务启动失败案例
  15. adb命令——adb命令大全
  16. Mac上的QQ字体大小和颜色如何设置
  17. 【kafka】解决kafka-tool连接上kafka,brokers和topics不显示问题
  18. 信息安全实训系统php源码,实训平台
  19. AKABEiSOFT2经典作品推荐 車輪の国、向日葵の少女( 攻略、汉化、特典、PSP转换器)...
  20. OpenCV学习-P44 角点检测

热门文章

  1. U盘重装官网纯净系统win10
  2. 《快速掌握PyQt5》第二十三章 主窗口QMainWindow
  3. 前端UI框架选择区别对比推荐
  4. 容器存储卷的介绍与使用
  5. 单片机进阶---HLK-W801硬件开发之制作PCB
  6. 502粘到手上变硬了怎么办_502粘住手怎么办?502粘到手上变硬了怎么办
  7. 面试题----单链表实现栈
  8. 毕业设计-基于 MATLAB 的图像分割算法研究及实现
  9. “海底捞”的管理智慧
  10. 计算(输入计算式得出结果)