前期准备

mysql模型:test_max_date(id int,name varchar(255),num int,date date)

hive模型: create table test_date_max(id int,name string,rq Date);

insert into table test_date_max values
(1,"1","2020-12-25"),
(2,"1","2020-12-28"),
(3,"2","2020-12-25"),
(4,"2","2020-12-20")
;

需求

查询每个人最新状态

计算逻辑

每个人有多条数据,日期越大,状态越新

计算过程

mysql:

SELECT id,name,date,max(date) from test_max_date group by name ORDER BY id

hive:

select name,max(rq) from  test_date_max group by name;

错误信息说明:在之前的帖子中说过hive groupby的问题。

这里hive中有id,name,日期。id是主键不重复,name是可以重复的,按照name分组,对rq使用max函数,其实是对name去重,返回name每个重复值组中的最大日期

就好比一个公司分了几个部门,部门是确定的,如果是求每个部门年龄最大的,那就是在公司全员信息表中对部门分组,对age求最大。

hive中select 字段和group by 字段必须一一匹配。

如果需要查询完整信息,一下有两种方式(附上sql、结果数据、查询时间)

方式一:
select 
    a.* 
from 
    test_date_max a
    join 
    (select name,max(rq) as rq from  test_date_max group by name) b
    on a.rq = b.rq and a.name = b.name

a.id    a.name    a.rq
2    1    2020-12-28
3    2    2020-12-25
Time taken: 118.387 seconds, Fetched: 2 row(s)

方式二:
select
    *
from(
    select
        *,
        row_number()over(partition by name order by rq desc) rank
    from
        test_date_max
    )tmp
where rank=1

tmp.id    tmp.name    tmp.rq    tmp.rank
2    1    2020-12-28    1
3    2    2020-12-25    1
Time taken: 68.587 seconds, Fetched: 2 row(s)

计算日志分析——方式一3个job,方式二1个job

方式一:

hive (test)> select >     a.* > from >     test_date_max a>     join >     (select name,max(rq) as rq from  test_date_max group by name) b>     on a.rq = b.rq and a.name = b.name> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = admin_20210204130801_0f13ad17-7887-4a32-984d-088b5453617e
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapreduce.job.reduces=<number>
Starting Job = job_1611888254670_2374, Tracking URL = http://hdp6.tydic.xian:8088/proxy/application_1611888254670_2374/
Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job  -kill job_1611888254670_2374
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2021-02-04 13:08:27,084 Stage-2 map = 0%,  reduce = 0%
2021-02-04 13:08:44,179 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 3.78 sec
2021-02-04 13:08:59,776 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 7.68 sec
MapReduce Total cumulative CPU time: 7 seconds 680 msec
Ended Job = job_1611888254670_2374
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.5.0-292/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2021-02-04 13:09:08 Starting to launch local task to process map join;  maximum memory = 954728448
2021-02-04 13:09:09 Dump the side-table for tag: 0 with group count: 4 into file: file:/tmp/admin/a935af81-8bbe-4c2c-b2f5-d3bdaa816d9e/hive_2021-02-04_13-08-01_805_2036786040923555355-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile20--.hashtable
2021-02-04 13:09:09 Uploaded 1 File to: file:/tmp/admin/a935af81-8bbe-4c2c-b2f5-d3bdaa816d9e/hive_2021-02-04_13-08-01_805_2036786040923555355-1/-local-10005/HashTable-Stage-3/MapJoin-mapfile20--.hashtable (356 bytes)
2021-02-04 13:09:09 End of local task; Time Taken: 1.24 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 2 out of 2
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1611888254670_2377, Tracking URL = http://hdp6.tydic.xian:8088/proxy/application_1611888254670_2377/
Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job  -kill job_1611888254670_2377
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2021-02-04 13:09:33,954 Stage-3 map = 0%,  reduce = 0%
2021-02-04 13:09:59,112 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.37 sec
MapReduce Total cumulative CPU time: 3 seconds 370 msec
Ended Job = job_1611888254670_2377
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 7.68 sec   HDFS Read: 7794 HDFS Write: 140 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 3.37 sec   HDFS Read: 5282 HDFS Write: 141 SUCCESS
Total MapReduce CPU Time Spent: 11 seconds 50 msec
OK
a.id    a.name  a.rq
2   1   2020-12-28
3   2   2020-12-25
Time taken: 118.387 seconds, Fetched: 2 row(s)

方式二:

hive (test)> select>     *> from(>     select>         *,>         row_number()over(partition by name order by rq desc) rank>     from>         test_date_max>     )tmp> where rank=1> ;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = admin_20210204130834_f1469766-42c9-48cb-9194-2cb506a5ff6a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:set mapreduce.job.reduces=<number>
Starting Job = job_1611888254670_2376, Tracking URL = http://hdp6.tydic.xian:8088/proxy/application_1611888254670_2376/
Kill Command = /usr/hdp/2.6.5.0-292/hadoop/bin/hadoop job  -kill job_1611888254670_2376
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-02-04 13:09:07,610 Stage-1 map = 0%,  reduce = 0%
2021-02-04 13:09:25,459 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.71 sec
2021-02-04 13:09:42,161 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.97 sec
MapReduce Total cumulative CPU time: 7 seconds 970 msec
Ended Job = job_1611888254670_2376
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.97 sec   HDFS Read: 10327 HDFS Write: 145 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 970 msec
OK
tmp.id  tmp.name    tmp.rq  tmp.rank
2   1   2020-12-28  1
3   2   2020-12-25  1
Time taken: 68.587 seconds, Fetched: 2 row(s)

job解析

2021-02-04 16:38:57,881 Stage-1 map = 0%,  reduce = 0%
2021-02-04 16:39:13,646 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
2021-02-04 16:39:20,976 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.78 sec

hive默认引擎是mapreduce,将sql转换成mapreduce任务,mapreduce任务分为三个阶段,map,shuffle,reduce,map阶段是读取文件,shuffle是归并排序,并将shuffle过程中的数据溢写的本地,reduce是读写shuffle过程中的文件二次计算将结果写到磁盘,从上面日志可以看出,map阶段不涉及计算,没有cpu耗时,shuffle有归并排序,有cpu计算,有cpu耗时,只不过是做简单计算,reduce阶段有读取、合并,有cpu计算,cpu耗时

hive与mysql对比之max、group by、日志分析相关推荐

  1. shell脚本:Dos 攻击防范、系统发送告警、MySQL 数据库备份单、MySQL 数据库备份多、Nginx 日志分析、网卡实时流量、服务器磁盘利用率

    系统配置初始化脚本 #/bin/bash # 设置时区并同步时间 ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime if ! crontab ...

  2. 通过日志分析mysql访问量,Mysql 慢查询和慢查询日志分析

    众所周知,大访问量的情况下,可添加节点或改变架构可有效的缓解数据库压力,不过一切的原点,都是从单台mysql开始的.下面总结一些使用过或者研究过的经验,从配置以及调节索引的方面入手,对mysql进行一 ...

  3. MySQL 通用查询日志和慢查询日志分析

    MySQL中的日志包括:错误日志.二进制日志.通用查询日志.慢查询日志等等.这里主要介绍下比较常用的两个功能:通用查询日志和慢查询日志. 1)通用查询日志:记录建立的客户端连接和执行的语句. 2)慢查 ...

  4. MySQL数据库:通用查询日志和慢查询日志分析

    MySQL中的日志包括:通用查询日志.慢查询日志.错误日志.二进制日志等等.这里主要记录一下两种比较常用的日志:通用查询日志和慢查询日志. (1)通用查询日志:记录建立的客户端连接和执行的语句. (2 ...

  5. mysql查看、开启慢查询、分析执行SQL的效率

    一.启用慢SQL 开启慢SQL的配置参数 slow_query_log: 该参数表示是否开启慢SQL查询日志.在mysql中,我们可以通过以下命令来查看和修改该变量的状态 1.show variabl ...

  6. 阿里云 mysql日志分析_mysql 慢日志分析-阿里云开发者社区

    启用 slow log 有两种启用方式: 1, 在my.cnf 里 通过 log-slow-queries[=file_name] 2, 在mysqld进程启动时,指定--log-slow-queri ...

  7. MySQL慢查询日志分析工具

    1.修改mysql配置文件开启慢查询: #开启慢查询日志 slow_query_log=on #设置慢查询阈值, 单位(秒) long_query_time=0.5 #设置慢查询日志文件地址 slow ...

  8. mysql慢查询分析工具_mysql慢查询日志分析工具

    启用 slow log 有两种启用方式:1, 在my.cnf 里 通过 log-slow-queries[=file_name] 2, 在mysqld进程启动时,指定--log-slow-querie ...

  9. Hive集成Mysql作为元数据时,提示错误:Specified key was too long; max key length is 767 bytes...

    在进行Hive集成Mysql作为元数据过程中.做全然部安装配置工作后.进入到hive模式,运行show databases.运行正常,接着运行show tables:时却报错. 关键错误信息例如以下: ...

  10. 第十二章 结合flume+mapreduce+hive+sqoop+mysql的综合实战练习

    简介: 项目大致过程是:flume监控日志文件,定时把文件清洗后上传到hdfs上,上传清晰后的数据是格式化的,可被hive识别,然后hive创建表,写脚本,执行hql语句,把执行结果写到hdfs上,最 ...

最新文章

  1. 2022-2028年中国再生金属行业投资分析及前景预测报告
  2. GridView 序号 排序 正序 倒序
  3. 12家股份银行当中,哪个盈利能力和口碑是最好的?
  4. 767 重构字符串_重构字符串型系统
  5. 4.3.1 jQuery基础(1)
  6. java对象转JSON JS取JSON数据
  7. cdn.cache.php,CDN缓存不命中诊断 - 在线工具
  8. 搭建一个Vue项目(完整步骤)
  9. css中关于居中的问题
  10. Linux中的Page cache和Buffer cache详解
  11. 【转】一个小妙招能让你在服装上省下好多rmb
  12. 【知了堂学习心得】浅谈c3p0连接池和dbutils工具类的使用
  13. [LeetCode] 350. 两个数组的交集 II intersection-of-two-arrays-ii(排序)
  14. Chrome浏览器插件之---FeHelper
  15. ZYT and LBC
  16. [图像处理-1]:颜色中英文对照表 颜色名字 色彩名称
  17. API接口管理平台eoLinker-AMS V3.2.0
  18. 无需剪辑软件,教你简单快速进行剪辑视频
  19. 为什么.bat脚本不断重复执行同一命令
  20. 企业信息安全需要做到的三点,可以有效的规避大部分风险

热门文章

  1. Ring3加载驱动源码
  2. Linux下PHP+MySQL+CoreSeek中文检索引擎配置
  3. 从零开始学习Android开发
  4. 终于解决华硕电脑触摸板的关闭问题
  5. python - 求约数 质数法
  6. 黑盒测试用例设计方法
  7. 维码扫描之集成Zxing
  8. (附源码)ssm网上零食销售系统 毕业设计 180826
  9. 【软件测试】美团一面、阿里一面复盘总结
  10. 谷歌云盘和百度云盘文件转存