一, 测试语句

语句一:

select count(distinct order_id) ,count(1) from d_common_wlt_info

语句二:

1 select count(order_id), count(one_num)2 from

3 (4 select order_id ,count(1) one_num5 fromd_common_wlt_info6 group byorder_id7 )t

二, 执行日志及表说明

表大小信息

Partition Parameters:

COLUMN_STATS_ACCURATEtrue

numFiles 9

numRows 28176219

rawDataSize 8785300782

totalSize 1820024671

transient_lastDdlTime1551631895

(1)语句一日志

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

2019-03-04 13:10:24,719 Stage-1 map = 0%, reduce = 0%

2019-03-04 13:10:39,289 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 23.87 sec

2019-03-04 13:10:58,984 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 76.04 sec

2019-03-04 13:11:39,220 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 191.24 sec

2019-03-04 13:11:50,532 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 212.97 sec

2019-03-04 13:11:56,726 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 228.88 sec

2019-03-04 13:12:06,036 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 247.36 sec

2019-03-04 13:12:09,118 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 250.66 sec

2019-03-04 13:12:12,205 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 253.8 sec

2019-03-04 13:12:34,851 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 274.25 sec

2019-03-04 13:12:37,942 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 277.76 sec

2019-03-04 13:12:41,023 Stage-1 map = 100%, reduce = 73%, Cumulative CPU 280.93 sec

2019-03-04 13:12:44,103 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 284.14 sec

2019-03-04 13:12:47,185 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 287.42 sec

2019-03-04 13:12:50,267 Stage-1 map = 100%, reduce = 80%, Cumulative CPU 290.65 sec

2019-03-04 13:12:53,353 Stage-1 map = 100%, reduce = 82%, Cumulative CPU 293.78 sec

2019-03-04 13:12:56,432 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 296.95 sec

2019-03-04 13:12:59,547 Stage-1 map = 100%, reduce = 86%, Cumulative CPU 300.13 sec

2019-03-04 13:13:02,639 Stage-1 map = 100%, reduce = 88%, Cumulative CPU 303.33 sec

2019-03-04 13:13:05,728 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 306.51 sec

2019-03-04 13:13:08,837 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 309.67 sec

2019-03-04 13:13:11,917 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 312.88 sec

2019-03-04 13:13:14,996 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 315.99 sec

2019-03-04 13:13:17,052 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 318.29 sec

MapReduce Total cumulative CPU time: 5 minutes 18 seconds 290 msec

Ended Job = job_1546585330012_16354054

MapReduce Jobs Launched:

Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 318.29 sec HDFS Read: 805206007 HDFS Write: 68 SUCCESS

Total MapReduce CPU Time Spent: 5 minutes 18 seconds 290 msec

OK

2120553127866943

Time taken: 218.373 seconds, Fetched: 1 row(s)

(2) 语句二日志

Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1

2019-03-04 13:12:08,331 Stage-1 map = 0%, reduce = 0%

2019-03-04 13:12:19,715 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 6.48 sec

2019-03-04 13:12:42,875 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 69.34 sec

2019-03-04 13:12:43,906 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 72.54 sec

2019-03-04 13:13:16,847 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 162.12 sec

2019-03-04 13:13:19,930 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 168.67 sec

2019-03-04 13:13:23,060 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 174.94 sec

2019-03-04 13:13:32,311 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 192.25 sec

2019-03-04 13:13:35,397 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 195.41 sec

2019-03-04 13:13:54,977 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 204.97 sec

2019-03-04 13:13:58,058 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 210.04 sec

2019-03-04 13:14:01,135 Stage-1 map = 100%, reduce = 69%, Cumulative CPU 213.98 sec

2019-03-04 13:14:04,236 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 217.42 sec

2019-03-04 13:14:07,345 Stage-1 map = 100%, reduce = 74%, Cumulative CPU 220.64 sec

2019-03-04 13:14:10,432 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 223.88 sec

2019-03-04 13:14:13,518 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 227.09 sec

2019-03-04 13:14:16,602 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 230.22 sec

2019-03-04 13:14:19,678 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 233.48 sec

2019-03-04 13:14:22,759 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 236.69 sec

2019-03-04 13:14:25,841 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 239.9 sec

2019-03-04 13:14:28,920 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 242.94 sec

2019-03-04 13:14:32,011 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 246.05 sec

2019-03-04 13:14:35,103 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 249.11 sec

2019-03-04 13:14:38,181 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 252.12 sec

2019-03-04 13:14:41,261 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 255.26 sec

2019-03-04 13:14:43,312 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 256.88 sec

MapReduce Total cumulative CPU time: 4 minutes 16 seconds 880 msec

Ended Job = job_1546585330012_16354401

Launching Job 2 out of 2

Number of reduce tasks determined at compile time: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=

In order to set a constant number of reducers:

set mapreduce.job.reduces=

Starting Job = job_1546585330012_16354794, Tracking URL = http://tjtx-81-187.58os.org:9088/proxy/application_1546585330012_16354794/

Kill Command = /usr/lib/software/hadoop/bin/hadoop job -kill job_1546585330012_16354794

Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1

2019-03-04 13:15:10,988 Stage-2 map = 0%, reduce = 0%

2019-03-04 13:15:21,338 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.15 sec

2019-03-04 13:15:30,634 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.56 sec

MapReduce Total cumulative CPU time: 4 seconds 560 msec

Ended Job = job_1546585330012_16354794

MapReduce Jobs Launched:

Stage-Stage-1: Map: 3 Reduce: 1 Cumulative CPU: 256.88 sec HDFS Read: 805205212 HDFS Write: 123 SUCCESS

Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 4.56 sec HDFS Read: 5330 HDFS Write: 68 SUCCESS

Total MapReduce CPU Time Spent: 4 minutes 21 seconds 440 msec

OK

2120553121205532

Time taken: 225.271 seconds, Fetched: 1 row(s)

三, 总结

在测试使用的集群中2000w的数据量级在效率上差别不大.

为什么慎用呢? 因为hive的distinct底层使用了HashSet去重.

即然效率上差不多,还是能不用就不用的好.

--

count时结果 hive_hive的count(distinct id)测试--慎用相关推荐

  1. MYSQL统计行数时到底应该怎么COUNT

    相信每个人在写代码时都有遇到过要获取MYSQL表里数据行数的情况,多数人获取数据表行数时都用COUNT(*),但同时也流传了不少其他方式,比如说COUNT(1).COUNT(主键).COUNT(字段) ...

  2. 【优化】COUNT(1)、COUNT(*)、COUNT(常量)、COUNT(主键)、COUNT(ROWID)、COUNT(非空列)、COUNT(允许为空列)、COUNT(DISTINCT 列名)

    [优化]COUNT(1).COUNT(*).COUNT(常量).COUNT(主键).COUNT(ROWID).COUNT(非空列).COUNT(允许为空列).COUNT(DISTINCT 列名) 1. ...

  3. SQL报错:Column count doesn‘t match value count at row 1

    Column count doesn't match value count at row 1 意思是:列数与第1行的值数不匹配 比如:person表中有4个字段(id,name,pwd,gender ...

  4. sql 在某表中加入一列count所有数据_执行COUNT(1)、COUNT(*) 与 COUNT(列名) 到底有什么区别?...

    击关注上方"SQL数据库开发", 设为"置顶或星标",第一时间送达干货. 1.  count(1) and count(*) 从执行计划来看,count(1)和 ...

  5. 成功解决:1136 - Column count doesn‘t match value count at row 1

    报错信息 Column count doesn't match value count at row 1; Column count doesn't match value count at row ...

  6. MySQL count(*)、count(1) 和count(字段)的区别以及count()查询优化手段

    MySQL的count(*).count(1) 和count(字段)的区别以及count()查询优化手段. 文章目录 1 几种count查询的区别 2 优化COUNT()查询 1 几种count查询的 ...

  7. count(*),count(1),count(列)区别

    执行效果: count(1) and count(*) 当表的数据量大些时,对表作分析之后,使用count(1)还要比使用count(*)用时多了! 从执行计划来看,count(1)和count()的 ...

  8. 性能对比:Count(字段)、Count(主键)、Count(1)、Count(*)

    本文经授权转载自微信公众号:猿人谷 最近有几个小伙伴留言说不清楚Count(字段).Count(主键).Count(1).Count(*)的区别,特此写篇短文说明下. 以下讨论是基于InnoDB引擎. ...

  9. oracle的count的null为0,count(字段)不统计null值

    在对有null值的字段进行count时,发现count(1)与count(字段)得到的记录不一样. 最后在ITPUB上朋友们的帮助下,解决了. 结论:1.count(1)与count(*)得到的结果一 ...

最新文章

  1. 输入和输出移位寄存器的同步串行模式
  2. 图像边缘检测技术与理论发展脉络梳理大放送
  3. 第十五讲 傅里叶级数引入
  4. vsflexgrid单元格换行后自动使用行高_「Excel技巧」Excel关于换行的技巧,你懂多少?...
  5. STL经典算法集锦之排列(next_permutation/prev_permutation
  6. 语法分析生成器 - LEX
  7. 阿里云在线web IDE:云效云端开发 DevStudio(ide.aliyun.com)
  8. 'SVN更新' has encountered a problem :An internal error occurred during: svn错误
  9. 给php权限,PHP实现权限管理功能的方法
  10. koa-router 源码浅析
  11. kindeditor编辑器 编辑器textarea不能获取到内容的解决办法
  12. Spring Boot入口类
  13. Java并发基础问题总结
  14. php Allowed memory size of 134217728 bytes exhausted
  15. 炫龙笔记本怎么进bios设置u盘启动图文教程
  16. 爬虫入门(三)——动态网页爬取:爬取pexel上的图片
  17. 乐视,你敢做VR直播吗?
  18. 给 App 提速:Android 性能优化总结
  19. Java IO篇 Java IO编程
  20. 3D打印机喷头堵塞维修

热门文章

  1. php从mysql 表中提取图片数据并显示
  2. cxf生成客户端代码
  3. ovs router
  4. Java中迭代列表中数据时几种循环写法的效率比较
  5. RHCS图形界面建立GFS共享上
  6. rhel6   openldap
  7. python的六大数据类型中可以改变的数据类型为_Python中数据类型转换
  8. 4*4按键扫描程序c语言,【资料】单片机4*4矩阵键盘扫描程序(c语言+汇编语言2个版本)...
  9. python 中evaluationcontext是什么_Pytorch evaluation每次运行结果不同的解决
  10. 成人怎么学计算机英语单词,成人怎么从零开始学英语单词