ElasticSearch聚合语法学习(bucket,metric,hitogram,date hitogram)
ElasticSearch聚合语法学习
目录
- bucket与metric两个核心概念
- 插入数据
- 统计哪种颜色电视销量最高
- 统计每种颜色电视平均价格
- bucket嵌套实现颜色+品牌的多层下钻
- 统计每种颜色电视最大最小,总和价格
- hitogram按价格区间统计电视销量和销售额
- date hitogram之统计每月电视销量
- 下钻分析之统计每季度每个品牌的销售额
1. bucket与metric两个核心概念
- bucket:对数据分组
city name
北京 小李
北京 小王
上海 小张
上海 小丽
上海 小陈
基于city划分buckets,划分出来两个bucket,一个是北京bucket,一个是上海bucket
- 北京bucket:包含了2个人,小李,小王
- 上海bucket:包含了3个人,小张,小丽,小陈
按照某个字段进行bucket划分,那个字段的值相同的那些数据,就会被划分到一个bucket中
有一些mysql的sql知识的话,聚合,首先第一步就是分组,对每个组内的数据进行聚合分析,分组,就是我们的bucket
metric:对一个数据分组执行的统计
当我们有了一堆bucket之后,就可以对每个bucket中的数据进行聚合分词了,比如说计算一个bucket内所有数据的数量,或者计算一个bucket内所有数据的平均值,最大值,最小值
metric,就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值,求最小值
select count(*) from access_log group by user_id
- bucket:group by user_id --> 那些user_id相同的数据,就会被划分到一个bucket中
- metric:count(*),对每个user_id bucket中所有的数据,计算一个数量
2. 插入数据
PUT /tvs
{"mappings": {"sales": {"properties": {"price": {"type": "long"},"color": {"type": "keyword"},"brand": {"type": "keyword"},"sold_date": {"type": "date"}}}}
}
POST /tvs/sales/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "绿色", "brand" : "小米", "sold_date" : "2016-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "蓝色", "brand" : "TCL", "sold_date" : "2016-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "绿色", "brand" : "TCL", "sold_date" : "2016-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "红色", "brand" : "三星", "sold_date" : "2017-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "蓝色", "brand" : "小米", "sold_date" : "2017-02-12" }
3. 统计哪种颜色电视销量最高
GET /tvs/sales/_search
{"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color"}}}
}
- size:只获取聚合结果,而不要执行聚合的原始数据
- aggs:固定语法,要对一份数据执行分组聚合操作
- popular_colors:就是对每个aggs,都要起一个名字,这个名字是随机的,你随便取什么都ok
- terms:根据字段的值进行分组
- field:根据指定的字段的值进行分组
- 相当于:select color,count(*) from tbl group by color;
4. 统计每种颜色电视平均价格
GET /tvs/sales/_search
{"size" : 0,"aggs": {"colors": {"terms": {"field": "color"},"aggs": { "avg_price": { "avg": {"field": "price" }}}}}
}
按照color去分bucket,可以拿到每个color bucket中的数量,这个仅仅只是一个bucket操作,doc_count其实只是es的bucket操作默认执行的一个内置metric
除了bucket操作,分组,还要对每个bucket执行一个metric聚合统计操作
在一个aggs执行的bucket操作(terms),平级的json结构下,再加一个aggs,这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field,price field,求一个平均值
"aggs": { "avg_price": { "avg": {"field": "price" }}
}
metric,就是一个对一个bucket分组操作之后,对每个bucket都要执行的一个metric
第一个metric,avg,求指定字段的平均值
5. bucket嵌套实现颜色+品牌的多层下钻
从颜色到品牌进行下钻分析,每种颜色的平均价格,以及找到每种颜色每个品牌的平均价格
我们可以进行多层次的下钻
比如说,现在红色的电视有4台,同时这4台电视中,有3台是属于长虹的,1台是属于小米的
- 红色电视中的3台长虹的平均价格是多少?
- 红色电视中的1台小米的平均价格是多少?
下钻的意思是,已经分了一个组了,比如说颜色的分组,然后还要继续对这个分组内的数据,再分组,比如一个颜色内,还可以分成多个不同的品牌的组,最后对每个最小粒度的分组执行聚合分析操作,这就叫做下钻分析
es,下钻分析,就要对bucket进行多层嵌套,多次分组
按照多个维度(颜色+品牌)多层下钻分析,而且学会了每个下钻维度(颜色,颜色+品牌),都可以对每个维度分别执行一次metric聚合操作
GET /tvs/sales/_search
{"size": 0,"aggs": {"group_by_color": {"terms": {"field": "color"},"aggs": {"color_avg_price": {"avg": {"field": "price"}},"group_by_brand": {"terms": {"field": "brand"},"aggs": {"brand_avg_price": {"avg": {"field": "price"}}}}}}}
}
{"took": 8,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_color": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "红色","doc_count": 4,"color_avg_price": {"value": 3250},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "长虹","doc_count": 3,"brand_avg_price": {"value": 1666.6666666666667}},{"key": "三星","doc_count": 1,"brand_avg_price": {"value": 8000}}]}},{"key": "绿色","doc_count": 2,"color_avg_price": {"value": 2100},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "TCL","doc_count": 1,"brand_avg_price": {"value": 1200}},{"key": "小米","doc_count": 1,"brand_avg_price": {"value": 3000}}]}},{"key": "蓝色","doc_count": 2,"color_avg_price": {"value": 2000},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "TCL","doc_count": 1,"brand_avg_price": {"value": 1500}},{"key": "小米","doc_count": 1,"brand_avg_price": {"value": 2500}}]}}]}}
}
6. 统计每种颜色电视最大最小,总和价格
count:bucket,terms,自动就会有一个doc_count,就相当于是count
avg:avg aggs,求平均值
max:求一个bucket内,指定field值最大的那个数据
min:求一个bucket内,指定field值最小的那个数据
sum:求一个bucket内,指定field值的总和
一般来说,90%的常见的数据分析的操作,metric,无非就是count,avg,max,min,sum
GET /tvs/sales/_search
{"size" : 0,"aggs": {"colors": {"terms": {"field": "color"},"aggs": {"avg_price": { "avg": { "field": "price" } },"min_price" : { "min": { "field": "price"} }, "max_price" : { "max": { "field": "price"} },"sum_price" : { "sum": { "field": "price" } } }}}
}
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_color": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "红色","doc_count": 4,"max_price": {"value": 8000},"min_price": {"value": 1000},"avg_price": {"value": 3250},"sum_price": {"value": 13000}},{"key": "绿色","doc_count": 2,"max_price": {"value": 3000},"min_price": {"value":}, 1200"avg_price": {"value": 2100},"sum_price": {"value": 4200}},{"key": "蓝色","doc_count": 2,"max_price": {"value": 2500},"min_price": {"value": 1500},"avg_price": {"value": 2000},"sum_price": {"value": 4000}}]}}
}
7. hitogram按价格区间统计电视销量和销售额
- histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作
"histogram":{ "field": "price","interval": 2000
}
interval:2000,划分范围,0-
2000,2000-
4000,4000-6000,6000-
8000,8000-10000,buckets去根据price的值,比如2500,看落在哪个区间内,比如2000-4000,此时就会将这条数据放入2000-4000对应的那个bucket中bucket划分的方法,terms,将field值相同的数据划分到一个bucket中
bucket有了之后,一样的,去对每个bucket执行avg,count,sum,max,min,等各种metric操作,聚合分析
GET /tvs/sales/_search
{"size" : 0,"aggs":{"price":{"histogram":{ "field": "price","interval": 2000},"aggs":{"revenue": {"sum": { "field" : "price"}}}}}
}
{"took": 13,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_price": {"buckets": [{"key": 0,"doc_count": 3,"sum_price": {"value": 3700}},{"key": 2000,"doc_count": 4,"sum_price": {"value": 9500}},{"key": 4000,"doc_count": 0,"sum_price": {"value": 0}},{"key": 6000,"doc_count: {"value":": 0,"sum_price" 0}},{"key": 8000,"doc_count": 1,"sum_price": {"value": 8000}}]}}
}
8. date hitogram之统计每月电视销量
date histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket
date interval = 1m,
- 2017-01-01~2017-01-31,就是一个bucket
- 2017-02-01~2017-02-28,就是一个bucket
然后会去扫描每个数据的date field,判断date落在哪个bucket中,就将其放入那个bucket
2017-01-05,就将其放入2017-01-01~2017-01-31,就是一个bucket
min_doc_count:即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,不然默认是会过滤掉这个区间的
extended_bounds,min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内
GET /tvs/sales/_search
{"size" : 0,"aggs": {"sales": {"date_histogram": {"field": "sold_date","interval": "month", "format": "yyyy-MM-dd","min_doc_count" : 0, "extended_bounds" : { "min" : "2016-01-01","max" : "2017-12-31"}}}}
}
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_sold_date": {"buckets": [{"key_as_string": "2016-01-01","key": 1451606400000,"doc_count": 0},{"key_as_string": "2016-02-01","key": 1454284800000,"doc_count": 0},{"key_as_string": "2016-03-01","key": 1456790400000,"doc_count": 0},{"key_as_string": "2016-04-01","key": 1459468800000,"doc_count": 0},{"key_as_string": "2016-05-01","key": 1462060800000,"doc_count": 1},{"key_as_string": "2016-06-01","key": 1464739200000,"doc_count": 0},{"key_as_string": "2016-07-01","key": 1467331200000,"doc_count": 1},{"key_as_strin"key_as_string": "2016-09-01","key": 1472688000000,"doc_count": 0},g": "2016-08-01","key": 1470009600000,"doc_count": 1},{{"key_as_string": "2016-10-01","key": 1475280000000,"doc_count": 1},{"key_as_string": "2016-11-01","key": 1477958400000,"doc_count": 2},{"key_as_string": "2016-12-01","key": 1480550400000,"doc_count": 0},{"key_as_string": "2017-01-01","key": 1483228800000,"doc_count": 1},{"key_as_string": "2017-02-01","key": 1485907200000,"doc_count": 1}]}}
}
9. 下钻分析之统计每季度每个品牌的销售额
GET /tvs/sales/_search
{"size": 0,"aggs": {"group_by_sold_date": {"date_histogram": {"field": "sold_date","interval": "quarter","format": "yyyy-MM-dd","min_doc_count": 0,"extended_bounds": {"min": "2016-01-01","max": "2017-12-31"}},"aggs": {"group_by_brand": {"terms": {"field": "brand"},"aggs": {"sum_price": {"sum": {"field": "price"}}}},"total_sum_price": {"sum": {"field": "price"}}}}}
}
{"took": 10,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_sold_date": {"buckets": [{"key_as_string": "2016-01-01","key": 1451606400000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}},{"key_as_string": "2016-04-01","key": 1459468800000,"doc_count": 1,"total_sum_price": {"value": 3000},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "小米","doc_count": 1,"sum_price": {"value": 3000}}]}},{"key_as_string": "2016-07-01","key": 1467331200000,"doc_count": 2,"total_sum_price": {"value": 2700},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "TCL","doc_count": 2,"sum_price": {"value": 2700}}]}},{"key_as_string": "2016-10-01","key": 1475280000000,"doc_count": 3,"total_sum_price": {"value": 5000},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "长虹","doc_count": 3,"sum_price": {"value": 5000}}]}},{"key_as_string": "2017-01-01","key": 1483228800000,"doc_count": 2,"total_sum_price": {"value": 10500},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "三星","doc_count": 1,"sum_price": {"value": 8000}},{"key": "小米","doc_count": 1,"sum_price": {"value": 2500}}]}},{"key_as_string": "2017-04-01","key": 1491004800000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}},{"key_as_string": "2017-07-01","key": 1498867200000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}},{"key_as_string": "2017-10-01","key": 1506816000000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}}]}}
}
ElasticSearch聚合语法学习(bucket,metric,hitogram,date hitogram)相关推荐
- ElasticSearch练习二:聚合语法学习(aggs、bucket、metric、hitogram、date hitogram)
bucket与metric两个核心概念 初始化数据 city name 北京 小李 北京 小王 上海 小张 上海 小丽 上海 小陈 按照某个字段进行bucket划分,那个字段的值相同的那些数据,就会被 ...
- ElasticSearch搜索语法学习(term,filter,bool,terms,range)
ES搜索语法学习 目录 原始数据 term,filter使用 bool组合多个filter条件来搜索数据 terms搜索多个值以及多值搜索结果优化 基于range filter来进行范围过滤 手动控制 ...
- ElasticSearch搜索语法进阶学习(搜索+聚合,过滤+聚合)
ElasticSearch聚合+搜索语法学习 目录 搜索+聚合:统计指定品牌下每个颜色的销量 global bucket:单个品牌与所有品牌销量对比 过滤+聚合:统计价格大于1200的电视平均价格 b ...
- 聚合中返回source_大数据搜索与可视化分析(9)elasticsearch聚合分析Metric Aggregation...
在上一篇文章中,我们介绍了<大数据搜索与可视化分析(8)kibana入门教程-2-Discover>,本文学习elasticsearch聚合分析,是对<大数据搜索与可视化分析(3)e ...
- Elasticsearch聚合初探——metric篇
2019独角兽企业重金招聘Python工程师标准>>> Elasticsearch是一款提供检索以及相关度排序的开源框架,同时,也支持对存储的文档进行复杂的统计--聚合. 前言 ES ...
- Elasticsearch聚合——Bucket Aggregations
2019独角兽企业重金招聘Python工程师标准>>> 本文主要参考于Elasticsearch Reference 6.3 -- Bucket Aggregations Terms ...
- Elasticsearch聚合学习之二:区间聚合
本文是<Elasticsearch聚合学习>系列的第二篇,上一篇是我们熟悉了聚合的基本操作,本篇的内容是按照区间聚合的实战操作: 系列文章列表 <Elasticsearch聚合学习之 ...
- SpringCloud(9)— Elasticsearch聚合和自动补全
SpringCloud(9)- Elasticsearch聚合和自动补全 一 数据聚合 1.聚合的分类 聚合(aggregations)可以实现对文档数据的统计,分析,运算.常见的聚合有三种: 1.桶 ...
- Composite 聚合——Elasticsearch 聚合后分页新实现
1.聚合后分页应用场景 在常规检索的基础上,用户期望返回基于特定字段的聚合结果,且用户期望分页查看检索和聚合结果. 如下图所示:以2020东京奥运会热点新闻亚洲飞人"苏炳添"为例, ...
最新文章
- 2-计算机发展及应用
- ie11用不了html5,HTML5拖放不能在IE11上运行
- redis stream java消息队列_Redis 异步消息队列与延时队列
- 超级计算机子系统,大规模并行巨型机的并行I/O子系统
- java static final关键字_Java的static与final关键字
- php新闻列表排序,javascript 新闻列表排序简单封装
- 【渝粤教育】国家开放大学2018年春季 0063-21T中国现当代文学专题 参考试题
- 香港手机号正则表达式
- multicast msdp 2
- 快速搜索Wox工具之Everything Service没有运行报错,解决办法!
- Linux提高CPU使用率并设置固定占比
- Box2D例子——打砖块
- 44道JavaScript送命题
- jsjquery获取url域名及参数的方法
- 图像处理--医疗图片的处理
- 【C语言】memcmp、memcpy、memmove、memset、memchr函数详解
- python学习笔记(七):运算符和流程控制
- postcss是什么?
- 电影票网上订购平台系统
- 【置顶帖】关于博主/关于博客/博客大事记
热门文章
- DOM——创建节点及节点属性与内部插入append()和appendTo()
- 51CTO学院四周年-成长之路
- (原)torch中微调某层参数
- Unity3D 人形血条制作小知识
- Android OpenGL ES(十)绘制三角形Triangle .
- 将jar文件转换成exe可执行文件[转]
- linux查看和修改当前系统时间
- cocos工程里面“”invalid ''cobl” in fuction xxx原理与解决方案
- CodeForces - 1445E Team-Building(可撤销并查集)
- HDOJ1536 S-nim