ElasticSearch聚合语法学习


目录

  1. bucket与metric两个核心概念
  2. 插入数据
  3. 统计哪种颜色电视销量最高
  4. 统计每种颜色电视平均价格
  5. bucket嵌套实现颜色+品牌的多层下钻
  6. 统计每种颜色电视最大最小,总和价格
  7. hitogram按价格区间统计电视销量和销售额
  8. date hitogram之统计每月电视销量
  9. 下钻分析之统计每季度每个品牌的销售额

1. bucket与metric两个核心概念

  1. bucket:对数据分组
city name
北京 小李
北京 小王
上海 小张
上海 小丽
上海 小陈
  1. 基于city划分buckets,划分出来两个bucket,一个是北京bucket,一个是上海bucket

    1. 北京bucket:包含了2个人,小李,小王
    2. 上海bucket:包含了3个人,小张,小丽,小陈
  2. 按照某个字段进行bucket划分,那个字段的值相同的那些数据,就会被划分到一个bucket中

  3. 有一些mysql的sql知识的话,聚合,首先第一步就是分组,对每个组内的数据进行聚合分析,分组,就是我们的bucket

  4. metric:对一个数据分组执行的统计

  5. 当我们有了一堆bucket之后,就可以对每个bucket中的数据进行聚合分词了,比如说计算一个bucket内所有数据的数量,或者计算一个bucket内所有数据的平均值,最大值,最小值

  6. metric,就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值,求最小值

select count(*)  from access_log group by user_id
  1. bucket:group by user_id --> 那些user_id相同的数据,就会被划分到一个bucket中
  2. metric:count(*),对每个user_id bucket中所有的数据,计算一个数量

2. 插入数据

PUT /tvs
{"mappings": {"sales": {"properties": {"price": {"type": "long"},"color": {"type": "keyword"},"brand": {"type": "keyword"},"sold_date": {"type": "date"}}}}
}
POST /tvs/sales/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "绿色", "brand" : "小米", "sold_date" : "2016-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "蓝色", "brand" : "TCL", "sold_date" : "2016-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "绿色", "brand" : "TCL", "sold_date" : "2016-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2016-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "红色", "brand" : "三星", "sold_date" : "2017-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "蓝色", "brand" : "小米", "sold_date" : "2017-02-12" }

3. 统计哪种颜色电视销量最高

GET /tvs/sales/_search
{"size" : 0,"aggs" : { "popular_colors" : { "terms" : { "field" : "color"}}}
}
  1. size:只获取聚合结果,而不要执行聚合的原始数据
  2. aggs:固定语法,要对一份数据执行分组聚合操作
  3. popular_colors:就是对每个aggs,都要起一个名字,这个名字是随机的,你随便取什么都ok
  4. terms:根据字段的值进行分组
  5. field:根据指定的字段的值进行分组
  6. 相当于:select color,count(*) from tbl group by color;

4. 统计每种颜色电视平均价格

GET /tvs/sales/_search
{"size" : 0,"aggs": {"colors": {"terms": {"field": "color"},"aggs": { "avg_price": { "avg": {"field": "price" }}}}}
}
  1. 按照color去分bucket,可以拿到每个color bucket中的数量,这个仅仅只是一个bucket操作,doc_count其实只是es的bucket操作默认执行的一个内置metric

  2. 除了bucket操作,分组,还要对每个bucket执行一个metric聚合统计操作

  3. 在一个aggs执行的bucket操作(terms),平级的json结构下,再加一个aggs,这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field,price field,求一个平均值

"aggs": { "avg_price": { "avg": {"field": "price" }}
}
  1. metric,就是一个对一个bucket分组操作之后,对每个bucket都要执行的一个metric

  2. 第一个metric,avg,求指定字段的平均值


5. bucket嵌套实现颜色+品牌的多层下钻

  1. 从颜色到品牌进行下钻分析,每种颜色的平均价格,以及找到每种颜色每个品牌的平均价格

  2. 我们可以进行多层次的下钻

  3. 比如说,现在红色的电视有4台,同时这4台电视中,有3台是属于长虹的,1台是属于小米的

    1. 红色电视中的3台长虹的平均价格是多少?
    2. 红色电视中的1台小米的平均价格是多少?
  4. 下钻的意思是,已经分了一个组了,比如说颜色的分组,然后还要继续对这个分组内的数据,再分组,比如一个颜色内,还可以分成多个不同的品牌的组,最后对每个最小粒度的分组执行聚合分析操作,这就叫做下钻分析

  5. es,下钻分析,就要对bucket进行多层嵌套,多次分组

  6. 按照多个维度(颜色+品牌)多层下钻分析,而且学会了每个下钻维度(颜色,颜色+品牌),都可以对每个维度分别执行一次metric聚合操作

GET /tvs/sales/_search
{"size": 0,"aggs": {"group_by_color": {"terms": {"field": "color"},"aggs": {"color_avg_price": {"avg": {"field": "price"}},"group_by_brand": {"terms": {"field": "brand"},"aggs": {"brand_avg_price": {"avg": {"field": "price"}}}}}}}
}
{"took": 8,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_color": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "红色","doc_count": 4,"color_avg_price": {"value": 3250},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "长虹","doc_count": 3,"brand_avg_price": {"value": 1666.6666666666667}},{"key": "三星","doc_count": 1,"brand_avg_price": {"value": 8000}}]}},{"key": "绿色","doc_count": 2,"color_avg_price": {"value": 2100},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "TCL","doc_count": 1,"brand_avg_price": {"value": 1200}},{"key": "小米","doc_count": 1,"brand_avg_price": {"value": 3000}}]}},{"key": "蓝色","doc_count": 2,"color_avg_price": {"value": 2000},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "TCL","doc_count": 1,"brand_avg_price": {"value": 1500}},{"key": "小米","doc_count": 1,"brand_avg_price": {"value": 2500}}]}}]}}
}

6. 统计每种颜色电视最大最小,总和价格

  1. count:bucket,terms,自动就会有一个doc_count,就相当于是count

  2. avg:avg aggs,求平均值

  3. max:求一个bucket内,指定field值最大的那个数据

  4. min:求一个bucket内,指定field值最小的那个数据

  5. sum:求一个bucket内,指定field值的总和

  6. 一般来说,90%的常见的数据分析的操作,metric,无非就是count,avg,max,min,sum

GET /tvs/sales/_search
{"size" : 0,"aggs": {"colors": {"terms": {"field": "color"},"aggs": {"avg_price": { "avg": { "field": "price" } },"min_price" : { "min": { "field": "price"} }, "max_price" : { "max": { "field": "price"} },"sum_price" : { "sum": { "field": "price" } } }}}
}
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_color": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "红色","doc_count": 4,"max_price": {"value": 8000},"min_price": {"value": 1000},"avg_price": {"value": 3250},"sum_price": {"value": 13000}},{"key": "绿色","doc_count": 2,"max_price": {"value": 3000},"min_price": {"value":}, 1200"avg_price": {"value": 2100},"sum_price": {"value": 4200}},{"key": "蓝色","doc_count": 2,"max_price": {"value": 2500},"min_price": {"value": 1500},"avg_price": {"value": 2000},"sum_price": {"value": 4000}}]}}
}

7. hitogram按价格区间统计电视销量和销售额

  1. histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作
"histogram":{ "field": "price","interval": 2000
}
  1. interval:2000,划分范围,0-2000,2000-4000,4000-6000,6000-8000,8000-10000,buckets去根据price的值,比如2500,看落在哪个区间内,比如2000-4000,此时就会将这条数据放入2000-4000对应的那个bucket中

  2. bucket划分的方法,terms,将field值相同的数据划分到一个bucket中

  3. bucket有了之后,一样的,去对每个bucket执行avg,count,sum,max,min,等各种metric操作,聚合分析

GET /tvs/sales/_search
{"size" : 0,"aggs":{"price":{"histogram":{ "field": "price","interval": 2000},"aggs":{"revenue": {"sum": { "field" : "price"}}}}}
}
{"took": 13,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_price": {"buckets": [{"key": 0,"doc_count": 3,"sum_price": {"value": 3700}},{"key": 2000,"doc_count": 4,"sum_price": {"value": 9500}},{"key": 4000,"doc_count": 0,"sum_price": {"value": 0}},{"key": 6000,"doc_count: {"value":": 0,"sum_price" 0}},{"key": 8000,"doc_count": 1,"sum_price": {"value": 8000}}]}}
}

8. date hitogram之统计每月电视销量

  1. date histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket

  2. date interval = 1m,

    1. 2017-01-01~2017-01-31,就是一个bucket
    2. 2017-02-01~2017-02-28,就是一个bucket
  3. 然后会去扫描每个数据的date field,判断date落在哪个bucket中,就将其放入那个bucket

  4. 2017-01-05,就将其放入2017-01-01~2017-01-31,就是一个bucket

  5. min_doc_count:即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,不然默认是会过滤掉这个区间的

  6. extended_bounds,min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内

GET /tvs/sales/_search
{"size" : 0,"aggs": {"sales": {"date_histogram": {"field": "sold_date","interval": "month", "format": "yyyy-MM-dd","min_doc_count" : 0, "extended_bounds" : { "min" : "2016-01-01","max" : "2017-12-31"}}}}
}
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_sold_date": {"buckets": [{"key_as_string": "2016-01-01","key": 1451606400000,"doc_count": 0},{"key_as_string": "2016-02-01","key": 1454284800000,"doc_count": 0},{"key_as_string": "2016-03-01","key": 1456790400000,"doc_count": 0},{"key_as_string": "2016-04-01","key": 1459468800000,"doc_count": 0},{"key_as_string": "2016-05-01","key": 1462060800000,"doc_count": 1},{"key_as_string": "2016-06-01","key": 1464739200000,"doc_count": 0},{"key_as_string": "2016-07-01","key": 1467331200000,"doc_count": 1},{"key_as_strin"key_as_string": "2016-09-01","key": 1472688000000,"doc_count": 0},g": "2016-08-01","key": 1470009600000,"doc_count": 1},{{"key_as_string": "2016-10-01","key": 1475280000000,"doc_count": 1},{"key_as_string": "2016-11-01","key": 1477958400000,"doc_count": 2},{"key_as_string": "2016-12-01","key": 1480550400000,"doc_count": 0},{"key_as_string": "2017-01-01","key": 1483228800000,"doc_count": 1},{"key_as_string": "2017-02-01","key": 1485907200000,"doc_count": 1}]}}
}

9. 下钻分析之统计每季度每个品牌的销售额

GET /tvs/sales/_search
{"size": 0,"aggs": {"group_by_sold_date": {"date_histogram": {"field": "sold_date","interval": "quarter","format": "yyyy-MM-dd","min_doc_count": 0,"extended_bounds": {"min": "2016-01-01","max": "2017-12-31"}},"aggs": {"group_by_brand": {"terms": {"field": "brand"},"aggs": {"sum_price": {"sum": {"field": "price"}}}},"total_sum_price": {"sum": {"field": "price"}}}}}
}
{"took": 10,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 8,"max_score": 0,"hits": []},"aggregations": {"group_by_sold_date": {"buckets": [{"key_as_string": "2016-01-01","key": 1451606400000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}},{"key_as_string": "2016-04-01","key": 1459468800000,"doc_count": 1,"total_sum_price": {"value": 3000},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "小米","doc_count": 1,"sum_price": {"value": 3000}}]}},{"key_as_string": "2016-07-01","key": 1467331200000,"doc_count": 2,"total_sum_price": {"value": 2700},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "TCL","doc_count": 2,"sum_price": {"value": 2700}}]}},{"key_as_string": "2016-10-01","key": 1475280000000,"doc_count": 3,"total_sum_price": {"value": 5000},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "长虹","doc_count": 3,"sum_price": {"value": 5000}}]}},{"key_as_string": "2017-01-01","key": 1483228800000,"doc_count": 2,"total_sum_price": {"value": 10500},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": [{"key": "三星","doc_count": 1,"sum_price": {"value": 8000}},{"key": "小米","doc_count": 1,"sum_price": {"value": 2500}}]}},{"key_as_string": "2017-04-01","key": 1491004800000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}},{"key_as_string": "2017-07-01","key": 1498867200000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}},{"key_as_string": "2017-10-01","key": 1506816000000,"doc_count": 0,"total_sum_price": {"value": 0},"group_by_brand": {"doc_count_error_upper_bound": 0,"sum_other_doc_count": 0,"buckets": []}}]}}
}

ElasticSearch聚合语法学习(bucket,metric,hitogram,date hitogram)相关推荐

  1. ElasticSearch练习二:聚合语法学习(aggs、bucket、metric、hitogram、date hitogram)

    bucket与metric两个核心概念 初始化数据 city name 北京 小李 北京 小王 上海 小张 上海 小丽 上海 小陈 按照某个字段进行bucket划分,那个字段的值相同的那些数据,就会被 ...

  2. ElasticSearch搜索语法学习(term,filter,bool,terms,range)

    ES搜索语法学习 目录 原始数据 term,filter使用 bool组合多个filter条件来搜索数据 terms搜索多个值以及多值搜索结果优化 基于range filter来进行范围过滤 手动控制 ...

  3. ElasticSearch搜索语法进阶学习(搜索+聚合,过滤+聚合)

    ElasticSearch聚合+搜索语法学习 目录 搜索+聚合:统计指定品牌下每个颜色的销量 global bucket:单个品牌与所有品牌销量对比 过滤+聚合:统计价格大于1200的电视平均价格 b ...

  4. 聚合中返回source_大数据搜索与可视化分析(9)elasticsearch聚合分析Metric Aggregation...

    在上一篇文章中,我们介绍了<大数据搜索与可视化分析(8)kibana入门教程-2-Discover>,本文学习elasticsearch聚合分析,是对<大数据搜索与可视化分析(3)e ...

  5. Elasticsearch聚合初探——metric篇

    2019独角兽企业重金招聘Python工程师标准>>> Elasticsearch是一款提供检索以及相关度排序的开源框架,同时,也支持对存储的文档进行复杂的统计--聚合. 前言 ES ...

  6. Elasticsearch聚合——Bucket Aggregations

    2019独角兽企业重金招聘Python工程师标准>>> 本文主要参考于Elasticsearch Reference 6.3 -- Bucket Aggregations Terms ...

  7. Elasticsearch聚合学习之二:区间聚合

    本文是<Elasticsearch聚合学习>系列的第二篇,上一篇是我们熟悉了聚合的基本操作,本篇的内容是按照区间聚合的实战操作: 系列文章列表 <Elasticsearch聚合学习之 ...

  8. SpringCloud(9)— Elasticsearch聚合和自动补全

    SpringCloud(9)- Elasticsearch聚合和自动补全 一 数据聚合 1.聚合的分类 聚合(aggregations)可以实现对文档数据的统计,分析,运算.常见的聚合有三种: 1.桶 ...

  9. Composite 聚合——Elasticsearch 聚合后分页新实现

    1.聚合后分页应用场景 在常规检索的基础上,用户期望返回基于特定字段的聚合结果,且用户期望分页查看检索和聚合结果. 如下图所示:以2020东京奥运会热点新闻亚洲飞人"苏炳添"为例, ...

最新文章

  1. 2-计算机发展及应用
  2. ie11用不了html5,HTML5拖放不能在IE11上运行
  3. redis stream java消息队列_Redis 异步消息队列与延时队列
  4. 超级计算机子系统,大规模并行巨型机的并行I/O子系统
  5. java static final关键字_Java的static与final关键字
  6. php新闻列表排序,javascript 新闻列表排序简单封装
  7. 【渝粤教育】国家开放大学2018年春季 0063-21T中国现当代文学专题 参考试题
  8. 香港手机号正则表达式
  9. multicast msdp 2
  10. 快速搜索Wox工具之Everything Service没有运行报错,解决办法!
  11. Linux提高CPU使用率并设置固定占比
  12. Box2D例子——打砖块
  13. 44道JavaScript送命题
  14. jsjquery获取url域名及参数的方法
  15. 图像处理--医疗图片的处理
  16. 【C语言】memcmp、memcpy、memmove、memset、memchr函数详解
  17. python学习笔记(七):运算符和流程控制
  18. postcss是什么?
  19. 电影票网上订购平台系统
  20. 【置顶帖】关于博主/关于博客/博客大事记

热门文章

  1. DOM——创建节点及节点属性与内部插入append()和appendTo()
  2. 51CTO学院四周年-成长之路
  3. (原)torch中微调某层参数
  4. Unity3D 人形血条制作小知识
  5. Android OpenGL ES(十)绘制三角形Triangle .
  6. 将jar文件转换成exe可执行文件[转]
  7. linux查看和修改当前系统时间
  8. cocos工程里面“”invalid ''cobl” in fuction xxx原理与解决方案
  9. CodeForces - 1445E Team-Building(可撤销并查集)
  10. HDOJ1536 S-nim