01.elasticsearch metric aggregation 查询

文章目录

1. 数据准备
2. metric aggregation分类
3.使用样例
- 1 . Avg Aggregation : 求query出来的结果的average 值
- 2 . Weighted Avg Aggregation: 带权重的average值，可以选取另一个字段的值作为权重值
- 3 . Max Aggregation: 求query出来的结果的max值
- 4 . Min Aggregation: 求query出来的结果的min值
- 5 . Sum Aggregation: 求query出来的结果的sum值
- 6 . Value Count Aggregation: query出来的结果的某个field的的值的个数，注意可能这个field的值是一个数组则这个一个doc可能就贡献了多个value count
- 7 . Cardinality Aggregation: 某个filed的value去重后的count, 可以理解为value_count做了去重
- 8 . Percentiles Aggregation: 百分比求值，对于大小的数值，求百分位，比如响应时间的99分位，95分位等对应的具体响应时间是多少
- 9 . Percentile Ranks Aggregation: 某个具体的响应时间在总体中所处的分位值
- 10. Stats Aggregation: 上面value_count,min,max,sum,avg的快捷方式
- 11. top_hit

elasticsearch的aggregate查询现在越来越丰富了，目前总共有4类。

metric aggregation: 主要是min,max,avg,sum,percetile 等单个统计指标的查询,同时，可以用作bucket agg的子聚合查询，但是本身不能包含子查询
bucket aggregation: 主要是类似group by的查询操作，而且可以含有子查询
matrix aggregation: 使用多个字段的值进行计算从而产生一个多维矩阵
pipline aggregation: 主要是能够在其他的aggregation进行一些附加的处理来增强数据

本篇就主要学习metric aggregation

1. 数据准备

演唱会的票信息
GET seats1028/_search

{
"play" : "Auntie Jo",   # 演唱会名称
"date" : "2018-11-6",  # 时间
"theatre" : "Skyline",  # 地点
"sold" : false,      # 这个票是否已经卖出
"actors" : [         # 演员"Jo Hangum","Jon Hittle","Rob Kettleman","Laura Conrad","Simon Hower","Nora Blue"],
"number" : 8,   #可以使用的人数（团体座位）
"datetime" : 1541497200000,
"price" : 8321,    # 票价
"tip" : 17.5,      # 优惠
"time" : "5:40PM"
}

总共有3w条这样的数据

2. metric aggregation分类

1 . Avg Aggregation : 求query出来的结果的average 值
2 . Weighted Avg Aggregation: 带权重的average值，可以选取另一个字段的值作为权重值
3 . Max Aggregation: 求query出来的结果的max值
4 . Min Aggregation: 求query出来的结果的min值
5 . Sum Aggregation: 求query出来的结果的sum值
6 . Value Count Aggregation: query出来的结果的某个field的的值的个数，注意可能这个field的值是一个数组则这个一个doc可能就贡献了多个value count
7 . Cardinality Aggregation: 某个filed的value去重后的count, 可以理解为value_count做了去重
8 . Percentiles Aggregation: 百分比求值，对于大小的数值，求百分位，比如响应时间的99分位，95分位等对应的具体响应时间是多少
9 . Percentile Ranks Aggregation: 某个具体的响应时间在总体中所处的分位值
10. Stats Aggregation: 上面value_count,min,max,sum,avg的快捷方式
11. Extended Stats Aggregation: 在stats的基础上增加了平方和、方差、标准差、平均值加/减两个标准差的区间
12. Top Hits Aggregation: 一般是嵌套查询，用在term查询当中，返回每个bucket的topN
13. Geo Bounds Aggregation: 地理位置的边界聚合
14. Geo Centroid Aggregation:
15. Scripted Metric Aggregation: 使用script的聚合
16. Median Absolute Deviation Aggregation: 平方差聚合

metric agg可以用作bucket agg的子聚合查询，但是metric agg 本身不能包含子查询

3.使用样例

1 . Avg Aggregation : 求query出来的结果的average 值

GET seats1028/_search
{"size": 0,"aggs": {"avg_price": {"avg": {"field": "price"}}}
}返回{"took" : 8,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"avg_price" : {"value" : 4995.498812220319}}
}

2 . Weighted Avg Aggregation: 带权重的average值，可以选取另一个字段的值作为权重值

GET seats1028/_search
{"size": 0,"aggs": {"avg_price": {"weighted_avg": {"value": {"field": "price"},"weight": {"field": "number"}}}}
}返回
{"took" : 14,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"avg_price" : {"value" : 4981.713482182667}}
}

3 . Max Aggregation: 求query出来的结果的max值

GET seats1028/_search
{"size": 0,"aggs": {"max_price": {"max": {"field": "price"}}}
}
返回{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"max_price" : {"value" : 9999.0}}
}

4 . Min Aggregation: 求query出来的结果的min值

GET seats1028/_search
{"size": 0,"aggs": {"min_price": {"min": {"field": "price"}}}
}返回{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"min_price" : {"value" : 0.0}}
}

5 . Sum Aggregation: 求query出来的结果的sum值

GET seats1028/_search
{"size": 0,"aggs": {"sum_price": {"sum": {"field": "price"}}}
}返回{"took" : 8,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"sum_price" : {"value" : 1.80847048E8}}
}

6 . Value Count Aggregation: query出来的结果的某个field的的值的个数，注意可能这个field的值是一个数组则这个一个doc可能就贡献了多个value count

GET seats1028/_search
{"size": 0,"aggs": {"count_price": {"value_count": {"field": "price"}}}
}返回
{"took" : 13,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"count_price" : {"value" : 36202}}
}

7 . Cardinality Aggregation: 某个filed的value去重后的count, 可以理解为value_count做了去重

GET seats1028/_search
{"size": 0,"aggs": {"unique_count_price": {"cardinality": {"field": "price"}}}
}返回
{"took" : 13,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"unique_count_price" : {"value" : 9591}}
}

8 . Percentiles Aggregation: 百分比求值，对于大小的数值，求百分位，比如响应时间的99分位，95分位等对应的具体响应时间是多少

GET seats1028/_search
{"size": 0,"aggs": {"percentile_price": {"percentiles": {"field": "price"}}}
}返回
{"took" : 56,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"percentile_price" : {"values" : {"1.0" : 100.22,"5.0" : 500.1716280451575,"25.0" : 2478.398741375389,"50.0" : 4990.070945393942,"75.0" : 7509.41570777247,"95.0" : 9487.07620155039,"99.0" : 9894.284278074867}}}
}

9 . Percentile Ranks Aggregation: 某个具体的响应时间在总体中所处的分位值

GET seats1028/_search
{"size": 0,"aggs": {"rank_price": {"percentile_ranks": {"field": "price","values": [1000,2500]}}}
}返回
{"took" : 15,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"rank_price" : {"values" : {"1000.0" : 9.948376390210644,"2500.0" : 25.172259605722964}}}
}

10. Stats Aggregation: 上面value_count,min,max,sum,avg的快捷方式

GET seats1028/_search
{"size": 0,"aggs": {"stats_price": {"stats": {"field": "price"}}}
}返回
{"took" : 9,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "gte"},"max_score" : null,"hits" : [ ]},"aggregations" : {"stats_price" : {"count" : 36202,"min" : 0.0,"max" : 9999.0,"avg" : 4995.498812220319,"sum" : 1.80847048E8}}
}

11. top_hit

这个一般用于嵌套的子查询，比如下面的查询按照row划分bucket，然后找出每个bucket中的price最高的两个


GET seats1105/_search
{"size": 1,"query": {"match_all": {}},"aggs": {"row_term": {"terms": {"field": "row","size": 10},"aggs": {"top_price": {"top_hits": {"size": 2,"sort": [{"price": {"order": "desc"}}]}}}}}
}

  "aggregations" : {"row_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : 2,"doc_count" : 5796,"top_price" : {"hits" : {"total" : {"value" : 5796,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "seats1105","_type" : "_doc","_id" : "1957","_score" : null,"_source" : {"price" : 9998},"sort" : [9998]},{"_index" : "seats1105","_type" : "_doc","_id" : "7105","_score" : null,"_source" : {"price" : 9997},"sort" : [9997]}]}}},{"key" : 3,"doc_count" : 5796,"top_price" : {"hits" : {"total" : {"value" : 5796,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "seats1105","_type" : "_doc","_id" : "3993","_score" : null,"_source" : {"price" : 9999},"sort" : [9999]},{"_index" : "seats1105","_type" : "_doc","_id" : "6656","_score" : null,"_source" : {"price" : 9999},"sort" : [9999]}]}}},{"key" : 1,"doc_count" : 5791,"top_price" : {"hits" : {"total" : {"value" : 5791,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "seats1105","_type" : "_doc","_id" : "4321","_score" : null,"_source" : {"price" : 9999},"sort" : [9999]},{"_index" : "seats1105","_type" : "_doc","_id" : "8380","_score" : null,"_source" : {"price" : 9999},"sort" : [9999]}]}}}]}