ElasticSearch系列 - SpringBoot整合ES：实现分页搜索 from+size、search after、scroll

文章目录

01. 数据准备
02. ElasticSearch 如何查询所有文档？
03. ElasticSearch 如何指定搜索结果的条数？
04. ElasticSearch 分页查询方式有哪些？
05. ElasticSearch 如何实现 from+size 分页查询？
06. ElasticSearch 如何实现 searchAfter 分页查询？
07. ElasticSearch 如何实现 scroll 分页查询？
08. ElasticSearch 深分页是什么？
09. ElasticSearch 分页查询的最大限制是多少？
10. ElasticSearch 如何解除分页查询的限制？
11. ElasticSearch 查询文档总命中数最大限制为多少？
12. ElasticSearch 如何解除查询文档总命中数的限制？
13. ElasticSearch 分页查询的性能优化有哪些？
14. SpringBoo整合ES实现：from+size 分页查询？
15. SpringBoo整合ES实现：searchAfetr 分页查询？
16. SpringBoo整合ES实现：scroll 分页查询?

01. 数据准备

ElasticSearch 向 my_index 索引中索引了 12 条文档：

PUT /my_index/_doc/1
{"title": "文雅酒店","content": "青岛","price": 556
}PUT /my_index/_doc/2
{"title": "金都嘉怡假日酒店","content": "北京","price": 337
}PUT /my_index/_doc/3
{"title": "金都欣欣酒店","content": "天津","price": 200
}PUT /my_index/_doc/4
{"title": "金都酒店","content": "上海","price": 300
}PUT /my_index/_doc/5
{"title": "自如酒店","content": "南京","price": 400
}PUT /my_index/_doc/6
{"title": "如家酒店","content": "杭州","price": 500
}PUT /my_index/_doc/7
{"title": "非常酒店","content": "合肥","price": 600
}PUT /my_index/_doc/8
{"title": "金都酒店","content": "淮北","price": 700
}PUT /my_index/_doc/9
{"title": "金都酒店","content": "淮南","price": 900
}PUT /my_index/_doc/10
{"title": "丽舍酒店","content": "阜阳","price": 1000
}PUT /my_index/_doc/11
{"title": "文轩酒店","content": "蚌埠","price": 1020
}PUT /my_index/_doc/12
{"title": "大理酒店","content": "长沙","price": 1100
}

02. ElasticSearch 如何查询所有文档？

ElasticSearch 查询所有文档

GET /my_index/_search

根据查询结果可以看出，集群中总共有12个文档，hits.total.value=12，但是在 hits 数组中只有 10 个文档。如何才能看到其他的文档？

{"took" : 688,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "2","_score" : 1.0,"_source" : {"title" : "金都嘉怡假日酒店","content" : "北京","price" : 337}},{"_index" : "my_index","_type" : "_doc","_id" : "3","_score" : 1.0,"_source" : {"title" : "金都欣欣酒店","content" : "天津","price" : 200}},{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 1.0,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 1.0,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 1.0,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 1.0,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 1.0,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 1.0,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 1.0,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000}}]}
}

03. ElasticSearch 如何指定搜索结果的条数？

Elasticsearch 接受 from 和 size 参数：

from：显示应该跳过的初始结果数量，默认是0
size：显示应该返回的结果数量，默认是10

from 和 size 参数的默认值分别为 0 和 10，因此如果不指定这两个参数，将返回前 10 条记录，这也是为什么集群中总共有12个文档，hits.total.value=12，但是在 hits 数组中只有 10 个文档的原因。

如果我们想返回更多的结果数量，可以通过size参数来指定：

GET /my_index/_search
{"size": 15
}

集群中总共有12条文档。size=15 会把集群中所有的文档返回：

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "2","_score" : 1.0,"_source" : {"title" : "金都嘉怡假日酒店","content" : "北京","price" : 337}},{"_index" : "my_index","_type" : "_doc","_id" : "3","_score" : 1.0,"_source" : {"title" : "金都欣欣酒店","content" : "天津","price" : 200}},{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 1.0,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 1.0,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 1.0,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 1.0,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 1.0,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 1.0,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 1.0,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000}},{"_index" : "my_index","_type" : "_doc","_id" : "11","_score" : 1.0,"_source" : {"title" : "文轩酒店","content" : "蚌埠","price" : 1020}},{"_index" : "my_index","_type" : "_doc","_id" : "12","_score" : 1.0,"_source" : {"title" : "大理酒店","content" : "长沙","price" : 1100}}]}
}

04. ElasticSearch 分页查询方式有哪些？

使用 from 和 size 参数来实现分页查询。
使用 scroll 查询来实现分页查询。
使用搜索后再次查询的方式来实现分页查询。

05. ElasticSearch 如何实现 from+size 分页查询？

在 ElasticSearch 中，可以使用 from 和 size 参数来进行分页搜索。 from 和 size 参数用来指定从哪个文档开始，返回多少个文档。具体命令如下：

GET /my_index/_search
{"query": {"match": {"title": "酒店"}}, "from": 0, // 从第 1 条数据开始"size": 3  // 返回 3 条数据
}

结果如下，总共有12条数据，从第1条数据开始，返回3条数据：

{"took" : 19,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 0.075949445,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 0.075949445,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 0.075949445,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 0.075949445,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}}]}
}

在上面的命令中，我们使用 from 参数指定从哪个文档开始，使用 size 参数指定返回多少个文档。例如，当 from=0 且 size=10 时，返回的是第 1 到第 10 条数据。当 from=10 且 size=10 时，返回的是第 11 到第 20 条数据。

06. ElasticSearch 如何实现 searchAfter 分页查询？

Search After API 可以用于在 Elasticsearch 中处理大量数据。它允许您在不影响性能的情况下检索大量数据。使用 Search After API，您可以在多个请求之间保持查询上下文，并在每个请求中返回一定数量的结果。这样，您就可以逐步处理大量数据，而不必一次性将所有数据加载到内存中。

Search After API 从指定的某个数据后面开始读。这种方式不能随机跳转分页，只能一页一页地读取数据，而且必须用一个唯一且不重复的属性对查询数据进行排序。

POST /my_index/_search
{"size": 3,"query": {"match": {"title": "酒店"}},"sort": [{"price": "asc"}],"track_total_hits": true
}

以上代码表示从 my_index 索引中查询 title 包含酒店的数据，每次返回 3 条数据，并按照 price 字段升序排序。查询结果中会返回一个 sort 值，用于在后续请求中使用。同时，设置 track_total_hits 参数为 true，表示计算总命中数。

查询文档的总命中数 hits.total.value 为12，返回3条数据：

{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "3","_score" : null,"_source" : {"title" : "金都欣欣酒店","content" : "天津","price" : 200},"sort" : [200]},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : null,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300},"sort" : [300]},{"_index" : "my_index","_type" : "_doc","_id" : "2","_score" : null,"_source" : {"title" : "金都嘉怡假日酒店","content" : "北京","price" : 337},"sort" : [337]}]}
}

接下来，可以使用 sort 值来获取下一页数据：

POST /my_index/_search
{"size": 1000,"query": {"match": {"title": "酒店"}},"sort": [{"price": "asc"}],"search_after": [337]
}

{"took" : 4,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : null,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400},"sort" : [400]},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : null,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500},"sort" : [500]},{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : null,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556},"sort" : [556]}]}
}

07. ElasticSearch 如何实现 scroll 分页查询？

Scroll API 可以用于在 Elasticsearch 中处理大量数据。它允许您在不影响性能的情况下检索大量数据。使用 Scroll API，您可以在多个请求之间保持查询上下文，并在每个请求中返回一定数量的结果。这样，您就可以逐步处理大量数据，而不必一次性将所有数据加载到内存中。

第一个查询会在内存中保存一个历史快照和光标（scroll_id）来记录当前消息查询的终止位置。下次查询会从光标记录的位置往后进行查询。这种方式性能好，一般用于海量数据导出或者重建索引。但是 scroll_id 有过期时间，两次查询之间如果 scroll_id 过期了，第二次查询会抛异常“找不到 “scroll_id”。

启用游标查询可以通过在查询的时候设置参数 scroll 的值为我们期望的游标查询的过期时间。游标查询的过期时间会在每次做查询的时候刷新，所以这个时间只需要足够处理当前批的结果就可以了，而不是处理查询结果的所有文档的所需时间。这个过期时间的参数很重要，因为保持这个游标查询窗口需要消耗资源，所以我们期望如果不再需要维护这种资源就该早点儿释放掉。设置这个超时能够让 Elasticsearch 在稍后空闲的时候自动释放这部分资源。

① 执行初始查询，获取scroll_id，其中，scroll参数指定了scroll查询的有效时间，这里设置为1分钟，size 表示每次返回7条数据。

POST /my_index/_search?scroll=1m
{"size": 7,"query": {"match": {"title": "酒店"}}
}

执行上述查询后，查询结果中会返回一个 scroll_id，用于在后续请求中使用，类似于以下内容：

{"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==","took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 0.06382885,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 0.06382885,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 0.06382885,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 0.06382885,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 0.06382885,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}}]}
}

② 使用scroll_id获取下一页数据：

POST /_search/scroll
{"scroll": "1m","scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ=="
}

执行上述查询后，会返回下一页数据和一个新的scroll_id：

{"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==","took" : 4,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 0.06382885,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 0.06382885,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000,"uploadTime" : 1678073241}},{"_index" : "my_index","_type" : "_doc","_id" : "11","_score" : 0.06382885,"_source" : {"title" : "文轩酒店","content" : "蚌埠","price" : 1020}},{"_index" : "my_index","_type" : "_doc","_id" : "12","_score" : 0.06382885,"_source" : {"title" : "大理酒店","content" : "长沙","price" : 1100}},{"_index" : "my_index","_type" : "_doc","_id" : "3","_score" : 0.05390298,"_source" : {"title" : "金都欣欣酒店","content" : "天津","price" : 200}},{"_index" : "my_index","_type" : "_doc","_id" : "2","_score" : 0.046648744,"_source" : {"title" : "金都嘉怡假日酒店","content" : "北京","price" : 337}}]}
}

③ 重复步骤②，直到所有数据都被检索完毕

POST /_search/scroll
{"scroll": "1m","scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ=="
}

{"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==","took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 0.06382885,"hits" : [ ]}
}

④ 当所有数据都被检索完毕后，需要使用clear_scroll API来清除scroll_id。

DELETE /_search/scroll
{"scroll_id": ["DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ==","DXF1ZXJ5QW5kRmV0Y2gBAAAAAAACQVUWZFFwRElpblJROU9lZV9LeXI5MUpPQQ=="]
}

注意，scroll查询会占用Elasticsearch的资源，因此在使用时需要注意性能问题。同时，scroll查询也不适用于实时数据的查询，因为scroll查询只能查询到在scroll查询开始时已经存在的数据。

08. ElasticSearch 深分页是什么？

ElasticSearch 深分页是指在搜索结果中，需要跳过大量的文档才能到达目标文档的情况。这种情况通常发生在需要访问大量文档的搜索结果中，例如搜索结果有数百万个文档，但只需要访问其中的前几个文档。这个查询的实现原理类似于mysql中的limit。比如查询10001条数据，需要把前10000条取出来过滤，最后得到数据。

在 ElasticSearch 中，深分页可能会导致性能问题，因为每次跳过大量文档时，ElasticSearch 都需要执行一次查询，并且需要将查询结果中的所有文档加载到内存中，这会占用大量的 CPU 和内存资源。

为了避免这种情况，可以使用 ElasticSearch 的 Scroll API 或 Search After API 来进行分页查询。这些 API 可以在不加载所有文档的情况下，快速地获取搜索结果中的指定文档。

09. ElasticSearch 分页查询的最大限制是多少？

当查询页很深或者查询的数据量很大时，就会发生深分页。ElasticSearch 分页查询的最大限制是 10000 条数据，当查询条数超过10000时，会报错。

GET /my_index/_search
{"query": {"match": {"title": "酒店"}}, "from": 0,"size": 10001
}

查询结果会报错：Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.

也就是说我们最多只能分页查询10000条数据。

10. ElasticSearch 如何解除分页查询的限制？

max_result_window 属性控制从Elasticsearch中检索文档的最大数量，默认情况下，它的值为10000。可以通过修改 index.max_result_window 参数来增加搜索结果的最大数量。如果您需要检索更多的文档，请增加max_result_window的值。但是，需要注意的是，增加max_result_window的值可能会影响Elasticsearch的性能。

第一种办法：在kibana中执行，解除索引最大查询数的限制

PUT /my_index/_settings
{"index.max_result_window":200000
}

第二种办法：在创建索引的时候加上

PUT /my_index
{"settings": {"index": {"max_result_window": 10000}}
}

11. ElasticSearch 查询文档总命中数最大限制为多少？

ElasticSearch中可以根据搜索结果中的 hits.total.value 值获取查询文档的总命中数，但最大返回条数是有限制的，默认情况下最大为 10000 条。数据量不大的情况下这个数值没问题。但是当数据超出 10000 的时候，这个 hits.total.value 将不会增长了，固定为 10000，这个时候的匹配文档数量统计就不准了。

如集群中总共有30000条文档，查询所有时 hits.total.value 的值却为10000：

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 10000,"relation" : "eq"},"max_score" : null,"hits" : [// ...]}
}

12. ElasticSearch 如何解除查询文档总命中数的限制？

Elasticsearch 的 track_total_hits 参数用于控制查询时是否计算总命中数，如果想要统计准确的匹配文档数，需要使用参数 track_total_hits 来开启精确匹配。默认情况下会计算前10000条数据的总命中数，如果想解除这个限制，需要将track_total_hits 参数设置为true。

track_total_hits 参数有三种取值：

true：计算总命中数。
false：不计算总命中数。
数字：只计算前 n 条数据的总命中数。

① 计算总命中数：

GET /my_index/_search
{"query": {"match": {"title": "酒店"}},"track_total_hits": true
}

查询文档的总命中数 hits.total.value 值为12，文档列表 hits.hits 中10条文档（from=0，size=10）

{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 0.06382885,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 0.06382885,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 0.06382885,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 0.06382885,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 0.06382885,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}},{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 0.06382885,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000,"uploadTime" : 1678073241}},{"_index" : "my_index","_type" : "_doc","_id" : "11","_score" : 0.06382885,"_source" : {"title" : "文轩酒店","content" : "蚌埠","price" : 1020}},{"_index" : "my_index","_type" : "_doc","_id" : "12","_score" : 0.06382885,"_source" : {"title" : "大理酒店","content" : "长沙","price" : 1100}}]}
}

② 不计算总命中数：

GET /my_index/_search
{"query": {"match": {"title": "酒店"}},"track_total_hits": false
}

查询结果中不返回总命中数 hits.total.value ，文档列表 hits.hits 中10条文档（from=0，size=10）

{"took" : 8,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"max_score" : 0.06382885,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 0.06382885,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 0.06382885,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 0.06382885,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 0.06382885,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}},{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 0.06382885,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000,"uploadTime" : 1678073241}},{"_index" : "my_index","_type" : "_doc","_id" : "11","_score" : 0.06382885,"_source" : {"title" : "文轩酒店","content" : "蚌埠","price" : 1020}},{"_index" : "my_index","_type" : "_doc","_id" : "12","_score" : 0.06382885,"_source" : {"title" : "大理酒店","content" : "长沙","price" : 1100}}]}
}

③ 只计算前5条数据的总命中数：

GET /my_index/_search
{"query": {"match": {"title": "酒店"}},"track_total_hits": 5
}

前5条数据的总命中数 hits.total.value 值为5，文档列表 hits.hits 中10条文档（from=0，size=10）

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 5,"relation" : "gte"},"max_score" : 0.06382885,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 0.06382885,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 0.06382885,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 0.06382885,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 0.06382885,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}},{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 0.06382885,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000,"uploadTime" : 1678073241}},{"_index" : "my_index","_type" : "_doc","_id" : "11","_score" : 0.06382885,"_source" : {"title" : "文轩酒店","content" : "蚌埠","price" : 1020}},{"_index" : "my_index","_type" : "_doc","_id" : "12","_score" : 0.06382885,"_source" : {"title" : "大理酒店","content" : "长沙","price" : 1100}}]}
}

④ 计算前15条文档的总命中数：

GET /my_index/_search
{"query": {"match": {"title": "酒店"}},"track_total_hits": 15
}

前15条数据的总命中数 hits.total.value 值为12，文档列表 hits.hits 中10条文档（from=0，size=10）

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 12,"relation" : "eq"},"max_score" : 0.06382885,"hits" : [{"_index" : "my_index","_type" : "_doc","_id" : "1","_score" : 0.06382885,"_source" : {"title" : "文雅酒店","content" : "青岛","price" : 556}},{"_index" : "my_index","_type" : "_doc","_id" : "4","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "上海","price" : 300}},{"_index" : "my_index","_type" : "_doc","_id" : "5","_score" : 0.06382885,"_source" : {"title" : "自如酒店","content" : "南京","price" : 400}},{"_index" : "my_index","_type" : "_doc","_id" : "6","_score" : 0.06382885,"_source" : {"title" : "如家酒店","content" : "杭州","price" : 500}},{"_index" : "my_index","_type" : "_doc","_id" : "7","_score" : 0.06382885,"_source" : {"title" : "非常酒店","content" : "合肥","price" : 600}},{"_index" : "my_index","_type" : "_doc","_id" : "9","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮南","price" : 900}},{"_index" : "my_index","_type" : "_doc","_id" : "8","_score" : 0.06382885,"_source" : {"title" : "金都酒店","content" : "淮北","price" : 700}},{"_index" : "my_index","_type" : "_doc","_id" : "10","_score" : 0.06382885,"_source" : {"title" : "丽舍酒店","content" : "阜阳","price" : 1000,"uploadTime" : 1678073241}},{"_index" : "my_index","_type" : "_doc","_id" : "11","_score" : 0.06382885,"_source" : {"title" : "文轩酒店","content" : "蚌埠","price" : 1020}},{"_index" : "my_index","_type" : "_doc","_id" : "12","_score" : 0.06382885,"_source" : {"title" : "大理酒店","content" : "长沙","price" : 1100}}]}
}

13. ElasticSearch 分页查询的性能优化有哪些？

尽量减少查询的字段，只查询需要的字段。
尽量减少查询的数据量，只查询需要的数据。
使用 scroll 查询或者搜索后再次查询的方式来避免过多的分页查询。
使用索引优化技术，如分片、副本等来提高查询性能。

14. SpringBoo整合ES实现：from+size 分页查询？

GET /my_index/_search
{"query": {"match": {"title": "酒店"}}, "from": 0, // 从第 1 条数据开始"size": 3  // 返回 3 条数据
}

@Slf4j
@Service
public class ElasticSearchImpl {@Autowiredprivate RestHighLevelClient restHighLevelClient;public void searchUser() throws IOException {SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// query 查询MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("title","酒店");searchSourceBuilder.query(matchQueryBuilder);// 分页查询int page = 1; // 第1页int pageSize = 3; // 每页返回3条数据searchSourceBuilder.from((page-1)*pageSize);searchSourceBuilder.size(pageSize);SearchRequest searchRequest = new SearchRequest(new String[]{"my_index"},searchSourceBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);// 搜索结果SearchHits searchHits = searchResponse.getHits();SearchHit[] hits = searchHits.getHits();for (SearchHit hit : hits) {// hits.hits._source：匹配的文档的原始数据String sourceAsString = hit.getSourceAsString();}System.out.println(searchResponse);}
}

15. SpringBoo整合ES实现：searchAfetr 分页查询？

POST /my_index/_search
{"size": 3,"query": {"match": {"title": "酒店"}},"sort": [{"price": "asc"}],"track_total_hits": true
}

POST /my_index/_search
{"size": 1000,"query": {"match": {"title": "酒店"}},"sort": [{"price": "asc"}],"search_after": [337]
}

@Slf4j
@Service
public class ElasticSearchImpl {@Autowiredprivate RestHighLevelClient restHighLevelClient;public void searchUser() throws IOException {SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// query 查询MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("title","酒店");searchSourceBuilder.query(matchQueryBuilder);// 计算总命中数：track_total_hitssearchSourceBuilder.trackTotalHits(true);// 每次返回3条数据searchSourceBuilder.size(3);// 设置排序字段searchSourceBuilder.sort(SortBuilders.fieldSort("price").order(SortOrder.ASC));SearchRequest searchRequest = new SearchRequest(new String[]{"my_index"},searchSourceBuilder);SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);List<Map<String, Object>> result = new ArrayList<>();while (searchResponse.getHits().getHits()!=null && searchResponse.getHits().getHits().length>0){SearchHit[] hits = searchResponse.getHits().getHits();for (SearchHit hit : hits) {Map<String, Object> sourceAsMap = hit.getSourceAsMap();result.add(sourceAsMap);}// 取得最后一条数据的排序值sort，下次查询时将从这个地方开始取数Object[] lastNum = hits[hits.length - 1].getSortValues();searchSourceBuilder.searchAfter(lastNum);searchRequest.source(searchSourceBuilder);// 做下次查询searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);}System.out.println(result);}
}

16. SpringBoo整合ES实现：scroll 分页查询?

@Slf4j
@Service
public class ElasticSearchImpl {@Autowiredprivate RestHighLevelClient restHighLevelClient;public void search() throws IOException {SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();// query 查询MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("title","酒店");searchSourceBuilder.query(matchQueryBuilder);// 计算总命中数：track_total_hitssearchSourceBuilder.trackTotalHits(true);// 每次返回7条数据searchSourceBuilder.size(7);// 设置排序字段searchSourceBuilder.sort(SortBuilders.fieldSort("price").order(SortOrder.ASC));SearchRequest searchRequest = new SearchRequest(new String[]{"my_index"},searchSourceBuilder);// 指定游标的过期时间searchRequest.scroll(TimeValue.timeValueMinutes(1L));SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);// 获取 scrollIdString scrollId = searchResponse.getScrollId();SearchHit[] searchHits = searchResponse.getHits().getHits();List<Map<String, Object>> result = new ArrayList<>();for (SearchHit hit: searchHits) {result.add(hit.getSourceAsMap());}while (true) {// 根据 scrollId 查询下一页数据SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);// 指定游标的过期时间scrollRequest.scroll(TimeValue.timeValueMinutes(1L));SearchResponse scrollResp = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);SearchHit[] hits = scrollResp.getHits().getHits();if (hits != null && hits.length > 0) {for (SearchHit hit : hits) {result.add(hit.getSourceAsMap());}} else {break;}}System.out.println(result);// After checking, we delete the id stored in the cache. After scrolling, clear the scrolling contextClearScrollRequest clearScrollRequest = new ClearScrollRequest();clearScrollRequest.addScrollId(scrollId);ClearScrollResponse clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest, RequestOptions.DEFAULT);boolean succeeded = clearScrollResponse.isSucceeded();System.out.println(succeeded);restHighLevelClient.close();}
}