如果你搜索不经常更改的文档,则使用标准查询的分页效果非常好; 否则,使用实时数据执行分页会返回不可预测的结果。 为了绕过这个问题,Elasticsearch 在查询中提供了一个额外的参数:scroll。如果你对搜索结果分页不是很熟悉的话,请参考我之前的文章 “Elasticsearch:运用 scroll 接口对大量数据实现更好的分页”。

准备数据

在今天的练习中,为了说明问题的方便,我们使用如下的数据来进行练习:

POST _bulk
{ "index" : { "_index" : "twitter", "_id": 1} }
{"user":"双榆树-张三","message":"今儿天气不错啊,出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}}
{ "index" : { "_index" : "twitter", "_id": 2 }}
{"user":"东城区-老刘","message":"出发,下一站云南!","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}}
{ "index" : { "_index" : "twitter", "_id": 3} }
{"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}}
{ "index" : { "_index" : "twitter", "_id": 4} }
{"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}}
{ "index" : { "_index" : "twitter", "_id": 5} }
{"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}}
{ "index" : { "_index" : "twitter", "_id": 6} }
{"user":"虹桥-老吴","message":"好友来了都今天我生日,好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

在上面,我们写入6个文档到 Elasticsearch 中。在练习中,我将设置每页的文档数为 2。我们可以进行如下的搜索:

GET twitter/_search
{"query": {"bool": {"must": [{"match": {"city": "北京"}}],"filter": [{"range": {"age": {"gte": 0,"lte": 100}}}]}},"size": 2
}

上面的搜索显示搜索结果中的前两个:

{"took": 0,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": 0.48232412,"hits": [{"_index": "twitter","_id": "1","_score": 0.48232412,"_source": {"user": "双榆树-张三","message": "今儿天气不错啊,出去转转去","uid": 2,"age": 20,"city": "北京","province": "北京","country": "中国","address": "中国北京市海淀区"}},{"_index": "twitter","_id": "2","_score": 0.48232412,"_source": {"user": "东城区-老刘","message": "出发,下一站云南!","uid": 3,"age": 30,"city": "北京","province": "北京","country": "中国","address": "中国北京市东城区台基厂三条3号"}}]}
}

从上面的显示结果中,我们可以看出来,它共有5个文档是满足搜索的条件的。按照每页 2 个文档,我们共有 3 页。那么我们该如何对搜索结果进行分页呢?我们可以使用 scroll 参数:

GET twitter/_search?scroll=2m
{"query": {"bool": {"must": [{"match": {"city": "北京"}}],"filter": [{"range": {"age": {"gte": 0,"lte": 100}}}]}},"size": 2
}

在上面,2m 代表2分钟之内有效。它返回的结果为:

{"_scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFi1rOUlBMFdGU2tLSS0yTlMyUkdRdUEAAAAAAAFeHBZReU4zSnhXVlR5eW5WQW5Yb09RSHNR","took": 0,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": 0.48232412,"hits": [{"_index": "twitter","_id": "1","_score": 0.48232412,"_source": {"user": "双榆树-张三","message": "今儿天气不错啊,出去转转去","uid": 2,"age": 20,"city": "北京","province": "北京","country": "中国","address": "中国北京市海淀区"}},{"_index": "twitter","_id": "2","_score": 0.48232412,"_source": {"user": "东城区-老刘","message": "出发,下一站云南!","uid": 3,"age": 30,"city": "北京","province": "北京","country": "中国","address": "中国北京市东城区台基厂三条3号"}}]}
}

很显然,它返回了第一个页的两个结果,但是它同时返回了一个 _scroll_id。我们可以运用这个 _scroll_id 来返回第二页的搜索结果:

GET _search/scroll
{"scroll": "2m","scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFi1rOUlBMFdGU2tLSS0yTlMyUkdRdUEAAAAAAAFeHBZReU4zSnhXVlR5eW5WQW5Yb09RSHNR"
}

上面的返回结果为:

{"_scroll_id": "FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFi1rOUlBMFdGU2tLSS0yTlMyUkdRdUEAAAAAAAFeHBZReU4zSnhXVlR5eW5WQW5Yb09RSHNR","took": 1,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 5,"relation": "eq"},"max_score": 0.48232412,"hits": [{"_index": "twitter","_id": "3","_score": 0.48232412,"_source": {"user": "东城区-李四","message": "happy birthday!","uid": 4,"age": 30,"city": "北京","province": "北京","country": "中国","address": "中国北京市东城区"}},{"_index": "twitter","_id": "4","_score": 0.48232412,"_source": {"user": "朝阳区-老贾","message": "123,gogogo","uid": 5,"age": 35,"city": "北京","province": "北京","country": "中国","address": "中国北京市朝阳区建国门"}}]}
}

我们可以运用返回的 _scroll_id 再接着返回接下来的搜索结果,直到我们的 hits 里的数组里没有数据为止。

运用 Java client APIs 来实现分页

接下来,我们来设计 Java 应用来对搜索结果进行分页。为了方便大家对代码的理解,我把最终的项目上传到 github:https://github.com/liu-xiao-guo/elasticsearchjava-scroll

首先我们创建一个叫做 Twitter 的 class:

Twitter.java

public class Twitter {private String user;private long uid;private String province;private String message;private String country;private String city;private long age;private String address;public Twitter() {}public Twitter(String user, long uid, String province, String message,String country, String city, long age, String address) {this.user = user;this.uid = uid;this.province = province;this.message = message;this.country = country;this.city = city;this.age = age;this.address = address;}public String getUser() {return user;}public long getUid() {return uid;}public String getProvince() {return province;}public String getMessage() {return message;}public String getCountry() {return country;}public String getCity() {return city;}public long getAge() {return age;}public String getAddress() {return address;}public void setUser(String user) {this.user = user;}public void setUid(long uid) {this.uid = uid;}public void setProvince(String province) {this.province = province;}public void setMessage(String message) {this.message = message;}public void setCountry(String country) {this.country = country;}public void setCity(String city) {this.city = city;}public void setAge(long age) {this.age = age;}public void setAddress(String address) {this.address = address;}
}

这个和上面的 twitter 文档相对应。

我们接下来连接到 Elasticsearch 集群。我们可以参考之前的文章 “Elasticsearch:在 Java 客户端中使用 truststore 来创建 HTTPS 连接”。一旦连接到 Elasticsearch 后,我们可以设计如下的代码来对搜索的结果进行分页:

ElasticsearchJava.java

        final String INDEX_NAME = "twitter";SearchRequest searchRequest = new SearchRequest.Builder().index(INDEX_NAME).query( q -> q.bool(b -> b.must(must->must.match(m ->m.field("city").query("北京"))).filter(f -> f.range(r -> r.field("age").gte(JsonData.of(0)).lte(JsonData.of(100)))))).size(2).scroll(Time.of(t -> t.time("2m"))).build();SearchResponse<Twitter> response = client.search(searchRequest, Twitter.class);do {System.out.println("size: " + response.hits().hits().size());for (Hit<Twitter> hit : response.hits().hits()) {System.out.println("hit: " + hit.index() + ": " + hit.id());}final SearchResponse<Twitter> old_response = response;System.out.println("scrollId: " + old_response.scrollId());response = client.scroll(s -> s.scrollId(old_response.scrollId()).scroll(Time.of(t -> t.time("2m"))),Twitter.class);System.out.println("=================================");} while (response.hits().hits().size() != 0); // 0 hits mark the end of the scroll and the while loop.

我们运行上面的代码后,我们可以看到如下的搜索结果:

size: 2
hit: twitter: 1
hit: twitter: 2
scrollId: FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFi1rOUlBMFdGU2tLSS0yTlMyUkdRdUEAAAAAAAFAnxZReU4zSnhXVlR5eW5WQW5Yb09RSHNR
=================================
size: 2
hit: twitter: 3
hit: twitter: 4
scrollId: FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFi1rOUlBMFdGU2tLSS0yTlMyUkdRdUEAAAAAAAFAnxZReU4zSnhXVlR5eW5WQW5Yb09RSHNR
=================================
size: 1
hit: twitter: 5
scrollId: FGluY2x1ZGVfY29udGV4dF91dWlkDXF1ZXJ5QW5kRmV0Y2gBFi1rOUlBMFdGU2tLSS0yTlMyUkdRdUEAAAAAAAFAnxZReU4zSnhXVlR5eW5WQW5Yb09RSHNR
=================================

从上面的搜索结果中,我们可以看出来它有三个页。共有5个文档被搜索到了。

Elasticsearch:在 Java 客户端中使用 scroll 来遍历搜索结果 - Elastic Stack 8.x相关推荐

  1. 【Elasticsearch】java 客户端 获取 termvectors 词频 统计

    1.概述 2.获取单条index的词频 elasticsearch的termvectors包括了term的位置.词频等信息.这些信息用于相应的数据统计或开发其他功能,本文介绍termvecters如何 ...

  2. Elasticsearch与java客户端交互的二种使用

    一.原生Elasticsearch 1)导入依赖 <dependency><groupId>org.elasticsearch.client</groupId>&l ...

  3. java数组中常见操作1——遍历

    所谓遍历(Traversal),字面意思是遍历就是全部走遍,到处周游的意思. 数组遍历:就是依次输出数组中的每一个元素. 注意:数组提供了一个属性length,用于获取数组的长度. 格式:数组名.le ...

  4. 来聊一聊 ElasticSearch 最新版的 Java 客户端

    可能不少小伙伴都注意到了,从 ElasticSearch7.17 这个版本开始,原先的 Java 高级客户端 Java High Level REST Client 废弃了,不支持了.老实说,Elas ...

  5. 干货 | Elasticsearch Java 客户端演进历史和选型指南

    1.Elasticsearch java 客户端为什么要选型? Elasticsearch 官方提供了很多版本的 Java 客户端,包含但不限于: Transport 客户端 Java REST 客户 ...

  6. signalr for java_ASP.NET Core SignalR Java 客户端

    ASP.NET Core SignalR Java 客户端ASP.NET Core SignalR Java client 11/12/2019 本文内容 Java 客户端允许 SignalR 从 j ...

  7. 二叉树的中序非递归遍历

    二叉树的中序非递归遍历 中序遍历的非递归算法描述如下: 从根节点开始检索,如果当前节点不为空,则将当前节点入栈,让当前节点成为其左孩子节点,再继续上一步的操作 加入当前节点为空了,说明其父节点已经没有 ...

  8. 用于Elasticsearch成绩单的Java客户端

    在本演讲中,我将介绍用于Elasticsearch和Spring Data Elasticsearch的三个不同的客户端. 首先,让我们看一下Elasticsearch的一些基础知识. 弹性搜索 为了 ...

  9. java全文检索工具_全文检索工具elasticsearch:第三章: Java程序中的应用

    搭建模块 创建二个项目 gmall-list-service的appliction.properties: server.port=8073 spring.datasource.url=jdbc:my ...

最新文章

  1. 死前真的会有「跑马灯」,人类首次同步测量大脑濒死状态
  2. Linux权限和进程管理、网络配置、任务调度(四)
  3. 应用服务器与数据库之间是长连接,要接收多个 tcp 长连接不断发送的数据并存储,哪些数据库或数据存储方案比较合适?...
  4. jvm内存模型_四种视角看JVM内存模型
  5. Buttons——CSS按钮样式库
  6. 理解题意优于一切(记洛谷P1426题WA的经历,Java语言描述)
  7. unc 隐藏共享文件夹_(原创)UNC路径的访问条件
  8. 汇编里的IMPORT和EXPORT
  9. linux中ps ef和aux,Linux中ps aux、ps -aux、ps -ef之间的区别讲解
  10. 蓝桥杯入门训练Fibonacci数列
  11. python组合数据类型包括_第六周 python组合数据类型
  12. 手机上怎么去掉a 标签中的img点击时的阴影?
  13. 【图像处理】基于matlab Hough变换人眼虹膜定位【含Matlab源码 387期】
  14. Tableau安装与破解
  15. Horizon client 连接桌面后显示:USB已禁用/USB重定向功能已禁用
  16. python分组求和法_awk分组求和分组统计次数
  17. TCP/IP常见英文缩写
  18. 对讲机有哪些?如何选购对讲机?
  19. QuerySet,本质上是一个给定的模型的对象列表
  20. 如何将heic转成jpg或png格式?

热门文章

  1. idea中字体大小以及注释的颜色设置
  2. python画图代码大全-纯干货:手把手教你用Python做数据可视化(附代码)
  3. QML之Canvas实现标尺(刻度尺)方案
  4. 分子动力学模拟gro格式转换为 car
  5. 华硕B460m plus+Intel 10400+AMD 5500xt黑苹果EFI引导文件
  6. oracle怎么查临时表,Oracle查询问题引发临时表使用
  7. 力扣772 基本计算器III
  8. 前程无忧爬虫,仅供学习使用
  9. su vs sudo的区别
  10. 联想微型计算机设置从u盘启动,联想bios设置u盘启动教程