SpringBoot 检索篇 - 整合 Elasticsearch7.6.2

前言：

我们的应用经常需要添加检索功能，更或者是大量日志检索分析等，SpringBoot 通过整合 SpringData Elasticdearch 为我们提供了非常便捷的检索功能支持。

Elasticsearch是一个分布式搜索服务，提供Restful API，底层基于Lucene，采用多Shard的方式保证数据安全，并且提供自动Resharding的功能，GitHub等大型的站点也是采用了 Elasticsearch 作为其搜索服务。

Elasticsearch - 参考文档

Elasticsearch Java REST Client - 参考文档

Spring Data Elasticsearch - 参考文档

搭配项目仓库 Web IDE 观看体验更佳

特别鸣谢：遇见狂神说

一、概述

1.1 与关系型数据库的客观对比

Elasticsearch 是面向文档的，使用 JSON 作为文档的序列化格式。

Elasticsearch（集群）中可以包含多个索引（数据库），每个索引中可以包含多个类型（表），每个类型下又包含多个文档（行），每个文档中又包含多个字段（列）。

与关系型数据库的客观对比如下：

Relational DB	Elasticsearch
数据库（database）	索引（indices）
表（tables）	类型（types）（将被弃用）
行（row）	文档（documents）
列（columns）	字段（fields）

1.2 物理设计

Elasticsearch 在后台把每个索引划分为多个分片，每个分片可以在集群中的不同服务器间迁移。

一个运行中的 Elasticsearch 实例称为一个节点，而集群是由一个或者多个拥有相同 cluster.name 配置的节点组成，它们共同承担数据和负载的压力。

1.3 逻辑设计

一个索引类型中，包含多个文档，比如说文档1、文档2。当索引一篇文档时，可以通过这样的一个顺序找到它：

索引》类型》文档id

通过这个组合就能索引到某个具体的文档。（注意id不必是整数，实际上它是个字符串）

文档

在 Elasticsearch 中，文档是索引和搜索数据的最小单位。

文档有几个重要属性：

• 自我包含：一个文档同时包含字段和对应的值，也就是同时包含 key:value 。
• 层次性：一个文档中包含自文档。
• 结构灵活：文档不依赖预先定义的模式。

尽管可以随意新增或忽略某个字段，但是每个字段的类型非常重要。
类型

类型是文档的逻辑容器，就像关系型数据库一样，表格是行的容器。

类型中对于字段的定义称为映射。
索引

索引是映射类型的容器，Elasticsearch 中的索引是一个非常大的文档集合。

索引存储了映射类型的字段和其它设置，然后它们被存储到了各个分片上。

1.4 工作原理

一个集群至少有一个节点，而一个节点就是一个 Elasticsearch 进程，节点可以有多个默认索引，如果创建索引，那么索引将会有5个分片（primary shard 又称主分片）构成的，每一个主分片会有一个副本（replica shard 又称复制分片）。

上图是一个有3个节点的集群，主分片与对应的复制分片都不回在同一个节点内，这样有利于如果某个节点宕机，数据也不至于丢失。

实际上，一个分片就是一个 Lucene 索引，一个包含倒排索引的文件目录，倒排索引的结构使得 Elasticsearch 在不扫描全部文档的情况下，就能检索文档包含的特定关键字。

1.5 倒排索引

Elasticsearch 使用的是一种称为倒排索引的结构，采用 Lucene 倒排索引作为底层。

这种结构适用于快速的全文搜索，一个索引由文档中所有不重复的列表构成，对于每一个词，都有一个包含它的文档列表。

例如，现在有两个文档，每个文档包含如下内容：

# 文档1包含的内容
Study every day, good good up to forever# 文档2包含的内容
To forever, study every day, good good up

为了创建倒排索引，首先要将每个文档拆分成独立的词（或称为词条或者tokens），然后创建一个包含所有不重复的词条的排序列表，然后列出每个词条出现在哪个文档。

term	doc_1	doc_2
Study	✓	✗
To	✗	✗
every	✓	✓
forever	✓	✓
day	✓	✓
study	✗	✓
good	✓	✓
every	✓	✓
to	✓	✗
up	✓	✓

如果搜索 to forever，只需查看包含每个词条的文档。

term	doc_1	doc_2
to	✓	✗
forever	✓	✓
total	2	1

两个文档都匹配，但是第一个文档比第二个文档的匹配程度更高。

如果没有别的条件，这两个包含关键字的文档都将返回。

二、部署&测试

2.1 部署 Elasticsearch

拉取镜像
```
docker pull elasticsearch
```

创建容器

其中9200是http访问端口，9300是tcp访问端口。

docker run -e "discovery.type=single-node" -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -d -p 9200:9200 -p 9300:9300 --name es elasticsearch:7.6.2

启动异常：

ERROR: [1] bootstrap checks failed[1]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least

解决：

查看max_map_count：

cat /proc/sys/vm/max_map_count
65530

设置max_map_count：

sysctl -w vm.max_map_count=262144

测试

访问 http://Server-IP:9200 出现以下页面

2.2 部署可视化工具 Elasticsearch-head

拉取镜像
```
docker pull mobz/elasticsearch-head:5
```

创建容器

docker run -d -p 9100:9100 --name head mobz/elasticsearch-head:5

解决跨域请求问题

进入 Elasticsearch 容器，修改配置文件elasticsearch.yml
行末添加以下字段：
```
http.cors.enabled: true
http.cors.allow-origin: "*"
```
重启服务

在查看或操作索引数据时，可能还报如下错误：

{“error”:“Content-Type header [application/x-www-form-urlencoded] is not supported”,“status”:406}

解决方法:

• 进入head 容器

• 安装 vim

配置国内镜像源：

mv /etc/apt/sources.list /etc/apt/sources.list.bakecho "deb http://mirrors.163.com/debian/ jessie main non-free contrib" >> /etc/apt/sources.listecho "deb http://mirrors.163.com/debian/ jessie-proposed-updates main non-free contrib" >>/etc/apt/sources.listecho "deb-src http://mirrors.163.com/debian/ jessie main non-free contrib" >>/etc/apt/sources.listecho "deb-src http://mirrors.163.com/debian/ jessie-proposed-updates main non-free contrib" >>/etc/apt/sources.list

更新安装源

apt-get update

安装 vim

apt-get install vim

• 进入_site目录，修改vendor.js文件

 ① 6886行 contentType: "application/x-www-form-urlencoded"改成：contentType: "application/json;charset=UTF-8"② 7573行 var inspectData = s.contentType === "application/x-www-form-urlencoded" &&改成：var inspectData = s.contentType === "application/json;charset=UTF-8" &&

测试

访问 http://Server-IP:9200 出现以下页面

2.3 部署可视化工具 Kibana

拉取镜像
```
docker pull kibana:7.6.2
```

创建容器

docker run -d -e ELASTICSEARCH_URL=http://39.105.80.221:9200 -p 5601:5601 --name kibana kibana:7.6.2

修改访问地址&汉化

进入容器

修改访问地址：编辑 kibana.yml 将 elasticsearch.hosts 修改为 Elasticsearch 服务地址

汉化：编辑 kibana.yml 行末添加 i18n.locale: “zh-CN”
测试

访问 http://Server-IP:5601 出现以下页面

2.4 安装 IK 分词器

什么是 IK 分词器？

分词：即把一段中文或者英文或分成一个个的关键字，我们在搜索的时候会把输入的信息进行分词，会把数据库或者索引库中的数据进行分词，然后进行一个匹配操作，默认的中文分词是将每一个字看成一个词，但这是不符合实际需求的，所以需要安装中文分词器 IK 来解决这个问题。

IK 提供了两个分词算法：ik_smart 和 ik_max_word ，其中 ik_smart 为最少切片，ik_max_word 为最细粒度切片。

进入 elasticsearch 容器
安装 wget
```
yum -y install wget
```
在 plugins 目录下创建 ik 目录
```
mkdir ik
```

进入 ik 目录使用 wget 下载对应版本

wget https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.6.2/elasticsearch-analysis-ik-7.6.2.zip

解压压缩包

unzip elasticsearch-analysis-ik-7.6.2.zip

删除压缩包

rm -rf elasticsearch-analysis-ik-7.6.2.zip

验证

重启 elasticsearch 容器后重新进入容器，在 bin 目录下执行指令：
```
elasticsearch-plugin list
```
显示 ik 即表示安装成功

测试

在 Kibana Dev Tools 控制台中输入以下命令

GET _analyze
{"analyzer": "ik_smart","text": "中国共产党"
}GET _analyze
{"analyzer": "ik_max_word","text": "中国共产党"
}

分别发送请求会得到不同响应

{"tokens" : [{"token" : "中国共产党","start_offset" : 0,"end_offset" : 5,"type" : "CN_WORD","position" : 0}]
}

{"tokens" : [{"token" : "中国共产党","start_offset" : 0,"end_offset" : 5,"type" : "CN_WORD","position" : 0},{"token" : "中国","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 1},{"token" : "国共","start_offset" : 1,"end_offset" : 3,"type" : "CN_WORD","position" : 2},{"token" : "共产党","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 3},{"token" : "共产","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 4},{"token" : "党","start_offset" : 4,"end_offset" : 5,"type" : "CN_CHAR","position" : 5}]
}

2.5 添加自定义分词字典

进入 elasticsearch 容器

处理中文乱码问题

编辑 ~/.vimrc 文件，行末添加以下配置：

set fileencodings=utf-8,ucs-bom,gb18030,gbk,gb2312,cp936
set termencoding=utf-8
set encoding=utf-8

保存退出

进入 IK 插件安装目录
进入 config 目录
创建 dic 文件
```
touch caixukun.dic
```
编辑 dic 添加自定义词条
```
蔡徐坤
鸡你太美
```

编辑 IKAnalyzer.cfg.xml

<entry key="ext_dict">caixukun.dic</entry>

重启 elasticsearch 容器

测试

在 Kibana Dev Tools 控制台中输入以下命令：

GET _analyze
{"analyzer": "ik_smart","text": "蔡徐坤鸡你太美"
}

默认响应数据：

{"tokens" : [{"token" : "蔡","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "徐","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "坤","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 2},{"token" : "鸡","start_offset" : 3,"end_offset" : 4,"type" : "CN_CHAR","position" : 3},{"token" : "你","start_offset" : 4,"end_offset" : 5,"type" : "CN_CHAR","position" : 4},{"token" : "太美","start_offset" : 5,"end_offset" : 7,"type" : "CN_WORD","position" : 5}]
}

自定义字典添加后响应数据：

{"tokens" : [{"token" : "蔡徐坤","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "鸡你太美","start_offset" : 3,"end_offset" : 7,"type" : "CN_WORD","position" : 1}]
}

三、Rest 风格说明

一种软件结构风格，而不是标准，只是提供了一组设计原则和约束条件，它主要用于客户端和服务器交互类的软件。

基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制。

基本 Rest 命令说明：

method	utl地址	描述
PUT	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST	localhost:9200/索引名称/类型名称	创建文档
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	通过id查询文档
POST	localhost:9200/索引名称/类型名称/_serch	查询所有数据

3.1 基础测试

在 Kibana Dev Tools 控制台中输入以下命令：
```
PUT /test1/type1/1
{"name": "蔡徐坤","age": 10
}
```
• 命令解释：

PUT：创建命令
test1：索引
type1：类型
1：id
“name”: “蔡徐坤”：属性
“age”: 10：属性

发送请求

得到响应如下：

#! Deprecation: [types removal] Specifying types in document index requests is deprecated, use the typeless endpoints instead (/{index}/_doc/{id}, /{index}/_doc, or /{index}/_create/{id}).
{"_index" : "test1","_type" : "type1","_id" : "1","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1
}

进入 head 查看已创建的索引信息

3.2 创建索引规则

在 Kibana Dev Tools 控制台中输入以下命令：

PUT /test2
{"mappings": {"properties": {"name": {"type": "text"},"age": {"type": "long"},"birthday": {"type": "date"}}}
}

发送请求

得到响应如下：

{"acknowledged" : true,"shards_acknowledged" : true,"index" : "test2"
}

进入 head 查看已创建的索引信息

3.3 查看默认的信息

如果文档字段没有指定，那么 Elasticsearch 就会自动配置默认字段。

在 Kibana Dev Tools 控制台中输入以下命令：

PUT /test3/_doc/1
{"name": "蔡徐坤","age": 10,"birthday": "1998-08-02"
}

发送请求

得到响应如下：

{"_index" : "test3","_type" : "_doc","_id" : "1","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1
}

控制台中输入以下命令：
```
GET test3
```

发送请求

得到响应如下：

{"test3" : {"aliases" : { },"mappings" : {"properties" : {"age" : {"type" : "long"},"birthday" : {"type" : "date"},"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}},"settings" : {"index" : {"creation_date" : "1596476421598","number_of_shards" : "1","number_of_replicas" : "1","uuid" : "Rh3Z67EpSPSOUbz1lmgB7g","version" : {"created" : "7060299"},"provided_name" : "test3"}}}
}

3.4 修改操作

通过 POST 命令实现修改操作。

在 Kibana Dev Tools 控制台中输入以下命令：

POST /test3/_doc/1/_update
{"doc": {"name": "坤坤"}
}

发送请求

得到响应如下：

#! Deprecation: [types removal] Specifying types in document update requests is deprecated, use the endpoint /{index}/_update/{id} instead.
{"_index" : "test3","_type" : "_doc","_id" : "1","_version" : 2, // 更新次数"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 1,"_primary_term" : 1
}

版本号发生变化

3.5 删除操作

通过 DELETE 命令实现删除操作。

在 Kibana Dev Tools 控制台中输入以下命令：
```
DELETE test1
```
发送请求

得到响应如下：
```
{"acknowledged" : true
}
```

3.6 拓展命令

通过 GET _cat 命令可以获得当前 Elasticsearch 集群的许多信息。

查看集群健康值

GET _cat/health

查看索引具体信息

GET _cat/indices?v

四、关于文档的基本操作

4.1 添加数据 PUT

在 Kibana Dev Tools 控制台中输入以下命令：

PUT /stars/user/1
{"name": "蔡徐坤","age": "22","desc": "鸡你太美","tags": ["唱","跳","rap","篮球"]
}

发送请求

得到响应如下：

{"_index" : "stars","_type" : "user","_id" : "1","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1
}

添加用户2

PUT /stars/user/2
{"name": "吴亦凡","age": "29","desc": "大碗宽面","tags": ["加拿大","电鳗","说唱","嘻哈"]
}

添加用户3

PUT /stars/user/3
{"name": "梁非凡","age": "40","desc": "也啦你","tags": ["桌面清理大师","警察","啵嘴"]
}

进入 head 查看已创建的索引信息

4.2 查询数据 GET

简单查询

GET stars/user/1

{"_index" : "stars","_type" : "user","_id" : "1","_version" : 1,"_seq_no" : 0,"_primary_term" : 1,"found" : true,"_source" : {"name" : "蔡徐坤","age" : "22","desc" : "鸡你太美","tags" : ["唱","跳","rap","篮球"]}
}

复杂查询
包含关键字匹配

GET stars/user/_search?q=name:吴亦凡

  "took" : 64,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 2.313365,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 2.313365, //匹配度"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]}},{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.4471386,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]}}]}
}

4.3 更新数据 POST

在 Kibana Dev Tools 控制台中输入以下命令：

POST /stars/user/1/_update
{"doc": {"name": "坤坤"}
}

发送请求

得到响应如下：

{"_index" : "stars","_type" : "user","_id" : "1","_version" : 2, // 更新次数"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 3,"_primary_term" : 1
}

4.4 删除数据 DELETE

五、高级查询操作

5.1 普通查询

请求

GET stars/user/_search
{"query": {"match": {"name": "凡" // 关键字}}
}

响应

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.4471386,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 0.4471386,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]}},{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.4471386,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]}}]}
}

5.2 查询结果过滤指定字段

请求

GET stars/user/_search
{"query": {"match": {"name": "凡"}},"_source": ["name", "desc"] // 过滤字段
}

响应

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.4471386,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 0.4471386,"_source" : {"name" : "吴亦凡","desc" : "大碗宽面"}},{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.4471386,"_source" : {"name" : "梁非凡","desc" : "吔*啦你"}}]}
}

5.3 查询结果排序

请求

GET stars/user/_search
{"query": {"match": {"name": "凡"}},"sort": [{"age.keyword": {"order": "desc" // 降序}}]
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : "stars","_type" : "user","_id" : "3","_score" : null,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]},"sort" : ["40"]},{"_index" : "stars","_type" : "user","_id" : "2","_score" : null,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]},"sort" : ["29"]}]}
}

5.4 查询结果分页

请求

GET stars/user/_search
{"query": {"match": {"name": "凡"}},"_source": ["name", "desc"],"from": 0, // 开始位置"size": 1 // 返回数据数目
}

响应

{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.4471386,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 0.4471386,"_source" : {"name" : "吴亦凡","desc" : "大碗宽面"}}]}
}

5.5 多条件查询

must：相当于关系型数据库 and

请求

GET stars/user/_search
{"query": {"bool": {"must": [{"match": {"name": "吴亦凡"}},{"match": {"age": "29"}}]}}
}

响应

{"took" : 5,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 3.2941942,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 3.2941942,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]}}]}
}

should：相当于关系型数据库 or

请求

GET stars/user/_search
{"query": {"bool": {"should": [{"match": {"name": "吴亦凡"}},{"match": {"age": "29"}}]}}
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 3.2941942,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 3.2941942,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]}},{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.4471386,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]}}]}
}

must_not：相当于关系型数据库 not

请求

GET stars/user/_search
{"query": {"bool": {"must_not": [{"match": {"age": "29"}}]}}
}

响应

{"took" : 2,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.0,"hits" : [{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.0,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]}},{"_index" : "stars","_type" : "user","_id" : "1","_score" : 0.0,"_source" : {"name" : "坤坤","age" : "22","desc" : "鸡你太美","tags" : ["唱","跳","rap","篮球"]}}]}
}

5.6 根据过滤条件查询

请求

GET stars/user/_search
{"query": {"bool": {"must": [{"match": {"name": "凡"}}],"filter": [{"range": {"age": {"gte": 10, // 大于等于10岁"lte": 30 // 小于等于30岁}}}]}}
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 0.4471386,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 0.4471386,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]}}]}
}

5.7 匹配多个条件查询

请求

GET stars/user/_search
{"query": {"match": {"tags": "唱 跳" // 多个条件使用空格隔开}}
}

响应

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 1.7137355,"hits" : [{"_index" : "stars","_type" : "user","_id" : "1","_score" : 1.7137355,"_source" : {"name" : "坤坤","age" : "22","desc" : "鸡你太美","tags" : ["唱","跳","rap","篮球"]}},{"_index" : "stars","_type" : "user","_id" : "2","_score" : 0.4471386,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]}}]}
}

5.8 精确查询

关于分词：

term：直接通过倒排索引指定的词条进行精确查询
match：先分析文档，再通过分析的文档进行查询

两个字段类型：

text：会被分词器解析
keyword：不会被分词器解析

5.9 高亮查询

请求

GET stars/user/_search
{"query": {"match": {"name": "吴亦凡"}},"highlight": {"fields": {"name": {}}}
}

响应

{"took" : 96,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 2.313365,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 2.313365,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]},"highlight" : {"name" : ["<em>吴</em><em>亦</em><em>凡</em>" // 高亮标签]}},{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.4471386,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]},"highlight" : {"name" : ["梁非<em>凡</em>"]}}]}
}

5.10 自定义高亮标签

请求

GET stars/user/_search
{"query": {"match": {"name": "吴亦凡"}},"highlight": {"pre_tags": "<p class='key' style='color:red'>","post_tags": "</p>", "fields": {"name": {}}}
}

响应

{"took" : 4,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 2.313365,"hits" : [{"_index" : "stars","_type" : "user","_id" : "2","_score" : 2.313365,"_source" : {"name" : "吴亦凡","age" : "29","desc" : "大碗宽面","tags" : ["加拿大","电鳗","说唱","嘻哈"]},"highlight" : {"name" : ["<p class='key' style='color:red'>吴</p><p class='key' style='color:red'>亦</p><p class='key' style='color:red'>凡</p>"]}},{"_index" : "stars","_type" : "user","_id" : "3","_score" : 0.4471386,"_source" : {"name" : "梁非凡","age" : "40","desc" : "吔*啦你","tags" : ["桌面清理大师","警察","啵嘴"]},"highlight" : {"name" : ["梁非<p class='key' style='color:red'>凡</p>"]}}]}
}

三、SpringBoot 整合 Elasticsearch

3.1 环境搭建

导入依赖

注意 Elasticsearch 版本需保持一致。

<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

编写配置类

@Configuration
public class RestClientConfig extends AbstractElasticsearchConfiguration {@Override@Beanpublic RestHighLevelClient elasticsearchClient() {final ClientConfiguration clientConfiguration = ClientConfiguration.builder().connectedTo("39.105.80.221:9200").build();return RestClients.create(clientConfiguration).rest();}}

3.2 索引相关操作

索引的创建

@SpringBootTest
class ElasticApplicationTests {@AutowiredRestHighLevelClient elasticsearchClient;/*** 测试索引的创建*/@Testvoid test01() throws IOException {// 创建请求CreateIndexRequest request = new CreateIndexRequest("test_index");// 客户端执行请求CreateIndexResponse response = elasticsearchClient.indices().create(request, RequestOptions.DEFAULT);System.out.println(response);}}

判断索引是否存在

@SpringBootTest
class ElasticApplicationTests {@AutowiredRestHighLevelClient elasticsearchClient;/*** 测试判断索引是否存在*/@Testvoid test02() throws IOException {// 创建请求GetIndexRequest request = new GetIndexRequest("test_index");// 客户端执行请求boolean response = elasticsearchClient.indices().exists(request, RequestOptions.DEFAULT);System.out.println(response);}}

索引的删除

@SpringBootTest
class ElasticApplicationTests {@AutowiredRestHighLevelClient elasticsearchClient;/*** 测试索引的删除*/@Testvoid test03() throws IOException {// 创建请求DeleteIndexRequest request = new DeleteIndexRequest("test_index");// 客户端执行请求AcknowledgedResponse response = elasticsearchClient.indices().delete(request, RequestOptions.DEFAULT);System.out.println(response.isAcknowledged());}}

3.3 文档相关操作

文档的添加

@Test
void test04() throws IOException {// 创建对象User user = new User("testUser", 18);// 创建请求IndexRequest request = new IndexRequest("test_index");// 设置idrequest.id("1");// 设置请求超时时间request.timeout(TimeValue.timeValueSeconds(1));// 将对象转为JSON数据放入请求request.source(objectMapper.writeValueAsString(user), XContentType.JSON);// 客户端发送请求IndexResponse response = elasticsearchClient.index(request, RequestOptions.DEFAULT);System.out.println(response.toString());System.out.println(response.status());
}

判断文档是否存在

@Test
void test05() throws IOException {// 创建请求GetRequest request = new GetRequest("test_index", "1");// 客户端发送请求boolean response = elasticsearchClient.exists(request, RequestOptions.DEFAULT);System.out.println(response);
}

文档信息的获取

@Test
void test06() throws IOException {// 创建请求GetRequest request = new GetRequest("test_index", "1");// 客户端发送请求GetResponse response = elasticsearchClient.get(request, RequestOptions.DEFAULT);System.out.println(response.getSourceAsString());
}

文档信息的更新

@Test
void test07() throws IOException {// 创建对象User user = new User("testUser", 28);// 创建请求UpdateRequest request = new UpdateRequest("test_index", "1");request.doc(objectMapper.writeValueAsString(user), XContentType.JSON);// 客户端发送请求UpdateResponse response = elasticsearchClient.update(request, RequestOptions.DEFAULT);System.out.println(response.status());
}

文档信息的删除

@Test
void test08() throws IOException {// 创建请求DeleteRequest request = new DeleteRequest("test_index", "1");// 设置请求超时时间request.timeout(TimeValue.timeValueSeconds(1));// 客户端发送请求DeleteResponse response = elasticsearchClient.delete(request, RequestOptions.DEFAULT);System.out.println(response.status());
}

文档数据的批量插入

@Test
void test09() throws IOException {// 创建请求BulkRequest request = new BulkRequest();// 设置超时时间request.timeout(TimeValue.timeValueSeconds(10));// 创建批量数据ArrayList<User> users = new ArrayList<>();users.add(new User("testUser02", 20));users.add(new User("testUser03", 21));users.add(new User("testUser04", 22));users.add(new User("testUser05", 23));users.add(new User("testUser06", 24));// 将批量数据添加至请求for (int i = 0; i < users.size(); i++) {request.add(new IndexRequest("test_index").id("" + i).source(objectMapper.writeValueAsString(users.get(i)), XContentType.JSON));}// 客户端发送请求BulkResponse responses = elasticsearchClient.bulk(request, RequestOptions.DEFAULT);System.out.println(responses.hasFailures());
}

文档的查询

@Test
void test10() throws IOException {// 创建请求SearchRequest request = new SearchRequest("test_index");// 设置搜索条件SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();// 创建查询构建器sourceBuilder.query(QueryBuilders.termQuery("name.keyword", "testUser02"));// 设置超时时间sourceBuilder.timeout(TimeValue.timeValueSeconds(60));request.source(sourceBuilder);// 客户端发送请求SearchResponse response = elasticsearchClient.search(request, RequestOptions.DEFAULT);System.out.println(objectMapper.writeValueAsString(response.getHits()));for (SearchHit hit : response.getHits().getHits()) {System.out.println("----------");System.out.println(hit.getSourceAsMap());}
}

四、实战应用 - 京东搜索

4.1 环境搭建

导入依赖

<dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId>
</dependency><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.13.1</version>
</dependency>

配置文件

server:port: 8080
spring:thymeleaf:cache: false # 关闭 thymeleaf 缓存

controller

@Controller
public class IndexController {@GetMapping({"/", "/index"})public String index() {return "index";}}

4.2 处理爬虫数据

搭配项目仓库 Web IDE 观看体验更佳