跟乐乐学ES!(三)ElasticSearch 批量操作与高级查询
上一篇文章:跟乐乐学ES!(二)ElasticSearch基础。
下一篇文章:跟乐乐学ES!(四) java中ElasticSearch客户端的使用。
批量操作
有些增删改查操作是可以进行批量操作的,以此来达到节省网络请求的目的。
对于批量操作的请求格式,是由action和request body 这两种Json来组合完成的。
格式如下:
{ action: { metadata }}
{ request body }
{ action: { metadata }}
{ request body }
关于action和request body的格式和意义,直接发上述的格式文本大家可能看不懂,那么我以批量添加的json截图为例子,便于理解,截图如下:
准备数据
批量查询
url后面加上 ‘_mget’
POST方式:localhost:9200/索引名/类型名/_mget
例如:
{"ids":["1","2","3"]
}
解释:
批量查询类型中id为’1’和’2’的文档。
结果:
{"docs": [{"_index": "userinfo","_type": "us","_id": "1","_version": 3,"_seq_no": 4,"_primary_term": 1,"found": true,"_source": {"query": {"match": {"phonenumber": 150}},"address": "南京","name": "张三","phonenumber": "13000000001","softid": "9999","age": 23}},{"_index": "userinfo","_type": "us","_id": "2","_version": 1,"_seq_no": 1,"_primary_term": 1,"found": true,"_source": {"softid": 9998,"name": "李四","phonenumber": 15000000000,"age": 18,"address": "南京"}},{"_index": "userinfo","_type": "us","_id": "3","found": false}]
}
值得注意的是,id为3的文档并不存在,因此在第三列的结果中found的值为false。
所以当你批量查询的目标中,存在一个不存在的文档,es会响应相应的json数据来告诉你。
批量插入 _bulk
批量插入的action有三种,分别为Update/Create/Index
三者的区别可以参考这里:https://blog.csdn.net/xiaoyu_BD/article/details/81914567
post方式: localhost:9200/索引/类型/_bulk
OR.create方式插入
json请求体:
{"create":{"_index":"userinfo","_type":"us","_id":"5"}} # 将地址为滨松,姓名为三好炎男的文档,创建到'userinfo'索引的'us'类型中,id为5.
{"address":"滨松","name":"三好炎男","phonenumber":"08026073443","softid":"9996","age":17}
{"create":{"_index":"userinfo","_type":"us","_id":"6"}}
{"address":"洛阳","name":"乐乐","phonenumber":"13000000002","softid":"9995","age":24}
{"create":{"_index":"userinfo","_type":"us","_id":"7"}}
{"address":"北京","name":"孔德明","phonenumber":"13000000003","softid":"9994","age":20}
注意:最后一行必须为换行符空出来的一行,不然es无法识别。
结果:
{"took": 99,"errors": false,"items": [{"create": {"_index": "userinfo","_type": "us","_id": "5","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 5,"_primary_term": 2,"status": 201}},{"create": {"_index": "userinfo","_type": "us","_id": "6","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 6,"_primary_term": 2,"status": 201}},{"create": {"_index": "userinfo","_type": "us","_id": "7","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 7,"_primary_term": 2,"status": 201}}]
}
OR.index方式插入
json请求体:
# 127.0.0.1:9200/mytest/_doc/_bulk
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"洛阳","age":24,"name":"乐乐","regtime":"2021-04-20","sex":true,"userid":1}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"南京","age":18,"name":"张三","regtime":"2021-01-30","sex":true,"userid":2}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"北京","age":21,"name":"李桂花","regtime":"2020-12-02","sex":false,"userid":3}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"郑州","age":19,"name":"王翠英","regtime":"2021-03-21","sex":false,"userid":4}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"北京","age":22,"name":"李斌","regtime":"2020-08-12","userid":5}
注意:最后一行必须为换行符空出来的一行,不然es无法识别。
批量删除
此处的话,需要将’create‘改为delete,即可实现批量删除。
post方式: localhost:9200/索引/类型/_bulk
json请求体示例:
{"delete":{"_index":"userinfo","_type":"us","_id":"5"}}
{"delete":{"_index":"userinfo","_type":"us","_id":"6"}}
{"delete":{"_index":"userinfo","_type":"us","_id":"7"}}
注意:批量删除时,同样需要最后一行为换行符。
结果:
{"took": 44,"errors": false,"items": [{"delete": {"_index": "userinfo","_type": "us","_id": "5","_version": 2,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 8,"_primary_term": 2,"status": 200}},{"delete": {"_index": "userinfo","_type": "us","_id": "6","_version": 2,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 9,"_primary_term": 2,"status": 200}},{"delete": {"_index": "userinfo","_type": "us","_id": "7","_version": 2,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 10,"_primary_term": 2,"status": 200}}]
}
分页查询
在ElasticSearch中,分页查询用size和from两个参数来表示。对应着sql中的limit。
get方式: localhost:9200/索引/类型/_search?size=每页显示条数&from=从第几个开始
size代表每页显示多少个。
from代表跳过几条文档再开始查找,起始值为0。
例如:
127.0.0.1:9200/userinfo/us/_search?size=3&from=0
结果:
{"took": 10,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "userinfo","_type": "us","_id": "JMHW53gBu0CFfFx2dazM","_score": 1.0,"_source": {"softid": 9997,"name": "王五","phonenumber": 15000000001,"age": 20,"address": "北京"}},{"_index": "userinfo","_type": "us","_id": "2","_score": 1.0,"_source": {"softid": 9998,"name": "李四","phonenumber": 15000000000,"age": 18,"address": "南京"}},{"_index": "userinfo","_type": "us","_id": "1","_score": 1.0,"_source": {"query": {"match": {"phonenumber": 150}},"address": "南京","name": "张三","phonenumber": "13000000001","softid": "9999","age": 23}}]}
}
深度分页的注意事项。
应该当心分页太深或者一次请求太多的结果。结果在返回前会被排序。但是记住一个搜索请求常常涉及多个分
片。每个分片生成自己排好序的结果,它们接着需要集中起来排序以确保整体排序正确。
为了理解为什么深度分页是有问题的,让我们假设在一个有5个主分片的索引中搜索。当我们请求结果的第一
页(结果1到10)时,
每个分片产生自己最顶端10个结果然后返回它们给请求节点(requesting node),它再排序这所有的50个结果以选出顶端的10个结果
。
现在假设我们请求第1000页,需要显示的结果为1001行到1010行的文档。
那么和请求前面10个结果时一样,这时会相应的、每个分片都必须产生顶端的1010个结果,然后请求节点排序这5050个结果并丢弃5040个!
所以在分布式系统中,排序结果的花费随着分页的深入而成倍增长。
这也是为什么网络搜索引擎中任何语句不能返回多于1000个结果的原因。
换而言之,我们在分页查询时,每页的查询数量(size)不应该太多。
结构化查询
准备数据
固定请求格式:
post方式:localhost:9200/索引名/类型名/_search
term查询(精准匹配)
term 主要用于精确匹配哪些值,比如数字,日期,布尔值或 not_analyzed 的字符串(未经分析的文本数据类型)
post方式:localhost:9200/索引名/类型名/_search
127.0.0.1:9200/mytest/_doc/_search
示例:
{"query":{"term":{"name":"乐乐"}}
}
结果:
{"took": 10,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.6931471,"hits": [{"_index": "mytest","_type": "_doc","_id": "DkSf7ngBqYrwxW-GsG6_","_score": 0.6931471,"_source": {"address": "洛阳","age": 24,"name": "乐乐","regtime": "2021-04-20","sex": true,"userid": 1}}]}
}
terms查询(多个值精确匹配)
terms是term的加强版。term只可以对一个字段进行指定单个值来匹配,而terms可以对一个字段指定多个值来匹配。
如果指定的多个值当中,有的值并不能匹配到某行文档,那将只会响应匹配到的那些结果。
示例:
{"query":{"terms":{"age":[18,19,22] # 年龄为22的用户并不存在}}
}
结果:
{"took": 13,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "D0Sf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "南京","age": 18,"name": "张三","regtime": "2021-01-30","sex": true,"userid": 2}},{"_index": "mytest","_type": "_doc","_id": "EUSf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "郑州","age": 19,"name": "王翠英","regtime": "2021-03-21","sex": false,"userid": 4}}]}
}
range查询(范围查询)
用于查询范围,当字段的数据类型为数字时,多配合’lt(小于)'和’gt(大于)'来指定。
范围操作符有:
gt 大于
gte 大于等于
lt 小于
lte 小于等于
示例:
{"query":{"range":{ # 声明为范围查询"age":{ # 查询字段age"gt":20,# 大于"lt":24 # 小于}}}
}
结果:
{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "EESf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "北京","age": 21,"name": "李桂花","regtime": "2020-12-02","sex": false,"userid": 3}}]}
}
exists非空查询(is not Null)
用于筛选一个类型中,某个字段不为空的所有文档。
相当于sql语法中的 is not Null;
json请求体:
# post:127.0.0.1:9200/mytest/_doc/_search{"query":{"exists":{"field":"sex" #请求查询,'_doc'类型中,'sex'字段值不为空的文档。}}
}
这个exists就相当于以下sql语句:
select * from _doc where sex is not null;
图中我们可以看到,姓名为‘李斌’的这一栏没有写上性别。所以下面的响应结果中就不存在这条文档。
{"took": 17,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "DkSf7ngBqYrwxW-GsG6_","_score": 1.0,"_source": {"address": "洛阳","age": 24,"name": "乐乐","regtime": "2021-04-20","sex": true,"userid": 1}},{"_index": "mytest","_type": "_doc","_id": "EESf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "北京","age": 21,"name": "李桂花","regtime": "2020-12-02","sex": false,"userid": 3}},{"_index": "mytest","_type": "_doc","_id": "D0Sf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "南京","age": 18,"name": "张三","regtime": "2021-01-30","sex": true,"userid": 2}},{"_index": "mytest","_type": "_doc","_id": "EUSf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "郑州","age": 19,"name": "王翠英","regtime": "2021-03-21","sex": false,"userid": 4}}]}
}
match(标准查询)
match是ElasticSearch中一种标准查询,诸如精确查询,模糊查询,全文本(String,text这样的)查询等场景几乎都需要用到它。
只是,如果你使用 match 查询一个全文本字段,它会在真正查询之前用分析器先分析 match 一下查询字符。
例如:
{"query":{"match":{"address":"张三"}}
}
如果用match下指定了一个确切值,在遇到数字、日期、布尔值或者 not_analyzed的字符串时,它将为你搜索你给定的值。
例如:
{"query":{"match":{"sex":true}}
}
{"query":{"match":{"age":18}}
}
过滤查询
使用过滤查询
post方式:localhost:9200/索引名/类型名/_search
过滤查询要比其它查询的效率要高。
match,term,range等可以用在过滤查询之中。
# post:127.0.0.1:9200/mytest/_doc/_search
{"query":{"bool":{ # 定义条件"filter":{ # 声明使用过滤查询"match":{ # 查询方式"address":"北京"}}}}
}
结果:
{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 0.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "EESf7ngBqYrwxW-GsG7A","_score": 0.0,"_source": {"address": "北京","age": 21,"name": "李桂花","regtime": "2020-12-02","sex": false,"userid": 3}},{"_index": "mytest","_type": "_doc","_id": "olw28ngBtFmPtmNtGLrS","_score": 0.0,"_source": {"address": "北京","age": 22,"name": "李斌","regtime": "2020-08-12","userid": 5}}]}
}
为什么过滤语句比查询语句好?
- 一条过滤语句会询问每个文档的字段值是否包含着特定值。
- 查询语句会询问每个文档的字段值与特定值的匹配程度如何
- 一条查询语句会计算每个文档与查询语句的相关性,会给出一个相关性评分 _score,并且 按照相关性对匹配到的文档进行排序。 这种评分方式非常适用于一个没有完全配置结果的全文本搜索。
- 一个简单的文档列表,快速匹配运算并存入内存是十分方便的, 每个文档仅需要1个字节。这些缓存的过滤结果集与后续请求的结合使用是非常高效的。
- 查询语句不仅要查找相匹配的文档,还需要计算每个文档的相关性,所以一般来说
查询语句要比过滤语句更耗时
,并且查询结果也不可缓存。 - 建议:
做精确匹配搜索时,最好用过滤语句,因为过滤语句可以缓存数据。
分词
什么是分词?
分词就是对于一个句子,根据词汇等规则把整个句子分为多个单词。
也叫做文本分析,在ElasticSearch中称之为Analysis。
例如:
我昨天吃完晚饭就去直接睡觉了----> 我/昨天/吃/完/晚饭/就/去/直接/睡觉/了
使用分词
单词‘analyze’在es语法中代表使用分词的意思。
同时,还存在‘分词器’这种概念,这是分词要依照世界各国语言的不同来选择符合各自需求的分词器。
常用的英文分词器有’standard‘标准分词器,它也是es中默认的分词器。
别急,我们先不介绍中文分词器的应用,这里我们举例用英文分词器’standard‘来进行分词。
普通方式分词
post方式: localhost:9200/_analyze
# 127.0.0.1:9200/_analyze
{"analyzer":"standard",# 指定分词器为标准分词器'standard'"text":"you are hero"# 要分词的内容本文。
}
结果:
{"tokens": [{"token": "you","start_offset": 0,"end_offset": 3,"type": "<ALPHANUM>","position": 0},{"token": "are","start_offset": 4,"end_offset": 7,"type": "<ALPHANUM>","position": 1},{"token": "hero","start_offset": 8,"end_offset": 12,"type": "<ALPHANUM>","position": 2}]
}
指定索引进行分词
post方式: localhost:9200/索引名
/_analyze
# 127.0.0.1:9200/mytest/_analyze
{"analyzer":"standard","text":"hello world"
}
结果:
{"tokens": [{"token": "hello","start_offset": 0,"end_offset": 5,"type": "<ALPHANUM>","position": 0},{"token": "world","start_offset": 6,"end_offset": 11,"type": "<ALPHANUM>","position": 1}]
}
使用中文分词
中文分词的难点就在于不能像英语那样,根据空格来区分句子中的词汇。
常用中文分词器,IK、jieba、THULAC等,我个人推荐使用IK分词器。
ik分词器的介绍
linux环境下安装ik分词器的话,具体参考这篇文章:https://zhuanlan.zhihu.com/p/98845218
windows环境下安装ik分词器的话,具体参考这篇文章:
https://blog.csdn.net/fgx_123456/article/details/108800699
我是windows环境,要注意哦,自己下载的ik分词器版本一定要和自己的ElasticSearch版本相对应
ik分词器的配置与使用。
ik分词器具有多种分词规则。所谓多种分词规则,就是对中文语句具有不同程度的分词粒度。例如k_max_word和ik_smart这两种分词规则。
- k_max_word:会将文本做最细粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」,会穷尽各种可能的组合
- ik_smart:会将文本做最粗粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」
关于ik分词器的更多配置说明,推荐参考这篇文章:https://www.cnblogs.com/haixiang/p/11810799.html
接下来我们开始尝试中文分词,虽说换了分词器,但是请求方式和url规则是不变的。
示例一:
# 127.0.0.1:9200/mytest/_analyze
{"analyzer":"ik_max_word",# ik_max_word是ik分词器的一种。"text":"你确定你说的正确吗"
}
结果:
{"tokens": [{"token": "你","start_offset": 0,"end_offset": 1,"type": "CN_CHAR","position": 0},{"token": "确定","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 1},{"token": "你","start_offset": 3,"end_offset": 4,"type": "CN_CHAR","position": 2},{"token": "说","start_offset": 4,"end_offset": 5,"type": "CN_CHAR","position": 3},{"token": "的","start_offset": 5,"end_offset": 6,"type": "CN_CHAR","position": 4},{"token": "正确","start_offset": 6,"end_offset": 8,"type": "CN_WORD","position": 5},{"token": "吗","start_offset": 8,"end_offset": 9,"type": "CN_CHAR","position": 6}]
}
示例二:
{"analyzer":"ik_smart",# 此处选择ik_smart规则来进行分词"text":"我希望世界大同,人类幸福安康"
}
结果:
{"tokens": [{"token": "我","start_offset": 0,"end_offset": 1,"type": "CN_CHAR","position": 0},{"token": "希望","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 1},{"token": "世界大同","start_offset": 3,"end_offset": 7,"type": "CN_WORD","position": 2},{"token": "人类","start_offset": 8,"end_offset": 10,"type": "CN_WORD","position": 3},{"token": "幸福","start_offset": 10,"end_offset": 12,"type": "CN_WORD","position": 4},{"token": "安康","start_offset": 12,"end_offset": 14,"type": "CN_WORD","position": 5}]
}
全文搜索(分词字段)
概念
关于全文搜索,标准化的解释如下:
- 相关性( Relevance) 它是评价查询与其结果间的相关程度,并根据这种相关程度对结果排名的一种能力,这 种计算方式可以是 TF/IDF 方法、地理位置邻近、模糊相似,或其他的某些算法。
- 分词( Analysis) 它是将文本块转换为有区别的、规范化的 token 的一个过程,目的是为了创建倒排索引以及 查询倒排索引。
那么通俗地来解释的话,就是在创建索引时,为其某个字段指定一个分词器,让该字段能够被分词查询,且该字段的值普遍是句子或多个词汇组成的。
准备数据
创建结构化索引,这个索引中的‘suitableCrowd’(适宜人群)字段被指定了中文分词器,分词规则为最细分词粒度的k_max_word
# put:127.0.0.1:9200/medical 创建索引 ‘医疗’
{"settings":{"index":{"number_of_shards":"5",# 分片数量为5"number_of_replicas":"0"# 副本数量为0}},"mappings":{"properties":{"id":{ # id字段"type":"integer"},"medicalName":{ # 医疗名称字段"type":"keyword"},"suitableCrowd":{ # 适宜人群字段 ,因为适宜人群可以是一个或多个,所以此处为其绑定了ik中文分词器"type":"text","analyzer":"ik_max_word"},"medicalType":{ # 医疗类型"type":"keyword"}}}
}
插入数据
# post:127.0.0.1:9200/medical/_bulk
{"index":{"_index":"medical","_type":"_doc"}}
{"id":1,"medicalName":"新冠疫苗","suitableCrowd":"老人,青少年,儿童","medicalType":"针剂"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":2,"medicalName":"褪黑素组合片","suitableCrowd":"老人,青少年","medicalType":"内服"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":3,"medicalName":"妇炎洁","suitableCrowd":"女性","medicalType":"外用"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":4,"medicalName":"跌打镇痛膏","suitableCrowd":"老人,青少年,儿童","medicalType":"外用"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":5,"medicalName":"清热解毒胶囊","suitableCrowd":"青少年,儿童","medicalType":"内服"}
单词搜索
# post:127.0.0.1:9200/medical/_search
{"query":{"match":{"suitableCrowd":"儿童"}}
}
结果:
{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 0.41360325,"hits": [{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 0.41360325,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 0.41360325,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}},{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.2876821,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}}]}
}
解释:
搜索出适用范围(suitableCrowd)包括小孩的医疗产品。
过程说明:
- 检查字段类型
suitableCrowd 字段是一个 text 类型( 指定了IK分词器),这意味着查询字符串本身也应该被分词。 - 分析查询字符串 。
将查询的字符串 “儿童” 传入IK分词器中,输出的结果是单个项 儿童。因为只有一个单词项,所以 match 查询执行的是单个底层 term 查询。 - 查找匹配文档 。
用 term 查询在倒排索引中查找 “suitableCrowd” 然后获取一组包含该项的文档,本例的结果是文档:1,2,3 - 为每个文档评分。用term查询计算每个文档相关度评分 _score,这是种将词频(term frequency,即词“儿童”在相关文档的suitableCrowd字段中出现的频率)和 反向文档频率(inverse document frequency,即词 “儿童” 在所有文档的 suitableCrowd字段中出现的频率),以及字段的长度(即字段越短相关度越高)相结合的计算方式。
多词搜索
多词搜索是指对某个字段指定多个分词来搜索。
OR形式
# post:127.0.0.1:9200/medical/_search
{"query":{"match":{# match是标准查询,你也可以换为term。"suitableCrowd":"儿童 青少年"# 指定搜索该字段中包含‘儿童’或‘青少年’的文档}}
}
结果:
{"took": 17,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 1.2408097,"hits": [{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 1.2408097,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 1.2408097,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}},{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3W56_XgB2_2kwSyJeq2T","_score": 0.5753642,"_source": {"id": 2,"medicalName": "褪黑素组合片","suitableCrowd": "老人,青少年","medicalType": "内服"}}]}
}
我们可以看到,产品‘褪黑素组合片’,中包含有‘青少年’;‘清热解毒胶囊’包含有’儿童‘。所以他们都被归纳为结果响应过来了。
AND形式(operator)
如果我们想要查询suitableCrowd字段既包含’青少年‘,又包含’儿童‘的文档。那么就需要用到AND形式。
在ElasticSearch中,我们可以通过operator来指定词与词之间的逻辑关系。
# post:127.0.0.1:9200/medical/_search
{"query":{"match":{"suitableCrowd":{# 查询suitableCrowd字段"query":"青少年 儿童",# 通过query指定要匹配的词"operator":"and"# 通过operator来指定逻辑关系,是and还是or或是其它}}}
}
结果:
{"took": 14,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 1.2408097,"hits": [{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 1.2408097,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 1.2408097,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}},{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}}]}
}
组合搜索
组合搜索就是说将多种条件规则用于分词字段。
以下举例中使用到的must,must_not这二个条件不代表所有组合用法,我们仅仅是将这二个条件组合在一起来演示组合搜索。
# post:127.0.0.1:9200/medical/_search
{"query":{"bool":{# bool 多条件"must":{ # must 必须包含"match":{"suitableCrowd":"儿童"}},"must_not":{ # must_not 必须不包含"match":{"suitableCrowd":"老人"}}}}
}
结果:
{"took": 33,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.2876821,"hits": [{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.2876821,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}}]}
}
扩展:
组合搜索还可以使用should来提高相似度。
详情参考:https://blog.csdn.net/chinatopno1/article/details/116061767
权重
权重是一个用数值来表示的概念,在Es中它用’boost‘这个名称来声明。
权重是加在某个查询子句下面的一个属性。
如:
"match": {"hobby": {"query": "音乐","boost": 10}}
权重的数值越大,评分(_score)越高。
那么他们的应用场景是什么呢?
假如用户要在我们的搜索引擎种搜索一种医疗服务,比如正骨按摩。
而有三家提供按摩服务的店家,作为我们的客户给我们掏钱进行竞价排名。分别叫做“百度按摩”,“阿里按摩”,“腾讯按摩”,其中腾讯按摩掏的钱是最多的,阿里其次,百度最次。那么他们的权重就分别依次为100,70,40.
这样一来,当用户搜索’正骨按摩‘时,es搜索出了一大堆和正骨按摩相关的结果,其中就有搜到这三家广告客户提供的服务,那么腾讯按摩的权重为100,意味着评分(_score)也就越高,排名也就越靠前;那么腾讯按摩就排在第一个。
示例:
# post 127.0.0.1:9200/medical/_search
{"query":{"bool":{"must":{# 必须要有suitableCrowd字段值的分词中带有’青少年‘的"match":{"suitableCrowd":"青少年"}},"should":[{ # should是有没有搜索到结果都没关系。如果medicalType中有值为’内服‘的,则设置权重。"match":{"medicalType":{"query":"内服","boost":10 # 设置权重为10.}}}]}}
}
结果:
{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 3.4521847,"hits": [{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 3.4521847,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3W56_XgB2_2kwSyJeq2T","_score": 3.4521847,"_source": {"id": 2,"medicalName": "褪黑素组合片","suitableCrowd": "老人,青少年","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 0.8272065,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 0.8272065,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}}]}
}
我们可以看到,medicalType为内服的被设置了权重10,所以他们的排名都靠在前面。且评分(_score)都为3.4521847。
。
如果不设置权重的话,虽然依据匹配他们还是排名靠前,但是评分(score)不会那么高。
例如:
# 127.0.0.1:9200/medical/_search
{"query":{"bool":{"must":{"match":{"suitableCrowd":"青少年"}},"should":[{"match":{"medicalType":{"query":"内服"}}}]}}
}
结果:
{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 0.8630463,"hits": [{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3W56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 2,"medicalName": "褪黑素组合片","suitableCrowd": "老人,青少年","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 0.8272065,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 0.8272065,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}}]}
}
跟乐乐学ES!(三)ElasticSearch 批量操作与高级查询相关推荐
- Elasticsearch(二、高级查询+集群搭建)
1内容概述 ElasticSearch 高级操作 ElasticSearch 集群管理 2 ElasticSearch高级操作 2.1 bulk批量操作-脚本 脚本: 测试用的5号文档 POST /p ...
- ElasticSearch DSL语言高级查询+SpringBoot
1 环境准备 1.1 Es数据准备 https://gitee.com/zhurongsheng/elasticsearch-data/blob/master/es.data 描述: 执行后查看结果. ...
- 微服务项目之电商--19.ElasticSearch基本、高级查询和 过滤、结果过滤、 排序和聚合aggregations
接上一篇 目录 3.查询 3.1.基本查询: 3.1.1 查询所有(match_all) 3.1.2 匹配查询(match) 3.1.3 多字段查询(multi_match) 3.1.4 词条匹配(t ...
- Elasticsearch 实战2:ES 项目实战(二):基本操作、批处理、高级查询
导读:上篇博客讲到了Java 集成 Spring Data Elasticsearch 的简介.环境搭建和domain 实体类的编写,本篇博客将接着讲解 如何用 Java 实现 es 基本操作.批处理 ...
- 【纯干货】SpringBoot 整合 ES 进行各种高级查询搜索
在上篇 SpringBoot 整合 ElasticSearch 文章中,我们详细的介绍了 ElasticSearch 的索引和文档的基本增删改查的操作方法! 本文将重点介绍 ES 的各种高级查询写法和 ...
- 用python批量更新es数据根据id_Python Elasticsearch批量操作客户端
基于Python实现的Elasticsearch批量操作客户端 by:授客 QQ:1033553122 1.代码用途 Elasticsearch客户端,目的在于实现批量操作,如下: <1> ...
- java操作es聚合操作并显示其他字段_java使用elasticsearch分组进行聚合查询过程解析...
这篇文章主要介绍了java使用elasticsearch分组进行聚合查询过程解析,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下 java连接elas ...
- es(Elasticsearch)客户端Elasticsearch-head安装使用(04Elasticsearch-head安装篇)
背景 elasticsearch-head是一款专门针对于elasticsearch的客户端工具,用来展示数据.elasticsearch-head是基于JavaScript语言编写的,可以使用npm ...
- 帝国CMS7.5基于es(Elasticsearch)7.x的全文搜索插件
帝国CMS7.5基于es(Elasticsearch)7.x的全文搜索插件 - GXECMS博客 一.插件演示地址 后台演示地址:https://ecms.gxecms.cf/e/admin/inde ...
最新文章
- HikariPool 连接池问题
- Opne GL ES 学习心得!
- java虚拟机6.HotSpot的GC实现
- 4.7 程序示例--算法诊断-机器学习笔记-斯坦福吴恩达教授
- JavaScript的运动——弹性运动原理及案例
- 滑动窗口算法_有点难度,几道和「滑动窗口」有关的算法面试题
- mac安装gdb及为gdb进行代码签名
- 一个35岁腾讯产品经理的忠告:在职场,这件事越早做越好
- [收藏]SQL Server 索引结构及其使用
- 深入理解SpringBoot (4)
- solarflare低延迟网卡_动态丨赛灵思收购solarflare,数据优先是重要布局
- 打开与关闭Linux防火墙
- 2019PMP考试专题资料大全
- python 模拟触屏_python一次简单游戏辅助的经历(截取屏幕模拟键盘)
- TensorFlow 2.9的零零碎碎(五)-模型编译
- 位、字节、字符数的关系
- 元宇宙持续火爆,各地纷纷布局元宇宙
- 韩语学习之——韩语基础入门第二课基本辅音
- 用matlab表白,你有一颗爱她的心,你就画出来
- WAS下Sanp、heapdump、javacore