上一篇文章:跟乐乐学ES!(二)ElasticSearch基础。
下一篇文章:跟乐乐学ES!(四) java中ElasticSearch客户端的使用。

批量操作

有些增删改查操作是可以进行批量操作的,以此来达到节省网络请求的目的。
对于批量操作的请求格式,是由action和request body 这两种Json来组合完成的。
格式如下:

{ action: { metadata }}
{ request body    }
{ action: { metadata }}
{ request body    }

关于action和request body的格式和意义,直接发上述的格式文本大家可能看不懂,那么我以批量添加的json截图为例子,便于理解,截图如下:

准备数据

批量查询

url后面加上 ‘_mget’

POST方式:localhost:9200/索引名/类型名/_mget
例如:

{"ids":["1","2","3"]
}

解释:
批量查询类型中id为’1’和’2’的文档。
结果:

{"docs": [{"_index": "userinfo","_type": "us","_id": "1","_version": 3,"_seq_no": 4,"_primary_term": 1,"found": true,"_source": {"query": {"match": {"phonenumber": 150}},"address": "南京","name": "张三","phonenumber": "13000000001","softid": "9999","age": 23}},{"_index": "userinfo","_type": "us","_id": "2","_version": 1,"_seq_no": 1,"_primary_term": 1,"found": true,"_source": {"softid": 9998,"name": "李四","phonenumber": 15000000000,"age": 18,"address": "南京"}},{"_index": "userinfo","_type": "us","_id": "3","found": false}]
}

值得注意的是,id为3的文档并不存在,因此在第三列的结果中found的值为false。
所以当你批量查询的目标中,存在一个不存在的文档,es会响应相应的json数据来告诉你。

批量插入 _bulk

批量插入的action有三种,分别为Update/Create/Index
三者的区别可以参考这里:https://blog.csdn.net/xiaoyu_BD/article/details/81914567

post方式: localhost:9200/索引/类型/_bulk

OR.create方式插入

json请求体:

{"create":{"_index":"userinfo","_type":"us","_id":"5"}} # 将地址为滨松,姓名为三好炎男的文档,创建到'userinfo'索引的'us'类型中,id为5.
{"address":"滨松","name":"三好炎男","phonenumber":"08026073443","softid":"9996","age":17}
{"create":{"_index":"userinfo","_type":"us","_id":"6"}}
{"address":"洛阳","name":"乐乐","phonenumber":"13000000002","softid":"9995","age":24}
{"create":{"_index":"userinfo","_type":"us","_id":"7"}}
{"address":"北京","name":"孔德明","phonenumber":"13000000003","softid":"9994","age":20}

注意:最后一行必须为换行符空出来的一行,不然es无法识别。

结果:

{"took": 99,"errors": false,"items": [{"create": {"_index": "userinfo","_type": "us","_id": "5","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 5,"_primary_term": 2,"status": 201}},{"create": {"_index": "userinfo","_type": "us","_id": "6","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 6,"_primary_term": 2,"status": 201}},{"create": {"_index": "userinfo","_type": "us","_id": "7","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 7,"_primary_term": 2,"status": 201}}]
}

OR.index方式插入

json请求体:

# 127.0.0.1:9200/mytest/_doc/_bulk
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"洛阳","age":24,"name":"乐乐","regtime":"2021-04-20","sex":true,"userid":1}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"南京","age":18,"name":"张三","regtime":"2021-01-30","sex":true,"userid":2}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"北京","age":21,"name":"李桂花","regtime":"2020-12-02","sex":false,"userid":3}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"郑州","age":19,"name":"王翠英","regtime":"2021-03-21","sex":false,"userid":4}
{"index":{"_index":"mytest","_type":"_doc"}}
{"address":"北京","age":22,"name":"李斌","regtime":"2020-08-12","userid":5}

注意:最后一行必须为换行符空出来的一行,不然es无法识别。

批量删除

此处的话,需要将’create‘改为delete,即可实现批量删除。
post方式: localhost:9200/索引/类型/_bulk

json请求体示例:

{"delete":{"_index":"userinfo","_type":"us","_id":"5"}}
{"delete":{"_index":"userinfo","_type":"us","_id":"6"}}
{"delete":{"_index":"userinfo","_type":"us","_id":"7"}}

注意:批量删除时,同样需要最后一行为换行符。

结果:

{"took": 44,"errors": false,"items": [{"delete": {"_index": "userinfo","_type": "us","_id": "5","_version": 2,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 8,"_primary_term": 2,"status": 200}},{"delete": {"_index": "userinfo","_type": "us","_id": "6","_version": 2,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 9,"_primary_term": 2,"status": 200}},{"delete": {"_index": "userinfo","_type": "us","_id": "7","_version": 2,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 10,"_primary_term": 2,"status": 200}}]
}

分页查询

在ElasticSearch中,分页查询用size和from两个参数来表示。对应着sql中的limit。
get方式: localhost:9200/索引/类型/_search?size=每页显示条数&from=从第几个开始

size代表每页显示多少个。
from代表跳过几条文档再开始查找,起始值为0。

例如:
127.0.0.1:9200/userinfo/us/_search?size=3&from=0
结果:

{"took": 10,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "userinfo","_type": "us","_id": "JMHW53gBu0CFfFx2dazM","_score": 1.0,"_source": {"softid": 9997,"name": "王五","phonenumber": 15000000001,"age": 20,"address": "北京"}},{"_index": "userinfo","_type": "us","_id": "2","_score": 1.0,"_source": {"softid": 9998,"name": "李四","phonenumber": 15000000000,"age": 18,"address": "南京"}},{"_index": "userinfo","_type": "us","_id": "1","_score": 1.0,"_source": {"query": {"match": {"phonenumber": 150}},"address": "南京","name": "张三","phonenumber": "13000000001","softid": "9999","age": 23}}]}
}

深度分页的注意事项。

应该当心分页太深或者一次请求太多的结果。结果在返回前会被排序。但是记住一个搜索请求常常涉及多个分
片。每个分片生成自己排好序的结果,它们接着需要集中起来排序以确保整体排序正确。
为了理解为什么深度分页是有问题的,让我们假设在一个有5个主分片的索引中搜索。当我们请求结果的第一
页(结果1到10)时,
每个分片产生自己最顶端10个结果然后返回它们给请求节点(requesting node),它再排序这所有的50个结果以选出顶端的10个结果
现在假设我们请求第1000页,需要显示的结果为1001行到1010行的文档。
那么和请求前面10个结果时一样,这时会相应的、每个分片都必须产生顶端的1010个结果,然后请求节点排序这5050个结果并丢弃5040个!
所以在分布式系统中,排序结果的花费随着分页的深入而成倍增长。
这也是为什么网络搜索引擎中任何语句不能返回多于1000个结果的原因。
换而言之,我们在分页查询时,每页的查询数量(size)不应该太多。

结构化查询

准备数据


固定请求格式:
post方式:localhost:9200/索引名/类型名/_search

term查询(精准匹配)

term 主要用于精确匹配哪些值,比如数字,日期,布尔值或 not_analyzed 的字符串(未经分析的文本数据类型)
post方式:localhost:9200/索引名/类型名/_search
127.0.0.1:9200/mytest/_doc/_search
示例:

{"query":{"term":{"name":"乐乐"}}
}

结果:

{"took": 10,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.6931471,"hits": [{"_index": "mytest","_type": "_doc","_id": "DkSf7ngBqYrwxW-GsG6_","_score": 0.6931471,"_source": {"address": "洛阳","age": 24,"name": "乐乐","regtime": "2021-04-20","sex": true,"userid": 1}}]}
}

terms查询(多个值精确匹配)

terms是term的加强版。term只可以对一个字段进行指定单个值来匹配,而terms可以对一个字段指定多个值来匹配。
如果指定的多个值当中,有的值并不能匹配到某行文档,那将只会响应匹配到的那些结果。
示例:

{"query":{"terms":{"age":[18,19,22] # 年龄为22的用户并不存在}}
}

结果:

{"took": 13,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "D0Sf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "南京","age": 18,"name": "张三","regtime": "2021-01-30","sex": true,"userid": 2}},{"_index": "mytest","_type": "_doc","_id": "EUSf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "郑州","age": 19,"name": "王翠英","regtime": "2021-03-21","sex": false,"userid": 4}}]}
}

range查询(范围查询)

用于查询范围,当字段的数据类型为数字时,多配合’lt(小于)'和’gt(大于)'来指定。
范围操作符有:
gt 大于
gte 大于等于
lt 小于
lte 小于等于

示例:

{"query":{"range":{ # 声明为范围查询"age":{ # 查询字段age"gt":20,# 大于"lt":24 # 小于}}}
}

结果:

{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "EESf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "北京","age": 21,"name": "李桂花","regtime": "2020-12-02","sex": false,"userid": 3}}]}
}

exists非空查询(is not Null)

用于筛选一个类型中,某个字段不为空的所有文档。
相当于sql语法中的 is not Null;
json请求体:

# post:127.0.0.1:9200/mytest/_doc/_search{"query":{"exists":{"field":"sex" #请求查询,'_doc'类型中,'sex'字段值不为空的文档。}}
}

这个exists就相当于以下sql语句:
select * from _doc where sex is not null;

图中我们可以看到,姓名为‘李斌’的这一栏没有写上性别。所以下面的响应结果中就不存在这条文档。

{"took": 17,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 1.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "DkSf7ngBqYrwxW-GsG6_","_score": 1.0,"_source": {"address": "洛阳","age": 24,"name": "乐乐","regtime": "2021-04-20","sex": true,"userid": 1}},{"_index": "mytest","_type": "_doc","_id": "EESf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "北京","age": 21,"name": "李桂花","regtime": "2020-12-02","sex": false,"userid": 3}},{"_index": "mytest","_type": "_doc","_id": "D0Sf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "南京","age": 18,"name": "张三","regtime": "2021-01-30","sex": true,"userid": 2}},{"_index": "mytest","_type": "_doc","_id": "EUSf7ngBqYrwxW-GsG7A","_score": 1.0,"_source": {"address": "郑州","age": 19,"name": "王翠英","regtime": "2021-03-21","sex": false,"userid": 4}}]}
}

match(标准查询)

match是ElasticSearch中一种标准查询,诸如精确查询,模糊查询,全文本(String,text这样的)查询等场景几乎都需要用到它。
只是,如果你使用 match 查询一个全文本字段,它会在真正查询之前用分析器先分析 match 一下查询字符。
例如:

{"query":{"match":{"address":"张三"}}
}

如果用match下指定了一个确切值,在遇到数字、日期、布尔值或者 not_analyzed的字符串时,它将为你搜索你给定的值。
例如:

{"query":{"match":{"sex":true}}
}
{"query":{"match":{"age":18}}
}

过滤查询

使用过滤查询

post方式:localhost:9200/索引名/类型名/_search

过滤查询要比其它查询的效率要高。
match,term,range等可以用在过滤查询之中。

# post:127.0.0.1:9200/mytest/_doc/_search
{"query":{"bool":{ # 定义条件"filter":{ # 声明使用过滤查询"match":{ # 查询方式"address":"北京"}}}}
}

结果:

{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 2,"relation": "eq"},"max_score": 0.0,"hits": [{"_index": "mytest","_type": "_doc","_id": "EESf7ngBqYrwxW-GsG7A","_score": 0.0,"_source": {"address": "北京","age": 21,"name": "李桂花","regtime": "2020-12-02","sex": false,"userid": 3}},{"_index": "mytest","_type": "_doc","_id": "olw28ngBtFmPtmNtGLrS","_score": 0.0,"_source": {"address": "北京","age": 22,"name": "李斌","regtime": "2020-08-12","userid": 5}}]}
}

为什么过滤语句比查询语句好?

  • 一条过滤语句会询问每个文档的字段值是否包含着特定值。
  • 查询语句会询问每个文档的字段值与特定值的匹配程度如何
    • 一条查询语句会计算每个文档与查询语句的相关性,会给出一个相关性评分 _score,并且 按照相关性对匹配到的文档进行排序。 这种评分方式非常适用于一个没有完全配置结果的全文本搜索。
  • 一个简单的文档列表,快速匹配运算并存入内存是十分方便的, 每个文档仅需要1个字节。这些缓存的过滤结果集与后续请求的结合使用是非常高效的。
  • 查询语句不仅要查找相匹配的文档,还需要计算每个文档的相关性,所以一般来说查询语句要比过滤语句更耗时,并且查询结果也不可缓存
  • 建议:
    做精确匹配搜索时,最好用过滤语句,因为过滤语句可以缓存数据。

分词

什么是分词?

分词就是对于一个句子,根据词汇等规则把整个句子分为多个单词。
也叫做文本分析,在ElasticSearch中称之为Analysis。
例如:
我昨天吃完晚饭就去直接睡觉了----> 我/昨天/吃/完/晚饭/就/去/直接/睡觉/了

使用分词

单词‘analyze’在es语法中代表使用分词的意思。
同时,还存在‘分词器’这种概念,这是分词要依照世界各国语言的不同来选择符合各自需求的分词器。
常用的英文分词器有’standard‘标准分词器,它也是es中默认的分词器。
别急,我们先不介绍中文分词器的应用,这里我们举例用英文分词器’standard‘来进行分词。

普通方式分词

post方式: localhost:9200/_analyze

# 127.0.0.1:9200/_analyze
{"analyzer":"standard",# 指定分词器为标准分词器'standard'"text":"you are hero"# 要分词的内容本文。
}

结果:

{"tokens": [{"token": "you","start_offset": 0,"end_offset": 3,"type": "<ALPHANUM>","position": 0},{"token": "are","start_offset": 4,"end_offset": 7,"type": "<ALPHANUM>","position": 1},{"token": "hero","start_offset": 8,"end_offset": 12,"type": "<ALPHANUM>","position": 2}]
}

指定索引进行分词

post方式: localhost:9200/索引名/_analyze

# 127.0.0.1:9200/mytest/_analyze
{"analyzer":"standard","text":"hello world"
}

结果:

{"tokens": [{"token": "hello","start_offset": 0,"end_offset": 5,"type": "<ALPHANUM>","position": 0},{"token": "world","start_offset": 6,"end_offset": 11,"type": "<ALPHANUM>","position": 1}]
}

使用中文分词

中文分词的难点就在于不能像英语那样,根据空格来区分句子中的词汇。
常用中文分词器,IK、jieba、THULAC等,我个人推荐使用IK分词器。

ik分词器的介绍

linux环境下安装ik分词器的话,具体参考这篇文章:https://zhuanlan.zhihu.com/p/98845218

windows环境下安装ik分词器的话,具体参考这篇文章:
https://blog.csdn.net/fgx_123456/article/details/108800699
我是windows环境,要注意哦,自己下载的ik分词器版本一定要和自己的ElasticSearch版本相对应

ik分词器的配置与使用。

ik分词器具有多种分词规则。所谓多种分词规则,就是对中文语句具有不同程度的分词粒度。例如k_max_wordik_smart这两种分词规则。

  • k_max_word:会将文本做最细粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、中华人民、中华、华人、人民共和国、人民、人、民、共和国、共和、和、国国、国歌」,会穷尽各种可能的组合
  • ik_smart:会将文本做最粗粒度的拆分,例如「中华人民共和国国歌」会被拆分为「中华人民共和国、国歌」

关于ik分词器的更多配置说明,推荐参考这篇文章:https://www.cnblogs.com/haixiang/p/11810799.html

接下来我们开始尝试中文分词,虽说换了分词器,但是请求方式和url规则是不变的。
示例一:

# 127.0.0.1:9200/mytest/_analyze
{"analyzer":"ik_max_word",# ik_max_word是ik分词器的一种。"text":"你确定你说的正确吗"
}

结果:

{"tokens": [{"token": "你","start_offset": 0,"end_offset": 1,"type": "CN_CHAR","position": 0},{"token": "确定","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 1},{"token": "你","start_offset": 3,"end_offset": 4,"type": "CN_CHAR","position": 2},{"token": "说","start_offset": 4,"end_offset": 5,"type": "CN_CHAR","position": 3},{"token": "的","start_offset": 5,"end_offset": 6,"type": "CN_CHAR","position": 4},{"token": "正确","start_offset": 6,"end_offset": 8,"type": "CN_WORD","position": 5},{"token": "吗","start_offset": 8,"end_offset": 9,"type": "CN_CHAR","position": 6}]
}

示例二:

{"analyzer":"ik_smart",# 此处选择ik_smart规则来进行分词"text":"我希望世界大同,人类幸福安康"
}

结果:

{"tokens": [{"token": "我","start_offset": 0,"end_offset": 1,"type": "CN_CHAR","position": 0},{"token": "希望","start_offset": 1,"end_offset": 3,"type": "CN_WORD","position": 1},{"token": "世界大同","start_offset": 3,"end_offset": 7,"type": "CN_WORD","position": 2},{"token": "人类","start_offset": 8,"end_offset": 10,"type": "CN_WORD","position": 3},{"token": "幸福","start_offset": 10,"end_offset": 12,"type": "CN_WORD","position": 4},{"token": "安康","start_offset": 12,"end_offset": 14,"type": "CN_WORD","position": 5}]
}

全文搜索(分词字段)

概念

关于全文搜索,标准化的解释如下:

  • 相关性( Relevance) 它是评价查询与其结果间的相关程度,并根据这种相关程度对结果排名的一种能力,这 种计算方式可以是 TF/IDF 方法、地理位置邻近、模糊相似,或其他的某些算法。
  • 分词( Analysis) 它是将文本块转换为有区别的、规范化的 token 的一个过程,目的是为了创建倒排索引以及 查询倒排索引。

那么通俗地来解释的话,就是在创建索引时,为其某个字段指定一个分词器,让该字段能够被分词查询,且该字段的值普遍是句子或多个词汇组成的。

准备数据

创建结构化索引,这个索引中的‘suitableCrowd’(适宜人群)字段被指定了中文分词器,分词规则为最细分词粒度的k_max_word

# put:127.0.0.1:9200/medical   创建索引 ‘医疗’
{"settings":{"index":{"number_of_shards":"5",# 分片数量为5"number_of_replicas":"0"# 副本数量为0}},"mappings":{"properties":{"id":{ # id字段"type":"integer"},"medicalName":{ # 医疗名称字段"type":"keyword"},"suitableCrowd":{ # 适宜人群字段 ,因为适宜人群可以是一个或多个,所以此处为其绑定了ik中文分词器"type":"text","analyzer":"ik_max_word"},"medicalType":{ # 医疗类型"type":"keyword"}}}
}

插入数据

# post:127.0.0.1:9200/medical/_bulk
{"index":{"_index":"medical","_type":"_doc"}}
{"id":1,"medicalName":"新冠疫苗","suitableCrowd":"老人,青少年,儿童","medicalType":"针剂"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":2,"medicalName":"褪黑素组合片","suitableCrowd":"老人,青少年","medicalType":"内服"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":3,"medicalName":"妇炎洁","suitableCrowd":"女性","medicalType":"外用"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":4,"medicalName":"跌打镇痛膏","suitableCrowd":"老人,青少年,儿童","medicalType":"外用"}
{"index":{"_index":"medical","_type":"_doc"}}
{"id":5,"medicalName":"清热解毒胶囊","suitableCrowd":"青少年,儿童","medicalType":"内服"}

单词搜索

# post:127.0.0.1:9200/medical/_search
{"query":{"match":{"suitableCrowd":"儿童"}}
}

结果:

{"took": 16,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 0.41360325,"hits": [{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 0.41360325,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 0.41360325,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}},{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.2876821,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}}]}
}

解释:
搜索出适用范围(suitableCrowd)包括小孩的医疗产品。

过程说明:

  1. 检查字段类型
    suitableCrowd 字段是一个 text 类型( 指定了IK分词器),这意味着查询字符串本身也应该被分词。
  2. 分析查询字符串 。
    将查询的字符串 “儿童” 传入IK分词器中,输出的结果是单个项 儿童。因为只有一个单词项,所以 match 查询执行的是单个底层 term 查询
  3. 查找匹配文档 。
    用 term 查询在倒排索引中查找 “suitableCrowd” 然后获取一组包含该项的文档,本例的结果是文档:1,2,3
  4. 为每个文档评分。用term查询计算每个文档相关度评分 _score,这是种将词频(term frequency,即词“儿童”在相关文档的suitableCrowd字段中出现的频率)和 反向文档频率(inverse document frequency,即词 “儿童” 在所有文档的 suitableCrowd字段中出现的频率),以及字段的长度(即字段越短相关度越高)相结合的计算方式。

多词搜索

多词搜索是指对某个字段指定多个分词来搜索。

OR形式

# post:127.0.0.1:9200/medical/_search
{"query":{"match":{# match是标准查询,你也可以换为term。"suitableCrowd":"儿童 青少年"# 指定搜索该字段中包含‘儿童’或‘青少年’的文档}}
}

结果:

{"took": 17,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 1.2408097,"hits": [{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 1.2408097,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 1.2408097,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}},{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3W56_XgB2_2kwSyJeq2T","_score": 0.5753642,"_source": {"id": 2,"medicalName": "褪黑素组合片","suitableCrowd": "老人,青少年","medicalType": "内服"}}]}
}

我们可以看到,产品‘褪黑素组合片’,中包含有‘青少年’;‘清热解毒胶囊’包含有’儿童‘。所以他们都被归纳为结果响应过来了。

AND形式(operator)

如果我们想要查询suitableCrowd字段既包含’青少年‘,又包含’儿童‘的文档。那么就需要用到AND形式。
在ElasticSearch中,我们可以通过operator来指定词与词之间的逻辑关系。

# post:127.0.0.1:9200/medical/_search
{"query":{"match":{"suitableCrowd":{# 查询suitableCrowd字段"query":"青少年 儿童",# 通过query指定要匹配的词"operator":"and"# 通过operator来指定逻辑关系,是and还是or或是其它}}}
}

结果:

{"took": 14,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": 1.2408097,"hits": [{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 1.2408097,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 1.2408097,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}},{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}}]}
}

组合搜索

组合搜索就是说将多种条件规则用于分词字段。
以下举例中使用到的must,must_not这二个条件不代表所有组合用法,我们仅仅是将这二个条件组合在一起来演示组合搜索。

# post:127.0.0.1:9200/medical/_search
{"query":{"bool":{# bool 多条件"must":{ # must 必须包含"match":{"suitableCrowd":"儿童"}},"must_not":{ # must_not 必须不包含"match":{"suitableCrowd":"老人"}}}}
}

结果:

{"took": 33,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 1,"relation": "eq"},"max_score": 0.2876821,"hits": [{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.2876821,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}}]}
}

扩展:
组合搜索还可以使用should来提高相似度。
详情参考:https://blog.csdn.net/chinatopno1/article/details/116061767

权重


权重是一个用数值来表示的概念,在Es中它用’boost‘这个名称来声明。
权重是加在某个查询子句下面的一个属性。
如:

"match": {"hobby": {"query": "音乐","boost": 10}}

权重的数值越大,评分(_score)越高。
那么他们的应用场景是什么呢?
假如用户要在我们的搜索引擎种搜索一种医疗服务,比如正骨按摩。
而有三家提供按摩服务的店家,作为我们的客户给我们掏钱进行竞价排名。分别叫做“百度按摩”,“阿里按摩”,“腾讯按摩”,其中腾讯按摩掏的钱是最多的,阿里其次,百度最次。那么他们的权重就分别依次为100,70,40.
这样一来,当用户搜索’正骨按摩‘时,es搜索出了一大堆和正骨按摩相关的结果,其中就有搜到这三家广告客户提供的服务,那么腾讯按摩的权重为100,意味着评分(_score)也就越高,排名也就越靠前;那么腾讯按摩就排在第一个。

示例:

# post 127.0.0.1:9200/medical/_search
{"query":{"bool":{"must":{# 必须要有suitableCrowd字段值的分词中带有’青少年‘的"match":{"suitableCrowd":"青少年"}},"should":[{ # should是有没有搜索到结果都没关系。如果medicalType中有值为’内服‘的,则设置权重。"match":{"medicalType":{"query":"内服","boost":10 # 设置权重为10.}}}]}}
}

结果:

{"took": 9,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 3.4521847,"hits": [{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 3.4521847,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3W56_XgB2_2kwSyJeq2T","_score": 3.4521847,"_source": {"id": 2,"medicalName": "褪黑素组合片","suitableCrowd": "老人,青少年","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 0.8272065,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 0.8272065,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}}]}
}

我们可以看到,medicalType为内服的被设置了权重10,所以他们的排名都靠在前面。且评分(_score)都为3.4521847。


如果不设置权重的话,虽然依据匹配他们还是排名靠前,但是评分(score)不会那么高。
例如:

# 127.0.0.1:9200/medical/_search
{"query":{"bool":{"must":{"match":{"suitableCrowd":"青少年"}},"should":[{"match":{"medicalType":{"query":"内服"}}}]}}
}

结果:

{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4,"relation": "eq"},"max_score": 0.8630463,"hits": [{"_index": "medical","_type": "_doc","_id": "4G56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 5,"medicalName": "清热解毒胶囊","suitableCrowd": "青少年,儿童","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3W56_XgB2_2kwSyJeq2T","_score": 0.8630463,"_source": {"id": 2,"medicalName": "褪黑素组合片","suitableCrowd": "老人,青少年","medicalType": "内服"}},{"_index": "medical","_type": "_doc","_id": "3G56_XgB2_2kwSyJeq2Q","_score": 0.8272065,"_source": {"id": 1,"medicalName": "新冠疫苗","suitableCrowd": "老人,青少年,儿童","medicalType": "针剂"}},{"_index": "medical","_type": "_doc","_id": "3256_XgB2_2kwSyJeq2T","_score": 0.8272065,"_source": {"id": 4,"medicalName": "跌打镇痛膏","suitableCrowd": "老人,青少年,儿童","medicalType": "外用"}}]}
}

跟乐乐学ES!(三)ElasticSearch 批量操作与高级查询相关推荐

  1. Elasticsearch(二、高级查询+集群搭建)

    1内容概述 ElasticSearch 高级操作 ElasticSearch 集群管理 2 ElasticSearch高级操作 2.1 bulk批量操作-脚本 脚本: 测试用的5号文档 POST /p ...

  2. ElasticSearch DSL语言高级查询+SpringBoot

    1 环境准备 1.1 Es数据准备 https://gitee.com/zhurongsheng/elasticsearch-data/blob/master/es.data 描述: 执行后查看结果. ...

  3. 微服务项目之电商--19.ElasticSearch基本、高级查询和 过滤、结果过滤、 排序和聚合aggregations

    接上一篇 目录 3.查询 3.1.基本查询: 3.1.1 查询所有(match_all) 3.1.2 匹配查询(match) 3.1.3 多字段查询(multi_match) 3.1.4 词条匹配(t ...

  4. Elasticsearch 实战2:ES 项目实战(二):基本操作、批处理、高级查询

    导读:上篇博客讲到了Java 集成 Spring Data Elasticsearch 的简介.环境搭建和domain 实体类的编写,本篇博客将接着讲解 如何用 Java 实现 es 基本操作.批处理 ...

  5. 【纯干货】SpringBoot 整合 ES 进行各种高级查询搜索

    在上篇 SpringBoot 整合 ElasticSearch 文章中,我们详细的介绍了 ElasticSearch 的索引和文档的基本增删改查的操作方法! 本文将重点介绍 ES 的各种高级查询写法和 ...

  6. 用python批量更新es数据根据id_Python Elasticsearch批量操作客户端

    基于Python实现的Elasticsearch批量操作客户端 by:授客 QQ:1033553122 1.代码用途 Elasticsearch客户端,目的在于实现批量操作,如下: <1> ...

  7. java操作es聚合操作并显示其他字段_java使用elasticsearch分组进行聚合查询过程解析...

    这篇文章主要介绍了java使用elasticsearch分组进行聚合查询过程解析,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下 java连接elas ...

  8. es(Elasticsearch)客户端Elasticsearch-head安装使用(04Elasticsearch-head安装篇)

    背景 elasticsearch-head是一款专门针对于elasticsearch的客户端工具,用来展示数据.elasticsearch-head是基于JavaScript语言编写的,可以使用npm ...

  9. 帝国CMS7.5基于es(Elasticsearch)7.x的全文搜索插件

    帝国CMS7.5基于es(Elasticsearch)7.x的全文搜索插件 - GXECMS博客 一.插件演示地址 后台演示地址:https://ecms.gxecms.cf/e/admin/inde ...

最新文章

  1. HikariPool 连接池问题
  2. Opne GL ES 学习心得!
  3. java虚拟机6.HotSpot的GC实现
  4. 4.7 程序示例--算法诊断-机器学习笔记-斯坦福吴恩达教授
  5. JavaScript的运动——弹性运动原理及案例
  6. 滑动窗口算法_有点难度,几道和「滑动窗口」有关的算法面试题
  7. mac安装gdb及为gdb进行代码签名
  8. 一个35岁腾讯产品经理的忠告:在职场,这件事越早做越好
  9. [收藏]SQL Server 索引结构及其使用
  10. 深入理解SpringBoot (4)
  11. solarflare低延迟网卡_动态丨赛灵思收购solarflare,数据优先是重要布局
  12. 打开与关闭Linux防火墙
  13. 2019PMP考试专题资料大全
  14. python 模拟触屏_python一次简单游戏辅助的经历(截取屏幕模拟键盘)
  15. TensorFlow 2.9的零零碎碎(五)-模型编译
  16. 位、字节、字符数的关系
  17. 元宇宙持续火爆,各地纷纷布局元宇宙
  18. 韩语学习之——韩语基础入门第二课基本辅音
  19. 用matlab表白,你有一颗爱她的心,你就画出来
  20. WAS下Sanp、heapdump、javacore

热门文章

  1. 技术前沿与经典文章17:历史上54位伟大物理学家、科学家的专属LOGO(三)
  2. JavaScript弹出对话框的三种方法
  3. 关于Leaflet打印地图的三种方法
  4. 【活动预告】金融大数据治理实践分享(12/03)
  5. jsp----前后端分离---框架---web应用的构建
  6. 左值右值将亡值泛左值
  7. PTA 7-33 空心字母金字塔(StringBuilder应用实例)
  8. 在线支付——微信支付宝二维码合一
  9. imx6q修改LVDS接口数据格式
  10. web 体系结构_Web服务体系结构概述