前言

本学习笔记主要基于 阅读Elasticsearch7.17版本官方文档和实操总结而来,官方文档地址https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index.html

目录

一、ES的存储形式

二、使用方式

2.1 向ES中添加文档

2.2 搜索

2.3 get specific fields

2.4 范围查询

2.5 extract fields from unstrctured content  从非结构化内容中提取fields

2.6 Combine queries 组合查询

2.7 Aggregate data  聚合数据

2.8 图解一个请求​

2.9 field data type  字段包括哪些类型

2.10 解释 结构化数据、非结构化数据、半结构化数据

2.11 term和match区别

三、Query DSL

3.1 dis_max 分离最大化

3.2 boosting query

3.3 constant_score

3.4 function_score query  用户自定义score机制

3.5 intervals query  间隔查询

3.6 match query

3.7 combined_feilds 多个字段

3.8 multi_match

3.9 query_string

3.10 joining query

3.11 percolate query

3.12 rank_feature

3.13 pinned query

3.14 fuzzy query

3.15 exist

3.16 wildcard query  通配符查询

结语


一、ES的存储形式

1.Elasticsearch stores complex data structures that have been serialized as JSON documents

ES存储已序列化为JSON文档的复杂数据结构

2.When a document is stored, it is indexed

当文档被存储时,它会被建立索引

3.An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in

倒排索引列出任何文档中出现的每个惟一单词,并标识每个单词出现的所有文档

4.An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data

可以将索引看作是文档的优化集合,每个文档都是字段的集合,字段是包含数据的键-值对

5.term 是中文 ‘术语或者条款 / 项’ 的意思

match 是‘匹配’的意思,复数是matches

extract ‘提取/索取’

shard 分片

6.The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two.  Structured queries are similar to the types of queries you can construct in SQL.  For example, you could search the gender and age fields in your employee index and sort the matches by the hire_date field.  Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms

Elasticsearch REST api支持结构化查询、全文查询和将两者结合起来的复杂查询。结构化查询类似于可以在SQL中构造的查询类型。例如,您可以在员工索引中搜索性别和年龄字段,并根据hire_date字段对匹配进行排序。全文查询查找与查询字符串匹配的所有文档,并按相关性(它们与搜索词的匹配程度)排序返回它们

二、使用方式

2.1 向ES中添加文档

add single document 添加单个文档

向ES服务器发送这个请求body

POST logs-my_app-default/_doc
{"@timestamp": "2099-05-06T16:21:15.000Z","event": {"original": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"}
}

得到响应,其中

_index 包含存储的文档

_id是文档在索引中的唯一id

{"_index": ".ds-logs-my_app-default-2099-05-06-000001","_type": "_doc","_id": "gl5MJXMBMk1dGnErnBW8","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1
}

一条request请求 添加多个文档

使用_bulk在首行末端 ,多个文档间需换行,每个文档都是json格式

示例:

PUT logs-my_app-default/_bulk
{ "create": { } }
{ "@timestamp": "2099-05-07T16:24:32.000Z", "event": { "original": "192.0.2.242 - - [07/May/2020:16:24:32 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0" } }
{ "create": { } }
{ "@timestamp": "2099-05-08T16:25:42.000Z", "event": { "original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" } }

2.2 搜索

此条请求,将匹配logs-my_app-default中的所有日志条目,并按@timestamp降序对它们进行排序

GET logs-my_app-default/_search
{"query": {"match_all": { }},"sort": [{"@timestamp": "desc"}]
}

得到response如下,其中

默认情况下hits部分最多包括与搜索匹配的前10个文档。每个命中的_source是已提交了索引的原始数据(json)

{"took": 2,"timed_out": false,"_shards": {"total": 1,"successful": 1,"skipped": 0,"failed": 0},"hits": {"total": {"value": 3,"relation": "eq"},"max_score": null,"hits": [{"_index": ".ds-logs-my_app-default-2099-05-06-000001","_type": "_doc","_id": "PdjWongB9KPnaVm2IyaL","_score": null,"_source": {"@timestamp": "2099-05-08T16:25:42.000Z","event": {"original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638"}},"sort": [4081940742000]},...]}
}

2.3 get specific fields

解析整个_source对于大型文档来说是笨拙的。要从响应中排除它,请将_source参数设置为false。作为代替,使用fields参数来检索所需的字段

示例:

GET logs-my_app-default/_search
{"query": {"match_all": { }},"fields": ["@timestamp"],"_source": false,"sort": [{"@timestamp": "desc"}]
}

response以平面数组的形式包含每个命中的field值, 区别与上次搜索可观察fields和_score的内容

{..."hits": {..."hits": [{"_index": ".ds-logs-my_app-default-2099-05-06-000001","_type": "_doc","_id": "PdjWongB9KPnaVm2IyaL","_score": null,"fields": {"@timestamp": ["2099-05-08T16:25:42.000Z"]},"sort": [4081940742000]},...]}
}

2.4 范围查询

在query中使用range关键字

gte: greater than or equal 大于等于

lt: less than 小于

GET logs-my_app-default/_search
{"query": {"range": {"@timestamp": {"gte": "2099-05-05","lt": "2099-05-08"}}},"fields": ["@timestamp"],"_source": false,"sort": [{"@timestamp": "desc"}]
}

可以使用date math 来定义相对时间范围。下面的查询是 搜索过去一天的数据,而不是 去匹配logs-my_app-default中的任何日志条目(上一条请求会去匹配logs-my_app-default中时间戳字段的内容去搜索)。注意观察gte和lt 内容中使用了所谓的date math

GET logs-my_app-default/_search
{"query": {"range": {"@timestamp": {"gte": "now-1d/d","lt": "now/d"}}},"fields": ["@timestamp"],"_source": false,"sort": [{"@timestamp": "desc"}]
}

2.5 extract fields from unstrctured content  从非结构化内容中提取fields

这种搜索 用到了映射,讲 从非结构化内容中提取fields之前,先说一下mapping映射

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.

Each document is a collection of fields, which each have their own data type.  When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.  A mapping definition also includes metadata fields, like the _source field, which customize how a document’s associated metadata is handled.

Use dynamic mapping and explicit mapping to define your data.  Each method provides different benefits based on where you are in your data journey.  For example, explicitly map fields where you don’t want to use the defaults, or to gain greater control over which fields are created.  You can then allow Elasticsearch to add other fields dynamically.

映射是定义文档及其包含的字段如何存储和索引的过程。

每个文档都是字段的集合,每个字段都有自己的数据类型。当映射数据时,创建映射定义,其中包含与文档相关的字段列表。映射定义还包括元数据字段,如_source字段,它自定义如何处理文档关联的元数据。

使用动态映射和显式映射来定义数据。每种方法根据您在数据旅程中的位置提供不同的好处。例如,显式地映射不希望使用默认值的字段,或者获得对创建哪些字段的更大控制。然后,您可以允许Elasticsearch 动态 添加其他字段。

Experiment with mapping options

Define runtime fields in a search request to experiment with different mapping options, and also fix mistakes in your index mapping values by overriding values in the mapping during the search request.

通过映射选项进行试验

在搜索请求中定义 runtime fields,以试验不同的映射选项,并通过在搜索请求期间覆盖映射中的值来修复索引映射值中的错误

搜索语句示例:

其中runtime_mappings部分用到了映射,其中进行了获取source.ip的操作。而后又会在响应中将source.ip放在fields中

GET logs-my_app-default/_search
{"runtime_mappings": {"source.ip": {"type": "ip","script": """String sourceip=grok('%{IPORHOST:sourceip} .*').extract(doc[ "event.original" ].value)?.sourceip;if (sourceip != null) emit(sourceip);"""}},"query": {"range": {"@timestamp": {"gte": "2099-05-05","lt": "2099-05-08"}}},"fields": ["@timestamp","source.ip"],"_source": false,"sort": [{"@timestamp": "desc"}]
}

响应如下

{"took" : 4,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "J1zs312345B1SeF7g53S","_score" : null,"fields" : {"@timestamp" : ["2099-05-07T16:24:32.000Z"],"source.ip" : ["192.1.2.122"]},"sort" : [4081854272000]},{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "Ilzr3IMBd9B43217032T","_score" : null,"fields" : {"@timestamp" : ["2099-05-06T16:21:15.000Z"],"source.ip" : ["192.1.2.122"]},"sort" : [4081767675000]},{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "Jlz567Bd9B1SeF89XPX","_score" : null,"fields" : {"@timestamp" : ["2099-05-06T16:21:15.000Z"],"source.ip" : ["192.1.2.122"]},"sort" : [4081767675000]}]}
}

2.6 Combine queries 组合查询

使用bool这个参数

The following search combines two range queries: one on @timestamp and one on the source.ip runtime field

示例

GET logs-my_app-default/_search
{"runtime_mappings": {"source.ip": {"type": "ip","script": """String sourceip=grok('%{IPORHOST:sourceip} .*').extract(doc[ "event.original" ].value)?.sourceip;if (sourceip != null) emit(sourceip);"""}},"query": {"bool": {"filter": [{"range": {"@timestamp": {"gte": "2099-05-05","lt": "2099-05-08"}}},{"range": {"source.ip": {"gte": "192.0.2.0","lte": "192.0.2.240"}}}]}},"fields": ["@timestamp","source.ip"],"_source": false,"sort": [{"@timestamp": "desc"}]
}

响应

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"took" : 3,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "Ilzr3IMBd9B1SeF703MT","_score" : null,"fields" : {"@timestamp" : ["2099-05-06T16:21:15.000Z"],"source.ip" : ["192.0.2.42"]},"sort" : [4081767675000]},{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "Jlzs3IMBd9B1SeF7IXPX","_score" : null,"fields" : {"@timestamp" : ["2099-05-06T16:21:15.000Z"],"source.ip" : ["192.0.2.42"]},"sort" : [4081767675000]}]}
}

compound query 混合查询中,下面这个查询返回满足must/filter/must_not/should的结果

minimum_should_match是设置应满足should的比例(多个should字句时,设定满足多少should比例才返回)

boost是设定此搜索条件的权重

POST _search
{"query": {"bool" : {"must" : {"term" : { "user.id" : "kimchy" }},"filter": {"term" : { "tags" : "production" }},"must_not" : {"range" : {"age" : { "gte" : 10, "lte" : 20 }}},"should" : [{ "term" : { "tags" : "env1" } },{ "term" : { "tags" : "deployed" } }],"minimum_should_match" : 1,"boost" : 1.0}}
}

2.7 Aggregate data  聚合数据

Use aggregations to summarize data as metrics, statistics, or other analytics.

The following search uses an aggregation to calculate the average_response_size using the http.response.body.bytes runtime field.   The aggregation only runs on documents that match the query

使用聚合将数据总结为度量、统计或其他分析。

下面的搜索 通过http.response.body.bytes runtime field 聚合计算average_response_size。这个聚合建立在与查询匹配的文档上

request示例,其中

runtime_mappings会求得每个请求的http.response.body.bytes放在fields中,用agg关键字声明 聚合 ,聚合字段定义为average_response_size,在其中用avg求平均每个field的http.response.body.bytes

GET logs-my_app-default/_search
{"runtime_mappings": {"http.response.body.bytes": {"type": "long","script": """String bytes=grok('%{COMMONAPACHELOG}').extract(doc[ "event.original" ].value)?.bytes;if (bytes != null) emit(Integer.parseInt(bytes));"""}},"aggs": {"average_response_size":{"avg": {"field": "http.response.body.bytes"}}},"query": {"bool": {"filter": [{"range": {"@timestamp": {"gte": "2099-05-05","lt": "2099-05-08"}}}]}},"fields": ["@timestamp","http.response.body.bytes"],"_source": false,"sort": [{"@timestamp": "desc"}]
}

response,其中

aggregations字段中包含 聚合计算出的内容

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{"took" : 112,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 3,"relation" : "eq"},"max_score" : null,"hits" : [{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "J1zs3gewsd9B1SeF7gnPS","_score" : null,"fields" : {"@timestamp" : ["2099-05-07T16:24:32.000Z"],"http.response.body.bytes" : [0]},"sort" : [4081854272000]},{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "Ilzr3IMBd9B1S321033T","_score" : null,"fields" : {"@timestamp" : ["2099-05-06T16:21:15.000Z"],"http.response.body.bytes" : [24736]},"sort" : [4081767675000]},{"_index" : ".ds-logs-my_app-default-2022.10.15-000001","_type" : "_doc","_id" : "Jlzs3IMBd9B1S12345PX","_score" : null,"fields" : {"@timestamp" : ["2099-05-06T16:21:15.000Z"],"http.response.body.bytes" : [24736]},"sort" : [4081767675000]}]},"aggregations" : {"average_response_size" : {"value" : 16490.666666666668}}
}

2.8 图解一个请求

响应

2.9 field data type  字段包括哪些类型

每个field 数据有自己的field data 类型,比如是:text、keyword、boolean、Dates、Range(long_range/double_range/date_range)、ip等等

其中,keyword类型经常用在 sorting, aggregations聚合, and term-level queries, such as term.应避免使用keyword fields full-text search全文搜索,应使用text field type作为代替

2.10 解释 结构化数据、非结构化数据、半结构化数据

结构化数据是指可以使用关系型数据库表示和存储,表现为二维形式的数据。一般特点是:数据以行为单位,一行数据表示一个实体的信息,每一行数据的属性是相同的

非机构化数据就是没有固定结构的数据。各种文档、图片、视频/音频等都属于非结构化数据。对于这类数据,我们一般直接整体进行存储,而且一般存储为二进制的数据格式

半结构化数据是结构化数据的一种形式,它并不符合关系型数据库或其他数据表的形式关联起来的数据模型结构,但包含相关标记,用来分隔语义元素以及对记录和字段进行分层。因此,它也被称为自描述的结构。半结构化数据,属于同一类实体可以有不同的属性,即使他们被组合在一起,这些属性的顺序并不重要。常见的半结构数据有XML和JSON

2.11 term和match区别

term是精确搜索,不会对 query的内容 进行分词,拿着整个的query内容 去es中查找。

match是模糊搜索,会对 query内容 进行分词,即使只有一个字命中也会返回,返回所有的命中结果并附带相似分数_score,我们这里说的分词es官方称作 analyzed text fields(分析过了的文本字段)

示例:先添加三条文档

POST /test/_doc
{
"name": "张三",
"age": 25
}POST /test/_doc
{
"name": "张无忌",
"age": 50
}POST /test/_doc
{
"name": "李四",
"age": 30
}

term精确搜索‘张’,响应出 张三和张无忌

# 请求
GET test/_search
{"query": {"term": {"name": "张"}
}
}# 响应
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 0.7549127,"hits" : [{"_index" : "test","_type" : "_doc","_id" : "13q_34MBwb0XtVQgyvdY","_score" : 0.7549127,"_source" : {"name" : "张三","age" : 25}},{"_index" : "test","_type" : "_doc","_id" : "2HrA34MBwb0XtVQgbfd6","_score" : 0.6407243,"_source" : {"name" : "张无忌","age" : 50}}]}
}

term精确搜索‘张三’,无结果并没有返回之前存的张三

es默认存储的是text类型字段,默认的分词器会对存储内容进行分词存到倒排索引中,所以即使我们之前存了‘张三’,也term不出来匹配的结果,因为es中的张三已经被分词了,分成了'张' '三'。

# 请求
GET test/_search
{"query": {"term": {"name": "张三"}
}
}#  响应
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 0,"relation" : "eq"},"max_score" : null,"hits" : [ ]}
}

match演示,可以看到即使查的是 张三,结果中张无忌也出来了,并且张三的_score高于张无忌的_score

# 请求
GET test/_search
{"query": {"match": {"name": "张三"}
}
}#  响应
{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 2,"relation" : "eq"},"max_score" : 2.0661702,"hits" : [{"_index" : "test","_type" : "_doc","_id" : "13q_34MBwb0XtVQgyvdY","_score" : 2.0661702,"_source" : {"name" : "张三","age" : 25}},{"_index" : "test","_type" : "_doc","_id" : "2HrA34MBwb0XtVQgbfd6","_score" : 0.6407243,"_source" : {"name" : "张无忌","age" : 50}}]}
}

三、Query DSL

3.1 dis_max 分离最大化

分离最大化查询(Disjunction Max Query)指的是: 将任何与任一查询匹配的文档作为结果返回,但只将最佳匹配的评分作为查询的评分结果返回。而为了考虑查询的其他字段,在请求参数中加入tie_breaker,此时返回结果 综合score最大的fields的值 与 其他fields 的score乘以tie_breaker

3.2 boosting query

其中包括positive和negative参数

positive 中的内容是希望匹配的内容

而negative中的内容会 根据negative_boost系数进行降低权重计算结果 返回

GET /_search
{"query": {"boosting": {"positive": {"term": {"text": "apple"}},"negative": {"term": {"text": "pie tart fruit crumble tree"}},"negative_boost": 0.5}}
}

3.3 constant_score  常量分数

包装一个filter query,返回的结果中score是经过boost计算出来的

在constant_score下方必要参数:filter

可选参数:boost,默认为1,如果设为1.2则此条constant_score下每个filter匹配出来的文档score会乘以1.2

3.4 function_score query  用户自定义score机制

3.5  intervals query  间隔查询

根据 interval query的内容 与 匹配的内容词条 还有 间隔参数  制定间隔规则,去es中索取符合规则的文档

下面这条返回的结果my favorite food 三个单词是直接挨着的,因为max_gaps设的0, 顺序上它的后面是 hot water OR cold porridge ,因为参数ordered为true

这条搜索会匹配出my favorite food is cold porridge,而非when it's cold my favorite food is porridge

POST _search
{"query": {"intervals" : {"my_text" : {"all_of" : {"ordered" : true,"intervals" : [{"match" : {"query" : "my favorite food","max_gaps" : 0,"ordered" : true}},{"any_of" : {"intervals" : [{ "match" : { "query" : "hot water" } },{ "match" : { "query" : "cold porridge" } }]}}]}}}}
}

3.6 match query

match_bool_prefix 构造成等价于bool term 查询,注意最后一个词是前缀匹配

GET /_search
{"query": {"match_bool_prefix" : {"message" : "quick brown f"}}
}# 等价于下面这条GET /_search
{"query": {"bool" : {"should": [{ "term": { "message": "quick" }},{ "term": { "message": "brown" }},{ "prefix": { "message": "f"}}]}}
}

match_phrase  短语查询

会去es中按 math_phrase 中的短语内容和顺序去  查数据

如下条,返回的结果必须完全满足有 this is a test 这个顺序的

这里也进行分词了内部是根据位置+1+2 这样满足匹配顺序的,而term query是不分词

GET /_search
{"query": {"match_phrase": {"message": "this is a test"}}
}

match_phrase_prefix

与 match_pharse类似,但是短语中最后一个字符在倒排序索引列表中进行通配符搜索。重要参数:模糊匹配数控制 max_expansions 默认值50,最小值为1

3.7 combined_feilds 多个字段

The combined_fields query supports searching multiple text fields as if their contents had been indexed into one combined field. The query takes a term-centric view of the input string: first it analyzes the query string into individual terms, then looks for each term in any of the fields. This query is particularly useful when a match could span multiple text fields, for example the title, abstract, and body of an article

combined_fields查询支持搜索多个文本字段,就像它们的内容已经被索引到一个组合字段中一样。查询采用以term为中心的输入字符串视图:首先,它将查询字符串分析为单个term,然后在任何字段中查找每个term。当匹配可以跨越多个文本字段时,例如标题、摘要和文章主体,此查询特别有用

示例:

在title, abstract, and body 三个字段中搜database  and systems

operator 也可以为or

GET /_search
{"query": {"combined_fields" : {"query":      "database systems","fields":     [ "title", "abstract", "body"],"operator":   "and"}}
}

3.8 multi_match

multi_match 查询将允许你在 mapping 使用不同的分词器,而 combine_fields 查询需要相同的分析器

示例

其中,type默认是best_fields,还可以是most_fields(等价于should )、phrase and phrase_prefix、cross_fields

GET /_search
{"query": {"multi_match" : {"query":      "brown fox","type":       "best_fields","fields":     [ "subject", "message" ],"tie_breaker": 0.3}}
}# 等价于
GET /_search
{"query": {"dis_max": {"queries": [{ "match": { "subject": "brown fox" }},{ "match": { "message": "brown fox" }}],"tie_breaker": 0.3}}

3.9 query_string

3.9.1 指定单个字段查询

GET /_search
{"query": {"query_string": {"query": "(new york city) OR (big apple)","default_field": "content"}}
}

3.9.2 指定多个字段查询

GET /_search
{"query": {"query_string": {"fields": [ "content", "name" ],"query": "this AND that"}}

3.9.3 simple_query_string

simple_query_string查询的语法比query_string查询更有限,但它不会因为语法无效而返回错误。相反,它会忽略查询字符串中任何无效的部分。

3.10 joining query

包括netsed、haschild、hasparent

nested嵌套对象

先在mapping映射时定义type为nested,查询时用nested语句

PUT /my-index-000001
{"mappings": {"properties": {"obj1": {"type": "nested"}}}
}
GET /my-index-000001/_search
{"query": {"nested": {"path": "obj1","query": {"bool": {"must": [{ "match": { "obj1.name": "blue" } },{ "range": { "obj1.count": { "gt": 5 } } }]}},"score_mode": "avg"}}
}

3.11 percolate query

es的普通查询是通过某些条件来查询满足的文档,percolator则不同,先是注册一些条件,然后查询一条文档是否满足其中的某些条件。es的percolator特性在数据分类、数据路由、事件监控和预警方面都有很好的应用

现在Mapping中定义percolator

PUT /my-index-00001
{"mappings": {"properties": {"message": {"type": "text"},"query": {"type": "percolator"}}}
}

查询时再使用percolate参数

3.12 rank_feature

根据上下文动态地对文档进行评分是很常见的。 例如,如果你需要对某个类别内的更多文档进行评分,经典方案是提升(给低分的文档提分)基于某个值的文档,例如页面排名、点击量或类别。Elasticsearch 提供了两种基于值提高分数的新方法。一个是 rank feature 字段,另一个是它的扩展,即使用值向量。根据 rank_feature 或 rank_features 字段的数值提高文档的相关性分数。rank_feature 查询通常用在 bool 查询的 should 子句中,因此它的相关性分数被添加到 bool 查询的其他分数中。
(此处借鉴这篇文章)Elasticsearch:Rank feature query - 排名功能查询_Elastic 中国社区官方博客的博客-CSDN博客_elasticsearch 排名

3.13 pinned query

Promotes selected documents to rank higher than those matching a given query. This feature is typically used to guide searchers to curated documents that are promoted over and above any "organic" matches for a search. The promoted or "pinned" documents are identified using the document IDs stored in the _id field.

提升选定文档的排名,使其高于匹配给定查询的文档的排名。该功能通常用于引导搜索者到经过策划的文档,这些文档会在搜索的任何“organic”匹配之上被提升。提升或"pinned" 文档使用存储在_id字段中的文档id进行标识。

比如,下面这个请求,ids中这些内容将置顶返回

GET /_search
{"query": {"pinned": {"ids": [ "1", "4", "100" ],"organic": {"match": {"description": "iphone"}}}}
}

3.14 fuzzy query

将我们fuzzy query的内容 进行模糊匹配(或者说自动改错纠正输入内容)到 es中去查找

GET /_search
{"query": {"fuzzy": {"user.id": {"value": "ki"}}}
}

3.15 exist

exists过滤document,查找出那些在特定字段有值的document,值可以为‘’不可以为null

3.16 wildcard query  通配符查询

Promotes selected documents to rank higher than those matching a given query. Returns documents that contain terms matching a wildcard pattern. A wildcard operator is a placeholder that matches one or more characters.  For example, the * wildcard operator matches zero or more characters.  You can combine wildcard operators with other characters to create a wildcard pattern.

提升选定文档的排名,使其高于匹配给定查询的文档的排名。返回包含匹配通配符模式的术语的文档。通配符是匹配一个或多个字符的占位符。例如,*通配符匹配零个或多个字符。可以将通配符操作符与其他字符组合在一起,以创建通配符模式。

结语

工作需要的原因,刚开始接触ES,读了两天的英文文档,在此写下文章作为记录。接下来还会持续的学习,欢迎阅读此文以及正在学习ES的同仁在评论区与我交流。

Elasticsearch7.17学习笔记相关推荐

  1. Elasticsearch-7.x学习笔记

    本文转载自:阅读原文 文章目录 1. 单节点安装 2. ES安装head插件 3. Elasticsearch Rest基本操作 REST介绍 CURL创建索引库 查询索引-GET DSL查询 MGE ...

  2. 22/02/17学习笔记

    知识回顾 ###(1)hadoop简介 数据存储与数据计算 hdfs :通过分布式文件存储对数据进行存储: MapReduce:通过MapReduce进行数据的计算: (2)hadoop生态圈简介 实 ...

  3. 2022.1.17 学习笔记 (SPN中业务是如何传输的,主要是业务切片的调度编排)

    一.SPN简介 SPN 技术是一种基于 SDN 架构.具有硬切片特性的的时分复用技术,具有高效的路由分配.多通道聚合,不同业务间的网络切片,以及低时延的交换的特点.SPN 架构上分为 SPL层,SCL ...

  4. Anchorpoints学习笔记:

    Anchor Detr学习笔记: 文章目录 Anchor Detr学习笔记: 1.首先介绍下什么叫锚点(Anchor point) 2.再来介绍下什么叫DETR 3.Anchor Detr 1.首先介 ...

  5. Elasticsearch7学习笔记(中)

    Elasticsearch是实时全文搜索和分析引擎,提供搜集.分析.存储数据三大功能:是一套开放REST和JAVA API等结构提供高效搜索功能,可扩展的分布式系统.它构建于Apache Lucene ...

  6. C# 学习笔记(17)操作SQL Server 上

    C# 学习笔记(17)操作SQL Server上 安装SQL Server 微软官网 https://www.microsoft.com/zh-cn/sql-server/sql-server-dow ...

  7. springmvc学习笔记(17)-上传图片

    2019独角兽企业重金招聘Python工程师标准>>> springmvc学习笔记(17)-上传图片 标签: springmvc [TOC] 本文展示如何在springmvc中上传图 ...

  8. 深度学习笔记(17) 误差分析(二)

    深度学习笔记(17) 误差分析(二) 1. 使用来自不同分布的数据进行误差分析 2. 数据分布不匹配时的偏差与方差 3. 处理数据不匹配问题 1. 使用来自不同分布的数据进行误差分析 越来越多的团队都 ...

  9. 几何光学学习笔记(17)- 4.6光学材料

    几何光学学习笔记(17)- 4.6光学材料 4.6光学材料 1.透明光学材料 2.玻璃的选择 3.塑料光学材料 4.反射光学材料 4.6光学材料 1.透明光学材料 透射材料的光学特性主要由对各种色光的 ...

最新文章

  1. 【转载】wpf学习笔记1
  2. application/x-www-form-urlencoded multipart/form-data text/plain 的区别和作用
  3. SpringMVC 之拦截器和异常处理
  4. 关于Spring Boot你不得不知道的事
  5. python 使用sqlite3
  6. 情人节脱单必备,程序员如何花式表白
  7. 查看linux是否为虚拟机,以及其它信息,cpu,主机型号,主板型号等
  8. 黑白块游戏java代码_用java做的一个小游戏—黑白反斗棋(适合菜鸟)
  9. bash不能运行c语言,解决:无法加载文件 C:\\Program Files\\.. 因为在此系统上禁止运行脚本。...
  10. 阶段3 2.Spring_04.Spring的常用注解_1 今日课程内容介绍
  11. 文件服务器定时开关机,云服务器定时开关机
  12. oppor17刷鸿蒙系统,oppo a5刷机包下载
  13. 医咖会SPSS免费教程学习笔记—Fisher精确检验
  14. 【认知计算】IBM报告解读《认知中国》— 拉近人工智能未来与现实的距离,中国企业争当认知创新者
  15. modelsim破解
  16. 数据产品经理修炼手册_AI产品经理之数据标注
  17. 震撼!世界从10亿光年到0.1飞米!
  18. mor命令_MOR游戏的情况。
  19. OpenCV之图像相似度
  20. 线程同步小例子:12306订票

热门文章

  1. 速递:惠州学院生科院副院长谢海伟一行莅临易基因科技参观交流 | 校企合作
  2. 智慧城市的八大发展趋势
  3. 企业微信api创建通讯录架构或成员出现60011没有权限问题
  4. python入门教程 - 滑块实战【附源码】
  5. 相机光心在世界坐标系下的坐标(相机坐标系原点在世界坐标系下的坐标与c2w的关系)
  6. 食品的英语名称总结 (实用级)
  7. 初学docker第一周
  8. qt练手小项目之打地鼠
  9. 数字逻辑与数字系统(第五版)课后习题答案
  10. 天载优配解读对明天盘面的一些思考