文章目录

  • 一、基本介绍
    • 1、索引、类型、文档
    • 2、倒排索引
    • 3、访问地址
  • 二、常用操作
    • 1、查询节点以及集群相关信息
      • (1)查看所有es节点
      • (2)查看健康状况
      • (3)查看主节点
      • (4)查看所有索引
      • (5)查看节点信息
      • (6)查看分片信息
    • 2、索引信息查询
      • (1)查询索引详细信息
      • (2)查询索引数据总量
      • (2)根据条件查询数据量
    • 3、索引(保存)文档
      • (1)使用Put请求方式保存文档
      • (2)使用Post请求方式保存文档
    • 4、更新文档
      • (1)使用Put请求方式更新文档(全量更新)
      • (2)使用Post请求方式更新文档(全量更新,不带_update)
      • (3)使用Post请求方式更新文档(部分字段值更新,其他字段值不变,带_update)
      • (4)更新数字
      • (5)根据条件更新部分字段
      • (6)更新中的乐观锁操作
    • 5、根据文档id查询文档
    • 6、删除文档
    • 7、删除索引
    • 8、批量操作
      • (1)使用简单的insert保存批量操作
      • (2)使用index保存、create保存、update更新、delete删除等批量操作
      • (3)批量添加测试数据
    • 9、查询多个索引数据
  • 三、Query DSL(DSL即领域特定语言)
    • 1、什么是Query DSL
      • (1)uri+请求体(请求值即Query DSL)
      • (2)uri+请求参数
    • 2、基础语法格式
      • (1)一个查询语句的典型结构
      • (2)针对文档中字段的操作
      • (3)query的使用
        • ①、查询全部
        • ②、非字符串值进行精确查询
        • ③、字符串值进行精确查询
        • ④、字符串值进行全文检索(也可以叫做分词匹配)
        • ⑤、字符串值进行短语匹配(即不对字符串进行分词,把字符串当做整体进行匹配)
        • ⑥、多字段匹配
        • ⑦、前缀查询 / 通配符查询 / 正则表达式查询
        • ⑧、复合查询
        • ⑨、滚动查询
      • (4)sort的使用
      • (5)from和size的使用
      • (6)_source的使用
      • (7)高亮的使用
      • (8)boost的使用
      • (9)exist的使用
      • (10)range和format的使用
  • 四、Aggregations聚合分析
    • 1、聚合简单用法
    • 2、子聚合简单使用
    • 3、多重子聚合的使用
    • 4、聚合用法补充
  • 五、Mapping映射
    • 1、查看索引中的映射信息
    • 2、创建字段映射
    • 3、Nested禁止扁平化处理
    • 4、添加字段映射
    • 5、更新字段映射
    • 6、数据迁移
      • (1)es7将type类型变为可选、es8完全去掉type类型的解释:
      • (2)具体实现
  • 六、分词器使用
    • 1、下载ik分词器和pinyin分词器
    • 2、默认分词器
    • 3、ik_smart分词器
    • 4、ik_max_word分词器
    • 5、ik_max_word分词器
    • 6、创建ik分词器的自定义远程仓库
  • 七、例子

一、基本介绍

1、索引、类型、文档

es和mysql类似,es中的索引、类型、文档分别对应mysql中的库、表、数据,不过es中的文档中的数据是以json格式来写的,一行就是一串json格式的字符串,json中的属性可以认为是mysql中的列名,json的值可以认为就是mysql中的属性

2、倒排索引

我们把数据存在es之后,es会将数据进行分析,然后将数据进行分词,例如红海行动分成了红海、行动,而探索红海行动被分成了探索、红海、行动,然后红海特别行动被分成了红海、特别、行动等等,之后把这些词放在表左边,包含这些词的记录放在词的右边,当我们检索某个词的时候会根据相关性得分得出数据的展示顺序,如下图:


根据上图来看,如果我们检索“红海特工行动”,这些词语可以分为红海、特工、行动三个词,可以看到这三个词中只有特工没有被匹配到,然后我们看匹配到的词,其中1号数据、2号数据、3号数据、5号数据有2个词匹配上了,而4号数据只有一个匹配上了,我们分别来计算这些数据的相关性得分,如下:

数据 相关性得分
1号数据 2/2
2号数据 2/3
3号数据 2/3
4号数据 1/2
5号数据 2/4

上面的相关性得分(文档得分)是将数据中的匹配词出现的次数当做分子,而数据会划分的总词数当做分母,得出的相关性得分越大显示检索的时候越靠前

以上为了介绍倒排索引对分词说的比较简单,其实并没有这么简单,例如红海特别行动不仅仅只被分成红海、特别、行动,还有可能被分成特别行动等等,但是上面已经把分词的含义说清楚了,不过实际分词的时候更加细致而已,毕竟分词数目不好确定

3、访问地址

elasticsearch:http://192.168.56.10:9200

kibana:http://192.168.56.10:5601

二、常用操作

1、查询节点以及集群相关信息

(1)查看所有es节点

http://192.168.56.10:9200/_cat/nodes

结果:

127.0.0.1 62 93 5 0.35 0.29 0.38 dilm * 8e383449ab38

解释:

其中*代表当前节点是主节点,8e383449ab38是节点的name,我们可以通过访问http://192.168.56.10:9200看到该name

(2)查看健康状况

http://192.168.56.10:9200/_cat/health

结果:

1609038775 03:12:55 elasticsearch green 1 1 3 3 0 0 0 0 - 100.0%

解释:

green表示当前集群是非常监控的,后面的数字等是集群分片信息

(3)查看主节点

http://192.168.56.10:9200/_cat/master

结果:

manSbe-MSkyNN5WMByvsgQ 127.0.0.1 127.0.0.1 8e383449ab38

解释:

8e383449ab38是主节点的name,可以通过访问http://192.168.56.10:9200看到

(4)查看所有索引

http://192.168.56.10:9200/_cat/indices

结果:

green open .kibana_task_manager_1   uaBoceJ2RcOePTGlYFqfjA 1 0 2 0 12.5kb 12.5kb
green open .apm-agent-configuration FH7Ig-YZQBmihJE8HLLkWw 1 0 0 0   283b   283b
green open .kibana_1                oqIZOOuCRhKYadlo_1821Q 1 0 6 0   29kb   29kb

解释:

可以存储一些配置等等,相当于mysql数据库中执行show databases

(5)查看节点信息

http://192.168.56.10:9200/_nodes/stats

结果:

{"_nodes": {"total": 1,"successful": 1,"failed": 0},"cluster_name": "elasticsearch","nodes": {"7gI4uiYxRF2bJILMiG69tg": {"timestamp": 1652883914938,"name": "LAPTOP-OTED0HAJ","transport_address": "127.0.0.1:9300","host": "127.0.0.1","ip": "127.0.0.1:9300","roles": ["data","ingest","master","ml","remote_cluster_client","transform"]……

(6)查看分片信息

http://192.168.56.10:9200/_cluster/state

结果:

{"cluster_name": "elasticsearch","cluster_uuid": "bzk5iojSRgecxmy3kwaIRQ","version": 393,"state_uuid": "Qn5de73TS7OvvfbhXHrIHQ","master_node": "7gI4uiYxRF2bJILMiG69tg","blocks": {},"nodes": {"7gI4uiYxRF2bJILMiG69tg": {"name": "LAPTOP-OTED0HAJ","ephemeral_id": "ajpLYZfWTj-4nJGCs12JGA","transport_address": "127.0.0.1:9300","attributes": {"ml.machine_memory": "51278336000","xpack.installed": "true","transform.node": "true","ml.max_open_jobs": "20"}}},"metadata": {"cluster_uuid": "bzk5iojSRgecxmy3kwaIRQ","cluster_uuid_committed": true,"cluster_coordination": {"term": 21,"last_committed_config": ["7gI4uiYxRF2bJILMiG69tg"],"last_accepted_config": ["7gI4uiYxRF2bJILMiG69tg"],"voting_config_exclusions": []},"templates": {……

2、索引信息查询

(1)查询索引详细信息

请求方式:

GET

请求路径:

http://192.168.56.10:9200/索引名称

(2)查询索引数据总量

请求方式:

GET

请求路径:

http://192.168.56.10:9200/索引名称/_count

(2)根据条件查询数据量

请求方式:

GET

请求路径:

http://192.168.56.10:9200/索引名称/_count

请求体:

// 使用Query DSL即可,也就是普通的查询语句,其实和_search中的查询语句格式一致,这里不再赘述

结果:

{"count": 299117,"_shards": {"total": 9,"successful": 9,"skipped": 0,"failed": 0}
}

3、索引(保存)文档

(1)使用Put请求方式保存文档

请求方式:

PUT

请求路径:

http://192.168.56.10:9200/customer/external/1

请求体:

{"name": "John Doe"
}

结果:

{"_index": "customer","_type": "external","_id": "1","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 0,"_primary_term": 1
}

解释:

其中customer是索引名称,external是类型名称,1是唯一标识id(即文档),保存的数据是json格式的,返回的结果中带_的数据都被称为元数据,注意put请求必须携带id(/customer/external/后面的就是id),否则请求无法发送,如果索引—》类型下面没有对应id,第一次执行该请求将会保存一个文档,可以看到返回值中的result是created;
以后在执行相同请求,将会执行更新操作,我们在更新操作中会具体说明

(2)使用Post请求方式保存文档

请求方式:

POST

请求路径:

// 方式1:不带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external
// 方式2:带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/2

请求体:

{"name": "John Doe"
}

结果(不带id执行):

{"_index": "customer","_type": "external","_id": "eKJLpHYBFLB86FefIxDx","_version": 1,"result": "created","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 11,"_primary_term": 1
}

解释:

1、不携带唯一标识id(/customer/external/后面的就是id),将自动生成唯一id,执行结果中的result为created,即保存操作,即使我们执行同一个请求,那每一次都是created保存操作

2、携带唯一标识id(/customer/external/后面的就是id),如果索引—》类型下面没有对应id(/customer/external/后面的就是id),第一次执行该请求将会保存一个文档,可以看到返回值中的result是created;
以后在执行相同请求,将会执行更新操作,我们在更新操作中会具体说明

4、更新文档

(1)使用Put请求方式更新文档(全量更新)

请求方式:

PUT

请求路径:

http://192.168.56.10:9200/customer/external/1

请求体:

{"name": "John Doe"
}

结果:

{"_index": "customer","_type": "external","_id": "1","_version": 2,"result": "updated","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 2,"_primary_term": 1
}

解释:

该请求之前已经执行过一次了,可以看上面的“2、索引(保存)文档---》(1)使用Put请求方式保存文档”,第二次同样的请求就是更新操作了,可以看到返回值中的result就是updated

(2)使用Post请求方式更新文档(全量更新,不带_update)

请求方式:

POST

请求路径:

// 带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/1

请求体:

{"name": "John Doe"
}

结果:

{"_index": "customer","_type": "external","_id": "1","_version": 3,"result": "updated","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 3,"_primary_term": 1
}

解释:

该请求之前已经执行过一次了,可以看上面的“2、索引(保存)文档---》(2)使用Post请求方式保存文档”,第二次同样的请求就是更新操作了,可以看到返回值中的result就是updated

(3)使用Post请求方式更新文档(部分字段值更新,其他字段值不变,带_update)

请求方式:

POST

请求路径:

// 方式1:
// 带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/1/_update
// 方式2(ES7.X.X版本):
http://192.168.56.10:9200/customer/_update/1

请求体:

{"doc": {"name": "John Doe"}
}

结果:

{"_index": "customer","_type": "external","_id": "1","_version": 4,"result": "noop","_shards": {"total": 0,"successful": 0,"failed": 0},"_seq_no": 4,"_primary_term": 1
}

解释:

http://192.168.56.10:9200/customer/external/1请求之前已经执行过至少一次了,第二次相同索引—》类型—》唯一标识id(即文档)在执行就是更新操作了,可以看到返回值中的result就是updated或者noop,但是本次的更新和以往的不一样,上面我们也提到了两种更新操作,以上两种更新操作不会判断值是否真的改变了,他们会直接进行更新,然后更改_version、_seq_no、_shards对应的值,而本次带上_update会判断值是否真的改变了,如果真的改变了,它当然会更改_version、_seq_no、_shards对应的值,并且返回值中的result是updated;如果更新的值还是原来的值,那_version、_seq_no、_shards对应的值不会改变,另外返回值中的result是noop,使用_updated的特点就是可以判断值是否真的改变了,然后决定是进行更新操作还是不进行更新操作,但是需要注意的有两点:

1)、使用Post请求,请求后面不仅有唯一标识id(/customer/external/后面的就是id),还有_update,这是特定写法不能改变
2)、数据需要放在"doc": {}中,不能直接放在{}中

(4)更新数字

请求方式:

POST

请求路径:

// 方式1:
// 带id(/customer/external/后面的就是id)
http://192.168.56.10:9200/customer/external/1/_update
// 方式2(ES7.X.X版本):
http://192.168.56.10:9200/customer/_update/1

请求体:

{"script" : "ctx._source.clickCount+=1"
}

解释:

其中ctx._source是固定写法,而clickCount是属性名称,这是数字类型的,然后1是需要增加的值,当然还可以是其他数字,并且可以是负数,这都是支持的

(5)根据条件更新部分字段

请求方式:

POST

请求路径:

// customer:索引名称;external:文档名称
http://192.168.56.10:9200/customer/external/_update_by_query

请求体:

1、单值更新

{"query": {"term": {"indexId": "ecb50a4579324f9fa35e6c8fa4d8807b"}},"script": {"source": "ctx._source.kgCategoryName = params.categoryName","params": {"categoryName": "科技强国"}}
}

2、多值更新

{"query": {"term": {"indexId": "ecb50a4579324f9fa35e6c8fa4d8807b"}},"script": {"source": "ctx._source.clickCount = params.clickCount;ctx._source.collectCount = params.collectCount","params": {"clickCount": 1,"collectCount": 2}}
}

解释:

多个值的话,中间用英文分号分隔开

(6)更新中的乐观锁操作

更新的时候可以看到这两个字段_seq_no(并发控制字段,每次更新都会加1,用来做乐观锁) 和_primary_term(主分片重新分配,就会变化,例如重启) ,更新的时候可以加上,假设有两个请求(索引—》类型后面已经有对应id)同时发送,以Put请求方式更新数据为例,当然也可以使用另外两种更新方式,依然使用id=1,请求数据还是{“name”: “John Doe”},请求地址如下:

http://192.168.56.10:9200/customer/external/1?if_seq_no=13&if_primary_term=1

其中if_seq_no和_seq_no对应比较,if_primary_term和_primary_term对应比较,只要我们更新之前上述两对值对应,那就可以执行更新,如果有任何一对值不对应,那就无法完成更新,具体执行结果如下:

{"error": {"root_cause": [{"type": "version_conflict_engine_exception","reason": "[1]: version conflict, required seqNo [13], primary term [1]. current document has seqNo [14] and primary term [1]","index_uuid": "ANbuGAD0TYC9_oNMvIskmA","shard": "0","index": "customer"}],"type": "version_conflict_engine_exception","reason": "[1]: version conflict, required seqNo [13], primary term [1]. current document has seqNo [14] and primary term [1]","index_uuid": "ANbuGAD0TYC9_oNMvIskmA","shard": "0","index": "customer"},"status": 409
}

5、根据文档id查询文档

请求方式:

GET

请求路径:

http://192.168.56.10:9200/customer/external/1

结果:

{"_index": "customer","_type": "external","_id": "1","_version": 4,"_seq_no": 4,"_primary_term": 1,"found": true,"_source": {"name": "John Doe"}
}

解释:

6、删除文档

方式1(根据indexId删除):

请求方式:

DELTE

请求方式:

http://192.168.56.10:9200/customer/external/1

结果:

{"_index": "customer","_type": "external","_id": "1","_version": 15,"result": "deleted","_shards": {"total": 2,"successful": 1,"failed": 0},"_seq_no": 20,"_primary_term": 1
}

解释:

可以看到返回值中的result是deleted,代表删除成功了,我们需要指定索引—》类型—》唯一标识id(即文档),例如/customer/external/1中的customer是索引、external是类型、唯一索引id是1

方式2(根据条件删除):

POST twitter/_delete_by_query
{"query": { "match": {"message": "some message"}}
}

解释:根据条件删除,地址:https://www.elastic.co/guide/en/elasticsearch/reference/6.0/docs-delete-by-query.html

7、删除索引

请求方式:

DELTE

请求方式:

http://192.168.56.10:9200/customer

结果:

{"acknowledged": true
}

8、批量操作

注意:以上操作均在PostMan中进行,但是批量操作无法在PostMan中进行,因此在Kibana中进行,以下的操作均在Kibana中进行

(1)使用简单的insert保存批量操作

请求:

POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}

结果:

{"took" : 159,"errors" : false,"items" : [{"index" : {"_index" : "customer","_type" : "external","_id" : "1","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 0,"_primary_term" : 1,"status" : 201}},{"index" : {"_index" : "customer","_type" : "external","_id" : "2","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 1,"_primary_term" : 1,"status" : 201}}]
}

解释:

先上图,后说话

请求方式就是POST,请求路径代表customer是索引,而external是类型,后面的_bulk代表本次执行批量操作,主要是请求体很独特,它的语法格式是这样的:

{action:{metadata名称: metadata值, ……}}
{请求体}

其中action有index保存、create保存、update更新、delete删除等,元数据就是我们之前看到的那些带_的元数据,在4、查询文档中可以看一下带_的那些元数据含义,本次指定的是_id,也就是唯一标识id,即我们常说的文档,至于请求体中写的东西还是我们在保存或者更新操作中使用的数据

(2)使用index保存、create保存、update更新、delete删除等批量操作

请求:

POST /_bulk
{"index":{"_index":"website","_type":"blog"}}
{"title":"My second blog post"}
{"index":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"My second blog post"}
{"create":{"_index":"website","_type":"blog","_id":"123"}}
{"title":"My first blog post"}
{"update":{"_index":"website","_type":"blog","_id":"123"}}
{"doc":{"title":"My updated blog post"}}
{"delete":{"_index":"website","_type":"blog","_id":"123"}}

结果:

{"took" : 18,"errors" : true,"items" : [{"index" : {"_index" : "website","_type" : "blog","_id" : "faLYpHYBFLB86FefzxCh","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 4,"_primary_term" : 1,"status" : 201}},{"index" : {"_index" : "website","_type" : "blog","_id" : "123","_version" : 1,"result" : "created","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 5,"_primary_term" : 1,"status" : 201}},{"create" : {"_index" : "website","_type" : "blog","_id" : "123","status" : 409,"error" : {"type" : "version_conflict_engine_exception","reason" : "[123]: version conflict, document already exists (current version [1])","index_uuid" : "Je9tgYdORkCHICn2yGJYWA","shard" : "0","index" : "website"}}},{"update" : {"_index" : "website","_type" : "blog","_id" : "123","_version" : 2,"result" : "updated","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 6,"_primary_term" : 1,"status" : 200}},{"delete" : {"_index" : "website","_type" : "blog","_id" : "123","_version" : 3,"result" : "deleted","_shards" : {"total" : 2,"successful" : 1,"failed" : 0},"_seq_no" : 7,"_primary_term" : 1,"status" : 200}}]
}

解释:

(3)批量添加测试数据

请求体获取地址:点击我

请求:

POST bank/accout/_bulk
请求体放在这里

具体执行:

9、查询多个索引数据

索引名称之间需要用英文逗号隔开,如下:

请求:

GET cis_bookstore_book,knowledge/_search

结果:

三、Query DSL(DSL即领域特定语言)

1、什么是Query DSL

(1)uri+请求体(请求值即Query DSL)

请求:

GET bank/_search
{"query": {"match_all": {}},"sort": [{"account_number": {"order": "asc"}}]
}

解释:

bank/_search中的_search是固定的,代表查询操作,“match_all”: {}代表查询条件,如果查询全部,那{}中可以不写条件,sort中写的内容代表根据account_number字段按照asc升序排序,然后解释一下结果

DSL定义:Elasticsearch提供了一个可以执行查询的Json风格的DSL(domain-specific language,即领域特定语言),这就被称为Query DSL。例如我们上面所说的请求体就是DSL,该查询语言非常全面,并且刚开始的时候感觉有点复杂,真正学好它的方法是从一些基础示例开始的。

(2)uri+请求参数

除了这种使用uri+请求体的方式,还有另外一种检索方式,这种检索方式把参数放在uri的后面,我们举出一个例子实现和上面使用Query DSL方式相同的效果,例如如下:

GET /bank/_search?q=*&sort=account_number:asc

其中q=*代表查询所有,sort后面的内容代表按照account_number字段进行asc顺序排序,结果和上面的查询结果是一样的,这里就不在展示了,我们最常使用的还是Query DSL

2、基础语法格式

(1)一个查询语句的典型结构

{QUERY_NAME:{ARGUMENT:VALUE,ARGUMENT:VALUE,……}
}

该结构中的QUERY_NAME表示是query等等,ARGUMENT就是es中使用的属性,比如match_all或者match等等,然后vlaue值中还可以有其他内容,比如es中的字段等等,我们后面也会提到的,例如:

{"query": {"match_all": {}}
}

虽然没有用到ARGUMENT:VALUE,但是是可以使用的

(2)针对文档中字段的操作

结构如下:

{QUERY_NAME:{FIELD_NAME:[{ARGUMENT:VALUE,ARGUMENT:VALUE,……},……]}
}

该结构中的QUERY_NAME表示是size等等,FIELD_NAME就是文档中的属性,ARGUMENT就是es中使用的属性,比如下面例子中使用的order

例如:

{……"sort": [{"account_number": {"order": "asc"}}]
}

(3)query的使用

①、查询全部
GET bank/_search
{"query": {"match_all": {}}
}

解释:

match_all用于查询所有数据,但是只会返回10条,这是默认值

②、非字符串值进行精确查询
GET bank/_search
{"query": {"term": {"account_number": 20}}
}

解释:

account_number是字段名称,这种精确匹配和mysql中使用的where是一致的,term就是会把字段的值当做当做一个整体进行去寻找,不经过倒排索引,毕竟它不需要分词,非字符串值进行精确查询建议使用term,虽然match也可以完成同样的功能,但是match主要用于全文检索,所以非字符串进行精确查询建议使用term

③、字符串值进行精确查询
GET bank/_search
{"query": {"match": {"address.keyword": "282 Kings Place"}}
}

解释:

每一个字段名称后面都可以跟上一个.keyword,只要加上了.keyword,那就说明要把后面的值当做一个整体去检索,并且不会经过倒排索引,虽然也可以使用term来完成,并且可以不用添加.keyword,但是term常常用于非字符串值进行精确查询,所以建议使用.keyword完成字符串值精确查询,这和match_phrase短语匹配还是不一样的,match_phrase短语匹配虽然不分词,并且把短语当做一个整体,但它还是需要经过倒排索引,即使address值只有部分和需要匹配的短语一样,那也是可以的,每一个字符串字段(type类型为text)都有key

④、字符串值进行全文检索(也可以叫做分词匹配)
GET bank/_search
{"query": {"match": {"address": "mill lane"}}
}

结果:

{"took" : 1,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 19,"relation" : "eq"},"max_score" : 9.507477,"hits" : [{"_index" : "bank","_type" : "account","_id" : "136","_score" : 9.507477,"_source" : {"account_number" : 136,"balance" : 45801,"firstname" : "Winnie","lastname" : "Holland","age" : 38,"gender" : "M","address" : "198 Mill Lane","employer" : "Neteria","email" : "winnieholland@neteria.com","city" : "Urie","state" : "IL"}},……
}

解释:

total中的value是19代表一共匹配到了19条记录,最大的得分max_score是9.507477,说明匹配的记录中最高的得分就是9.507477,越高得分的记录查询的时候越靠前,我们查到的第一条数据就是的_score就是9.507477,查询的时候会将字符串mill lane进行分词,可以分为milllane,那么只要address中包含mill或者lane的数据都会被查询出来,并且不区分大小写,而我们往es中存储数据的时候es也会将数据进行分词,然后维护一个倒排索引,然后根据数据中address包含的milllane的数量和数据自身address被分词的总数目进行除法操作,得出最终的相关性得分,既然需要匹配的字符串和原来传入的字符串都需要分词,然后按照倒排索引计算相关性得分,如果我们匹配的字符串不在倒排索引中,相关性得分_score是0,那是搜不到数据的,例如把address变成mil,虽然整体能匹配上那些带mill的address字符串,但是我们不是根据address字符串整体来的,而是根据倒排索引中的分词来分析的,所以address变成mil是搜索不到数据的,毕竟倒排索引中就没有为mil的词

⑤、字符串值进行短语匹配(即不对字符串进行分词,把字符串当做整体进行匹配)
GET bank/_search
{"query": {"match_phrase": {"address": "mill lane"}}
}

结果:

{"took" : 8,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 9.507477,"hits" : [{"_index" : "bank","_type" : "account","_id" : "136","_score" : 9.507477,"_source" : {"account_number" : 136,"balance" : 45801,"firstname" : "Winnie","lastname" : "Holland","age" : 38,"gender" : "M","address" : "198 Mill Lane","employer" : "Neteria","email" : "winnieholland@neteria.com","city" : "Urie","state" : "IL"}}]}
}

解释:

使用match_pharase不对字符串mill lane进行分词,直接把这个字符串mill lane当做整词在倒排索引中寻找和它一致的词,只有我们之前存入es中时address值分词的的时候可以分成字符串mill lane那些数据会被检索到,检索的时候不区分大小写

⑥、多字段匹配
GET bank/_search
{"query": {"multi_match": {"query": "mill movico","fields": ["address","city"]}}
}

解释:

首先将query中的字符串mill movico进行分词处理,只要address或者city字段值中任何一个包含mill或者movico都是会被检索到的,这当然需要和倒排索引密切相关,另外检索的时候也是不区分大小写的

⑦、前缀查询 / 通配符查询 / 正则表达式查询

(1)前缀查询

GET bank/_search
{"query": {"prefix": {"address": "Hines"}}
}

解释:前面只要包含Hines的就可以,如果字段是keyword类型,执行匹配该字段的前缀内容

(2)通配符查询

GET /my_index/address/_search
{"query": {"wildcard": {"postcode": "W?F*HW" (1)}}
}

解释:? 匹配任意字符, * 匹配 0 或多个字符,如果字段是keyword类型,执行匹配该字段的全部内容

(3)正则表达式

GET /my_index/address/_search
{"query": {"regexp": {"postcode": "W[0-9].+"}}
}

解释:正则表达式平常怎么用,这边也是怎么用,如果字段是keyword类型,执行匹配该字段的全部内容

总结:以上三种方式对应的字段必须是not_analyzed类型的,也就是keyword类型,这是支持匹配的;如果该字段是可以分词的,那就会匹配所有单个分词

⑧、复合查询
GET bank/_search
{"query": {"bool": {"must": [{"match": {"gender": "M"}},{"match": {"address": "mill"}}],"must_not": [{"match": {"age": "28"}}],"should": [{"match": {"lastname": "Hines"}}],"filter": {"range": {"age": {"gte": 10,"lte": 40}}}}}
}

解释:

bool里面有四种形式,分别是must、must_not、should、filter,他们分别代表必须满足全部条件、不能满足任何一个条件、必须满足条件之一、数据过滤

其中must、should都会贡献相关性得分,filter不会贡献相关性得分,而must_not被当成一个过滤器,和filter功能一致,所以也不会贡献相关性得分

可以看出我们上面需要寻找gender包含M,address包含mill,年龄不能是28岁,lastname中最好包含Hines,即使不包含也没有关系,不过不包含的话相关性得分_score低一点,但也是可以查询出来的,should里面的条件不是全部满足,但必须满足其中的条件之一,相当于或者操作,最后然后通过filter过滤掉年龄大于等于10岁并且小于等于40岁的数据

⑨、滚动查询

首次查询:

// scroll字段指定了滚动id的有效生存期,以分钟为单位,过期之后会被es自动清理
GET /kms.wiki/_search?scroll=2m
{"query": {"match_all": {}},"size": 20,"_source": "title"
}

结果:

{"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAKFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAACxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAAAwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAANFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADhZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABIWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAARFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABAWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==","took": 20,"timed_out": false,"_shards": {"total": 9,"successful": 9,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4075,"relation": "eq"},"max_score": 1,"hits": [{"_index": "kms.wiki","_type": "_doc","_id": "46b3f4af5ead47a69622e2d13186cf01","_score": 1,"_source": {"title": "前南斯拉夫国防学院"}},……]}
}

非首次查询:

// scroll后面的值依然是2分钟
// scroll_id的值是上次查询获取的_scroll_id,由于滚动id的存在,所以不用在写索引名称和查询条件,当hits为空的时候,说明滚动查询到头了,不要在查询了
GET /_search/scroll?scroll=2m&scroll_id=DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAKFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAACxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAAAwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAANFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADhZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABIWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAARFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABAWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==

结果:

{"_scroll_id": "DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAKFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAACxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAAAwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAANFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADhZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABIWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAARFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAADxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABAWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==","took": 12,"timed_out": false,"terminated_early": true,"_shards": {"total": 9,"successful": 9,"skipped": 0,"failed": 0},"hits": {"total": {"value": 4075,"relation": "eq"},"max_score": 1,"hits": [{"_index": "kms.wiki","_type": "_doc","_id": "328a36aa33b444dfa1c1b379ee0a8d47","_score": 1,"_source": {"title": "卡-52武装直升机"}},……]}
}

清除scroll_id:

// 由于滚动查询十分占用内存,所以在查询成功之后,需要根据滚动查询id及时回收内存;下面数组中都是_scroll_id,由于查询次数过多,所以滚动id存在多个
DELETE /_search/scroll
{"scroll_id": ["DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAeFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAHRZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAfFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIBZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACEWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAiFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACQWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ==","DnF1ZXJ5VGhlbkZldGNoCQAAAAAAAAAeFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAHRZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAABwWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAfFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIBZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACEWZldNV0hQTVlRdWkwNDBGb3NUTkduUQAAAAAAAAAiFmZXTVdIUE1ZUXVpMDQwRm9zVE5HblEAAAAAAAAAIxZmV01XSFBNWVF1aTA0MEZvc1ROR25RAAAAAAAAACQWZldNV0hQTVlRdWkwNDBGb3NUTkduUQ=="]
}

参考:

  1. ElasticSearch 深度分页解决方案
  2. Elasticsearch的scroll用法

注意:

  1. 查询条件中不能使用from属性,否则会出现错误:Validation Failed: 1: using [from] is not allowed in a scroll context

(4)sort的使用

请求:

GET bank/_search
{"query": {"match_all": {}},"sort": [{"account_number": {"order": "asc"}}]
}

解释:

按照account_number字段进行升序排列,这和mysql中的order by 字段名称 排序规则是一样的,以上写法最符合标准,但是也可以简写,比如将

"account_number": {"order": "asc"
}

简写成:

"account_number": "asc"

多字段排序:

GET bank/_search
{"query": {"match_all": {}},"sort": [{"account_number": {"order": "asc"}},{"age": {"order": "desc"}}]
}

(5)from和size的使用

请求:

GET bank/_search
{"query": {"match_all": {}},"from": 0,"size": 5
}

解释:

结果就不在说了,query中设置的查询全部数据,from代表从0开始,数据默认确实从0开始排列,size代表取出5条数据,这个和mysql中的limit start, size是一样的

(6)_source的使用

结果中只包含单个字段:

{"query": {"match_all": {}},"_source": "title"
}

结果中包含多个指定字段:

{"query": {"match_all": {}},"_source": ["title", "price"]
}

结果中不包含哪些字段:

{"query": {"match_all": {}},"_source": {"excludes": ["firstname", "lastname"]}
}

解释:

_source和mysql中的select 字段名称……是类似的

(7)高亮的使用

默认使用em标签包裹:

{"query": {"match": {"title": "小米"}},"highlight": {"fields": {"title": {}}}
}

可以使用pre_tagspost_tags来指定前后的包裹标签:

{"query": {"match": {"title": "小米"}},"highlight": {"pre_tags": "<b color='red'>","post_tags": "</b>","fields": {"title": {}}}
}

(8)boost的使用

// 1、在prefix中使用
{"query": {"prefix": {"website": {"value": "/坦克/","boost": 100}}}
}// 2、在match_phrase中使用
{"query": {"match_phrase": {"title": {"query": "中国","boost": 5}}}
}// 3、在term和terms中使用
{"query": {"bool": {"should": [{"term": {"title.keyword": {"value": "中国","boost": 2999}}},{"prefix": {"website": {"value": "/中国/","boost": 2888}}},{"terms": {"title.keyword": ["中华人民共和国"],"boost": 1000}},{"multi_match": {"query": "中国","fields": ["title^10", "content"],"minimum_should_match": "100%"}}],"minimum_should_match": 1}},"highlight": {"pre_tags": "<font>","post_tags": "</font>","fields": {"title": {},"title.keyword": {},"content": {}}},"_source": ["indexId", "title", "content", "date"],"from": 0,"size": 10
}

(9)exist的使用

请求:

GET bank/_search
{"query": {"bool": {"must_not": {"exists": {"field": "kgCategoryName"}}}}
}

解释:

如果值为null或者[],但是不包括以下几种类型:(1)空字符串,例如"“或”-"(2)包含null和另一个值的数组,例如[null, “foo”](3)自定义nul值;具体细节可以查看:exists-query

(10)range和format的使用

请求:

GET /kms.wiki/_count
{"query": {"range": {"date": {"gte": "2020-01-01","lte": "2021-01-01","format": "yyyy-MM-dd"}}}
}

解释:

不仅说明range的使用,更是说明format如何使用,format说明gte和lte后面的日期格式,让es能知道是什么日期

四、Aggregations聚合分析

1、聚合简单用法

GET bank/_search
{"query": {"match": {"address": "mill"}},"aggs": {"balanceCount": {"terms": {"field": "balance","size": 10}},"blanceSum": {"sum": {"field": "balance"}},"balanceAvg":{"avg": {"field": "balance"}},"balanceMin":{"min": {"field": "balance"}},"balanceMax":{"max": {"field": "balance"}}},"size": 0
}

结果:

{"took" : 34,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 4,"relation" : "eq"},"max_score" : null,"hits" : [ ]},"aggregations" : {"balanceMax" : {"value" : 45801.0},"blanceSum" : {"value" : 100832.0},"balanceMin" : {"value" : 9812.0},"balanceCount" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : 9812,"doc_count" : 1},{"key" : 19648,"doc_count" : 1},{"key" : 25571,"doc_count" : 1},{"key" : 45801,"doc_count" : 1}]},"balanceAvg" : {"value" : 25208.0}}
}

解释:

首先查询了address中包含mill的数据,然后根据这些数据进行聚合操作,几种操作如下:
aggs:代表聚合操作,balanceCount、blanceSum、balanceAvg、balanceMin、balanceMax代表聚合操作的名称
terms:代表分组,在terms里面field指定需要分组的字段名称,size表示最多展示10个分组结果,当然这个可以随意调节
sum:代表计算总额,在sum里面field指定需要计算总额的字段名称
avg:代表计算平均值,在avg里面field指定需要计算平均值的字段名称
min:代表计算最小值,在min里面field指定需要计算最小值的字段名称
max:代表计算最小值,在max里面field指定需要计算最大值的字段名称

最后的size:0代表只看聚合数据,不看其他检索出来的数据

2、子聚合简单使用

问题:

按照年龄聚合,并且求出处于该年龄段的员工平均薪资

请求:

GET bank/_search
{"query": {"match_all": {}},"aggs": {"ageCount": {"terms": {"field": "age","size": 100},"aggs": {"balanceAvg": {"avg": {"field": "balance"}}}}},"size": 0
}

结果:

第一个aggs里面先进行了age的年龄统计,最多展示100个统计数据,这些统计数据每个都是一组数据的集合,如果我们需要对这些数据集合再次进行统计,那就需要使用子聚合,子聚合放在父聚合名称下面,具体我们ageCount中的aggs子聚合,子聚合的名称是balanceAvg,里面用于计算这一组数据中balance的平均值

3、多重子聚合的使用

问题:

按照年龄聚合,并分别查找这些年龄段中性别为M或者F的平均薪资以及这个年龄段的总体平均薪资

请求:

GET bank/_search
{"query": {"match_all": {}},"aggs": {"ageCount": {"terms": {"field": "age","size": 100},"aggs": {"genderCount": {"terms": {"field": "gender.keyword","size": 100},"aggs": {"balanceAvg": {"avg": {"field": "balance"}}}},"balanceAvg": {"avg": {"field": "balance"}}}}},"size": 0
}

结果:

………………"aggregations" : {"ageCount" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 463,"buckets" : [{"key" : 31,"doc_count" : 61,"genderCount" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "M","doc_count" : 35,"balanceAvg" : {"value" : 29565.628571428573}},{"key" : "F","doc_count" : 26,"balanceAvg" : {"value" : 26626.576923076922}}]},"balanceAvg" : {"value" : 28312.918032786885}},………………

解释:

由于篇幅原因,我们只给出一个统计值,这代表年龄是31岁的员工有60人,这些员工的平均工资是28312.918032786885元,其中性别为M的有35人,平均薪资是29565.628571428573元,然后性别为F的有26人,平均薪资是26626.576923076922元

4、聚合用法补充

1、min_doc_count的使用
GET bank/_search
{"query": {"match_all": {}},"aggs": {"age_agg": {"terms": {"field": "age","min_doc_count": 2}}},"size": "0"
}
说明:可以对聚合之后的数量进行过滤,比如上面的例子中就是找到age相同的情况下总数大于2的聚合结果

五、Mapping映射

1、查看索引中的映射信息

请求:

GET bank/_mapping

结果:

{"bank" : {"mappings" : {"properties" : {"account_number" : {"type" : "long"},"address" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}},"age" : {"type" : "long"},"balance" : {"type" : "long"},………………

解释:

bank是索引名称,_mapping是固定值,如果不指定字段的类型映射,es会自动猜测字段类型,然后设置字段类型,一般情况下非字符串类型都被设置为long类型,字符串类型都被设置为text类型

2、创建字段映射

请求:

// my_index是索引名称
PUT /my_index
{"mappings": {"properties": {"age":{"type": "integer"},"email":{"type":"keyword"},"name":{"type": "text"}}}
}

结果:

{"acknowledged" : true,"shards_acknowledged" : true,"index" : "my_index"
}

解释:

在往索引中插入数据之前,我们可以先定义索引中的字段名称对应的字段类型,字段类型多种多样,具体可以看:https://blog.csdn.net/hello_world123456789/article/details/95341515,另外email类型为keyword代表该字段会被精确匹配,不通过倒排索引匹配,而是通过直接和字段值进行比较完成精准匹配

字符串型数据的type可以是text或者keyword类型,其中text类型会被分词,而keyword类型不会被分词,对于一个字段,除type之外,还可以设置index属性,默认是true,也可以设置成false,那就是不能放在查询条件中检索,例如:"email":{"type":"keyword", "index": false}

补充:

创建复杂索引结构,不仅包含mappings,还包括aliases和settings,如下:

PUT /kms.wiki
{"aliases": {"kms.wiki.alias": {}},"mappings": {"properties": {"basicInfo": {"type": "text"},"browsedCount": {"type": "long"},"catalog": {"type": "text"},"clickCount": {"type": "long"},"collectCount": {"type": "long"},"content": {"type": "text","analyzer": "ik_smart"},"contentUrl": {"type": "text"},"date": {"type": "date"},"fileId": {"type": "keyword"},"handleLink": {"type": "long"},"htmlContent": {"type": "text"},"imageUrl": {"type": "keyword"},"indexId": {"type": "keyword"},"kGraphView": {"type": "nested","properties": {"currentInstance": {"type": "nested","properties": {"instanceId": {"type": "keyword"},"instanceName": {"type": "keyword"},"objectName": {"type": "keyword"}}},"instances": {"type": "nested","properties": {"instanceId": {"type": "keyword"},"instanceName": {"type": "keyword"},"objectName": {"type": "keyword"}}},"relations": {"type": "nested","properties": {"relationId": {"type": "keyword"},"relationName": {"type": "keyword"},"sourceId": {"type": "keyword"},"targetId": {"type": "keyword"}}}}},"kgCategoryName": {"type": "keyword"},"kgId": {"type": "keyword"},"labels": {"type": "nested","properties": {"label": {"type": "keyword"},"type": {"type": "keyword"}}},"normalTitle": {"type": "keyword"},"shareCount": {"type": "long"},"source": {"type": "keyword"},"summary": {"type": "text"},"summaryUrl": {"type": "text"},"title": {"type": "text","fields": {"keyword": {"type": "keyword"},"pinyin": {"type": "text","analyzer": "pinyin_analyzer"},"synonyms": {"type": "text","analyzer": "ik_synon_max_word"}},"analyzer": "ik_max_word"},"titleMeaning": {"type": "keyword"},"url": {"type": "keyword"},"website": {"type": "keyword"}}},"settings": {"index": {"number_of_shards": "9","analysis": {"filter": {"pinyin_filter": {"keep_none_chinese_in_first_letter": "true","lowercase": "true","keep_original": "false","keep_first_letter": "true","trim_whitespace": "true","type": "pinyin","keep_none_chinese": "true","limit_first_letter_length": "16","keep_full_pinyin": "true"},"word_sync": {"type": "synonym","synonyms_path": "analysis-ik/synonym.txt"}},"analyzer": {"ik_synon_max_word": {"filter": ["word_sync"],"type": "custom","tokenizer": "ik_max_word"},"ik_synon_smart": {"filter": ["word_sync"],"type": "custom","tokenizer": "ik_smart"},"pinyin_analyzer": {"filter": ["pinyin_filter"],"tokenizer": "whitespace"}}},"number_of_replicas": "2"}}
}

3、Nested禁止扁平化处理

代码:

PUT product
{"mappings": {"properties": {……………………"catalogName": {"type": "keyword","index": false,"doc_values": false},"attrs": {"type": "nested","properties": {"attrId": {"type": "long"},"attrName": {"type": "keyword","index": false,"doc_values": false},"attrValue": {"type": "keyword"}}}}}
}

解释:

因为attrs里面还有属性,所以attrs是一个复杂属性,如果不添加"type": "nested"(“type”: "nested"怎么用),那么es将会对复杂属性进行扁平化处理,也就是将所有的attrId的值放在一个数组中,那我们进行查询数组中的单个值将会把数组中所有的值包含的attrs查询出来,这就不是我们想要的效果,所以我们需要需要添加"type": "nested"来禁止复杂属性的扁平化处理,对于添加"type": "nested"的复杂属性查询就需要特别的做法,具体操作可以看:Example query,虽然里面的例子是直接在query下面使用的nested,但是我们也可以在filter或者must等等下面使用nested

4、添加字段映射

请求:

PUT /my_index/_mapping
{"properties": {"employee-id": {"type": "keyword","index": "false"}}
}

结果:

{"acknowledged" : true
}

解释:

由于我们之前已经添加过age、email、name字段的类型,所以不能在使用创建映射的方式来添加映射了,而是需要使用本次的方式来添加映射,本次添加的employee-id字段的type是keyword,代表不经过倒排索引精准匹配,index是false代表该字段不能参与query检索,不过默认index是true,也就是默认参与query检索

5、更新字段映射

字段映射创建之后,无法更新已创建的字段映射,只能重新创建索引,然后添加想要的字段映射,通过数据迁移来实现和更新字段映射相同的效果

6、数据迁移

(1)es7将type类型变为可选、es8完全去掉type类型的解释:

从es7开始,url中的type参数变为可选,比如索引一个文档不在要求提供文档类型type,数据可以直接存储在index索引下面,这是因为在关系数据库(比如mysql)中两个数据库是独立的,即使他们里面有相同名称的列也不会影响使用,但是ES中不是这样的,elasticsearch是基于Lucene开发的搜索引擎,ES中不同type下名称相同的filed字段最终在Lucene中的处理方式是一样的,如果同一个index中的不同type中的同名字段具有不同的映射(包括字段类型等),那就会出现冲突情况,最终导入Lucene处理效率下降,而我们总不能让同一个index下面的所有type中的文档中的同名字段属性使用同一种类型吧,所以我们要去掉type,以后文档直接就存储在index索引下面,简单来说去掉type就是为了提高ES处理数据的效率

(2)具体实现

操作:

将bank索引下面的account类型的所有文档都迁移到新的newbank下面

创建新索引的映射:

PUT /newbank
{"mappings": {"properties": {"account_number": {"type": "long"},"address": {"type": "text"},"age": {"type": "integer"},"balance": {"type": "long"},"city": {"type": "keyword"},"email": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"employer": {"type": "keyword"},"firstname": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"gender": {"type": "keyword"},"lastname": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"state": {"type": "keyword"}}}
}

es6中的数据迁移到es7:

POST _reindex
{"source": {"index": "bank","type": "account"},"dest": {"index": "newbank"}
}

在source中index后面的bank是老index索引名称,type后面的account是老索引下面的type类型名称,这是es6的写法,毕竟还有真实的type类型,然后es7中只需要把创建好的新index名称写在那里就可以了,因为我们当前使用的是es7.4.2版本,所以选择不使用type

es7中的数据迁移到es7:

POST _reindex
{"source": {"index": "老index索引名称"},"dest": {"index": "新index索引名称"}
}

总结:

无论是es6中的数据迁移到es7还是es7中的数据迁移到es7,都需要先创建映射,并且注意映射中的字段名称和老的index索引中的字段名称一致,可以通过GET bank/_mapping来查看字段信息,其中bank是索引名称,按照上面说过的形式来完成数据迁移,es6->es7需要多写一个type,可以通过GET bank/_search来查看type名称等,如果迁移完成可以通过GET newbank/_search查看index和type,可以看到"_type" : "_doc",虽然还有一个type,但是这并不是真的type,只是一个象征意义

六、分词器使用

1、下载ik分词器和pinyin分词器

pinyin分词器下载路径:https://github.com/medcl/elasticsearch-analysis-pinyin/

ik分词器下载路径: https://github.com/medcl/elasticsearch-analysis-ik

下载方法:

点击tags,如下:

找到合适的版本,点击Downloads按钮,如下:

点击zip就可以下载,如下:

安装方法:

windows环境: 将zip解压之后放在elasticsearch安装目录的plugins目录下面,如下:

k8s容器环境: 将zip解压之后放在/usr/share/elasticsearch/plugins目录下面,如下:

2、默认分词器

POST _analyze
{"analyzer": "standard","text": "我是中国人"
}

解释:

只能识别英文,不能识别中文,中文会被一个一个拆开

3、ik_smart分词器

POST _analyze
{"analyzer": "ik_smart","text": "我是中国人"
}

结果:

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中国人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2}]
}

4、ik_max_word分词器

POST _analyze
{"analyzer": "ik_max_word","text": "我是中国人"
}

结果:

{"tokens" : [{"token" : "我","start_offset" : 0,"end_offset" : 1,"type" : "CN_CHAR","position" : 0},{"token" : "是","start_offset" : 1,"end_offset" : 2,"type" : "CN_CHAR","position" : 1},{"token" : "中国人","start_offset" : 2,"end_offset" : 5,"type" : "CN_WORD","position" : 2},{"token" : "中国","start_offset" : 2,"end_offset" : 4,"type" : "CN_WORD","position" : 3},{"token" : "国人","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 4}]
}

解释:

以后创建索引之前需要先创建ik分词器,毕竟我们需要指定ik分词器,不能在使用默认分词器了,它们对中文的支持度实在太低了

5、ik_max_word分词器

POST _analyze
{"analyzer": "pinyin","text": "我是中国人"
}

结果:

{"tokens": [{"token": "wo","start_offset": 0,"end_offset": 0,"type": "word","position": 0},{"token": "wszgr","start_offset": 0,"end_offset": 0,"type": "word","position": 0},{"token": "shi","start_offset": 0,"end_offset": 0,"type": "word","position": 1},{"token": "zhong","start_offset": 0,"end_offset": 0,"type": "word","position": 2},{"token": "guo","start_offset": 0,"end_offset": 0,"type": "word","position": 3},{"token": "ren","start_offset": 0,"end_offset": 0,"type": "word","position": 4}]
}

6、创建ik分词器的自定义远程仓库

请看virtualbox和vagrant.docx

七、例子

GET _search
{"query": {"match_all": {}}
}POST /customer/external/_bulk
{"index":{"_id":"1"}}
{"name":"John Doe"}
{"index":{"_id":"2"}}
{"name":"John Doe"}GET /bank/_search?q=*&sort=account_number:ascGET bank/_search
{"query": {"match_all": {}},"_source": "balance"
}GET bank/_search
{"query": {"match_all": {}},"_source": "balance"
}GET bank/_search
{"query": {"match": {"address": "mil"}}
}GET bank/_search
{"query": {"match": {"address": "mill lane"}}
}GET bank/_search
{"query": {"multi_match": {"query": "mill movico","fields": ["address","city"]}}
}GET bank/_search
{"query": {"bool": {"must": [{"match": {"gender": "M"}},{"match": {"address": "mill"}}],"must_not": [{"match": {"age": "28"}}],"should": [{"match": {"lastname": "Hines"}}],"filter": {"range": {"age": {"gte": 10,"lte": 40}}}}}
}GET bank/_search
{"query": {"match": {"balance": 16}}
}GET bank/_search
{"query": {"term": {"address.keyword": "282 Kings Place"}}
}# 简单聚合用法
GET bank/_search
{"query": {"match": {"address": "mill"}},"aggs": {"balanceCount": {"terms": {"field": "balance","size": 10}},"balanceSum": {"sum": {"field": "balance"}},"balanceAvg":{"avg": {"field": "balance"}},"balanceMin":{"min": {"field": "balance"}},"balanceMax":{"max": {"field": "balance"}}}
}# 子聚合(按照年龄聚合,并且求出处于该年龄段的员工平均薪资)
GET bank/_search
{"query": {"match_all": {}},"aggs": {"ageCount": {"terms": {"field": "age","size": 20},"aggs": {"balanceAvg": {"avg": {"field": "balance"}}}},"balanceAvg": {"avg": {"field": "balance"}}},"size": 0
}# 按照年龄聚合,并分别查找这些年龄段中性别为M或者F的平均薪资以及这个年龄段的总体平均薪资
GET bank/_search
{"query": {"match_all": {}},"aggs": {"ageCount": {"terms": {"field": "age","size": 10},"aggs": {"genderCount": {"terms": {"field": "gender.keyword","size": 100},"aggs": {"balanceAvg": {"avg": {"field": "balance"}}}},"balanceAvg": {"avg": {"field": "balance"}}}}},"size": 0
}GET bank/_mappingPUT /my_index
{"mappings": {"properties": {"age":{"type": "integer"},"email":{"type":"keyword"},"name":{"type": "text"}}}
}PUT /my_index/_mapping
{"properties": {"employee-id": {"type": "keyword","index": "false"}}
}GET bank/_mappingGET bank/_searchPUT /newbank
{"mappings": {"properties": {"account_number": {"type": "long"},"address": {"type": "text"},"age": {"type": "integer"},"balance": {"type": "long"},"city": {"type": "keyword"},"email": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"employer": {"type": "keyword"},"firstname": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"gender": {"type": "keyword"},"lastname": {"type": "text","fields": {"keyword": {"type": "keyword","ignore_above": 256}}},"state": {"type": "keyword"}}}
}POST _reindex
{"source": {"index": "bank","type": "account"},"dest": {"index": "newbank"}
}GET newbank/_mappingGET newbank/_searchPOST _analyze
{"analyzer": "standard","text": "我是中国人"
}POST _analyze
{"analyzer": "ik_smart","text": "乔碧萝殿下"
}POST _analyze
{"analyzer": "ik_max_word","text": "尚硅谷电商项目"
}GET product/_searchPOST _reindex
{"source": {"index": "product"},"dest": {"index": "gulimall_product"}
}GET gulimall_product/_searchPUT gulimall_product
{"mappings": {"properties": {"skuId": {"type": "long"},"spuId": {"type": "keyword"},"skuTitle": {"type": "text","analyzer": "ik_smart"},"skuPrice": {"type": "double"},"skuImg": {"type": "keyword"},"saleCount": {"type": "long"},"hasStock": {"type": "boolean"},"hotScore": {"type": "long"},"brandId": {"type": "long"},"catalogId": {"type": "long"},"brandName": {"type": "keyword"},"brandImg": {"type": "keyword"},"catalogName": {"type": "keyword"},"attrs": {"type": "nested","properties": {"attrId": {"type": "long"},"attrName": {"type": "keyword"},"attrValue": {"type": "keyword"}}}}}
}GET gulimall_product/_search
{"query": {"bool": {"must": [{"match": {"skuTitle": "华为"}}],"filter": [{"term": {"catalogId": 225}},{"terms": {"brandId": [1,2]}},{"nested": {"path": "attrs","query": {"bool": {"must": [{"term": {"attrs.attrId": {"value": 1}}},{"term": {"attrs.attrValue": {"value": "ELS-AN10"}}}]}}}},{"term": {"hasStock": true}},{"range": {"skuPrice": {"gte": 0,"lte": 10000}}}]}},"sort": [{"skuPrice": {"order": "desc"}}],"from": 0,"size": 1,"highlight": {"pre_tags": "<b color='red'>","post_tags": "</b>","fields": {"skuTitle": {}}},"aggs": {"brand_agg": {"terms": {"field": "brandId","size": 32},"aggs": {"brand_name_agg": {"terms": {"field": "brandName","size": 32}},"brand_img_agg": {"terms": {"field": "brandImg","size": 32}}}},"catalog_agg": {"terms": {"field": "catalogId","size": 14},"aggs": {"catalog_name_agg": {"terms": {"field": "catalogName","size": 14}}}},"attr_agg": {"nested": {"path": "attrs"},"aggs": {"attr_id_agg": {"terms": {"field": "attrs.attrId","size": 10},"aggs": {"attr_name_agg": {"terms": {"field": "attrs.attrName","size": 10}},"attr_value_agg": {"terms": {"field": "attrs.attrValue","size": 10}}}}}}}
}GET /gulimall_product/_mappingGET /gulimall_product/_searchGET gulimall_product/_search
{"query": {"match": {"brandId": "1"}},"aggs": {"brand_agg": {"terms": {"field": "brandId","size": 32},"aggs": {"brand_name_agg": {"terms": {"field": "brandName","size": 32}},"brand_img_agg": {"terms": {"field": "brandImg","size": 32}}}},"catalog_agg": {"terms": {"field": "catalogId","size": 14},"aggs": {"catalog_name_agg": {"terms": {"field": "catalogName","size": 14}}}},"attr_agg": {"nested": {"path": "attrs"},"aggs": {"attr_id_agg": {"terms": {"field": "attrs.attrId","size": 10},"aggs": {"attr_name_agg": {"terms": {"field": "attrs.attrName","size": 10}},"attr_value_agg": {"terms": {"field": "attrs.attrValue","size": 10}}}}}}},"size": 0
}GET /gulimall_product/_search
{"from": 0,"size": 2,"query": {"bool": {"must": [{"match": {"skuTitle": {"query": "华为","operator": "OR","prefix_length": 0,"max_expansions": 50,"fuzzy_transpositions": true,"lenient": false,"zero_terms_query": "NONE","auto_generate_synonyms_phrase_query": true,"boost": 1.0}}}],"filter": [{"term": {"catalogId": {"value": 225,"boost": 1.0}}}, {"terms": {"brandId": [1, 2],"boost": 1.0}}, {"nested": {"query": {"bool": {"must": [{"term": {"attrs.attrId": {"value": "1","boost": 1.0}}}, {"terms": {"attrs.attrValue": ["ELS-AN10", "123"],"boost": 1.0}}],"adjust_pure_negative": true,"boost": 1.0}},"path": "attrs","ignore_unmapped": false,"score_mode": "none","boost": 1.0}}, {"nested": {"query": {"bool": {"must": [{"term": {"attrs.attrId": {"value": "4","boost": 1.0}}}, {"terms": {"attrs.attrValue": ["华为 HUAWEI P40 Pro+"],"boost": 1.0}}],"adjust_pure_negative": true,"boost": 1.0}},"path": "attrs","ignore_unmapped": false,"score_mode": "none","boost": 1.0}}, {"term": {"hasStock": {"value": true,"boost": 1.0}}}, {"range": {"skuPrice": {"from": null,"to": "8000","include_lower": true,"include_upper": true,"boost": 1.0}}}],"adjust_pure_negative": true,"boost": 1.0}},"sort": [{"skuPrice": {"order": "desc"}}],"aggregations": {"brand_agg": {"terms": {"field": "brandId","size": 32,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]},"aggregations": {"brand_name_agg": {"terms": {"field": "brandName","size": 32,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]}},"brand_img_agg": {"terms": {"field": "brandImg","size": 32,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]}}}},"catalog_agg": {"terms": {"field": "catalogId","size": 14,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]},"aggregations": {"catalog_name_agg": {"terms": {"field": "catalogName","size": 14,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]}}}},"attr_agg": {"nested": {"path": "attrs"},"aggregations": {"attr_id_agg": {"terms": {"field": "attrs.attrId","size": 10,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]},"aggregations": {"attr_name_agg": {"terms": {"field": "attrs.attrName","size": 10,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]}},"attr_value_agg": {"terms": {"field": "attrs.attrValue","size": 10,"min_doc_count": 1,"shard_min_doc_count": 0,"show_term_doc_count_error": false,"order": [{"_count": "desc"}, {"_key": "asc"}]}}}}}}},"highlight": {"pre_tags": ["<b color='red'>"],"post_tags": ["</b>"],"fields": {"skuTitle": {}}}
}

Elasticsearch7.4.2学习文档(来自谷粒商城项目)相关推荐

  1. FreeMarker中文帮助手册API文档,基础入门学习文档

    FreeMarker中文帮助手册API文档,基础入门学习文档 分类: 编程技术 发布: bywei 浏览: 7 日期: 2011年5月28日 分享到: QQ空间 新浪微博 腾讯微博 人人网 什么是Fr ...

  2. NodeJS-001-Nodejs学习文档整理(转-出自http://www.cnblogs.com/xucheng)

    Nodejs学习文档整理 http://www.cnblogs.com/xucheng/p/3988835.html 1.nodejs是什么: nodejs是一个是javascript能在后台运行的平 ...

  3. java学习文档_阿里技术专家带你玩转JVM,从底层源码到项目实战,都在这份文档里...

    作为 Java 的从业者,在找工作的时候,一定会被问及关于 JVM 相关的知识. JVM 知识的掌握程度,在很多面试官眼里是候选人技术深度的一个重要评判标准.而大多数人可能没有对 JVM 的实际开发和 ...

  4. C和C++编程和学习文档

     C和C++编程和学习文档 C和C++编程和学习文档   1 :指针变量名称以p为首字符,这是程序员通常在定义指针时的一个习惯 2 :har * p;    (int *)p 把p强制转换为int型  ...

  5. 安卓学习文档收集汇总

    安卓学习文档收集汇总 https://www.jianshu.com/p/86aed183ce6c?utm_campaign=maleskine&utm_content=note&ut ...

  6. Hadoop大数据平台实践(二):Hadoop生态组件的学习文档

    Hadoop基础组件学习-Yzg-2019-03-06 Hadoop基础组件学习文档.. 1 简介.. 4 HDFS. 5 HDFS读文件.. 6 HDFS写文件.. 7 Mapreduce 8 单词 ...

  7. 100个Java项目解析,带源代码和学习文档!

    前言 你是否正在寻找带有源代码的Java项目的免费集合?你的搜索到这里结束,我为你提供了近100多个Java项目. 想要成为一个优秀的程序员写项目是绕不开的,毕竟工程学的最终目标都是要创造东西,所以, ...

  8. ffmpeg的中文学习文档

    ffmpeg的中文学习文档 文章目录: 一.ffmpeg介绍 二.学习参考文档 1.中文 一.ffmpeg介绍 ffmpeg是视频处理工具,可选参数非常多,功能也非常的强大,可以用来开发各种视频处理工 ...

  9. Ext JS 6学习文档-第3章-基础组件

    Ext JS 6学习文档-第3章-基础组件 基础组件 在本章中,你将学习到一些 Ext JS 基础组件的使用.同时我们会结合所学创建一个小项目.这一章我们将学习以下知识点: 熟悉基本的组件 – 按钮, ...

最新文章

  1. ologit模型与logit_Stata-多元 Logit 模型详解 (mlogit)
  2. 5G中网络切片技术是什么?—Vecloud
  3. 「实战篇」开源项目docker化运维部署-后端java部署(七)
  4. 到底是把甲方当爸爸还是当甲方爸爸
  5. 为VMware虚拟主机添加新磁盘
  6. creator图片循环显示_江淮宣传车厂家价格 图片 配置
  7. 【牛客 -330E 】Applese 涂颜色(费马小定理,超级快速幂)
  8. STM32使用DMA发送串口数据
  9. mybaties :required string parameter ‘XXX‘is not present
  10. Unable to open debugger port (127.0.0.1:**): java.net.BindException Address already in use: JVM_Bind
  11. java 标签的制作
  12. PostGIS教程七:几何图形(Geometry)
  13. 读周公度之《结构化学基础》
  14. TDK是什么意思,TDK怎么写?怎么利于SEO优化?
  15. day015异常捕获和正则
  16. chrome浏览器删除一些自动出现的书签
  17. Win7系统中,如何关闭Windows默认的防火墙? win7如何关闭防火墙
  18. Swift 使用NSRange 查找字符多次出现的位置处理
  19. 用robot framework + python实现http接口自动化测试框架
  20. Unity之升高人物视野

热门文章

  1. chrome 应用程序无法启动,因为应用程序的并行配置不正确
  2. 考研之南京大学软件学院
  3. css display: flex 横向滚动
  4. Dapp及相关开发工具介绍
  5. 又一家茶叶企业上市败北:八马茶业梦碎创业板,十分依赖定制采购
  6. Linux学习:简单基础的Linux系统命令
  7. 每日一题:提莫攻击(11-10)
  8. EF的TransactionScope
  9. html页面漏掉元素的关闭导致js不执行
  10. 面试官:脏读,不可重复读,幻读是如何发生的?