01. elasticsearch certification 练习题

文章目录

1. node setting
2. parent/child 文档
- 1. nested相关
- 2. join类型设置
3. query查询
- 1. 简单高亮
- 2. 模糊查询
- 3. multi_match查询
- 4. 打分查询
- 5. alias设置查询
- 6. bool 查询
- 7. 多个索引同一个alias,只有一个index可以写入
- 8. scroll query
4. 同义搜索
- 1. 题目一 we-do 等同于wedo
- 2. 题目二，dingding搜索
- 3. 题目三 dog & cat搜索
5. multi_fields
- 1. 题目一：多字段不同analyzer
- 2. 题目二：多字段不同analyzer
6. dynamic_template
7. date_range_and_count.txt
8. aggregation search
- 1. 题目一：地震数据按月最大最小
- 2. 题目二：地震数据聚合结果过滤过滤
- 3. 题目三:bucket agg之间的嵌套
- 4. 题目四：生产商排名
9. snapshot_and_restore
10. search_template
11. update_by_query
- 1. 题目一：内容join
- 2. 题目二：query过滤+script设置字段值
12. reindex pipeline_use
- 1. 题目一字段分割，去除空格，统计长度
- 2. 题目二，字段拼接，字符串长度
13. allocation filter
- 1. 题目一：冷热架构
- 2. 题目二：机架感知
- 3. 题目三：集群去除node, index去除node
14. cross cluster search
15. 截图真题

个人搜集的es练习题

1. node setting

集群的cluster name设置，集群的角色设置，集群的attr设置
今天又看了视频才发现还是有很多地方没有注意到位啊

## cluster,node基础设置
cluster.name: log-dev
node.name: node-2## node角色配置相关的有4个
node.master: true
node.data: true
node.ingest: true
node.ml: false
## master一般不允许远程进行连接，非master节点可以不配置
cluster.remote.connect: false# 单机配置snapshot的仓库地址
path.repo: ["/home/deploy/search/log-manager/elasticsearch-7.2.0/repository01"]# 这个可以直接配置成 _site_ 就代表会绑定本机的内网地址，考试中应该也不用修改
network.host: 19.76.3.145# 默认值就是9200，考试中好像没有设置
http.port: 12200# 考试中好像没有设置
transport.port: 12300# seed_hosts也可以直接设置ip,那么默认会加上transport.port
discovery.seed_hosts: ["19.76.0.98:12300", "19.76.3.145:12300","19.76.0.129:12300"]# 这个是master的node name构成
cluster.initial_master_nodes: ["node-1", "node-2","node-3"]
bootstrap.system_call_filter: false## data log 存储地址配置，考试中一般不需要特别关注
path.data: /home/deploy/search/log-manager/elasticsearch-7.2.0/data
path.logs: /home/deploy/search/log-manager/elasticsearch-7.2.0/logs# reindex 从其他集群的时候这里需要配置其他集群的地址
reindex.remote.whitelist: "19.76.0.27:14200,19.76.0.98:14200, 19.76.3.145:14200,  19.76.0.129:14200"# node attr的设置
node.attr.size: small
node.attr.rack: rack01
node.attr.disk: big
node.attr.machine: m01

2. parent/child 文档

1. nested相关

POST phone/_doc/1
{"brand": "samsung","model": "AS1","features": [{"type": "os","value": "android"},{"type": "memory","value": "100`"},{"type": "capacity","value": "128"}]
}
POST phone/_doc/2
{"brand": "apple","model": "AS2","features": [{"type": "os","value": "apple"},{"type": "memory","value": "32"},{"type": "capacity","value": "100"}]
}

使用类似下面的查询，结果是只能查出来一条数据

GET phone/_search
{"query": {"bool": {"must": [{"match": {"features.type": "memory"}},{"match": {"features.value": "100"}}]}}
}

2. join类型设置

{"title":"elastic","content":"ELK is a great tool"
}{"comments":"good blogs"}

上面的两个doc，一个是article，一个是该article的comment，把这两个存入同一个index当中


PUT join_test03
{"mappings": {"properties": {"title": {"type": "text"},"content": {"type": "text"},"comments": {"type": "text"},"relation": {"type": "join","relations": {  # 注意这个地方是固定的，别忘了"article": "comment"}}}}
}


PUT join_test03/_doc/1
{"title": "elastic","content": "ELK is a great tool","relation": {           #这个字段使用嵌套结构"name": "article"}
}PUT join_test03/_doc/2?routing=1
{"comments":"good blogs","relation":{"name":"comment","parent":1   # 直接是parent}
}测试GET join_test03/_search
{"query": {"has_child": {"type": "comment","query": {"match": {"comments": "good"}}}}
}

3. query查询

1. 简单高亮

PUT query_highlight/_doc/1
{"title":"ther beautifull door is yours","body":"i want to be a better man to left the door "
}
PUT query_highlight/_doc/2
{"title":"do you like dog?","body":"the dog is a good friend more than a pet "
}

在查询title字段中存在door的doc，并进行高亮

GET query_highlight/_search
{"query": {"match": {"title": "door"}},"highlight": {"fields": {"title": {"pre_tags": ["<em>"],"post_tags": ["</em>"]},"body": {}  #这里的查询没有效果}}
}

感觉必须是指定字段才行了

2. 模糊查询

如果要进行编辑距离，即使是近似door的也要能够查出来,下面的是编辑距离为2的查询结果

GET query_highlight/_search
{"query": {"match": {"title": {"query": "door","fuzziness": "2"}}},"highlight": {"fields": {"title": {"pre_tags": ["<em>"],"post_tags": ["</em>"]},"body": {}}}
}

3. multi_match查询


PUT multi_match/_doc/1
{"title":"dog is friend","body":"we all should love and protect dogs,they are friend","detail":"do you really believe it is good thing"
}PUT multi_match/_doc/2?refresh
{"title":" cat  is friend","body":"cat is my friend","detail":"do you really believe dog and cat is good thing"
}

在title,body,detail中查找dog,而且三个字段的boost依次为1，2，3

GET multi_match/_search
{"query": {"multi_match" : {"query":      "dog","type":       "most_fields","fields":     [ "title", "body^2", "detail^3" ]}}
}

4. 打分查询

索引 movie-1，保存的电影信息，title是题目，tags是电影的标签。

在title中包含“my”或者“me”。

如果在tags中包含"romatic movies"，该条算分提高，如果不包含则算分

POST movie-1/_search
{"query": {"bool": {"must": [{"terms": {"title": ["my","me"]}}],"should": {"match": {"tags": {"query": "romatic movies","boost": 2}}}}}
}

5. alias设置查询

为task23设定一个index alias名字为alias2，默认查询只返回评分大于3的电影。
要注意奥，alias是可以设置的时候同时指定过滤条件的。


POST /_aliases
{"actions":[{"add":{"index":"task23","alias": "alias2","filter":{"range": {"score": {"gt": 3}}}}}]
}

6. bool 查询

写一个查询，要求某个关键字"new york"在task25这个索引中，4个字段(“overview”/“title”/“tags”/“tagline”)中至少包含两个以上

POST task25/_search
{"query": {"bool": {"should": [{"match": {"overview": "new york"}},{"match": {"title": "new york"}},{"match": {"tags": "new york"}},{"match": {"tagline": "new york"}}],"minimum_should_match": 2}}
}

除了这样写好像真的没有更好的办法

7. 多个索引同一个alias,只有一个index可以写入

POST _aliases
{"actions": [{"add": {"index": "hamlet-1","alias": "hamlet","is_write_index": true}},{"add": {"index": "hamlet-2","alias": "hamlet"}}]
}

8. scroll query

earth_quack 索引中有221条数据，按照每个batch 100条的方式进行遍历

GET earth_quack/_search?scroll=1m&size=100
{"query": {"range": {"Gap": {"gte": 10}}}
}# 第二次的时候啥都不用带了，直接继续使用scroll_id查询即可。
GET _search/scroll
{"scroll": "1m","scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAAkWSmxIWXRPbmFRSmloeWNTTUVXM0xtQQ=="
}

4. 同义搜索

自定义分词器尽量增加一个lowercase filter是一个好习惯的

1. 题目一 we-do 等同于wedo

要求


PUT synonym_test01/_doc/1
{"title":"we-do work","des":"we-do like do work",}PUT synonym_test01/_doc/2
{"title":"we-do work for a long time , you do not need do it ","des":"we-do like do work"
}PUT synonym_test01/_doc/3
{"title":"wedo work  it ","des":"wedo like do work we do "
}GET synonym_test01/_search
{"query": {"match": {"title": "wedo"}}
}

只能查出来id为3的信息，要求使用x-x查询和xx查询的结果是一样的也就是查询 we-do和wedo结果一样


PUT synonym_test
{"settings": {"analysis": {"analyzer": {"synonym":{"type":"custom","tokenizer":"standard","char_filter":"map_filter"}},"char_filter": {"map_filter":{"type":"mapping","mappings":["- =>"]}}}},"mappings": {"properties": {"title":{"type": "text","analyzer": "synonym"}}}
}POST _reindex
{"source": {"index": "synonym_test01"},"dest": {"index": "synonym_test"}
}

GET synonym_test/_search
{"query": {"match": {"title": "wedo"}}
}

2. 题目二，dingding搜索

数据如下


PUT dingding_test/_bulk{"index":{"_id":1}}{"title":"oa is very good"}{"index":{"_id":2}}{"title":"oA is very good"}{"index":{"_id":3}}{"title":"OA is very good"}{"index":{"_id":4}}{"title":"dingding is very good"}{"index":{"_id":5}}{"title":"dingding is ali software"}{"index":{"_id":6}}{"title":"0A is very good"}

要求查询 oa oA OA dingding 0A 这几个出来的结果是一样的，都能够命中各个文档。

DELETE dingding_test
PUT dingding_test
{"settings": {"analysis": {"analyzer": {"my_analyzer": {"type": "custom","tokenizer": "standard","filter": ["lowercase","synonym"]}},"filter": {"synonym": {"type": "synonym","synonyms": ["oa,0a,dingding"]}}}},"mappings": {"properties": {"title": {"type": "text","analyzer": "my_analyzer"}}}
}GET dingding_test/_search{"query": {"match": {"title": "oA"}}}

3. 题目三 dog & cat搜索

有一个文档，内容类似dog & cat，要求索引这条文档，并且使用match_phrase query，查询dog & cat或者dog and cat都能match

数据


PUT dog_and_cat/_bulk
{"index":{ "_id":0}}
{"title" : "dog and cat  are my familly"}
{"index":{ "_id":1}}
{"title" : "do you love dog & cat"}
{"index":{ "_id":2}}
{"title" : "you will finally find  dog  cat"}

这一题要求的是用match_phrase查询，而且隐含的一个知识点是&符号会被standard tokenizer去掉，注意不是在filter中去掉的，是在tokenizer中去掉的，所以必须在tokenizer之前对数据进行处理，或者是换tokenizer处理。


PUT dog_and_cat02
{"settings": {"analysis": {"analyzer": {"my_analyzer":{"type":"custom","char_filter":["map_filter"],"tokenizer":"standard"}},"char_filter": {"map_filter":{"type":"mapping","mappings":["& => and"]}}}},"mappings": {"properties": {"title":{"type":"text","analyzer": "my_analyzer"}}}
}

reindex and search


POST _reindex
{"source": {"index": "dog_and_cat"},"dest": {"index": "dog_and_cat02"}
}GET   dog_and_cat02/_search
{"query": {"match_phrase": {"title": "dog & cat"}}
}

解法二使用同义词，但是需要把tokenizer换成white space这样的话才能保留&符号，进而使用同义词功能
参考 https://elasticsearch.cn/article/6133

5. multi_fields

1. 题目一：多字段不同analyzer

PUT multi_fields/_doc/1
{"title":"manager","des":"this is the man who is more powerfull "
}
PUT multi_fields/_doc/2
{"title":"employee","des":"this is the man who really do the job "
}

新建一个索引，为title设置多个字段，字段名为space_f使用whitespace analyzer,然后reindex当前索引的数据到新的索引当中

PUT multi_fields02
{"mappings": {"properties": {"des": {"type": "text","fields": {"space_f": {"type": "text","analyzer": "whitespace"}}},"title": {"type": "text","fields": {"space_f": {"type": "text","analyzer": "whitespace"}}}}}
}POST _reindex
{"source": {"index": "multi_fields"},"dest": {"index": "multi_fields02"}
}

2. 题目二：多字段不同analyzer

给出一个有数据的index multi_fields03，设计新的索引multi_fields04的mapings，然后把数据转移到multi_fields04，其中xxx（字段名忘了）字段可以转过去后用standard作为分词器，并且新建两个新字段一个字段名xxx.english,以english分词，另一个字段名为xxx.stop,分词器为stop,其他的field都取和原来的index类型一致。

数据样例


PUT multi_fields03/_doc/1
{"content":"i want to be better","name":"chencc","age":180
}PUT multi_fields03/_doc/2
{"content":"she want a happy life","name":"zhaolu","age":18
}PUT multi_fields03/_doc/3
{"content":"best wish for you team","name":"wangj","age":28
}

创建mapping


PUT multi_fields04
{"mappings" : {"properties" : {"age" : {"type" : "long"},"content" : {"type" : "text","analyzer": "standard","fields" : {"english":{"type":"text","analyzer":"english"},"stop":{"type":"text","analyzer":"stop"}}},"name" : {"type" : "text","fields" : {"keyword" : {"type" : "keyword","ignore_above" : 256}}}}}
}POST _reindex
{"source": {"index": "multi_fields03"},"dest": {"index": "multi_fields04"}
}

6. dynamic_template

以key_开头的字段都是keyword类型
string 类型的以key_开头的字段都是keyword类型

PUT _template/dynamic_template
{"index_patterns": ["dynamic*"],"settings": {"number_of_shards": 3},"mappings": {"dynamic_templates": [{"key_word": {"match": "key_*","mapping": {"type": "keyword"}}}]}
}PUT dynamic01/_doc/1
{"title":"this doc is for dynamic use","key_want":"go back","key_like":123
}GET dynamic01
对应的两个key开头的字段都是keyword类型

假如增加一个对初始类型识别的限制则可以限制只把初始类型为string的并且以key_开头的设置为keyword


PUT _template/dynamic_template02
{"index_patterns": ["02dyn*"],"settings": {"number_of_shards": 3},"mappings": {"dynamic_templates": [{"key_word": {"match_mapping_type":"string","match": "key_*","mapping": {"type": "keyword"}}}]}
}GET 02dynamic/_doc/1
{"title":"this doc is for dynamic use","key_want":"go back","key_like":123
}GET 02dynamic
..."key_like" : {"type" : "long"},"key_want" : {"type" : "keyword"},...

7. date_range_and_count.txt

查询国家为China,birth为2016年1-3月的women


PUT people_agg
{"mappings": {"properties": {"birth": {"type": "date","format": "yyyy/MM/dd HH:mm:ss.SS"},"country": {"type": "keyword"},"sex": {"type": "keyword"},"des": {"type": "text"}}}
}PUT people_agg/_doc/1
{"birth":"2016/01/04 21:18:48.64","country":"China","sex":"woman","des":"beauty woman"
}PUT people_agg/_doc/2
{"birth":"2016/02/04 21:18:48.64","country":"China","sex":"woman","des":"beauty woman"
}PUT people_agg/_doc/3
{"birth":"2016/01/04 21:18:48.64","country":"China","sex":"man","des":"beauty man"
}PUT people_agg/_doc/4
{"birth":"2016/01/04 21:18:48.64","country":"Japan","sex":"woman","des":"beauty woman"
}PUT people_agg/_doc/5
{"birth":"2016/03/04 21:18:48.64","country":"China","sex":"woman","des":"beauty woman"
}

GET people_agg/_count
{"query": {"bool": {"must": [{"range": {"birth": {"gte": "01/2016","lte": "04/2016","format": "MM/yyyy||yyyy"}}},{"term": {"sex": {"value": "woman"}}},{"term": {"country": {"value": "China"}}}]}}
}

需要注意的是这个地方的date格式的range查询，时间应该没有办法用等于来查询
还有就是不用用agg查询，直接使用count查询就ok了。

8. aggregation search

1. 题目一：地震数据按月最大最小

地震数据，查找每个月最大深度和最远距离，同时赛选出来深度最大的月份

{"DateTime" : "2016/01/04 21:18:48.64","Latitude" : "37.3257","Longitude" : "-122.1043","Depth" : "-0.32","Magnitude" : "1.55","MagType" : "Md","NbStations" : "12","Gap" : "77","Distance" : "1","RMS" : "0.06","Source" : "NC","EventID" : "72573650"}


GET earth_quack/_search?size=0
{"aggs": {"month": {"date_histogram": {"field": "DateTime","calendar_interval": "month"},"aggs": {"max_dep": {"max": {"field": "Depth"}},"max_dis":{"max": {"field": "Distance"}}}},"max_bucket":{"max_bucket": {"buckets_path": "month>max_dep"}}}
}

这个是一个sibling 聚合，直接是最外层，parent聚合是在内层

2. 题目二：地震数据聚合结果过滤过滤

在1的基础上过滤dep>0的数据

GET earth_quack/_search?size=0
{"aggs": {"month": {"date_histogram": {"field": "DateTime","calendar_interval": "month"},"aggs": {"max_dep": {"max": {"field": "Depth"}},"max_dis":{"max": {"field": "Distance"}},"dep_filter":{"bucket_selector": {"buckets_path": {"m_dep":"max_dep"},"script": "params.m_dep>0"}}}}}
}

3. 题目三:bucket agg之间的嵌套


PUT log_agg
{
"mappings" : {"properties" : {"name" : {"type" : "text"},"param" : {"type" : "keyword"},"status" : {"type" : "long"},"uri" : {"type" : "keyword"}}}
}

数据


PUT log_agg/_doc/1
{"uri":"/query","status":200,"param":"query dog"
}PUT log_agg/_doc/2
{"uri":"/query","status":200,"param":"query cat"
}PUT log_agg/_doc/3
{"uri":"/query","status":400,"param":"query bad"
}PUT log_agg/_doc/4
{"uri":"/login","status":200,"param":"uid:123"
}PUT log_agg/_doc/5
{"uri":"/login","status":400,"param":"uid:123  uid bad"
}PUT log_agg/_doc/6
{"uri":"/login","status":400,"param":"uid:123,pass bad"
}PUT log_agg/_doc/7
{"uri":"/login","status":302,"param":"uid:123,no user"
}PUT log_agg/_doc/8
{"uri":"/register","status":302,"param":"phone:12345"
}PUT log_agg/_doc/9
{"uri":"/register","status":302,"param":"query cat"
}PUT log_agg/_doc/10
{"uri":"/register","status":400,"param":"server error"
}

要求
查询日志文件中每个status请求排行前三的url。

这个刚开始没有反应过来可以这样做，bucket之间是可以嵌套的，我以为只能bucket嵌套metric呢

GET log_agg/_search?size=0
{"aggs": {"status_term": {"terms": {"field": "status","size": 10},"aggs": {"uri_ter": {"terms": {"field": "uri","size": 3,"order": {"_count": "desc"}}}}}}
}

"aggregations" : {"status_term" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : 400,"doc_count" : 4,"uri_ter" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "/login","doc_count" : 2},{"key" : "/query","doc_count" : 1},{"key" : "/register","doc_count" : 1}]}},{"key" : 200,"doc_count" : 3,"uri_ter" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "/query","doc_count" : 2},{"key" : "/login","doc_count" : 1}]}},{"key" : 302,"doc_count" : 3,"uri_ter" : {"doc_count_error_upper_bound" : 0,"sum_other_doc_count" : 0,"buckets" : [{"key" : "/register","doc_count" : 2},{"key" : "/login","doc_count" : 1}]}}]}}

4. 题目四：生产商排名

针对食品添加剂food ingredient这个索引为task15，要求添加剂字段

ingredient 这个name包含 tt，符合这个条件的top 10的供应商 manufacturer。

POST task15/_search
{"query": {"match": {"ingredient": "tt"}},"aggs": {"top_10": {"terms": {"field": "manufacturer"},"size": 10}}
}

9. snapshot_and_restore

给集群创建一个数据仓库，创建一个只包括work02_test09的快照信息
elasticsearch.yml文件配置

path.repo: ["/home/deploy/search/log-manager/single_node/repository_global"]

PUT _snapshot/exam_bak
{"type": "fs","settings": {"location": "exam_back01"}
}POST _snapshot/exam_bak/_verifyPUT _snapshot/exam_bak/snapshot_1
{"indices": "work02_test09"
}验证一下
GET _snapshot/exam_bak/snapshot_1

10. search_template

数据

PUT search_template/_doc/1
{"title":"i love pet ,and i want to have a dog","age":8
}PUT search_template/_doc/2
{"title":"i love pet ,and i want to have a cat","age":18
}PUT search_template/_doc/3
{"title":"i love pet","age":88
}

定义一个search_template,要求title字段中的内容是params1，排序的字段是params2,排序方法是params3,取出来的size是size

GET _search/template
{"id": "search_template","params": {"params1":“xxxx”，"params2":“xxxx”，"params2":“asc”，"size":10，}
}

先大致的写出queryGET search_template/_search
{"query": {"match": {"title": "TEXT"}},"sort": [{"FIELD": {"order": "desc"}}],"size":10}填充到template当中
PUT _scripts/template_query
{"script": {"lang": "mustache","source": {"query": {"match": {"title": "{{params1}}"}},"sort": [{"{{params2}}": {"order": "{{params3}}"}}],"size":"{{size}}"}}
}验证一下
GET _render/template/template_query
{"params": {"params1":"pet","params2":"age","params3":"asc","size":10}
}使用其进行搜索GET search_template/_search/template
{"id":"template_query","params": {"params1":"pet","params2":"age","params3":"desc","size":10}
}

11. update_by_query

1. 题目一：内容join

添加一个新字段，new_field,字段内容是title、content按顺序串联起来，并且title的字段值只有text类型


PUT script_new_field/_doc/1
{"title":"save food","content":"we all should save food ,it is important"
}PUT script_new_field/_doc/2
{"title":"protect water","content":"water is precious for all"
}

解法一

POST script_new_field/_update_by_query
{"script":{"lang":"painless","source":"ctx._source.new_field=ctx._source.title+' '+ctx._source.content"}
}
GET script_new_field/_search

解法二


PUT _ingest/pipeline/join_field
{"description": "join two field","processors": [{"set": {"field": "new_ffff","value": "{{title}} {{content}}"}}]
}POST script_new_field/_update_by_query?pipeline=join_field

2. 题目二：query过滤+script设置字段值


PUT city_update/_doc/1
{"city":"shanghai","name":"liurui"
}PUT city_update/_doc/2
{"city":"wuhan","name":"liuao"
}

将index中所有city为shanghai的数据修改为beijin

POST city_update/_update_by_query
{"query":{"match":{"city":"shanghai"}},"script":{"lang":"painless","source":"ctx._source.city='beijin'"}
}

12. reindex pipeline_use

POST _ingest/pipeline/_simulate
{"pipeline" : {// pipeline definition here},"docs" : [{ "_source": {/** first document **/} },{ "_source": {/** second document **/} },// ...]
}

只能是直接对pipeline进行渲染，不能对已经存储的pipeline进行渲染。
可以的

POST _ingest/pipeline/my-pipeline-id/_simulate
{"docs" : [{ "_source": {/** first document **/} },{ "_source": {/** second document **/} },// ...]
}

1. 题目一字段分割，去除空格，统计长度

转移一个index数据到另一个task2，其中原来index的数据为有个字段的数据为：" xx1 “,” xx2 “,” xx3 ",要求：
转到tesk2的数据为以逗号分割的数组。
去掉每个分割出来的字符串的两边空格。
新增一个数组长度的字段num。

数据准备


PUT pipe_origin/_doc/1
{"title":"teacher ,student , mom ","name":"diaom"
}PUT pipe_origin/_doc/2
{"title":"father ,engneer,son","name":"chenq"
}

处理

POST _ingest/pipeline/_simulate
{"pipeline": {"description": "split int arr and count num","processors": [{"split": {"field": "title","target_field": "temp","separator": ","}},{"foreach": {"field": "temp","processor": {"trim": {"field": "_ingest._value"}}}},{"script": {"lang": "painless","source": "ctx.len=ctx.temp.length;"}}]},"docs": [{"_source": {"title": "teacher ,student , mom "}}]
}POST _reindex
{"source": {"index": "pipe_origin"},"dest": {"index": "pipe_dest","pipeline": "array_deal"}
}

2. 题目二，字段拼接，字符串长度

数据准备


PUT pipe_origin/_doc/1
{"title":"teacher ,student , mom ","name":"diaom"
}PUT pipe_origin/_doc/2
{"title":"father ,engneer,son","name":"chenq"
}

reindex 到新的dest index 当中，增加一个新的字段join，值是这两个字段的值拼接,并有另一个len统计join的字符数。

答案


POST _ingest/pipeline/_simulate
{"pipeline": {"description": "split int arr and count num","processors": [{"set": {"field": "join","value": "{{name}} {{title}}"}},{"script": {"lang": "painless","source": "ctx.len=ctx.join.length()"}}]},"docs": [{"_source": {"title": "better man","name":"jack"}}]
}PUT _ingest/pipeline/you_know
{"description": "split int arr and count num","processors": [{"set": {"field": "join","value": "{{name}} {{title}}"}},{"script": {"lang": "painless","source": "ctx.len=ctx.join.length()"}}]
}POST _reindex
{"source": {"index": "pipe_origin"},"dest": {"index": "join_res","pipeline": "you_know"}
}GET join_res/_search

13. allocation filter

1. 题目一：冷热架构

部署三个节点的ES节点,有一个属性叫warm_hot
node01为hot，node02和node03是warm节点。
创建两个索引task701,task702,两个索引都是2个shard。一个shard都存在hot一个都存在warm当中。

# node设置
node1:  node.attr.warm_hot: hot
node2: node.attr.warm_hot: warm
node3: node.attr.warm_hot: warm索引设置
PUT task701
{"settings": {"index.routing.allocation.include.warm_hot":"warm","number_of_replicas": 0,"number_of_shards": 3}
}PUT task702
{"settings": {"index.routing.allocation.include.warm_hot":"hot","number_of_replicas": 0,"number_of_shards": 3}
}

2. 题目二：机架感知

三个节点有一个属性rack
node01,node02为rack01, node03为rack02

创建一个索引task703,2个shard,1个replica,
让task703的所有shard能够实现在rack01,rack02上的互备份。

node01:  node.attr.rack: rack01
node02:  node.attr.rack: rack01
node03:  node.attr.rack: rack02PUT _cluster/settings
{"persistent": {"cluster.routing.allocation.awareness.attributes": "rack","cluster.routing.allocation.awareness.force.rack.values": "rack01,rack02"}
}PUT task703/
{"settings": { "number_of_replicas": 1,"number_of_shards": 2}
}GET _cat/shards/task703

3. 题目三：集群去除node, index去除node

index books1和books2都是3个主分片，1个副本分片。
要求books1只能分配在node-1上。
要求books2所有分片分配在node-2，node-3上。

不要上来就想着自定义attr,node name本身就是内部自定义的，多好用


PUT books1
{"settings": {"index.routing.allocation.include.name":"node-1","number_of_shards": 3,"number_of_replicas": 0}
}
GET _cat/shards/books1PUT books2
{"settings": {"index.routing.allocation.include.name":"node-2,node-3","number_of_shards": 3,"number_of_replicas": 1}
}
GET _cat/shards/books2

要从集群中摘除node3,先把数据给自动迁移走

PUT _cluster/settings
{"transient" : {"cluster.routing.allocation.exclude.name" : "node-1"}
}

14. cross cluster search

在cluster1中写入如下数据

PUT hamlet/_bulk
{"index":{"_id":0}}
{"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_id":1}}
{"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay answer me: stand, and unfold yourself."}

在cluster2中写入如下数据

PUT hamlet02/_bulk
{"index":{"_id":0}}
{"line_number":"1","speaker":"BERNARDO","text_entry":"Whos there?"}
{"index":{"_id":1}}
{"line_number":"2","speaker":"FRANCISCO","text_entry":"Nay answer me: stand, and unfold yourself."}

在cluster1中同时搜索两个集群中speaker为FRANCISCO 的doc

PUT _cluster/settings
{"persistent": {"cluster": {"remote": {"cluster_one": {"seeds": ["10.76.3.145:16300"],"transport.ping_schedule": "30s"}}}}
}GET hamlet,cluster_one:hamlet02/_search
{"query": {"match": {"speaker": "FRANCISCO"}}
}

15. 截图真题

https://github.com/mingyitianxia/elastic-certified-engineer/blob/master/review-practice/0011_zhenti.md