text和keyword区别 term和match区别 ik中文分词器使用

一、前言
二、之前相关的博客
三、造点测试数据
- 1. 创建一个index
- 2. 插入测试数据
四、做一份试卷
- 第1题：title term "宝贝"的结果？
- 第2题：title term "宝宝"的结果？
- 第3题：title term "宝"的结果？
- 第4题：title term "ABC"的结果？
- 第5题：title term "abc"的结果？
- 第6题：title match "宝贝"的结果？
- 第7题：title match "宝宝"的结果？
- 第8题：title match "ABC"的结果？
- 第9题：content match "宝贝"的结果？
- 第10题：content match "宝"的结果？
- 第11题：author term "宝"的结果？
- 第12题：author term "彭钧"的结果？
- 第13题：author match"彭钧"的结果？
五、分析title字段
六、分析content字段
七、答案
- 第1题：title term "宝贝"的结果？
- 第2题：title term "宝宝"的结果？
- 第3题：title term "宝"的结果？
- 第4题：title term "ABC"的结果？
- 第5题：title term "abc"的结果？
- 第6题：title match "宝贝"的结果？
- 第7题：title match "宝宝"的结果？
- 第8题：title match "ABC"的结果？
- 第9题：content match "宝贝"的结果？
- 第10题：content match "宝"的结果？
- 第11题：author term "宝"的结果？
- 第12题：author term "彭钧"的结果？
- 第13题：author match"彭钧"的结果？
八、添加自定义的词项
- 1. 添加自定义字典文件
- 2. 配置自定义字典文件
- 3. 测试
九、token_count
十、结语

一、前言

最近啊，我发现有个老同志（其实指我自己）在工作中用ES也好几年了，从ES5.6一直用到ES7.9。

但是这个同志对text和keyword区别、term和match区别还是很模糊，反正能用工具把ES中数据查询出来，全当MySQL这样关系型数据库用了。

我感觉ES的版本升的比较快，但是升的都是高级功能和底层实现。像text和keyword、term和match这些基础中的基础还是没啥变化的。

如果用ES好几年了，连text和keyword、term和match这些都搞不明白，还有脸说自己会ES？啊？看我老亚瑟给你一刀，再一刀，大招落地+斩杀，祝你幸福。

二、之前相关的博客

之前我的博客关于match、term、text和keyword的讲解也不少，主要集中如下三篇。有兴趣的请选择食用。
Elasticsearch笔记(九) Query DSL 查询教程
Elasticsearch笔记(十) Mapping 字段类型 keyword text date numeric
Elasticsearch笔记(十一) ES term terms prefix 搜索聚合查询详细总结

三、造点测试数据

1. 创建一个index

下面创建一个index，它只有3个字段

字段	说明	类型	分词器
title	标题	text	ES自带的标准分词器，适合英文
author	作者	keyword	无
content	内容	text	ik中文分词器，这里用 ik_smart

具体ik的安装和使用，网上优秀博客一大堆，这里我只给个官方链接吧
官方地址
下载地址

PUT pigg_blog
{"mappings": {"properties": {"title":{"type": "text","analyzer": "standard"},"author":{"type": "keyword"},"content":{"type": "text","analyzer": "ik_smart"}}}
}

2. 插入测试数据

PUT pigg_blog/_doc/1
{"title": "宝贝ABC","author": "宝贝巴士","content": "宝贝,宝贝ABC"
}PUT pigg_blog/_doc/2
{"title": "小跳蛙宝宝","author": ["彭钧", "李润"],"content": "小跳蛙宝宝"
}

四、做一份试卷

先做一份试卷，你可以先用笔记下你的答案，正确的答案在后面会给出。
每题你只需要写出返回的文档id。

第1题：title term "宝贝"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "宝贝"}}}
}

第2题：title term "宝宝"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "宝宝"}}}
}

第3题：title term "宝"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "宝"}}}
}

第4题：title term "ABC"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "ABC"}}}
}

第5题：title term "abc"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "abc"}}}
}

第6题：title match "宝贝"的结果？

GET pigg_blog/_search
{"query": {"match": {"title": "宝贝"}}
}

第7题：title match "宝宝"的结果？

GET pigg_blog/_search
{"query": {"match": {"title": "宝宝"}}
}

第8题：title match "ABC"的结果？

GET pigg_blog/_search
{"query": {"match": {"title": "ABC"}}
}

第9题：content match "宝贝"的结果？

GET pigg_blog/_search
{"query": {"match": {"content": "宝贝"}}
}

第10题：content match "宝"的结果？

GET pigg_blog/_search
{"query": {"match": {"content": "宝"}}
}

第11题：author term "宝"的结果？

GET pigg_blog/_search
{"query": {"term": {"author": {"value": "宝"}}}
}

第12题：author term "彭钧"的结果？

GET pigg_blog/_search
{"query": {"term": {"author": {"value": "彭钧"}}}
}

第13题：author match"彭钧"的结果？

GET pigg_blog/_search
{"query": {"match": {"author": "彭钧"}}}
}

五、分析title字段

看正确答案前，先看下分词器对文本的分析。

POST pigg_blog/_analyze
{"analyzer": "standard","text": ["宝贝ABC"]
}

结果如下：

{"tokens" : [{"token" : "宝","start_offset" : 0,"end_offset" : 1,"type" : "<IDEOGRAPHIC>","position" : 0},{"token" : "贝","start_offset" : 1,"end_offset" : 2,"type" : "<IDEOGRAPHIC>","position" : 1},{"token" : "abc","start_offset" : 2,"end_offset" : 5,"type" : "<ALPHANUM>","position" : 2}]
}

POST pigg_blog/_analyze
{"analyzer": "standard","text": ["小跳蛙宝宝"]
}

结果如下：

{"tokens" : [{"token" : "小","start_offset" : 0,"end_offset" : 1,"type" : "<IDEOGRAPHIC>","position" : 0},{"token" : "跳","start_offset" : 1,"end_offset" : 2,"type" : "<IDEOGRAPHIC>","position" : 1},{"token" : "蛙","start_offset" : 2,"end_offset" : 3,"type" : "<IDEOGRAPHIC>","position" : 2},{"token" : "宝","start_offset" : 3,"end_offset" : 4,"type" : "<IDEOGRAPHIC>","position" : 3},{"token" : "宝","start_offset" : 4,"end_offset" : 5,"type" : "<IDEOGRAPHIC>","position" : 4}]
}

六、分析content字段

POST pigg_blog/_analyze
{"analyzer": "ik_smart","text": ["宝贝,宝贝ABC"]
}

结果如下：

{"tokens" : [{"token" : "宝贝","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "宝贝","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 1},{"token" : "abc","start_offset" : 5,"end_offset" : 8,"type" : "ENGLISH","position" : 2}]
}

POST pigg_blog/_analyze
{"analyzer": "ik_smart","text": ["小跳蛙宝宝"]
}

结果如下：

{"tokens" : [{"token" : "小跳","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "蛙","start_offset" : 2,"end_offset" : 3,"type" : "CN_CHAR","position" : 1},{"token" : "宝宝","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 2}]
}

七、答案

不知道看过上面的解析，你对你的答案还确定几题呢？

第1题：title term "宝贝"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "宝贝"}}}
}

答案：空

第2题：title term "宝宝"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "宝宝"}}}
}

答案：空

第3题：title term "宝"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "宝"}}}
}

答案：1和2都返回

第4题：title term "ABC"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "ABC"}}}
}

答案：空

第5题：title term "abc"的结果？

GET pigg_blog/_search
{"query": {"term": {"title": {"value": "abc"}}}
}

答案：返回文档1

第6题：title match "宝贝"的结果？

GET pigg_blog/_search
{"query": {"match": {"title": "宝贝"}}
}

答案：1和2都返回

第7题：title match "宝宝"的结果？

GET pigg_blog/_search
{"query": {"match": {"title": "宝宝"}}
}

答案：1和2都返回

第8题：title match "ABC"的结果？

GET pigg_blog/_search
{"query": {"match": {"title": "ABC"}}
}

答案：返回文档1

第9题：content match "宝贝"的结果？

GET pigg_blog/_search
{"query": {"match": {"content": "宝贝"}}
}

答案：返回文档1

第10题：content match "宝"的结果？

GET pigg_blog/_search
{"query": {"match": {"content": "宝"}}
}

答案：空

第11题：author term "宝"的结果？

GET pigg_blog/_search
{"query": {"term": {"author": {"value": "宝"}}}
}

答案：空

第12题：author term "彭钧"的结果？

GET pigg_blog/_search
{"query": {"term": {"author": {"value": "彭钧"}}}
}

答案：返回文档2

第13题：author match"彭钧"的结果？

GET pigg_blog/_search
{"query": {"match": {"author": "彭钧"}}}
}

答案：返回文档2

八、添加自定义的词项

在上面发现ik分词器把”小跳蛙“分析成"小跳"和"蛙"这2个词，这个显然是不符合我预期的，这个时候就需要我们自己来添加自定义词项。

1. 添加自定义字典文件

在es的plugins\ik\config目录下，添加一个文本文件，命名为ext.dic，然后输入自己的词，最后一行也回车。

2. 配置自定义字典文件

然后重启es服务就可以了。

3. 测试

POST pigg_blog/_analyze
{"analyzer": "ik_smart","text": ["小跳蛙宝宝"]
}

结果如下：

{"tokens" : [{"token" : "小跳蛙","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "宝宝","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 1}]
}

这个时候你发现上面的只有"小跳蛙"，没有"跳蛙"，如果需要"跳蛙"，可以把分析器设置为"ik_max_word"，它会将词分析出最大数量的词项。

POST pigg_blog/_analyze
{"analyzer": "ik_max_word","text": ["小跳蛙宝宝"]
}

结果如下：

{"tokens" : [{"token" : "小跳蛙","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "小跳","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 1},{"token" : "跳蛙","start_offset" : 1,"end_offset" : 3,"type" : "CN_WORD","position" : 2},{"token" : "宝宝","start_offset" : 3,"end_offset" : 5,"type" : "CN_WORD","position" : 3}]
}

九、token_count

token_count是一个字段类型，用于统计分词后词项的个数，在特殊场景下还是比较有用的。
删除原来index，重新创建索引的mapping，并插入原来的数据。

PUT pigg_blog
{"mappings": {"properties": {"title":{"type": "text","analyzer": "standard"},"author":{"type": "keyword"},"content":{"type": "text","analyzer": "ik_smart","fields": {"length": {"type": "token_count","analyzer": "ik_smart"}}}}}
}

GET pigg_blog/_search
{"query": {"term": {"content.length": {"value": 3}}}
}

返回结果如下：

{"took" : 0,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 1,"relation" : "eq"},"max_score" : 1.0,"hits" : [{"_index" : "pigg_blog","_type" : "_doc","_id" : "1","_score" : 1.0,"_source" : {"title" : "宝贝ABC","author" : "宝贝巴士","content" : "宝贝,宝贝ABC"}}]}
}

十、结语

感觉做个IT打工仔，要学的东西很多，es，flink，spring cloud，vue，每次看到好的教程都是点赞，收藏，退出，一气呵成。还是慢慢学吧，积累一点算一点。

Elasticsearch教程(28) text和keyword区别 term和match区别 ik中文分词器使用相关推荐

Elasticsearch：IK 中文分词器
Elasticsearch 内置的分词器对中文不友好,只会一个字一个字的分,无法形成词语,比如: POST /_analyze {"text": "我爱北京天安门&quo ...
elasticsearch 5.6.x单机环境构建(集成head插件和IK中文分词)
elasticsearch近几年版本更新迭代的速度之已经超出了我的想象,想着我2016,2017年还在用着2.4.x版本,最近几年直接5.x,6.x,7.x版本了,看了一下下更新迭代的小版本时间几乎几 ...
springboot elasticsearch vue ik中文分词器实现百度/京东全文搜索
背景:实现和百度搜索一样效果的,全文搜索引擎支持关键词高亮显示文章目录 1. 企业级搜索引擎解决方案 2. 创建索引规则 3. 数据拉取 4. 搜索高亮 5. 自定义词库 6. 效果图 7. 开源源 ...
Elasticsearch（二）IK中文分词器
文章目录安装 ik 分词器在三个节点上安装 ik 分词器查看安装结果 ik分词测试 `ik_max_word` 分词测试 `ik_smart` 分词测试安装 ik 分词器从 ik 分词器项目 ...
ElasticSearch的IK中文分词器
目录概述一.安装下载二.设置es使用ik分词器三.效果对比四.ik分词器自定义字典五.ik分词器自定义字典的配置概述本文主要介绍了 ik 分词器在es中的一些配置以及原理,包括下载安 ...
Elasticsearch(四) - IK中文分词器
es对中文分词不给力,所以要用ik分词器. 下载和es对应版本的ik # cd /usr/local/elasticsearch-6.5.3/plugins # mkdir ik # cd ik/ # ...
Elasticsearch配置ik中文分词器自定义词库
1.IK配置文件在config目录下: IKAnalyzer.cfg.xml:配置自定义词库 main.dic:分词器自带的词库,索引会按照里面的词创建 quantifier.dic:存放计量单位词 ...
如何在Elasticsearch中安装中文分词器(IK+pinyin)
如何在Elasticsearch中安装中文分词器(IK+pinyin) 如果直接使用Elasticsearch的朋友在处理中文内容的搜索时,肯定会遇到很尴尬的问题--中文词语被分成了一个一个的汉字,当 ...
ElasticSearch 之中文分词器
4,中文分词器 4.1. smartCN(这个是参照别人的安装方法安装的有兴趣可以试一下) 一个简单的中文或中英文混合文本的分词器这个插件提供 smartcn analyzer 和 smartcn ...

Elasticsearch教程(28) text和keyword区别 term和match区别 ik中文分词器使用

text和keyword区别 term和match区别 ik中文分词器使用

一、前言

二、之前相关的博客

三、造点测试数据

1. 创建一个index

2. 插入测试数据

四、做一份试卷

第1题：title term "宝贝"的结果？

第2题：title term "宝宝"的结果？

第3题：title term "宝"的结果？

第4题：title term "ABC"的结果？

第5题：title term "abc"的结果？

第6题：title match "宝贝"的结果？

第7题：title match "宝宝"的结果？

第8题：title match "ABC"的结果？

第9题：content match "宝贝"的结果？

第10题：content match "宝"的结果？

第11题：author term "宝"的结果？

第12题：author term "彭钧"的结果？

第13题：author match"彭钧"的结果？

五、分析title字段

六、分析content字段

七、答案

第1题：title term "宝贝"的结果？

第2题：title term "宝宝"的结果？

第3题：title term "宝"的结果？

第4题：title term "ABC"的结果？

第5题：title term "abc"的结果？

第6题：title match "宝贝"的结果？

第7题：title match "宝宝"的结果？

第8题：title match "ABC"的结果？

第9题：content match "宝贝"的结果？

第10题：content match "宝"的结果？

第11题：author term "宝"的结果？

第12题：author term "彭钧"的结果？

第13题：author match"彭钧"的结果？

八、添加自定义的词项

1. 添加自定义字典文件

2. 配置自定义字典文件

3. 测试

九、token_count

十、结语

Elasticsearch教程(28) text和keyword区别 term和match区别 ik中文分词器使用相关推荐

最新文章

热门文章