elasticsearch自定义分词器---拼音分词器
拼音分词器
之前详细介绍了IK中文分词器,本节详细介绍拼音分词。拼音分词是将中文转化为拼音,并提供可配置项共用户自定义拼音检索方式,如拼音首字母、全拼等
下载地址
https://github.com/medcl/elasticsearch-analysis-pinyin
安装方式
1.在github页面找到releases页签,通过上下页找到自己ES的版本对应的发布包,以6.2.2为例,下载zip包。
2.上传到ES安装目录的plugins文件夹,解压并重命名解压后的文件夹为 pinyin(这步很重要)
详细步骤可以参考IK中文分词器
3.重启es,如果重启成功代表插件安装成功
插件说明
该插件内置了analyzer: pinyin
, tokenizer: pinyin,
token-filter: pinyin
GET /ik-pinyin/_analyze
{"text": ["中华人民共和国人民大会堂"],"analyzer": "pinyin"
}{"tokens": [{"token": "zhong","start_offset": 0,"end_offset": 0,"type": "word","position": 0},{"token": "zhrmghgrmdht", #所有文字拼音的首字母"start_offset": 0,"end_offset": 0,"type": "word","position": 0},{"token": "hua","start_offset": 0,"end_offset": 0,"type": "word","position": 1},{"token": "ren","start_offset": 0,"end_offset": 0,"type": "word","position": 2},{"token": "min","start_offset": 0,"end_offset": 0,"type": "word","position": 3},{"token": "gong","start_offset": 0,"end_offset": 0,"type": "word","position": 4},{"token": "he","start_offset": 0,"end_offset": 0,"type": "word","position": 5},{"token": "guo","start_offset": 0,"end_offset": 0,"type": "word","position": 6},{"token": "ren","start_offset": 0,"end_offset": 0,"type": "word","position": 7},{"token": "min","start_offset": 0,"end_offset": 0,"type": "word","position": 8},{"token": "da","start_offset": 0,"end_offset": 0,"type": "word","position": 9},{"token": "hui","start_offset": 0,"end_offset": 0,"type": "word","position": 10},{"token": "tang","start_offset": 0,"end_offset": 0,"type": "word","position": 11}]
}
配置index
1.创建index
PUT /ik-pinyin/{"acknowledged": true,"shards_acknowledged": true,"index": "ik-pinyin"
}
2.设置mappings,这里设置要做分词检索的字段,以content为例
PUT /ik-pinyin/DOC/_mapping
{"properties": {"content": {"type": "text","analyzer": "pinyin"}}
}{"acknowledged": true
}
3.测试
#写数据
POST /ik-pinyin/DOC/
{"content":"人民共和国人民大会堂"
}POST /ik-pinyin/DOC/
{"content":"中华人民共和国人民大会堂"
}POST /ik-pinyin/DOC/
{"content":"人民大会堂"
}#查看数据
POST /ik-pinyin/DOC/_search{"took": 8,"timed_out": false,"_shards": {"total": 3,"successful": 3,"skipped": 0,"failed": 0},"hits": {"total": 3,"max_score": 1,"hits": [{"_index": "ik-pinyin","_type": "DOC","_id": "d1--23IBeaMPz9g6CzKs","_score": 1,"_source": {"content": "中华人民共和国人民大会堂"}},{"_index": "ik-pinyin","_type": "DOC","_id": "dl-923IBeaMPz9g6LzIu","_score": 1,"_source": {"content": "人民共和国人民大会堂"}},{"_index": "ik-pinyin","_type": "DOC","_id": "eF--23IBeaMPz9g6ODK2","_score": 1,"_source": {"content": "人民大会堂"}}]}
}#使用中文检索数据,检索结果为空
POST /ik-pinyin/DOC/_search
{"query": {"bool": {"must": [{"term": {"content": "人民"}}]}}
}{"took": 8,"timed_out": false,"_shards": {"total": 3,"successful": 3,"skipped": 0,"failed": 0},"hits": {"total": 0,"max_score": null,"hits": [] #未命中任何记录}
}#用拼音 guo 检索
POST /ik-pinyin/DOC/_search
{"query": {"bool": {"must": [{"term": {"content": "guo"}}]}}
}{"took": 4,"timed_out": false,"_shards": {"total": 3,"successful": 3,"skipped": 0,"failed": 0},"hits": {"total": 2,"max_score": 0.2987943,"hits": [{"_index": "ik-pinyin","_type": "DOC","_id": "dl-923IBeaMPz9g6LzIu","_score": 0.2987943,"_source": {"content": "人民共和国人民大会堂"}},{"_index": "ik-pinyin","_type": "DOC","_id": "d1--23IBeaMPz9g6CzKs","_score": 0.29702917,"_source": {"content": "中华人民共和国人民大会堂"}}]}
}#使用zhrmghgrmdht检索
POST /ik-pinyin/DOC/_search
{"query": {"bool": {"must": [{"term": {"content": "zhrmghgrmdht"}}]}}
}{"took": 9,"timed_out": false,"_shards": {"total": 3,"successful": 3,"skipped": 0,"failed": 0},"hits": {"total": 1,"max_score": 0.29702917,"hits": [{"_index": "ik-pinyin","_type": "DOC","_id": "d1--23IBeaMPz9g6CzKs","_score": 0.29702917,"_source": {"content": "中华人民共和国人民大会堂"}}]}
}
自定义配置
GitHub中给出了详细的配置项及其说明和使用方式
elasticsearch自定义分词器---拼音分词器相关推荐
- Elasticsearch 分布式搜索引擎 -- elasticsearch-analysis-pinyin 拼音分词器的安装和介绍
1. 拼音分词器 要实现根据字母做补全,就必须对文档按照拼音分词.在GitHub上恰好有elasticsearch的 拼音分词插件. 7.12.1 版本(因为我的es是7.12.1版本) 1.1.1 ...
- CentOS安装Elasticsearch_IK分词器拼音分词器_部署kibana_部署es集群
CentOS安装Elasticsearch_IK分词器_部署kibana_部署es集群 一.部署单点es ①:创建网络 因为我们还需要部署kibana容器,因此需要让es和kibana容器互联.这里先 ...
- docker使用小记——docker安装es+ik分词器+拼音分词器+kibana
一.docker安装:Windows Docker 安装 | 菜鸟教程 二.docker换镜像源 修改或新增 /etc/docker/daemon.json vi /etc/docker/daemon ...
- Elasticsearch 2.3.x 拼音分词 analysis-lc-pinyin
我选择了elasticsearch-analysis-lc-pinyin作为拼音分词插件,它是一款elasticsearch拼音分词插件,可以支持按照全拼.首字母,中文混合搜索. elasticsea ...
- ElasticSearch之中文、拼音分词
1.IK分词(Git) 1.1.IK分词插件安装 /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.co ...
- es拼音分词 大帅哥_elasticsearch实现中文分词和拼音分词混合查询+CompletionSuggestion...
引言 之前已经介绍了如何搭建elasticsearch服务端和简单的索引创建,和中文分词的支持.今天我们来说一说如何实现elasticsearch同时实现中文分词和pinyin分词.并且实现类似百度搜 ...
- es - elasticsearch 自定义分析器 - 内建分词过滤器 - 11
世界上并没有完美的程序,但是我们并不因此而沮丧,因为写程序就是一个不断追求完美的过程. 自定义分析器 : Character filters : 1. 作用 : 字符的增.删.改转换 ...
- es - elasticsearch自定义分析器 - 内建分词过滤器 - 10
世界上并没有完美的程序,但是我们并不因此而沮丧,因为写程序就是一个不断追求完美的过程. 自定义分析器 : Character filters : 1. 作用 : 字符的增.删.改转换 ...
- Elasticsearch 分布式搜索引擎 -- 自动补全(拼音分词器、自定义分词器、自动补全查询、实现搜索框自动补全)
文章目录 1. 自动补全 1.1 拼音分词器 1.2.1 自定义分词器 1.2.2 小结 1.2 自动补全 1.3 实现酒店搜索框自动补全 1.3.1 修改酒店映射结构 1.3.2 修改HotelDo ...
- 服务器安装配置elasticsearch,kibana,IK分词器和拼音分词器,集群搭建教程
docker安装配置elasticsearch,kibana和IK分词器 elasticsearch文章系列 前置安装docker 创建docker网络 安装Elasticsearch 运行elast ...
最新文章
- SpringBoot实现微信点餐
- tensoflow_yolov3 计算平均识别个数(平均识别数)
- DataTbles中设置所有列不进行排序使用ording等不管用的解决方案
- Mysql物理逻辑备份概述
- Java中如何获得集合变量的集合中的类型参数
- 从EXCEL文件将数据导入数据库的向导程序设计!
- 对图片对比度和亮度的理解
- Jozky 刷题目录
- 浅谈面向对象的javascript几个特性
- 字节跳动新加坡职位 Algorithm Engineer (Platform Governance)
- 鸿蒙硬件HI3861-INA226-电压测量(外挂方案)
- websocket替代方案_WebSocket 有没有可能取代 AJAX ?
- 设计模式----单例模式(c++实现)
- 怎样搞研究的一套思路
- win10+Ubuntu16.04 LTS双系统完美教程
- 文法俱乐部 第一章 基本句型及补语
- 超高分辨率大屏拼接工作站硬件选型
- Parcel打包React
- 关于巨杉数据库的学习
- 批量无损压缩图片大小的工具Voralent Antelope