Elasticsearch 拼音分词器
安装步骤
下载地址:https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v7.2.0/elasticsearch-analysis-pinyin-7.2.0.zip
创建文件夹并上传解压文件(所有节点)
[root@master01 elasticsearch]# ls
bin config data jdk lib LICENSE.txt logs modules NOTICE.txt plugins README.textile
[root@master01 elasticsearch]# cd plugins/
[root@master01 plugins]# ls
analysis-ik
[root@master01 plugins]# mkdir pinyin
[root@master01 plugins]# cd pinyin/
[root@master01 pinyin]# ls
elasticsearch-analysis-pinyin-7.2.0.zip
[root@master01 pinyin]# unzip elasticsearch-analysis-pinyin-7.2.0.zip
Archive: elasticsearch-analysis-pinyin-7.2.0.zipinflating: plugin-descriptor.propertiesinflating: elasticsearch-analysis-pinyin-7.2.0.jarinflating: nlp-lang-1.7.jar
修改文件权限
[root@master01 plugins]# chown -R elastic:elastic ./pinyin/
[root@master01 plugins]# ll
total 0
drwxr-xr-x. 3 elastic elastic 243 Jul 29 15:53 analysis-ik
drwxr-xr-x. 2 elastic elastic 113 Aug 8 17:46 pinyin
重启集群
使用方式
测试拼音分词器:
GET /_analyze
{"text":"刘德华","analyzer": "pinyin"
}{"tokens" : [{"token" : "liu","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 0},{"token" : "ldh","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 0},{"token" : "de","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 1},{"token" : "hua","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 2}]
}
说明:
- keep_first_letter:刘德华> ldh keep_separate_first_letter:刘德华> l,d,h
- limit_first_letter_length:first_letter结果的最大长度,默认值:16
- keep_full_pinyin:刘德华> [ liu,de,hua] keep_joined_full_pinyin:刘德华> [liudehua] keep_none_chinese:结果中保留非中文字母或数字,默认值:true
- keep_none_chinese_together:默认值:true,如:DJ音乐家- >DJ,yin,yue,jia,当设置为false,例如:DJ音乐家- >D,J,yin,yue,jia,注意:keep_none_chinese必须先启动
- keep_none_chinese_in_first_letter:刘德华AT2016- > ldhat2016
- keep_none_chinese_in_joined_full_pinyin:刘德华2016- > liudehua2016
- lowercase:小写非中文字母,默认值:true remove_duplicated_term:de的>de
拼音分词器:
PUT /express_info_v1/
{"settings" : {"number_of_shards": 3,"number_of_replicas": 1,"analysis" : {"analyzer" : {"pinyin_analyzer" : {"tokenizer" : "my_pinyin"}},"tokenizer" : {"my_pinyin" : {"type" : "pinyin","keep_separate_first_letter" : false,"keep_full_pinyin" : true,"keep_original" : true,"limit_first_letter_length" : 16,"lowercase" : true,"remove_duplicated_term" : true}}}}
}
取别名:
POST _aliases
{"actions": [{"add": {"index": "express_info_v1","alias": "express_info"}}]
}
创建mapping
PUT /express_info_v1/_mappings
{"properties":{"name":{"type":"text","analyzer": "pinyin_analyzer"},"address":{"type":"text","analyzer":"pinyin_analyzer"},"send_time":{"type":"date","format": "yyyy-MM-dd"},"num":{"type":"text","analyzer":"pinyin_analyzer"}}
}
填充数据:
PUT /express_info_v1/_doc/1
{"name": "薛蒋柳","address": "康庄街道B-11-8","send_time": "2019-08-07","num":"sf9971618841"
}PUT /express_info_v1/_doc/2
{"name": "袁喻","address": "江西省抚州市黎川县","send_time": "2019-08-08","num":"ve458634059"
}
查询数据:
GET /express_info/_search
{"query": {"match": {"name": "yy"}}
}GET /express_info/_search
{"query": {"match": {"name": "源于"}}
}GET /express_info/_search
{"query": {"match": {"name": "薛蒋l"}}
}GET /express_info/_search
{"query": {"match": {"name": "xuejiangliu"}}
}
重建index
中文分词+拼音分词器
PUT /express_info_v2
{"settings": {"number_of_shards": 3,"number_of_replicas": 1,"analysis": {"analyzer": {"ik_smart_pinyin": {"type": "custom","tokenizer": "ik_smart","filter": ["my_pinyin", "word_delimiter"]},"ik_max_word_pinyin": {"type": "custom","tokenizer": "ik_max_word","filter": ["my_pinyin", "word_delimiter"]}},"filter": {"my_pinyin": {"type" : "pinyin","keep_separate_first_letter" : false,"keep_full_pinyin" : true,"keep_original" : true,"limit_first_letter_length" : 16,"lowercase" : true,"remove_duplicated_term" : true }}}}
}
创建mapping
PUT /express_info_v2/_mappings
{"properties":{"name":{"type":"text","analyzer": "ik_smart_pinyin"},"address":{"type":"text","analyzer":"ik_smart_pinyin"},"send_time":{"type":"date","format": "yyyy-MM-dd"},"num":{"type":"text","analyzer":"ik_max_word_pinyin"}}
}
数据重载:
POST _reindex
{"source": {"index": "express_info_v1"}, "dest": {"index": "express_info_v2"}
}
使用新index取代原始的index
POST /_aliases
{"actions": [{"remove": {"index": "express_info_v1","alias": "express_info"}},{"add": {"index": "express_info_v2","alias": "express_info"}}]
}
删除原始的index:
DELETE express_info_v1
测试:
GET /express_info_v2/_analyze
{"text": "江西省抚州市黎川县", "analyzer": "ik_max_word_pinyin"
}
{"tokens" : [{"token" : "jiang","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "江西省","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "jxs","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "xi","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 1},{"token" : "sheng","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 2},{"token" : "fu","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 3},{"token" : "zhou","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 4},{"token" : "shi","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 5},{"token" : "抚州市","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 5},{"token" : "fzs","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 5},{"token" : "li","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 6},{"token" : "chuan","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 7},{"token" : "xian","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 8},{"token" : "黎川县","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 8},{"token" : "lcx","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 8}]
}
Elasticsearch 拼音分词器相关推荐
- Elasticsearch 分布式搜索引擎 -- 自动补全(拼音分词器、自定义分词器、自动补全查询、实现搜索框自动补全)
文章目录 1. 自动补全 1.1 拼音分词器 1.2.1 自定义分词器 1.2.2 小结 1.2 自动补全 1.3 实现酒店搜索框自动补全 1.3.1 修改酒店映射结构 1.3.2 修改HotelDo ...
- 服务器安装配置elasticsearch,kibana,IK分词器和拼音分词器,集群搭建教程
docker安装配置elasticsearch,kibana和IK分词器 elasticsearch文章系列 前置安装docker 创建docker网络 安装Elasticsearch 运行elast ...
- ElasticSearch从入门到精通--第七话(自动补全、拼音分词器、自定义分词、数据同步方案)
ElasticSearch从入门到精通–第七话(自动补全.拼音分词器.自定义分词.数据同步方案) 使用拼音分词 可以引入elasticsearch的拼音分词插件,地址:https://github.c ...
- Elasticsearch——分布式搜索引擎01(索引库、文档、RestAPI、RestClient、拼音分词器、IK分词器)
Elasticsearch--分布式搜索引擎01(索引库.文档.RestAPI.RestClient.拼音分词器.IK分词器) 一.初识 elesticsearch 1.1 简介 1.2 倒排索引(重 ...
- Elasticsearch 分布式搜索引擎 -- elasticsearch-analysis-pinyin 拼音分词器的安装和介绍
1. 拼音分词器 要实现根据字母做补全,就必须对文档按照拼音分词.在GitHub上恰好有elasticsearch的 拼音分词插件. 7.12.1 版本(因为我的es是7.12.1版本) 1.1.1 ...
- 59、Docker ElasticSearch安装拼音分词器及自定义分词器
一.使用拼音分词器 1.拼音分词器 2.docker下安装拼音分词器插件 3.测试拼音分词器 # 测试拼音分词器 POST /_analyze { "text": [" ...
- 使用docker安装拼音分词器
要实现根据字母做补全,就必须对文档按照拼音分词. 在GitHub上恰好有elasticsearch的拼音分词插件. 地址:https://github.com/medcl/elasticsearch- ...
- es拼音分词 大帅哥_elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词
elasticsearch 自定义分词器 安装拼音分词器.ik分词器 下载源码需要使用maven打包 下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下,可以 ...
- 淘东电商项目(47) -商品搜索服务功能的实现(集成拼音分词器)
引言 本文代码已提交至Github,有兴趣的同学可以下载来看看:https://github.com/ylw-github/taodong-shop 在上一篇博客<淘东电商项目(46) -商品搜 ...
- ik与拼音分词器,拓展热词/停止词库
说明:本篇文章讲述elasticsearch分词器插件的安装,热词库停止词库的拓展,文章后面提到elasticsearch ,都是以es简称. 以下分词器的安装以ik分词器和pinyin分词器为例说明 ...
最新文章
- 硬盘mdr转换成gdp linux,Linux 命令学习神器!命令看不懂直接给你解释!
- hibernate 调试_Hibernate调试–查找查询的来源
- 【Leetcode | 】93. 复原IP地址
- Android开发 - 掌握ConstraintLayout(一)传统布局的问题
- twisted mysql_在Twisted下用MySQLadbapi获取自增id
- fiddler基础入门
- 是介于小型机和微型计算机,第一章计算机基础解析.ppt
- 机器学习基础(四十四)—— 优化
- Volley(五)—— 自定义Request
- ORACLE 视图的 with check option
- Android开发艺术探索知识回顾——第0章 书本内容介绍
- 从汇编的角度分析函数调用过程(2)
- win7无线连接服务器,win7/8无线网络连接受限制怎么办?
- linux fat32转ntfs,fat32怎么转换ntfs格式?不损坏数据FAT32转NTFS命令是什么 电脑维修技术网...
- Java随笔记 - 断包和粘包 解决方法
- mysql增加重做日志组_mysql重做日志
- 这两天比较火的量子科技是什么?
- 卸载asterisk
- Java项目:ssm教务管理系统
- OverTheWire-Bandit