安装步骤

下载地址:https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v7.2.0/elasticsearch-analysis-pinyin-7.2.0.zip
创建文件夹并上传解压文件(所有节点)

[root@master01 elasticsearch]# ls
bin  config  data  jdk  lib  LICENSE.txt  logs  modules  NOTICE.txt  plugins  README.textile
[root@master01 elasticsearch]# cd plugins/
[root@master01 plugins]# ls
analysis-ik
[root@master01 plugins]# mkdir pinyin
[root@master01 plugins]# cd pinyin/
[root@master01 pinyin]# ls
elasticsearch-analysis-pinyin-7.2.0.zip
[root@master01 pinyin]# unzip elasticsearch-analysis-pinyin-7.2.0.zip
Archive:  elasticsearch-analysis-pinyin-7.2.0.zipinflating: plugin-descriptor.propertiesinflating: elasticsearch-analysis-pinyin-7.2.0.jarinflating: nlp-lang-1.7.jar

修改文件权限

[root@master01 plugins]# chown -R elastic:elastic ./pinyin/
[root@master01 plugins]# ll
total 0
drwxr-xr-x. 3 elastic elastic 243 Jul 29 15:53 analysis-ik
drwxr-xr-x. 2 elastic elastic 113 Aug  8 17:46 pinyin

重启集群

使用方式

测试拼音分词器:

GET /_analyze
{"text":"刘德华","analyzer": "pinyin"
}{"tokens" : [{"token" : "liu","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 0},{"token" : "ldh","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 0},{"token" : "de","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 1},{"token" : "hua","start_offset" : 0,"end_offset" : 0,"type" : "word","position" : 2}]
}

说明:

  • keep_first_letter:刘德华> ldh keep_separate_first_letter:刘德华> l,d,h
  • limit_first_letter_length:first_letter结果的最大长度,默认值:16
  • keep_full_pinyin:刘德华> [ liu,de,hua] keep_joined_full_pinyin:刘德华> [liudehua] keep_none_chinese:结果中保留非中文字母或数字,默认值:true
  • keep_none_chinese_together:默认值:true,如:DJ音乐家- >DJ,yin,yue,jia,当设置为false,例如:DJ音乐家- >D,J,yin,yue,jia,注意:keep_none_chinese必须先启动
  • keep_none_chinese_in_first_letter:刘德华AT2016- > ldhat2016
  • keep_none_chinese_in_joined_full_pinyin:刘德华2016- > liudehua2016
  • lowercase:小写非中文字母,默认值:true remove_duplicated_term:de的>de

拼音分词器:

PUT /express_info_v1/
{"settings" : {"number_of_shards": 3,"number_of_replicas": 1,"analysis" : {"analyzer" : {"pinyin_analyzer" : {"tokenizer" : "my_pinyin"}},"tokenizer" : {"my_pinyin" : {"type" : "pinyin","keep_separate_first_letter" : false,"keep_full_pinyin" : true,"keep_original" : true,"limit_first_letter_length" : 16,"lowercase" : true,"remove_duplicated_term" : true}}}}
}

取别名:

POST _aliases
{"actions": [{"add": {"index": "express_info_v1","alias": "express_info"}}]
}

创建mapping

PUT /express_info_v1/_mappings
{"properties":{"name":{"type":"text","analyzer": "pinyin_analyzer"},"address":{"type":"text","analyzer":"pinyin_analyzer"},"send_time":{"type":"date","format": "yyyy-MM-dd"},"num":{"type":"text","analyzer":"pinyin_analyzer"}}
}

填充数据:

PUT /express_info_v1/_doc/1
{"name": "薛蒋柳","address": "康庄街道B-11-8","send_time": "2019-08-07","num":"sf9971618841"
}PUT /express_info_v1/_doc/2
{"name": "袁喻","address": "江西省抚州市黎川县","send_time": "2019-08-08","num":"ve458634059"
}

查询数据:

GET /express_info/_search
{"query": {"match": {"name": "yy"}}
}GET /express_info/_search
{"query": {"match": {"name": "源于"}}
}GET /express_info/_search
{"query": {"match": {"name": "薛蒋l"}}
}GET /express_info/_search
{"query": {"match": {"name": "xuejiangliu"}}
}

重建index

中文分词+拼音分词器

PUT /express_info_v2
{"settings": {"number_of_shards": 3,"number_of_replicas": 1,"analysis": {"analyzer": {"ik_smart_pinyin": {"type": "custom","tokenizer": "ik_smart","filter": ["my_pinyin", "word_delimiter"]},"ik_max_word_pinyin": {"type": "custom","tokenizer": "ik_max_word","filter": ["my_pinyin", "word_delimiter"]}},"filter": {"my_pinyin": {"type" : "pinyin","keep_separate_first_letter" : false,"keep_full_pinyin" : true,"keep_original" : true,"limit_first_letter_length" : 16,"lowercase" : true,"remove_duplicated_term" : true }}}}
}

创建mapping

PUT /express_info_v2/_mappings
{"properties":{"name":{"type":"text","analyzer": "ik_smart_pinyin"},"address":{"type":"text","analyzer":"ik_smart_pinyin"},"send_time":{"type":"date","format": "yyyy-MM-dd"},"num":{"type":"text","analyzer":"ik_max_word_pinyin"}}
}

数据重载:

POST _reindex
{"source": {"index": "express_info_v1"}, "dest": {"index": "express_info_v2"}
}

使用新index取代原始的index

POST /_aliases
{"actions": [{"remove": {"index": "express_info_v1","alias": "express_info"}},{"add": {"index": "express_info_v2","alias": "express_info"}}]
}

删除原始的index:

DELETE express_info_v1

测试:

GET /express_info_v2/_analyze
{"text": "江西省抚州市黎川县", "analyzer": "ik_max_word_pinyin"
}
{"tokens" : [{"token" : "jiang","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "江西省","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "jxs","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 0},{"token" : "xi","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 1},{"token" : "sheng","start_offset" : 0,"end_offset" : 3,"type" : "CN_WORD","position" : 2},{"token" : "fu","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 3},{"token" : "zhou","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 4},{"token" : "shi","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 5},{"token" : "抚州市","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 5},{"token" : "fzs","start_offset" : 3,"end_offset" : 6,"type" : "CN_WORD","position" : 5},{"token" : "li","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 6},{"token" : "chuan","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 7},{"token" : "xian","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 8},{"token" : "黎川县","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 8},{"token" : "lcx","start_offset" : 6,"end_offset" : 9,"type" : "CN_WORD","position" : 8}]
}

Elasticsearch 拼音分词器相关推荐

  1. Elasticsearch 分布式搜索引擎 -- 自动补全(拼音分词器、自定义分词器、自动补全查询、实现搜索框自动补全)

    文章目录 1. 自动补全 1.1 拼音分词器 1.2.1 自定义分词器 1.2.2 小结 1.2 自动补全 1.3 实现酒店搜索框自动补全 1.3.1 修改酒店映射结构 1.3.2 修改HotelDo ...

  2. 服务器安装配置elasticsearch,kibana,IK分词器和拼音分词器,集群搭建教程

    docker安装配置elasticsearch,kibana和IK分词器 elasticsearch文章系列 前置安装docker 创建docker网络 安装Elasticsearch 运行elast ...

  3. ElasticSearch从入门到精通--第七话(自动补全、拼音分词器、自定义分词、数据同步方案)

    ElasticSearch从入门到精通–第七话(自动补全.拼音分词器.自定义分词.数据同步方案) 使用拼音分词 可以引入elasticsearch的拼音分词插件,地址:https://github.c ...

  4. Elasticsearch——分布式搜索引擎01(索引库、文档、RestAPI、RestClient、拼音分词器、IK分词器)

    Elasticsearch--分布式搜索引擎01(索引库.文档.RestAPI.RestClient.拼音分词器.IK分词器) 一.初识 elesticsearch 1.1 简介 1.2 倒排索引(重 ...

  5. Elasticsearch 分布式搜索引擎 -- elasticsearch-analysis-pinyin 拼音分词器的安装和介绍

    1. 拼音分词器 要实现根据字母做补全,就必须对文档按照拼音分词.在GitHub上恰好有elasticsearch的 拼音分词插件. 7.12.1 版本(因为我的es是7.12.1版本) 1.1.1 ...

  6. 59、Docker ElasticSearch安装拼音分词器及自定义分词器

    一.使用拼音分词器 1.拼音分词器 2.docker下安装拼音分词器插件 3.测试拼音分词器 # 测试拼音分词器 POST /_analyze {   "text": [" ...

  7. 使用docker安装拼音分词器

    要实现根据字母做补全,就必须对文档按照拼音分词. 在GitHub上恰好有elasticsearch的拼音分词插件. 地址:https://github.com/medcl/elasticsearch- ...

  8. es拼音分词 大帅哥_elasticsearch 拼音+ik分词,spring data elasticsearch 拼音分词

    elasticsearch 自定义分词器 安装拼音分词器.ik分词器 下载源码需要使用maven打包 下载构建好的压缩包解压后放直接在elasticsearch安装目录下 plugins文件夹下,可以 ...

  9. 淘东电商项目(47) -商品搜索服务功能的实现(集成拼音分词器)

    引言 本文代码已提交至Github,有兴趣的同学可以下载来看看:https://github.com/ylw-github/taodong-shop 在上一篇博客<淘东电商项目(46) -商品搜 ...

  10. ik与拼音分词器,拓展热词/停止词库

    说明:本篇文章讲述elasticsearch分词器插件的安装,热词库停止词库的拓展,文章后面提到elasticsearch ,都是以es简称. 以下分词器的安装以ik分词器和pinyin分词器为例说明 ...

最新文章

  1. 硬盘mdr转换成gdp linux,Linux 命令学习神器!命令看不懂直接给你解释!
  2. hibernate 调试_Hibernate调试–查找查询的来源
  3. 【Leetcode | 】93. 复原IP地址
  4. Android开发 - 掌握ConstraintLayout(一)传统布局的问题
  5. twisted mysql_在Twisted下用MySQLadbapi获取自增id
  6. fiddler基础入门
  7. 是介于小型机和微型计算机,第一章计算机基础解析.ppt
  8. 机器学习基础(四十四)—— 优化
  9. Volley(五)—— 自定义Request
  10. ORACLE 视图的 with check option
  11. Android开发艺术探索知识回顾——第0章 书本内容介绍
  12. 从汇编的角度分析函数调用过程(2)
  13. win7无线连接服务器,win7/8无线网络连接受限制怎么办?
  14. linux fat32转ntfs,fat32怎么转换ntfs格式?不损坏数据FAT32转NTFS命令是什么 电脑维修技术网...
  15. Java随笔记 - 断包和粘包 解决方法
  16. mysql增加重做日志组_mysql重做日志
  17. 这两天比较火的量子科技是什么?
  18. 卸载asterisk
  19. Java项目:ssm教务管理系统
  20. OverTheWire-Bandit

热门文章

  1. 美国国家安全局(NSA)网络攻击主战武器NOPEN
  2. 第三阶段:数据存储与计算(离线场景):3.2 数据存储hdfs
  3. 基于 Roslyn 实现代码动态编译
  4. 【软件工程】根据数据流图导出程序结构
  5. 前端开发:Mac电脑修改hosts文件的方法
  6. 抖音seo排名优化技术工具源代码
  7. 机械类有哪些好投一些的核心期刊?
  8. dart语言和flutter学习——Dart语言学习
  9. Object类型转换为int型
  10. 【博学谷学习记录】超强总结,用心分享 | 产品经理电商项目知识点总结与回顾