谷粒商城ES自定义词库(十八)
具体的IK分词可以查看博客:https://www.cnblogs.com/dalianpai/p/12694298.html
122、全文检索-ElasticSearch-分词-分词&安装ik分词 - 124、全文检索-ElasticSearch-分词-自定义扩展词库
下载地址;https://github.com/medcl/elasticsearch-analysis-ik
[root@localhost plugins]# ll total 4400 -rw-r--r-- 1 root root 4504487 Jun 15 13:13 elasticsearch-analysis-ik-7.4.2.zip [root@localhost plugins]# unzip UnZip 6.00 of 20 April 2009, by Info-ZIP. Maintained by C. Spieler. Send bug reports using http://www.info-zip.org/zip-bug.html; see README for details.Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]Default action is to extract files in list, except those in xlist, to exdir;file[.zip] may be a wildcard. -Z => ZipInfo mode ("unzip -Z" for usage).-p extract files to pipe, no messages -l list files (short format)-f freshen existing files, create none -t test compressed archive data-u update files, create if necessary -z display archive comment only-v list verbosely/show version info -T timestamp archive to latest-x exclude files that follow (in xlist) -d extract files into exdir modifiers:-n never overwrite existing files -q quiet mode (-qq => quieter)-o overwrite files WITHOUT prompting -a auto-convert any text files-j junk paths (do not make directories) -aa treat ALL files as text-U use escapes for all non-ASCII Unicode -UU ignore any Unicode fields-C match filenames case-insensitively -L make (some) names lowercase-X restore UID/GID info -V retain VMS version numbers-K keep setuid/setgid/tacky permissions -M pipe through "more" pager-O CHARSET specify a character encoding for DOS, Windows and OS/2 archives-I CHARSET specify a character encoding for UNIX and other archivesSee "unzip -hh" or unzip.txt for more help. Examples:unzip data1 -x joe => extract all files except joe from zipfile data1.zipunzip -p foo | more => send contents of foo.zip via pipe into program moreunzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer [root@localhost plugins]# unzip elasticsearch-analysis-ik-7.4.2.zip Archive: elasticsearch-analysis-ik-7.4.2.zipinflating: elasticsearch-analysis-ik-7.4.2.jarinflating: httpclient-4.5.2.jarinflating: httpcore-4.4.4.jarinflating: commons-logging-1.2.jarinflating: commons-codec-1.9.jarinflating: plugin-descriptor.propertiesinflating: plugin-security.policycreating: config/inflating: config/surname.dicinflating: config/quantifier.dicinflating: config/extra_stopword.dicinflating: config/suffix.dicinflating: config/extra_single_word_full.dicinflating: config/extra_single_word.dicinflating: config/preposition.dicinflating: config/IKAnalyzer.cfg.xmlinflating: config/main.dicinflating: config/stopword.dicinflating: config/extra_main.dicinflating: config/extra_single_word_low_freq.dic [root@localhost plugins]# ll total 5828 -rw-r--r-- 1 root root 263965 May 6 2018 commons-codec-1.9.jar -rw-r--r-- 1 root root 61829 May 6 2018 commons-logging-1.2.jar drwxr-xr-x 2 root root 299 Oct 7 2019 config -rw-r--r-- 1 root root 54643 Nov 4 2019 elasticsearch-analysis-ik-7.4.2.jar -rw-r--r-- 1 root root 4504487 Jun 15 13:13 elasticsearch-analysis-ik-7.4.2.zip -rw-r--r-- 1 root root 736658 May 6 2018 httpclient-4.5.2.jar -rw-r--r-- 1 root root 326724 May 6 2018 httpcore-4.4.4.jar -rw-r--r-- 1 root root 1805 Nov 4 2019 plugin-descriptor.properties -rw-r--r-- 1 root root 125 Nov 4 2019 plugin-security.policy [root@localhost plugins]# mkdir ik [root@localhost plugins]# ll total 1428 -rw-r--r-- 1 root root 263965 May 6 2018 commons-codec-1.9.jar -rw-r--r-- 1 root root 61829 May 6 2018 commons-logging-1.2.jar drwxr-xr-x 2 root root 299 Oct 7 2019 config -rw-r--r-- 1 root root 54643 Nov 4 2019 elasticsearch-analysis-ik-7.4.2.jar -rw-r--r-- 1 root root 736658 May 6 2018 httpclient-4.5.2.jar -rw-r--r-- 1 root root 326724 May 6 2018 httpcore-4.4.4.jar drwxr-xr-x 2 root root 6 Jun 15 13:17 ik -rw-r--r-- 1 root root 1805 Nov 4 2019 plugin-descriptor.properties -rw-r--r-- 1 root root 125 Nov 4 2019 plugin-security.policy [root@localhost plugins]# mv * ik/ mv: cannot move ‘ik’ to a subdirectory of itself, ‘ik/ik’
进行重启容器,然后查询
POST _analyze {"tokenizer": "standard","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." }POST _analyze {"tokenizer": "ik_smart","text": "尚硅谷电商" }
但是有很多的词识别不了,需要自定义词汇表
先增大内存
[root@cicd ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7ab7bf7aa2e5 kibana:7.4.2 "/usr/local/bin/dumb…" 7 days ago Up 5 hours 0.0.0.0:5601->5601/tcp kibana 174c44e86f31 elasticsearch:7.4.2 "/usr/local/bin/dock…" 7 days ago Up 2 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp elasticsearch [root@cicd ~]# docker stop 174c44e86f31 174c44e86f31 [root@cicd ~]# docker start 174c44e86f31 174c44e86f31 [root@cicd ~]# free -mtotal used free shared buff/cache available Mem: 7821 3944 2361 9 1515 3605 Swap: 1639 0 1639 [root@cicd ~]# docker stop 174c44e86f31 174c44e86f31 [root@cicd ~]# docker rm 174c44e86f31 174c44e86f31 [root@cicd ~]# docker run --name elasticsearch -p 9200:9200 -p 9300:9300 --privi leged=true \ > -e "discovery.type=single-node" \ > -e ES_JAVA_OPTS="-Xms512m -Xms1024m" \ > -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/con fig/elasticsearch.yml \ > -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data \ > -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins \ > -d elasticsearch:7.4.2 aa707f92c246a4878adcb5e6f6e7c98ab55ecbe201fa026d3329178a41fc7791 [root@cicd ~]# docker ps -a
在安装nginx,并修改es的xml
[root@cicd ~]# cd /mydata/ [root@cicd mydata]# mkdir nginx [root@cicd mydata]# docker pull nginx:1.10 1.10: Pulling from library/nginx 6d827a3ef358: Pull complete 1e3e18a64ea9: Pull complete 556c62bb43ac: Pull complete Digest: sha256:6202beb06ea61f44179e02ca965e8e13b961d12640101fca213efbfd145d7575 Status: Downloaded newer image for nginx:1.10 docker.io/library/nginx:1.10 [root@cicd mydata]# ll total 0 drwxrwxrwx. 5 root root 47 Jun 8 11:35 elasticsearch drwxr-xr-x 2 root root 6 Jun 15 13:47 nginx [root@cicd mydata]# docker run -p 80:80 --name nginx -d nginx:1.10 7217ab7d7ad153960b2d1acebffd3fc02527655e2b0888e8c5d1eb0cebb84a05 [root@cicd mydata]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAME S 7217ab7d7ad1 nginx:1.10 "nginx -g 'daemon of…" 5 seconds ago Up 4 seconds 0.0.0.0:80->80/tcp, 443/tcp ngin x aa707f92c246 elasticsearch:7.4.2 "/usr/local/bin/dock…" 3 minutes ago Up 3 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp elas ticsearch 7ab7bf7aa2e5 kibana:7.4.2 "/usr/local/bin/dumb…" 7 days ago Up 5 hours 0.0.0.0:5601->5601/tcp kiba na [root@cicd mydata]# docker container cp nginx:/etc/nginx . [root@cicd mydata]# cd nginx/ [root@cicd nginx]# ll total 32 drwxr-xr-x 2 root root 26 Mar 27 2017 conf.d -rw-r--r-- 1 root root 1007 Jan 31 2017 fastcgi_params -rw-r--r-- 1 root root 2837 Jan 31 2017 koi-utf -rw-r--r-- 1 root root 2223 Jan 31 2017 koi-win -rw-r--r-- 1 root root 3957 Jan 31 2017 mime.types lrwxrwxrwx 1 root root 22 Jan 31 2017 modules -> /usr/lib/nginx/modules -rw-r--r-- 1 root root 643 Jan 31 2017 nginx.conf -rw-r--r-- 1 root root 636 Jan 31 2017 scgi_params -rw-r--r-- 1 root root 664 Jan 31 2017 uwsgi_params -rw-r--r-- 1 root root 3610 Jan 31 2017 win-utf [root@cicd nginx]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7217ab7d7ad1 nginx:1.10 "nginx -g 'daemon of…" About a minute ago Up About a minute 0.0.0.0:80->80/tcp, 443/tcp nginx aa707f92c246 elasticsearch:7.4.2 "/usr/local/bin/dock…" 4 minutes ago Up 4 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp elasticsearch 7ab7bf7aa2e5 kibana:7.4.2 "/usr/local/bin/dumb…" 7 days ago Up 5 hours 0.0.0.0:5601->5601/tcp kibana [root@cicd nginx]# docker stop 7217ab7d7ad1 7217ab7d7ad1 [root@cicd nginx]# docker rm 7217ab7d7ad1 7217ab7d7ad1 [root@cicd nginx]# cd .. [root@cicd mydata]# ll total 0 drwxrwxrwx. 5 root root 47 Jun 8 11:35 elasticsearch drwxr-xr-x 3 root root 177 Mar 27 2017 nginx [root@cicd mydata]# mv nginx conf [root@cicd mydata]# ll total 0 drwxr-xr-x 3 root root 177 Mar 27 2017 conf drwxrwxrwx. 5 root root 47 Jun 8 11:35 elasticsearch [root@cicd mydata]# mkdir nginx [root@cicd mydata]# mv conf/ nginx/ [root@cicd mydata]# ll total 0 drwxrwxrwx. 5 root root 47 Jun 8 11:35 elasticsearch drwxr-xr-x 3 root root 18 Jun 15 13:52 nginx [root@cicd mydata]# cd nginx/ [root@cicd nginx]# ll total 0 drwxr-xr-x 3 root root 177 Mar 27 2017 conf [root@cicd nginx]# [root@cicd nginx]# [root@cicd nginx]# [root@cicd nginx]# [root@cicd nginx]# [root@cicd nginx]# docker run -p 80:80 --name nginx \ > -v /mydata/nginx/html:/usr/share/nginx/html \ > -v /mydata/nginx/logs:/var/log/nginx \ > -v /mydata/nginx/conf/:/etc//nginx \ > -d nginx:1.10 7b3ae8abac8219ac43b99e058fed83d93f3e16db015744369477e99ec134cc16 [root@cicd nginx]# docker ps -l CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7b3ae8abac82 nginx:1.10 "nginx -g 'daemon of…" 20 seconds ago Up 18 seconds 0.0.0.0:80->80/tcp, 443/tcp nginx [root@cicd nginx]# cd html/ [root@cicd html]# ll total 0 [root@cicd html]# vim index.html [root@cicd html]# mkdir es [root@cicd html]# cd es [root@cicd es]# ll total 0 [root@cicd es]# vim femci.txt [root@cicd es]# mv femci.txt fenci.txt [root@cicd es]# cd /mydata/elasticsearch/plugins/ [root@cicd plugins]# cd ik/config/ [root@cicd config]# ll total 8260 -rw-r--r-- 1 root root 5225922 Oct 7 2019 extra_main.dic -rw-r--r-- 1 root root 63188 Oct 7 2019 extra_single_word.dic -rw-r--r-- 1 root root 63188 Oct 7 2019 extra_single_word_full.dic -rw-r--r-- 1 root root 10855 Oct 7 2019 extra_single_word_low_freq.dic -rw-r--r-- 1 root root 156 Oct 7 2019 extra_stopword.dic -rw-r--r-- 1 root root 625 Oct 7 2019 IKAnalyzer.cfg.xml -rw-r--r-- 1 root root 3058510 Oct 7 2019 main.dic -rw-r--r-- 1 root root 123 Oct 7 2019 preposition.dic -rw-r--r-- 1 root root 1824 Oct 7 2019 quantifier.dic -rw-r--r-- 1 root root 164 Oct 7 2019 stopword.dic -rw-r--r-- 1 root root 192 Oct 7 2019 suffix.dic -rw-r--r-- 1 root root 752 Oct 7 2019 surname.dic [root@cicd config]# vim IKAnalyzer.cfg.xml [root@cicd config]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7b3ae8abac82 nginx:1.10 "nginx -g 'daemon of…" 4 minutes ago Up 4 minutes 0.0.0.0:80->80/tcp, 443/tcp nginx aa707f92c246 elasticsearch:7.4.2 "/usr/local/bin/dock…" 12 minutes ago Up 12 minutes 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp elasticsearch 7ab7bf7aa2e5 kibana:7.4.2 "/usr/local/bin/dumb…" 7 days ago Up 5 hours 0.0.0.0:5601->5601/tcp kibana [root@cicd config]# docker restart elasticsearch elasticsearch [root@cicd config]#
然后再进行分词
谷粒商城ES自定义词库(十八)相关推荐
- Elasticsearch学习1 入门进阶 Linux系统下操作安装Elasticsearch Kibana 初步检索 SearchAPI Query DSL ki分词库 自定义词库
文章目录 一.全文检索-Elasticsearch 1.Elasticsearch简介 2.全文搜索引擎 二.docker安装 1.elasticsearch启动 2.kibana启动 三.[入门]初 ...
- 白话Elasticsearch29-IK中文分词之IK分词器配置文件+自定义词库
文章目录 概述 ik配置文件 IK自定义词库 自定义词库 Step1 : 新建自定义分词库 Step2 : 添加到ik的配置文件中 Step3 :重启es ,查看分词 自定义停用词库 Step1 : ...
- Elasticsearch 之(24)IK分词器配置文件讲解以及自定义词库
1.ik配置文件 ik配置文件地址:es/plugins/ik/config目录 IKAnalyzer.cfg.xml:用来配置自定义词库 main.dic:ik原生内置的中文词库,总共有27万多条, ...
- 30_ElasticSearch IK分词器配置文件 以及自定义词库
ElasticSearch IK分词器配置文件 以及自定义词库 更多干货 分布式实战(干货) spring cloud 实战(干货) mybatis 实战(干货) spring boot 实战(干货) ...
- ElasticSearch 中文分词器ik的安装、测试、使用、自定义词库、热更新词库
文章目录 # 实验环境 # ik分词器的下载.安装.测试 ## 安装方法一:使用elasticsearch-plugin 安装 ## 安装方法二:下载编译好的包进行安装 1.下载 2.安装 3.重启` ...
- ElasticSearch自定义词库
由于网络词语层出不穷,ik分词器有时并不能完全识别网络词汇,如下: 按照网络词语,王者荣耀应该被识别为一个词语,而不是被拆分成2个. 所以这时需要自定义词库来解决以上问题. 自定义词库 自定义扩展词库 ...
- Elasticsearch配置ik中文分词器自定义词库
1.IK配置文件 在config目录下: IKAnalyzer.cfg.xml:配置自定义词库 main.dic:分词器自带的词库,索引会按照里面的词创建 quantifier.dic:存放计量单位词 ...
- IK分词器使用自定义词库
2019独角兽企业重金招聘Python工程师标准>>> 1.拷贝IKAnalyzer.cfg.xml到WEB-INF/classes下,拷贝IKAnalyzer2012FF_u1.j ...
- 【Vue2.0】— 组件的自定义事件(十八)
[Vue2.0]- 组件的自定义事件(十八) <template><div ><h2>{{msg}}</h2><!-- 通过父组件给子组件传递函数 ...
最新文章
- linux驱动:i2c驱动(三)流程图之注册设备
- Python数字类型及操作汇总(入门级)
- 百度搜索技巧语法大全
- 自己动手开发智能聊天机器人完全指南(附python完整源码)
- c++开发中,调试排查问题总结
- VTK:Utilities之DiscretizableColorTransferFunction
- python numpy库作用_python Numpy库
- ASP存储过程参数数据类型
- Linux系统C语言获取所有CPU核心的利用率“/proc/stat”
- AES加密,解决了同步问题,和随机密钥和固定密钥,多端通信加密不一致解决办法...
- JAVA 语言如何进行异常处理,关键字: throws,throw,try,catch,finally分别代表什么意义? 在try块中可以抛 出异常吗?...
- 漫画:如何给女朋友解释什么是编译与反编译
- Tableau上面地图与条形图结合_Tableau | 20种常用图表(上文)
- day 3 - 1 数据类型
- python爬取微博评论点赞数_爬取新浪微博评论及点赞数并存储为excel的.csv格式
- 6个月内进轨,目标火星!马斯克刚刚发布最强“理想飞船”,称已达人类物理极限
- 部署高校房屋管理系统可以实现哪些目标?
- 怎么远程控制linux,Linux远程控制
- ElasticSearch 之 Linux 安装 ElasticSearch-7.15.2(ELK、IK)
- 大数据毕设 地铁客流分析与可视化系统