具体的IK分词可以查看博客:https://www.cnblogs.com/dalianpai/p/12694298.html

122、全文检索-ElasticSearch-分词-分词&安装ik分词 - 124、全文检索-ElasticSearch-分词-自定义扩展词库

下载地址;https://github.com/medcl/elasticsearch-analysis-ik

[root@localhost plugins]# ll
total 4400
-rw-r--r-- 1 root root 4504487 Jun 15 13:13 elasticsearch-analysis-ik-7.4.2.zip
[root@localhost plugins]# unzip
UnZip 6.00 of 20 April 2009, by Info-ZIP.  Maintained by C. Spieler.  Send
bug reports using http://www.info-zip.org/zip-bug.html; see README for details.Usage: unzip [-Z] [-opts[modifiers]] file[.zip] [list] [-x xlist] [-d exdir]Default action is to extract files in list, except those in xlist, to exdir;file[.zip] may be a wildcard.  -Z => ZipInfo mode ("unzip -Z" for usage).-p  extract files to pipe, no messages     -l  list files (short format)-f  freshen existing files, create none    -t  test compressed archive data-u  update files, create if necessary      -z  display archive comment only-v  list verbosely/show version info       -T  timestamp archive to latest-x  exclude files that follow (in xlist)   -d  extract files into exdir
modifiers:-n  never overwrite existing files         -q  quiet mode (-qq => quieter)-o  overwrite files WITHOUT prompting      -a  auto-convert any text files-j  junk paths (do not make directories)   -aa treat ALL files as text-U  use escapes for all non-ASCII Unicode  -UU ignore any Unicode fields-C  match filenames case-insensitively     -L  make (some) names lowercase-X  restore UID/GID info                   -V  retain VMS version numbers-K  keep setuid/setgid/tacky permissions   -M  pipe through "more" pager-O CHARSET  specify a character encoding for DOS, Windows and OS/2 archives-I CHARSET  specify a character encoding for UNIX and other archivesSee "unzip -hh" or unzip.txt for more help.  Examples:unzip data1 -x joe   => extract all files except joe from zipfile data1.zipunzip -p foo | more  => send contents of foo.zip via pipe into program moreunzip -fo foo ReadMe => quietly replace existing ReadMe if archive file newer
[root@localhost plugins]# unzip elasticsearch-analysis-ik-7.4.2.zip
Archive:  elasticsearch-analysis-ik-7.4.2.zipinflating: elasticsearch-analysis-ik-7.4.2.jarinflating: httpclient-4.5.2.jarinflating: httpcore-4.4.4.jarinflating: commons-logging-1.2.jarinflating: commons-codec-1.9.jarinflating: plugin-descriptor.propertiesinflating: plugin-security.policycreating: config/inflating: config/surname.dicinflating: config/quantifier.dicinflating: config/extra_stopword.dicinflating: config/suffix.dicinflating: config/extra_single_word_full.dicinflating: config/extra_single_word.dicinflating: config/preposition.dicinflating: config/IKAnalyzer.cfg.xmlinflating: config/main.dicinflating: config/stopword.dicinflating: config/extra_main.dicinflating: config/extra_single_word_low_freq.dic
[root@localhost plugins]# ll
total 5828
-rw-r--r-- 1 root root  263965 May  6  2018 commons-codec-1.9.jar
-rw-r--r-- 1 root root   61829 May  6  2018 commons-logging-1.2.jar
drwxr-xr-x 2 root root     299 Oct  7  2019 config
-rw-r--r-- 1 root root   54643 Nov  4  2019 elasticsearch-analysis-ik-7.4.2.jar
-rw-r--r-- 1 root root 4504487 Jun 15 13:13 elasticsearch-analysis-ik-7.4.2.zip
-rw-r--r-- 1 root root  736658 May  6  2018 httpclient-4.5.2.jar
-rw-r--r-- 1 root root  326724 May  6  2018 httpcore-4.4.4.jar
-rw-r--r-- 1 root root    1805 Nov  4  2019 plugin-descriptor.properties
-rw-r--r-- 1 root root     125 Nov  4  2019 plugin-security.policy
[root@localhost plugins]# mkdir ik
[root@localhost plugins]# ll
total 1428
-rw-r--r-- 1 root root 263965 May  6  2018 commons-codec-1.9.jar
-rw-r--r-- 1 root root  61829 May  6  2018 commons-logging-1.2.jar
drwxr-xr-x 2 root root    299 Oct  7  2019 config
-rw-r--r-- 1 root root  54643 Nov  4  2019 elasticsearch-analysis-ik-7.4.2.jar
-rw-r--r-- 1 root root 736658 May  6  2018 httpclient-4.5.2.jar
-rw-r--r-- 1 root root 326724 May  6  2018 httpcore-4.4.4.jar
drwxr-xr-x 2 root root      6 Jun 15 13:17 ik
-rw-r--r-- 1 root root   1805 Nov  4  2019 plugin-descriptor.properties
-rw-r--r-- 1 root root    125 Nov  4  2019 plugin-security.policy
[root@localhost plugins]# mv * ik/
mv: cannot move ‘ik’ to a subdirectory of itself, ‘ik/ik’

进行重启容器,然后查询

POST _analyze
{"tokenizer": "standard","text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone."
}POST _analyze
{"tokenizer": "ik_smart","text": "尚硅谷电商"
}

但是有很多的词识别不了,需要自定义词汇表

先增大内存

[root@cicd ~]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED             STATUS              PORTS                                            NAMES
7ab7bf7aa2e5        kibana:7.4.2          "/usr/local/bin/dumb…"   7 days ago          Up 5 hours          0.0.0.0:5601->5601/tcp                           kibana
174c44e86f31        elasticsearch:7.4.2   "/usr/local/bin/dock…"   7 days ago          Up 2 minutes        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   elasticsearch
[root@cicd ~]# docker stop 174c44e86f31
174c44e86f31
[root@cicd ~]# docker start 174c44e86f31
174c44e86f31
[root@cicd ~]# free -mtotal        used        free      shared  buff/cache   available
Mem:           7821        3944        2361           9        1515        3605
Swap:          1639           0        1639
[root@cicd ~]# docker stop 174c44e86f31
174c44e86f31
[root@cicd ~]# docker rm 174c44e86f31
174c44e86f31
[root@cicd ~]# docker run --name elasticsearch -p 9200:9200 -p 9300:9300 --privi                                                                              leged=true \
> -e "discovery.type=single-node"  \
> -e ES_JAVA_OPTS="-Xms512m -Xms1024m"  \
> -v /mydata/elasticsearch/config/elasticsearch.yml:/usr/share/elasticsearch/con                                                                              fig/elasticsearch.yml   \
> -v /mydata/elasticsearch/data:/usr/share/elasticsearch/data   \
> -v /mydata/elasticsearch/plugins:/usr/share/elasticsearch/plugins   \
> -d elasticsearch:7.4.2
aa707f92c246a4878adcb5e6f6e7c98ab55ecbe201fa026d3329178a41fc7791
[root@cicd ~]# docker ps -a

在安装nginx,并修改es的xml

[root@cicd ~]# cd /mydata/
[root@cicd mydata]# mkdir nginx
[root@cicd mydata]# docker pull nginx:1.10
1.10: Pulling from library/nginx
6d827a3ef358: Pull complete
1e3e18a64ea9: Pull complete
556c62bb43ac: Pull complete
Digest: sha256:6202beb06ea61f44179e02ca965e8e13b961d12640101fca213efbfd145d7575
Status: Downloaded newer image for nginx:1.10
docker.io/library/nginx:1.10
[root@cicd mydata]# ll
total 0
drwxrwxrwx. 5 root root 47 Jun  8 11:35 elasticsearch
drwxr-xr-x  2 root root  6 Jun 15 13:47 nginx
[root@cicd mydata]# docker run -p 80:80 --name nginx -d nginx:1.10
7217ab7d7ad153960b2d1acebffd3fc02527655e2b0888e8c5d1eb0cebb84a05
[root@cicd mydata]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED                                                                                           STATUS              PORTS                                            NAME                                                                              S
7217ab7d7ad1        nginx:1.10            "nginx -g 'daemon of…"   5 seconds ago                                                                                     Up 4 seconds        0.0.0.0:80->80/tcp, 443/tcp                      ngin                                                                              x
aa707f92c246        elasticsearch:7.4.2   "/usr/local/bin/dock…"   3 minutes ago                                                                                     Up 3 minutes        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   elas                                                                              ticsearch
7ab7bf7aa2e5        kibana:7.4.2          "/usr/local/bin/dumb…"   7 days ago                                                                                        Up 5 hours          0.0.0.0:5601->5601/tcp                           kiba                                                                              na
[root@cicd mydata]# docker container cp nginx:/etc/nginx .
[root@cicd mydata]# cd nginx/
[root@cicd nginx]# ll
total 32
drwxr-xr-x 2 root root   26 Mar 27  2017 conf.d
-rw-r--r-- 1 root root 1007 Jan 31  2017 fastcgi_params
-rw-r--r-- 1 root root 2837 Jan 31  2017 koi-utf
-rw-r--r-- 1 root root 2223 Jan 31  2017 koi-win
-rw-r--r-- 1 root root 3957 Jan 31  2017 mime.types
lrwxrwxrwx 1 root root   22 Jan 31  2017 modules -> /usr/lib/nginx/modules
-rw-r--r-- 1 root root  643 Jan 31  2017 nginx.conf
-rw-r--r-- 1 root root  636 Jan 31  2017 scgi_params
-rw-r--r-- 1 root root  664 Jan 31  2017 uwsgi_params
-rw-r--r-- 1 root root 3610 Jan 31  2017 win-utf
[root@cicd nginx]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED              STATUS              PORTS                                            NAMES
7217ab7d7ad1        nginx:1.10            "nginx -g 'daemon of…"   About a minute ago   Up About a minute   0.0.0.0:80->80/tcp, 443/tcp                      nginx
aa707f92c246        elasticsearch:7.4.2   "/usr/local/bin/dock…"   4 minutes ago        Up 4 minutes        0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   elasticsearch
7ab7bf7aa2e5        kibana:7.4.2          "/usr/local/bin/dumb…"   7 days ago           Up 5 hours          0.0.0.0:5601->5601/tcp                           kibana
[root@cicd nginx]# docker stop 7217ab7d7ad1
7217ab7d7ad1
[root@cicd nginx]# docker rm 7217ab7d7ad1
7217ab7d7ad1
[root@cicd nginx]# cd ..
[root@cicd mydata]# ll
total 0
drwxrwxrwx. 5 root root  47 Jun  8 11:35 elasticsearch
drwxr-xr-x  3 root root 177 Mar 27  2017 nginx
[root@cicd mydata]# mv nginx conf
[root@cicd mydata]# ll
total 0
drwxr-xr-x  3 root root 177 Mar 27  2017 conf
drwxrwxrwx. 5 root root  47 Jun  8 11:35 elasticsearch
[root@cicd mydata]# mkdir nginx
[root@cicd mydata]# mv conf/ nginx/
[root@cicd mydata]# ll
total 0
drwxrwxrwx. 5 root root 47 Jun  8 11:35 elasticsearch
drwxr-xr-x  3 root root 18 Jun 15 13:52 nginx
[root@cicd mydata]# cd nginx/
[root@cicd nginx]# ll
total 0
drwxr-xr-x 3 root root 177 Mar 27  2017 conf
[root@cicd nginx]#
[root@cicd nginx]#
[root@cicd nginx]#
[root@cicd nginx]#
[root@cicd nginx]#
[root@cicd nginx]# docker run -p 80:80 --name nginx \
> -v /mydata/nginx/html:/usr/share/nginx/html  \
> -v /mydata/nginx/logs:/var/log/nginx \
> -v /mydata/nginx/conf/:/etc//nginx \
> -d nginx:1.10
7b3ae8abac8219ac43b99e058fed83d93f3e16db015744369477e99ec134cc16
[root@cicd nginx]# docker ps -l
CONTAINER ID        IMAGE               COMMAND                  CREATED                                                                                      STATUS              PORTS                         NAMES
7b3ae8abac82        nginx:1.10          "nginx -g 'daemon of…"   20 seconds ago                                                                               Up 18 seconds       0.0.0.0:80->80/tcp, 443/tcp   nginx
[root@cicd nginx]# cd html/
[root@cicd html]# ll
total 0
[root@cicd html]# vim index.html
[root@cicd html]# mkdir es
[root@cicd html]# cd es
[root@cicd es]# ll
total 0
[root@cicd es]# vim femci.txt
[root@cicd es]# mv femci.txt fenci.txt
[root@cicd es]# cd /mydata/elasticsearch/plugins/
[root@cicd plugins]# cd ik/config/
[root@cicd config]# ll
total 8260
-rw-r--r-- 1 root root 5225922 Oct  7  2019 extra_main.dic
-rw-r--r-- 1 root root   63188 Oct  7  2019 extra_single_word.dic
-rw-r--r-- 1 root root   63188 Oct  7  2019 extra_single_word_full.dic
-rw-r--r-- 1 root root   10855 Oct  7  2019 extra_single_word_low_freq.dic
-rw-r--r-- 1 root root     156 Oct  7  2019 extra_stopword.dic
-rw-r--r-- 1 root root     625 Oct  7  2019 IKAnalyzer.cfg.xml
-rw-r--r-- 1 root root 3058510 Oct  7  2019 main.dic
-rw-r--r-- 1 root root     123 Oct  7  2019 preposition.dic
-rw-r--r-- 1 root root    1824 Oct  7  2019 quantifier.dic
-rw-r--r-- 1 root root     164 Oct  7  2019 stopword.dic
-rw-r--r-- 1 root root     192 Oct  7  2019 suffix.dic
-rw-r--r-- 1 root root     752 Oct  7  2019 surname.dic
[root@cicd config]# vim IKAnalyzer.cfg.xml
[root@cicd config]# docker ps -a
CONTAINER ID        IMAGE                 COMMAND                  CREATED                                                                                      STATUS              PORTS                                            NAMES
7b3ae8abac82        nginx:1.10            "nginx -g 'daemon of…"   4 minutes ago                                                                                Up 4 minutes        0.0.0.0:80->80/tcp, 443/tcp                      nginx
aa707f92c246        elasticsearch:7.4.2   "/usr/local/bin/dock…"   12 minutes ago                                                                               Up 12 minutes       0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp   elasticsearch
7ab7bf7aa2e5        kibana:7.4.2          "/usr/local/bin/dumb…"   7 days ago                                                                                   Up 5 hours          0.0.0.0:5601->5601/tcp                           kibana
[root@cicd config]# docker restart elasticsearch
elasticsearch
[root@cicd config]#  

然后再进行分词

谷粒商城ES自定义词库(十八)相关推荐

  1. Elasticsearch学习1 入门进阶 Linux系统下操作安装Elasticsearch Kibana 初步检索 SearchAPI Query DSL ki分词库 自定义词库

    文章目录 一.全文检索-Elasticsearch 1.Elasticsearch简介 2.全文搜索引擎 二.docker安装 1.elasticsearch启动 2.kibana启动 三.[入门]初 ...

  2. 白话Elasticsearch29-IK中文分词之IK分词器配置文件+自定义词库

    文章目录 概述 ik配置文件 IK自定义词库 自定义词库 Step1 : 新建自定义分词库 Step2 : 添加到ik的配置文件中 Step3 :重启es ,查看分词 自定义停用词库 Step1 : ...

  3. Elasticsearch 之(24)IK分词器配置文件讲解以及自定义词库

    1.ik配置文件 ik配置文件地址:es/plugins/ik/config目录 IKAnalyzer.cfg.xml:用来配置自定义词库 main.dic:ik原生内置的中文词库,总共有27万多条, ...

  4. 30_ElasticSearch IK分词器配置文件 以及自定义词库

    ElasticSearch IK分词器配置文件 以及自定义词库 更多干货 分布式实战(干货) spring cloud 实战(干货) mybatis 实战(干货) spring boot 实战(干货) ...

  5. ElasticSearch 中文分词器ik的安装、测试、使用、自定义词库、热更新词库

    文章目录 # 实验环境 # ik分词器的下载.安装.测试 ## 安装方法一:使用elasticsearch-plugin 安装 ## 安装方法二:下载编译好的包进行安装 1.下载 2.安装 3.重启` ...

  6. ElasticSearch自定义词库

    由于网络词语层出不穷,ik分词器有时并不能完全识别网络词汇,如下: 按照网络词语,王者荣耀应该被识别为一个词语,而不是被拆分成2个. 所以这时需要自定义词库来解决以上问题. 自定义词库 自定义扩展词库 ...

  7. Elasticsearch配置ik中文分词器自定义词库

    1.IK配置文件 在config目录下: IKAnalyzer.cfg.xml:配置自定义词库 main.dic:分词器自带的词库,索引会按照里面的词创建 quantifier.dic:存放计量单位词 ...

  8. IK分词器使用自定义词库

    2019独角兽企业重金招聘Python工程师标准>>> 1.拷贝IKAnalyzer.cfg.xml到WEB-INF/classes下,拷贝IKAnalyzer2012FF_u1.j ...

  9. 【Vue2.0】— 组件的自定义事件(十八)

    [Vue2.0]- 组件的自定义事件(十八) <template><div ><h2>{{msg}}</h2><!-- 通过父组件给子组件传递函数 ...

最新文章

  1. linux驱动:i2c驱动(三)流程图之注册设备
  2. Python数字类型及操作汇总(入门级)
  3. 百度搜索技巧语法大全
  4. 自己动手开发智能聊天机器人完全指南(附python完整源码)
  5. c++开发中,调试排查问题总结
  6. VTK:Utilities之DiscretizableColorTransferFunction
  7. python numpy库作用_python Numpy库
  8. ASP存储过程参数数据类型
  9. Linux系统C语言获取所有CPU核心的利用率“/proc/stat”
  10. AES加密,解决了同步问题,和随机密钥和固定密钥,多端通信加密不一致解决办法...
  11. JAVA 语言如何进行异常处理,关键字: throws,throw,try,catch,finally分别代表什么意义? 在try块中可以抛 出异常吗?...
  12. 漫画:如何给女朋友解释什么是编译与反编译
  13. Tableau上面地图与条形图结合_Tableau | 20种常用图表(上文)
  14. day 3 - 1 数据类型
  15. python爬取微博评论点赞数_爬取新浪微博评论及点赞数并存储为excel的.csv格式
  16. 6个月内进轨,目标火星!马斯克刚刚发布最强“理想飞船”,称已达人类物理极限
  17. 部署高校房屋管理系统可以实现哪些目标?
  18. 怎么远程控制linux,Linux远程控制
  19. ElasticSearch 之 Linux 安装 ElasticSearch-7.15.2(ELK、IK)
  20. 大数据毕设 地铁客流分析与可视化系统

热门文章

  1. 用unity做一个发射子弹的模拟
  2. 数据治理、共享交换、数据仓库、数据中心的关系
  3. 让你的PPT图片处理更具创意
  4. 拔掉了腾讯微博的输氧管后,新浪微博能逃过中年危机吗?
  5. 微型计算机原理push指令,微机原理指令英文解释
  6. flickr之我们心中的点子
  7. i2C设备地址怎么看?
  8. linux下鼠标主题以及gtk主题,图标主题,字体的设置
  9. PHP银行卡自动转账,自动转账
  10. iphone各个型号屏幕分辨率