文章目录

  • Elasticsearch 同义词(dynamic-synonym)远程热词更新
    • 零、版本说明
    • 一、同义词本地文件读取方式(可不用插件)
      • 1、添加同义词文件
      • 2、创建索引,并配置同义词过滤
      • 3、测试效果
    • 二、同义词插件远程词库调用
      • 1、同义词插件官网说明:
      • 2、在服务中实现http请求,并连接数据库实现热词管理实例:
      • 3、根据远程请求创建索引:
    • 三、重写同义词插件源码连接mysql/oracle更新词库
      • 1、下载同义词插件
      • 2、修改ik插件源码(以oracle为例,mysql对应修改配置即可)
        • 1)、添加jdbc配置文件
        • 2)、创建DBRemoteSynonymFile类
        • 3)、添加数据maven库依赖
        • 4)、maven打包项目
      • 3、创建索引
      • 4、注意事项
    • 5、参考文章

Elasticsearch 同义词(dynamic-synonym)远程热词更新

零、版本说明

  • elasticsearch7.2.0

  • 同义词插件:elasticsearch-analysis-dynamic-synonym

  • elasticsearch-analysis-dynamic-synonym插件官网

    https://github.com/bells/elasticsearch-analysis-dynamic-synonym
    
  • 后面内容为同义词更新词库方法,elasticsearch安装同义词插件不再赘述;

  • 由于服务器jdk版本与es使用版本不一致,以下为es单独指定jdk版本的;

一、同义词本地文件读取方式(可不用插件)

在es内部添加同义词文件,实现同义词查询,es已内置该功能;

官方说明文档地址如下:

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/analysis-synonym-tokenfilter.html

简单使用方法:

1、添加同义词文件

在elasticsearch安装目录下的config目录下(elasticsearch-7.2.0/config)新建synonyms.txt文本;

并在文本内添加同义词如(英文逗号分隔):

美元,美金,美币
苹果,iphone

2、创建索引,并配置同义词过滤

PUT syno_v1
{"settings": {"index":{"number_of_shards": "3","number_of_replicas": "1","max_result_window": "200000","analysis": {"filter":{"my_syno_filter":{"type":"synonym","synonyms_path":"synonyms.txt"}},"ik_max_syno": {"type":"custom","tokenizer": "ik_max_word","filter": ["lowercase","my_syno_filter"]}}}}},"mappings": {"properties": {"keyword": {"type": "text","analyzer": "ik_max_syno"}}}
}

3、测试效果

dsl:

GET syno_v1/_analyze
{"analyzer": "ik_max_syno","text": "苹果"
}

结果:

{"tokens" : [{"token" : "苹果","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "iphone","start_offset" : 0,"end_offset" : 2,"type" : "SYNONYM","position" : 0}]
}

二、同义词插件远程词库调用

1、同义词插件官网说明:

Example:

{"index" : {"analysis" : {"analyzer" : {"synonym" : {"tokenizer" : "whitespace","filter" : ["remote_synonym"]}},"filter" : {"remote_synonym" : {"type" : "dynamic_synonym","synonyms_path" : "http://host:port/synonym.txt","interval": 30},"local_synonym" : {"type" : "dynamic_synonym","synonyms_path" : "synonym.txt"}}}}
}

Configuration:

synonyms_path: A file path relative to the Elastic config file or an URL, mandatory

相对于Elastic配置文件或URL的文件路径(必填)

interval: Refresh interval in seconds for the synonym file, default: 60, optional

同义词文件的刷新间隔(以秒为单位),默认值:60,可选

ignore_case: Ignore case in synonyms file, default: false, optional

忽略同义词文件中的大小写,默认值:false,可选

expand: Expand, default: true, optional

lenient: Lenient on exception thrown when importing a synonym, default: false, optional

format: Synonym file format, default: '', optional. For WordNet structure this can be set to 'wordnet'

2、在服务中实现http请求,并连接数据库实现热词管理实例:

@RestController
@RequestMapping("/synonym")
@Slf4j
public class SynonymController {private String lastModified = new Date().toString();private String etag = String.valueOf(System.currentTimeMillis());@RequestMapping(value = "/word", method = {RequestMethod.GET,RequestMethod.HEAD}, produces="text/html;charset=UTF-8")public String getSynonymWord(HttpServletResponse response){response.setHeader("Last-Modified",lastModified);response.setHeader("ETag",etag);//response.setHeader("If-Modified-Since",lastModified);Connection conn = null;Statement stmt = null;ResultSet rs = null;StringBuilder words = new StringBuilder();try {Class.forName("oracle.jdbc.driver.OracleDriver");conn = DriverManager.getConnection("jdbc:oracle:thin:@192.168.114.13:1521:xe","test","test");stmt = conn.createStatement();rs = stmt.executeQuery("select word from SYNONYM_WORD where status=0");while(rs.next()) {String theWord = rs.getString("word");System.out.println("hot word from mysql: " + theWord);words.append(theWord);words.append("\n");}return words.toString();} catch (Exception e) {e.printStackTrace();} finally {if(rs != null) {try {rs.close();} catch (SQLException e) {log.error("资源关闭异常:",e);}}if(stmt != null) {try {stmt.close();} catch (SQLException e) {log.error("资源关闭异常:",e);}}if(conn != null) {try {conn.close();} catch (SQLException e) {log.error("资源关闭异常:",e);}}}return null;}@RequestMapping(value = "/update", method = RequestMethod.GET)public void updateModified(){lastModified = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());etag = String.valueOf(System.currentTimeMillis());}
}

注:

  • updateModified方法为单独更新lastModified与etag,用于判断ik是否需要重新加载远程词库,具体关联数据库操作代码时自行扩展

3、根据远程请求创建索引:

PUT syno_v2
{"settings": {"index":{"number_of_shards": "3","number_of_replicas": "1","max_result_window": "200000","analysis": {"filter":{"remote_syno_filter":{"type":"dynamic_synonym","synonyms_path":"http://192.168.xx.xx:8080/synonym/word"}},"ik_max_syno": {"type":"custom","tokenizer": "ik_max_word","filter": ["lowercase","remote_syno_filter"]}}}}},"mappings": {"properties": {"keyword": {"type": "text","analyzer": "ik_max_syno"}}}
}

三、重写同义词插件源码连接mysql/oracle更新词库

1、下载同义词插件

https://github.com/bells/elasticsearch-analysis-dynamic-synonym

2、修改ik插件源码(以oracle为例,mysql对应修改配置即可)

1)、添加jdbc配置文件

在项目根目录下创建config目录并创建config\jdbc-reload.properties配置文件:

jdbc.url=jdbc:oracle:thin:@192.168.xx.xx:1521:xe
jdbc.user=test
jdbc.password=test
jdbc.reload.synonym.sql=SELECT word FROM TEST.SYNONYM_WORD WHERE STATUS = 0
jdbc.lastModified.synonym.sql=SELECT MAX(UPDATE_TIME) AS last_modify_dt FROM TEST.SYNONYM_WORD
jdbc.driver=oracle.jdbc.driver.OracleDriver

2)、创建DBRemoteSynonymFile类

在目录analysis下创建DBRemoteSynonymFile类,

具体位置:

src\main\java\com\bellszhu\elasticsearch\plugin\synonym\analysis\DBRemoteSynonymFile.java

内容:

package com.bellszhu.elasticsearch.plugin.synonym.analysis;import com.bellszhu.elasticsearch.plugin.DynamicSynonymPlugin;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.env.Environment;import java.io.*;
import java.nio.file.Path;
import java.sql.*;
import java.util.ArrayList;
import java.util.Properties;/*** com.bellszhu.elasticsearch.plugin.synonym.analysis* author: Yic.z* date: 2020-08-04*/
public class DBRemoteSynonymFile implements SynonymFile {// 配置文件名private final static String DB_PROPERTIES = "jdbc-reload.properties";private static Logger logger = LogManager.getLogger("dynamic-synonym");private String format;private boolean expand;private boolean lenient;private Analyzer analyzer;private Environment env;// 数据库配置private String location;private long lastModified;private Connection connection = null;private Statement statement = null;private Properties props;private Path conf_dir;DBRemoteSynonymFile(Environment env, Analyzer analyzer,boolean expand,boolean lenient, String format, String location) {this.analyzer = analyzer;this.expand = expand;this.lenient = lenient;this.format = format;this.env = env;this.location = location;this.props = new Properties();//读取当前 jar 包存放的路径Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource().getLocation().getPath()).getParent(), "config").toAbsolutePath();this.conf_dir = filePath.resolve(DB_PROPERTIES);//判断文件是否存在File configFile = conf_dir.toFile();InputStream input = null;try {input = new FileInputStream(configFile);} catch (FileNotFoundException e) {logger.info("jdbc-reload.properties not find. " + e);}if (input != null) {try {props.load(input);} catch (IOException e) {logger.error("fail to load the jdbc-reload.properties," + e);}}isNeedReloadSynonymMap();}/*** 加载同义词词典至SynonymMap中* @return SynonymMap*/@Overridepublic SynonymMap reloadSynonymMap() {try {logger.info("start reload local synonym from {}.", location);Reader rulesReader = getReader();SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);return parser.build();} catch (Exception e) {logger.error("reload local synonym {} error!", e, location);throw new IllegalArgumentException("could not reload local synonyms file to build synonyms", e);}}/*** 判断是否需要进行重新加载* @return true or false*/@Overridepublic boolean isNeedReloadSynonymMap() {try {Long lastModify = getLastModify();if (lastModified < lastModify) {lastModified = lastModify;return true;}} catch (Exception e) {logger.error(e);}return false;}/*** 获取同义词库最后一次修改的时间* 用于判断同义词是否需要进行重新加载** @return getLastModify*/public Long getLastModify() {ResultSet resultSet = null;Long last_modify_long = null;try {if (connection == null || statement == null) {Class.forName(props.getProperty("jdbc.driver"));connection = DriverManager.getConnection(props.getProperty("jdbc.url"),props.getProperty("jdbc.user"),props.getProperty("jdbc.password"));statement = connection.createStatement();}resultSet = statement.executeQuery(props.getProperty("jdbc.lastModified.synonym.sql"));while (resultSet.next()) {Timestamp last_modify_dt = resultSet.getTimestamp("last_modify_dt");last_modify_long = last_modify_dt.getTime();}} catch (ClassNotFoundException | SQLException e) {logger.error("获取同义词库最后一次修改的时间",e);} finally {try {if (resultSet != null) {resultSet.close();}} catch (SQLException e) {e.printStackTrace();}}return last_modify_long;}/*** 查询数据库中的同义词* @return DBData*/public ArrayList<String> getDBData() {ArrayList<String> arrayList = new ArrayList<>();ResultSet resultSet = null;try {if (connection == null || statement == null) {Class.forName(props.getProperty("jdbc.driver"));connection = DriverManager.getConnection(props.getProperty("jdbc.url"),props.getProperty("jdbc.user"),props.getProperty("jdbc.password"));statement = connection.createStatement();}resultSet = statement.executeQuery(props.getProperty("jdbc.reload.synonym.sql"));while (resultSet.next()) {String theWord = resultSet.getString("word");arrayList.add(theWord);}} catch (ClassNotFoundException | SQLException e) {logger.error("查询数据库中的同义词异常",e);} finally {try {if (resultSet != null) {resultSet.close();}} catch (SQLException e) {e.printStackTrace();}}return arrayList;}/*** 同义词库的加载* @return Reader*/@Overridepublic Reader getReader() {StringBuffer sb = new StringBuffer();try {ArrayList<String> dbData = getDBData();for (int i = 0; i < dbData.size(); i++) {logger.info("load the synonym from db," + dbData.get(i));sb.append(dbData.get(i)).append(System.getProperty("line.separator"));}} catch (Exception e) {logger.error("reload synonym from db failed");}return new StringReader(sb.toString());}
}

修改DynamicSynonymTokenFilterFactory类中的getSynonymFile方法:

添加选择远程直连数据库方法

SynonymFile getSynonymFile(Analyzer analyzer) {try {SynonymFile synonymFile;if (location.equals("fromDB")){synonymFile = new DBRemoteSynonymFile(environment, analyzer, expand, lenient, format,location);} else if (location.startsWith("http://") || location.startsWith("https://")) {synonymFile = new RemoteSynonymFile(environment, analyzer, expand, lenient, format,location);} else {synonymFile = new LocalSynonymFile(environment, analyzer, expand, lenient, format,location);}if (scheduledFuture == null) {scheduledFuture = pool.scheduleAtFixedRate(new Monitor(synonymFile),interval, interval, TimeUnit.SECONDS);}return synonymFile;} catch (Exception e) {throw new IllegalArgumentException("failed to get synonyms : " + location, e);}}

3)、添加数据maven库依赖

  • 在pom.xml文件中添加数据库依赖

    <!-- 新增oracle依赖 -->
    <dependency><groupId>com.oracle.ojdbc</groupId><artifactId>ojdbc8</artifactId><version>19.3.0.0</version>
    </dependency>
    
  • 根据es版本修改ik对应版本

    <version>7.2.0</version>
    
  • 在src\main\assemblies\plugin.xml中添加配置使得数据库相关依赖一并打包

    在中添加:

    <dependencySet><outputDirectory/><useProjectArtifact>true</useProjectArtifact><useTransitiveFiltering>true</useTransitiveFiltering><includes><include>com.oracle.ojdbc:ojdbc8</include></includes>
    </dependencySet>
    

    空白处添加打包配置文件配置

    <fileSets><fileSet><directory>${project.basedir}/config</directory><outputDirectory>config</outputDirectory></fileSet>
    </fileSets>
    

4)、maven打包项目

3、创建索引

修改后的同义词插件使用例子:

PUT syno_v2
{"settings": {"index":{"number_of_shards": "3","number_of_replicas": "1","max_result_window": "200000","analysis": {"filter":{"remote_syno_filter":{"type":"dynamic_synonym","synonyms_path":"fromDB","interval": 120}},"ik_max_syno": {"type":"custom","tokenizer": "ik_max_word","filter": ["lowercase","remote_syno_filter"]}}}}},"mappings": {"properties": {"keyword": {"type": "text","analyzer": "ik_max_syno"}}}
}

4、注意事项

如更新ik插件以后,出现报错如下:

java.security.AccessControlException: access denied (java.net.SocketPermission172.16.xxx.xxx:3306 connect,resolve)

这是jar的安全策略的错误(具体没有深究),解决方案如下:

1、在ik源码的config中创建文件socketPolicy.policy

grant {permission java.net.SocketPermission "business.mysql.youboy.com:3306","connect,resolve";
};

2、在服务器上的es中的config目录文件jvm.option添加如下代码配置上面的文件路径

-Djava.security.policy=/data/elasticsearch-6.5.3/plugins/ik/config/socketPolicy.policy

5、参考文章

https://blog.csdn.net/weixin_43315211/article/details/100144968

ps:ik分词器实现词库热更新文章链接

Elasticsearch 同义词(dynamic-synonym插件)远程热词更新相关推荐

  1. Elasticsearch 同义词(dynamic-synonym)远程数据库加载

    说明 Elasticsearch 版本7.2.0 同义词插件:elasticsearch-analysis-dynamic-synonym 无停机动态远程更新同义词 1.下载同义词插件 下载地址: h ...

  2. ik分词器的热词更新_ik与拼音分词器,拓展热词/停止词库

    说明:本篇文章讲述elasticsearch分词器插件的安装,热词库停止词库的拓展,文章后面提到elasticsearch ,都是以es简称. 以下分词器的安装以ik分词器和pinyin分词器为例说明 ...

  3. es自建搜索词库_ElasticSearch-IK拓展自定义词库(2):HTTP请求动态热词内容方式...

    上一章节(https://my.oschina.net/jsonyang/blog/1643032)我们介绍了使用热词文件形式拓展词库,这样的好处是方便简单,但是如果公司运营人员来直接管理这个东西的话 ...

  4. ES学习(五)同义词分词器dynamic synonym for ElasticSearch

    dynamic synonym for ElasticSearch elasticsearch动态同义词插件是添加一个同义词过滤器在给定间隔(默认60秒)来重新加载同义词文件(本地文件或远程文件). ...

  5. elasticsearch-analysis-dynamic-synonym同义词插件实现热更

    1 安装elasticsearch-analysis-dynamic-synonym插件 下载地址:GitHub - bells/elasticsearch-analysis-dynamic-syno ...

  6. Elasticsearch热词(新词/自定义词)更新配置

    网络词语日新月异,如何让新出的网络热词(或特定的词语)实时的更新到我们的搜索当中呢 先用 ik 测试一下 : curl -XGET 'http://localhost:9200/_analyze?pr ...

  7. Elasticsearch 7.X Ik源码解读,及自定义远程动态词库

    一.ik 远程词库 上篇文章对ik进行了整体的讲解,包括远程动态词库的讲解,但是上篇文章中是基于nginx+静态txt文件实现的,利用nginx 对文件修改后自动添加Last-Modified 的属性 ...

  8. java 搜索热词插件_SpringBoot结合内嵌Redis实现热词搜索功能

    需求说明 需要实现一个检索功能,需要查询到最近所有的所有热词,自定需求为所有一个月内检索数量最多的10个热词:这里使用Redis的内存数据库功能,其中Redis的ZSet格式提供的功能完全贴合该需求: ...

  9. mysql读数据入库es_ES 实现实时从Mysql数据库中读取热词,停用词

    IK分词器虽然自带词库 但是在实际开发应用中对于词库的灵活度的要求是远远不够的,IK分词器虽然配置文件中能添加扩展词库,但是需要重启ES 这章就当写一篇扩展了 其实IK本身是支持热更新词库的,但是需要 ...

最新文章

  1. uboot启动流程概述_关于RISCV启动部分的思考~
  2. ACM《数据结构》顺序表
  3. 汇编中16进制的写法问题
  4. mfc cimage加载显示图片_在微信小程序里实现图片预加载组件
  5. Python入门(一) 异常处理
  6. jdk1.5的类转换成jdk1.4的类文件
  7. HDU - 2087 剪花布条(kmp)
  8. 精品软件 推荐 TM2013 性能不好的电脑可以用这软件替代QQ
  9. Linux usb3.0 xhci,解决Usb3.0/3.1(XHCI)和磁盘控制器(SRS)驱动 总裁USM、CeoMSX神兵利器
  10. powerbuilder防止反编译: pbkiller无法解析的部分公布
  11. 机器学习之协方差矩阵、黑塞矩阵、标准差椭圆和EM算法
  12. 简单线性回归的应用及画图(一)
  13. 汽车c语言标准 misra,MATLAB 和 Simulink 中的 MISRA C 支持
  14. css td中画斜线,css 模拟表格斜线
  15. Unity - 射线检测
  16. java pdf 插入图片_java在pdf模板的指定位置插入图片
  17. scscanner:一款功能强大的大规模状态码扫描工具
  18. Unity 模型导入材质丢失解决方案
  19. lua对接bmob数据库
  20. 不小心把苹果手机备忘录删掉怎么恢复

热门文章

  1. 使用Python向mysql导.sql、.xlsx、.csv方法
  2. Ubuntu 16.04 桌面版使用体验报告
  3. 2018年百度算法大盘点
  4. WHIP WHEP:WebRTC 是直播的未来吗?
  5. 程序员的全新的兼职工作方式
  6. 1月28日 io线程和进程
  7. matlab 画三维图 及 画图
  8. C++多线程----进程与线程区别
  9. 神木林服务器未能,梦幻西游:一回合扫4次的大唐你见过吗?魔花果山比神木林好用吗...
  10. 如何用PPT做九宫格,来了解一下吧!