Elasticsearch 同义词(dynamic-synonym插件)远程热词更新
文章目录
- Elasticsearch 同义词(dynamic-synonym)远程热词更新
- 零、版本说明
- 一、同义词本地文件读取方式(可不用插件)
- 1、添加同义词文件
- 2、创建索引,并配置同义词过滤
- 3、测试效果
- 二、同义词插件远程词库调用
- 1、同义词插件官网说明:
- 2、在服务中实现http请求,并连接数据库实现热词管理实例:
- 3、根据远程请求创建索引:
- 三、重写同义词插件源码连接mysql/oracle更新词库
- 1、下载同义词插件
- 2、修改ik插件源码(以oracle为例,mysql对应修改配置即可)
- 1)、添加jdbc配置文件
- 2)、创建DBRemoteSynonymFile类
- 3)、添加数据maven库依赖
- 4)、maven打包项目
- 3、创建索引
- 4、注意事项
- 5、参考文章
Elasticsearch 同义词(dynamic-synonym)远程热词更新
零、版本说明
elasticsearch7.2.0
同义词插件:elasticsearch-analysis-dynamic-synonym
elasticsearch-analysis-dynamic-synonym插件官网
https://github.com/bells/elasticsearch-analysis-dynamic-synonym
后面内容为同义词更新词库方法,elasticsearch安装同义词插件不再赘述;
由于服务器jdk版本与es使用版本不一致,以下为es单独指定jdk版本的;
一、同义词本地文件读取方式(可不用插件)
在es内部添加同义词文件,实现同义词查询,es已内置该功能;
官方说明文档地址如下:
https://www.elastic.co/guide/en/elasticsearch/reference/7.2/analysis-synonym-tokenfilter.html
简单使用方法:
1、添加同义词文件
在elasticsearch安装目录下的config目录下(elasticsearch-7.2.0/config)新建synonyms.txt文本;
并在文本内添加同义词如(英文逗号分隔):
美元,美金,美币
苹果,iphone
2、创建索引,并配置同义词过滤
PUT syno_v1
{"settings": {"index":{"number_of_shards": "3","number_of_replicas": "1","max_result_window": "200000","analysis": {"filter":{"my_syno_filter":{"type":"synonym","synonyms_path":"synonyms.txt"}},"ik_max_syno": {"type":"custom","tokenizer": "ik_max_word","filter": ["lowercase","my_syno_filter"]}}}}},"mappings": {"properties": {"keyword": {"type": "text","analyzer": "ik_max_syno"}}}
}
3、测试效果
dsl:
GET syno_v1/_analyze
{"analyzer": "ik_max_syno","text": "苹果"
}
结果:
{"tokens" : [{"token" : "苹果","start_offset" : 0,"end_offset" : 2,"type" : "CN_WORD","position" : 0},{"token" : "iphone","start_offset" : 0,"end_offset" : 2,"type" : "SYNONYM","position" : 0}]
}
二、同义词插件远程词库调用
1、同义词插件官网说明:
Example:
{"index" : {"analysis" : {"analyzer" : {"synonym" : {"tokenizer" : "whitespace","filter" : ["remote_synonym"]}},"filter" : {"remote_synonym" : {"type" : "dynamic_synonym","synonyms_path" : "http://host:port/synonym.txt","interval": 30},"local_synonym" : {"type" : "dynamic_synonym","synonyms_path" : "synonym.txt"}}}}
}
Configuration:
synonyms_path
: A file path relative to the Elastic config file or an URL, mandatory
相对于Elastic配置文件或URL的文件路径(必填)
interval
: Refresh interval in seconds for the synonym file, default: 60
, optional
同义词文件的刷新间隔(以秒为单位),默认值:60,可选
ignore_case
: Ignore case in synonyms file, default: false
, optional
忽略同义词文件中的大小写,默认值:false,可选
expand
: Expand, default: true
, optional
lenient
: Lenient on exception thrown when importing a synonym, default: false
, optional
format
: Synonym file format, default: ''
, optional. For WordNet structure this can be set to 'wordnet'
2、在服务中实现http请求,并连接数据库实现热词管理实例:
@RestController
@RequestMapping("/synonym")
@Slf4j
public class SynonymController {private String lastModified = new Date().toString();private String etag = String.valueOf(System.currentTimeMillis());@RequestMapping(value = "/word", method = {RequestMethod.GET,RequestMethod.HEAD}, produces="text/html;charset=UTF-8")public String getSynonymWord(HttpServletResponse response){response.setHeader("Last-Modified",lastModified);response.setHeader("ETag",etag);//response.setHeader("If-Modified-Since",lastModified);Connection conn = null;Statement stmt = null;ResultSet rs = null;StringBuilder words = new StringBuilder();try {Class.forName("oracle.jdbc.driver.OracleDriver");conn = DriverManager.getConnection("jdbc:oracle:thin:@192.168.114.13:1521:xe","test","test");stmt = conn.createStatement();rs = stmt.executeQuery("select word from SYNONYM_WORD where status=0");while(rs.next()) {String theWord = rs.getString("word");System.out.println("hot word from mysql: " + theWord);words.append(theWord);words.append("\n");}return words.toString();} catch (Exception e) {e.printStackTrace();} finally {if(rs != null) {try {rs.close();} catch (SQLException e) {log.error("资源关闭异常:",e);}}if(stmt != null) {try {stmt.close();} catch (SQLException e) {log.error("资源关闭异常:",e);}}if(conn != null) {try {conn.close();} catch (SQLException e) {log.error("资源关闭异常:",e);}}}return null;}@RequestMapping(value = "/update", method = RequestMethod.GET)public void updateModified(){lastModified = new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());etag = String.valueOf(System.currentTimeMillis());}
}
注:
- updateModified方法为单独更新lastModified与etag,用于判断ik是否需要重新加载远程词库,具体关联数据库操作代码时自行扩展
3、根据远程请求创建索引:
PUT syno_v2
{"settings": {"index":{"number_of_shards": "3","number_of_replicas": "1","max_result_window": "200000","analysis": {"filter":{"remote_syno_filter":{"type":"dynamic_synonym","synonyms_path":"http://192.168.xx.xx:8080/synonym/word"}},"ik_max_syno": {"type":"custom","tokenizer": "ik_max_word","filter": ["lowercase","remote_syno_filter"]}}}}},"mappings": {"properties": {"keyword": {"type": "text","analyzer": "ik_max_syno"}}}
}
三、重写同义词插件源码连接mysql/oracle更新词库
1、下载同义词插件
https://github.com/bells/elasticsearch-analysis-dynamic-synonym
2、修改ik插件源码(以oracle为例,mysql对应修改配置即可)
1)、添加jdbc配置文件
在项目根目录下创建config目录并创建config\jdbc-reload.properties配置文件:
jdbc.url=jdbc:oracle:thin:@192.168.xx.xx:1521:xe
jdbc.user=test
jdbc.password=test
jdbc.reload.synonym.sql=SELECT word FROM TEST.SYNONYM_WORD WHERE STATUS = 0
jdbc.lastModified.synonym.sql=SELECT MAX(UPDATE_TIME) AS last_modify_dt FROM TEST.SYNONYM_WORD
jdbc.driver=oracle.jdbc.driver.OracleDriver
2)、创建DBRemoteSynonymFile类
在目录analysis下创建DBRemoteSynonymFile类,
具体位置:
src\main\java\com\bellszhu\elasticsearch\plugin\synonym\analysis\DBRemoteSynonymFile.java
内容:
package com.bellszhu.elasticsearch.plugin.synonym.analysis;import com.bellszhu.elasticsearch.plugin.DynamicSynonymPlugin;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.elasticsearch.common.io.PathUtils;
import org.elasticsearch.env.Environment;import java.io.*;
import java.nio.file.Path;
import java.sql.*;
import java.util.ArrayList;
import java.util.Properties;/*** com.bellszhu.elasticsearch.plugin.synonym.analysis* author: Yic.z* date: 2020-08-04*/
public class DBRemoteSynonymFile implements SynonymFile {// 配置文件名private final static String DB_PROPERTIES = "jdbc-reload.properties";private static Logger logger = LogManager.getLogger("dynamic-synonym");private String format;private boolean expand;private boolean lenient;private Analyzer analyzer;private Environment env;// 数据库配置private String location;private long lastModified;private Connection connection = null;private Statement statement = null;private Properties props;private Path conf_dir;DBRemoteSynonymFile(Environment env, Analyzer analyzer,boolean expand,boolean lenient, String format, String location) {this.analyzer = analyzer;this.expand = expand;this.lenient = lenient;this.format = format;this.env = env;this.location = location;this.props = new Properties();//读取当前 jar 包存放的路径Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource().getLocation().getPath()).getParent(), "config").toAbsolutePath();this.conf_dir = filePath.resolve(DB_PROPERTIES);//判断文件是否存在File configFile = conf_dir.toFile();InputStream input = null;try {input = new FileInputStream(configFile);} catch (FileNotFoundException e) {logger.info("jdbc-reload.properties not find. " + e);}if (input != null) {try {props.load(input);} catch (IOException e) {logger.error("fail to load the jdbc-reload.properties," + e);}}isNeedReloadSynonymMap();}/*** 加载同义词词典至SynonymMap中* @return SynonymMap*/@Overridepublic SynonymMap reloadSynonymMap() {try {logger.info("start reload local synonym from {}.", location);Reader rulesReader = getReader();SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);return parser.build();} catch (Exception e) {logger.error("reload local synonym {} error!", e, location);throw new IllegalArgumentException("could not reload local synonyms file to build synonyms", e);}}/*** 判断是否需要进行重新加载* @return true or false*/@Overridepublic boolean isNeedReloadSynonymMap() {try {Long lastModify = getLastModify();if (lastModified < lastModify) {lastModified = lastModify;return true;}} catch (Exception e) {logger.error(e);}return false;}/*** 获取同义词库最后一次修改的时间* 用于判断同义词是否需要进行重新加载** @return getLastModify*/public Long getLastModify() {ResultSet resultSet = null;Long last_modify_long = null;try {if (connection == null || statement == null) {Class.forName(props.getProperty("jdbc.driver"));connection = DriverManager.getConnection(props.getProperty("jdbc.url"),props.getProperty("jdbc.user"),props.getProperty("jdbc.password"));statement = connection.createStatement();}resultSet = statement.executeQuery(props.getProperty("jdbc.lastModified.synonym.sql"));while (resultSet.next()) {Timestamp last_modify_dt = resultSet.getTimestamp("last_modify_dt");last_modify_long = last_modify_dt.getTime();}} catch (ClassNotFoundException | SQLException e) {logger.error("获取同义词库最后一次修改的时间",e);} finally {try {if (resultSet != null) {resultSet.close();}} catch (SQLException e) {e.printStackTrace();}}return last_modify_long;}/*** 查询数据库中的同义词* @return DBData*/public ArrayList<String> getDBData() {ArrayList<String> arrayList = new ArrayList<>();ResultSet resultSet = null;try {if (connection == null || statement == null) {Class.forName(props.getProperty("jdbc.driver"));connection = DriverManager.getConnection(props.getProperty("jdbc.url"),props.getProperty("jdbc.user"),props.getProperty("jdbc.password"));statement = connection.createStatement();}resultSet = statement.executeQuery(props.getProperty("jdbc.reload.synonym.sql"));while (resultSet.next()) {String theWord = resultSet.getString("word");arrayList.add(theWord);}} catch (ClassNotFoundException | SQLException e) {logger.error("查询数据库中的同义词异常",e);} finally {try {if (resultSet != null) {resultSet.close();}} catch (SQLException e) {e.printStackTrace();}}return arrayList;}/*** 同义词库的加载* @return Reader*/@Overridepublic Reader getReader() {StringBuffer sb = new StringBuffer();try {ArrayList<String> dbData = getDBData();for (int i = 0; i < dbData.size(); i++) {logger.info("load the synonym from db," + dbData.get(i));sb.append(dbData.get(i)).append(System.getProperty("line.separator"));}} catch (Exception e) {logger.error("reload synonym from db failed");}return new StringReader(sb.toString());}
}
修改DynamicSynonymTokenFilterFactory类中的getSynonymFile方法:
添加选择远程直连数据库方法
SynonymFile getSynonymFile(Analyzer analyzer) {try {SynonymFile synonymFile;if (location.equals("fromDB")){synonymFile = new DBRemoteSynonymFile(environment, analyzer, expand, lenient, format,location);} else if (location.startsWith("http://") || location.startsWith("https://")) {synonymFile = new RemoteSynonymFile(environment, analyzer, expand, lenient, format,location);} else {synonymFile = new LocalSynonymFile(environment, analyzer, expand, lenient, format,location);}if (scheduledFuture == null) {scheduledFuture = pool.scheduleAtFixedRate(new Monitor(synonymFile),interval, interval, TimeUnit.SECONDS);}return synonymFile;} catch (Exception e) {throw new IllegalArgumentException("failed to get synonyms : " + location, e);}}
3)、添加数据maven库依赖
在pom.xml文件中添加数据库依赖
<!-- 新增oracle依赖 --> <dependency><groupId>com.oracle.ojdbc</groupId><artifactId>ojdbc8</artifactId><version>19.3.0.0</version> </dependency>
根据es版本修改ik对应版本
<version>7.2.0</version>
在src\main\assemblies\plugin.xml中添加配置使得数据库相关依赖一并打包
在中添加:
<dependencySet><outputDirectory/><useProjectArtifact>true</useProjectArtifact><useTransitiveFiltering>true</useTransitiveFiltering><includes><include>com.oracle.ojdbc:ojdbc8</include></includes> </dependencySet>
空白处添加打包配置文件配置
<fileSets><fileSet><directory>${project.basedir}/config</directory><outputDirectory>config</outputDirectory></fileSet> </fileSets>
4)、maven打包项目
3、创建索引
修改后的同义词插件使用例子:
PUT syno_v2
{"settings": {"index":{"number_of_shards": "3","number_of_replicas": "1","max_result_window": "200000","analysis": {"filter":{"remote_syno_filter":{"type":"dynamic_synonym","synonyms_path":"fromDB","interval": 120}},"ik_max_syno": {"type":"custom","tokenizer": "ik_max_word","filter": ["lowercase","remote_syno_filter"]}}}}},"mappings": {"properties": {"keyword": {"type": "text","analyzer": "ik_max_syno"}}}
}
4、注意事项
如更新ik插件以后,出现报错如下:
java.security.AccessControlException: access denied (java.net.SocketPermission172.16.xxx.xxx:3306 connect,resolve)
这是jar的安全策略的错误(具体没有深究),解决方案如下:
1、在ik源码的config中创建文件socketPolicy.policy
grant {permission java.net.SocketPermission "business.mysql.youboy.com:3306","connect,resolve";
};
2、在服务器上的es中的config目录文件jvm.option添加如下代码配置上面的文件路径
-Djava.security.policy=/data/elasticsearch-6.5.3/plugins/ik/config/socketPolicy.policy
5、参考文章
https://blog.csdn.net/weixin_43315211/article/details/100144968
ps:ik分词器实现词库热更新文章链接
Elasticsearch 同义词(dynamic-synonym插件)远程热词更新相关推荐
- Elasticsearch 同义词(dynamic-synonym)远程数据库加载
说明 Elasticsearch 版本7.2.0 同义词插件:elasticsearch-analysis-dynamic-synonym 无停机动态远程更新同义词 1.下载同义词插件 下载地址: h ...
- ik分词器的热词更新_ik与拼音分词器,拓展热词/停止词库
说明:本篇文章讲述elasticsearch分词器插件的安装,热词库停止词库的拓展,文章后面提到elasticsearch ,都是以es简称. 以下分词器的安装以ik分词器和pinyin分词器为例说明 ...
- es自建搜索词库_ElasticSearch-IK拓展自定义词库(2):HTTP请求动态热词内容方式...
上一章节(https://my.oschina.net/jsonyang/blog/1643032)我们介绍了使用热词文件形式拓展词库,这样的好处是方便简单,但是如果公司运营人员来直接管理这个东西的话 ...
- ES学习(五)同义词分词器dynamic synonym for ElasticSearch
dynamic synonym for ElasticSearch elasticsearch动态同义词插件是添加一个同义词过滤器在给定间隔(默认60秒)来重新加载同义词文件(本地文件或远程文件). ...
- elasticsearch-analysis-dynamic-synonym同义词插件实现热更
1 安装elasticsearch-analysis-dynamic-synonym插件 下载地址:GitHub - bells/elasticsearch-analysis-dynamic-syno ...
- Elasticsearch热词(新词/自定义词)更新配置
网络词语日新月异,如何让新出的网络热词(或特定的词语)实时的更新到我们的搜索当中呢 先用 ik 测试一下 : curl -XGET 'http://localhost:9200/_analyze?pr ...
- Elasticsearch 7.X Ik源码解读,及自定义远程动态词库
一.ik 远程词库 上篇文章对ik进行了整体的讲解,包括远程动态词库的讲解,但是上篇文章中是基于nginx+静态txt文件实现的,利用nginx 对文件修改后自动添加Last-Modified 的属性 ...
- java 搜索热词插件_SpringBoot结合内嵌Redis实现热词搜索功能
需求说明 需要实现一个检索功能,需要查询到最近所有的所有热词,自定需求为所有一个月内检索数量最多的10个热词:这里使用Redis的内存数据库功能,其中Redis的ZSet格式提供的功能完全贴合该需求: ...
- mysql读数据入库es_ES 实现实时从Mysql数据库中读取热词,停用词
IK分词器虽然自带词库 但是在实际开发应用中对于词库的灵活度的要求是远远不够的,IK分词器虽然配置文件中能添加扩展词库,但是需要重启ES 这章就当写一篇扩展了 其实IK本身是支持热更新词库的,但是需要 ...
最新文章
- uboot启动流程概述_关于RISCV启动部分的思考~
- ACM《数据结构》顺序表
- 汇编中16进制的写法问题
- mfc cimage加载显示图片_在微信小程序里实现图片预加载组件
- Python入门(一) 异常处理
- jdk1.5的类转换成jdk1.4的类文件
- HDU - 2087 剪花布条(kmp)
- 精品软件 推荐 TM2013 性能不好的电脑可以用这软件替代QQ
- Linux usb3.0 xhci,解决Usb3.0/3.1(XHCI)和磁盘控制器(SRS)驱动 总裁USM、CeoMSX神兵利器
- powerbuilder防止反编译: pbkiller无法解析的部分公布
- 机器学习之协方差矩阵、黑塞矩阵、标准差椭圆和EM算法
- 简单线性回归的应用及画图(一)
- 汽车c语言标准 misra,MATLAB 和 Simulink 中的 MISRA C 支持
- css td中画斜线,css 模拟表格斜线
- Unity - 射线检测
- java pdf 插入图片_java在pdf模板的指定位置插入图片
- scscanner:一款功能强大的大规模状态码扫描工具
- Unity 模型导入材质丢失解决方案
- lua对接bmob数据库
- 不小心把苹果手机备忘录删掉怎么恢复