Elasticsearch 带中文分词的全文检索(分页+高亮返回)
一.全文搜索介绍
Full text queries 全文搜索主要有以下几种类型:
1.1 匹配查询(match query)
QueryBuilder qb = matchQuery( "name", //field 字段 "kimchy elasticsearch" // text
);
DSL 查询语句:
GET /_search
{"query": {"match" : {"message" : "this is a test"}}
}
1.2 多字段查询(multi_match query)
可以用来对多个字段的版本进行匹配查询
QueryBuilder qb = multiMatchQuery( "kimchy elasticsearch", //text "user", "message" //fields 多个字段
);
DSL查询语句:
GET /_search
{"query": {"multi_match" : {"query": "this is a test", "fields": [ "subject", "message" ] }}
}
1.3 常用术语查询(common_terms query)
可以对一些比较专业的偏门词语进行的更加专业的查询
QueryBuilder qb = commonTermsQuery(
"name", //field 字段
"kimchy"); // value
DSL查询语句:
GET /_search
{"query": {"common": {"body": {"query": "this is bonsai cool","cutoff_frequency": 0.001}}}
}
1.4 查询语句查询(query_string query)
与lucene查询语句的语法结合的更加紧密的一种查询,允许你在一个查询语句中使用多个特殊条件关键字(如:AND|OR|NOT )对多个字段进行查询,这种查询仅限专家用户去使用。
QueryBuilder qb = queryStringQuery("+kimchy -elasticsearch"); //text
DSL查询语句:
GET /_search
{"query": {"query_string" : {"default_field" : "content","query" : "this AND that OR thus"}}
}
以上四种是全文搜索可以用到的查询方式,但是一般使用多字段查询(multi_match query)比较多,这里重点写下第二种方式的使用。
二.使用multi_match query的方式实现全文多字段的匹配查询
2.1 检索服务
实现一个关键字分词匹配多个字段,分页查询,命中字段高亮显示
private <T> SearchDto<T> getResult(ShipQueryDto shipQueryDto, String indexName, Class<T> clazz) throws IOException, IllegalAccessException {SearchRequest searchRequest = new SearchRequest();searchRequest.indices(indexName);SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();/* 高亮查询 */HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.numOfFragments(0); /*长度*/highlightBuilder.preTags("<span style='color:red;'>");highlightBuilder.postTags("</span>");highlightBuilder.highlighterType("plain");for (String name : EsSmartIndexHelper.classMapMap.get(clazz).keySet()) {highlightBuilder.field(name).requireFieldMatch(false);}sourceBuilder.highlighter(highlightBuilder);BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();if(StringUtils.isNotEmpty(shipQueryDto.getKeys())){boolQueryBuilder.must(QueryBuilders.multiMatchQuery(shipQueryDto.getKeys()).fields(EsSmartIndexHelper.classMapMap.get(clazz)).type(MultiMatchQueryBuilder.Type.CROSS_FIELDS)
// .minimumShouldMatch("70%")//使用最细粒度分词搜索.analyzer("ik_max_word").operator(Operator.OR));}sourceBuilder.query(boolQueryBuilder);// 分页Integer from = (shipQueryDto.getPageNum()-1) * shipQueryDto.getPageSize();sourceBuilder.from(from);sourceBuilder.size(shipQueryDto.getPageSize());sourceBuilder.trackTotalHits(true);searchRequest.source(sourceBuilder);log.error("查询的DSL语句: " + searchRequest.source().toString());SearchResponse searchRes = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);log.error("返回原始数据 : " + searchRes);SearchHit[] hits = searchRes.getHits().getHits();List<T> searchShipCbgkDtos = new ArrayList<>();for (SearchHit hit : hits) {String json = hit.getSourceAsString();T shipDto = JSONObject.parseObject(json, clazz);//获取高亮字段Map<String, HighlightField> highlightFields = hit.getHighlightFields();if(CollectionUtil.isNotEmpty(highlightFields)){//获取class子类的字段Field[] fields =clazz.getDeclaredFields();//获取class继承父类的字段Field[] fields1 = clazz.getSuperclass().getDeclaredFields();//字段高亮处理for (Field field : fields1) {field.setAccessible(true);if (highlightFields.containsKey(field.getName())){HighlightField highlightField = highlightFields.get(field.getName());Text[] fragments = highlightField.fragments();StringBuilder text = new StringBuilder();for (Text fragment : fragments) {text.append(fragment.toString());}field.set(shipDto, text.toString());}}for (Field field : fields) {field.setAccessible(true);if (highlightFields.containsKey(field.getName())){HighlightField highlightField = highlightFields.get(field.getName());Text[] fragments = highlightField.fragments();StringBuilder text = new StringBuilder();for (Text fragment : fragments) {text.append(fragment.toString());}field.set(shipDto, text.toString());}}}searchShipCbgkDtos.add(shipDto);}SearchDto<T> searchDto = new SearchDto<>();searchDto.setTotal(searchRes.getHits().getTotalHits().value);searchDto.setSearchShips(searchShipCbgkDtos);return searchDto;}@Overridepublic SearchDto<SearchShipCbgkDto> searchShip(ShipQueryDto shipQueryDto) throws IOException, IllegalAccessException {return getResult(shipQueryDto, EsIndex.INDEX_SEAT_SEARCH_SHIP_CBGK.getStatus(), SearchShipCbgkDto.class);}
import lombok.Data;import java.util.List;/*** 搜索返回实体* @param <T>*/
@Data
public class SearchDto<T> {/** 该库数量 */private Long total;/** 该库返回列表 */private List<T> searchShips;
}
import java.util.HashMap;
import java.util.Map;/*** 全文搜索匹配的字段和权重*/
public class EsSmartIndexHelper {public static Map<String, Float> shipCbgkfields = new HashMap<String, Float>();public static HashMap<Class<? extends BaseSearchDto>, Map<String, Float>> classMapMap = new HashMap<Class<? extends BaseSearchDto>, Map<String, Float>>();static {//船舶库classMapMap.put(SearchShipCbgkDto.class, shipCbgkfields);shipCbgkfields.put("shipName", 2.5f);// "shipId",shipCbgkfields.put("shipRegistryPort", 1.8f);// "shipOwnerId",shipCbgkfields.put("shipOwnerName", 1.5f);shipCbgkfields.put("shipOwnerSex", 1f);shipCbgkfields.put("shipOwnerTel", 1f);shipCbgkfields.put("shipOwnerIdNumber", 1.1f);shipCbgkfields.put("deptId", 1f);shipCbgkfields.put("createTime", 1f);shipCbgkfields.put("bdsTerminalNo", 1.3f);shipCbgkfields.put("mmsi", 1.3f);}
}
2.2 检索的DSL语句
Get /index/queryShip?keys=琼海口渔&pageNum=1&pageSize=10
GET index_test_search_ship/_search
{"from": 0,"size": 20,"query": {"bool": {"must": [{"multi_match": {"query": "琼海口渔","fields": ["bdsTerminalNo^1.3", "createTime^1.0", "deptId^1.0", "mmsi^1.3", "shipName^2.5", "shipOwnerIdNumber^1.1", "shipOwnerName^1.5", "shipOwnerSex^1.0", "shipOwnerTel^1.0", "shipRegistryPort^1.8"],"type": "cross_fields","operator": "OR","analyzer": "ik_max_word","slop": 0,"prefix_length": 0,"max_expansions": 50,"zero_terms_query": "NONE","auto_generate_synonyms_phrase_query": true,"fuzzy_transpositions": true,"boost": 1.0}}],"adjust_pure_negative": true,"boost": 1.0}},"track_total_hits": 2147483647,"highlight": {"pre_tags": ["<span style='color:red;'>"],"post_tags": ["</span>"],"number_of_fragments": 0,"type": "plain","require_field_match": false,"fields": {"shipOwnerName": {},"shipOwnerTel": {},"createTime": {},"mmsi": {},"bdsTerminalNo": {},"deptId": {},"shipName": {},"shipOwnerSex": {},"shipOwnerIdNumber": {},"shipRegistryPort": {}}}
}
2.3 返回的原始JSON数据
{"took" : 4,"timed_out" : false,"_shards" : {"total" : 1,"successful" : 1,"skipped" : 0,"failed" : 0},"hits" : {"total" : {"value" : 20,"relation" : "eq"},"max_score" : 7.7624564,"hits" : [{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "bpMhcYQB4gQEvltnaqX-","_score" : 7.7624564,"_source" : {"shipId" : "01","shipName" : "琼海口渔","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "b5MhcYQB4gQEvltnbaUM","_score" : 7.7624564,"_source" : {"shipId" : "01","shipName" : "琼海口渔","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "U5PBb4QB4gQEvltnIKV-","_score" : 7.0790462,"_source" : {"shipId" : "01","shipName" : "013234琼海口渔","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["013234<span style='color:red;'>琼海口渔</span>"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "VJPEb4QB4gQEvltnm6Uz","_score" : 7.0790462,"_source" : {"shipId" : "01","shipName" : "013913琼海口渔","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["013913<span style='color:red;'>琼海口渔</span>"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "bZMhcYQB4gQEvltnQKVb","_score" : 7.0790462,"_source" : {"shipId" : "01","shipName" : "琼海口渔013","shipOwnerName" : "","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>013"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "a5MccYQB4gQEvltnY6Ur","_score" : 7.0790462,"_source" : {"shipId" : "01","shipName" : "琼海口渔013","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>013"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "bJMccYQB4gQEvltnZaV1","_score" : 7.0790462,"_source" : {"shipId" : "01","shipName" : "琼海口渔013","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>013"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "VZPbb4QB4gQEvltnraU6","_score" : 6.506234,"_source" : {"shipId" : "01","shipName" : "013913琼海口渔","shipOwnerName" : "013913琼海口渔","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipOwnerName" : ["013913<span style='color:red;'>琼海口渔</span>"],"shipName" : ["013913<span style='color:red;'>琼海口渔</span>"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "apMWcYQB4gQEvltnT6Vt","_score" : 6.019184,"_source" : {"shipId" : "01","shipName" : "琼海口渔013 李宁","shipOwnerName" : "12341","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>013 李宁"]}},{"_index" : "index_test_search_ship","_type" : "_doc","_id" : "cpNRcYQB4gQEvltnQ6Xw","_score" : 6.019184,"_source" : {"shipId" : "01","shipName" : "琼海口渔013 李宁","shipOwnerName" : "李宁","shipOwnerTel" : "15173934187","shipOwnerIdNumber" : "430525199408136134","fullText" : "01 解决12345 时代 15173934187 430525199408136134"},"highlight" : {"shipName" : ["<span style='color:red;'>琼海口渔</span>013 李宁"]}}]}
}
2.4 接收格式化后返回的接口数据
{"code": "SUCCESS","businessCode": "0","message": "操作成功","data": {"total": 20,"searchShips": [{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "013234<span style='color:red;'>琼海口渔</span>","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "013913<span style='color:red;'>琼海口渔</span>","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>013","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>013","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>013","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "013913<span style='color:red;'>琼海口渔</span>","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "013913<span style='color:red;'>琼海口渔</span>","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>013 李宁","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "12341","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null},{"shipId": "01","shipName": "<span style='color:red;'>琼海口渔</span>013 李宁","shipRegistryPort": null,"shipOwnerId": null,"shipOwnerName": "李宁","shipOwnerSex": null,"shipOwnerTel": "15173934187","shipOwnerIdNumber": "430525199408136134","deptId": null,"createTime": null,"bdsTerminalNo": null,"mmsi": null}]}
}
可以看出,返回的字段,凡是命中关键词分词其中一个的都做高亮处理了。
Elasticsearch 带中文分词的全文检索(分页+高亮返回)相关推荐
- 【Elasticsearch】Elasticsearch analyzer 中文 分词器
1.概述 转载: https://blog.csdn.net/tzs_1041218129/article/details/77887767 分词器首先看文章:[Elasticsearch]Elast ...
- ElasticSearch:为中文分词器增加对英文的支持(让中文分词器可以处理中英文混合文档)(转)
本文地址,需转载请注明出处: http://blog.csdn.net/hereiskxm/article/details/47441911 当我们使用中文分词器的时候,其实也希望它能够支持对于英文的 ...
- 为Elasticsearch添加中文分词
Elasticsearch的中文分词很烂,所以我们需要安装ik.首先从github上下载项目,解压: cd /tmp wget https://github.com/medcl/elasticsear ...
- Elasticsearch之中文分词器插件es-ik(博主推荐)
前提 什么是倒排索引? Elasticsearch之分词器的作用 Elasticsearch之分词器的工作流程 Elasticsearch之停用词 Elasticsearch之中文分词器 Elasti ...
- ElasticSearch的中文分词
一ElasticSearch安装中文分词器 1.1 gitclone https://github.com/medcl/elasticsearch-analysis-ik 1.2 gitcheckou ...
- Elasticsearch之中文分词器
前提 什么是倒排索引? Elasticsearch之分词器的作用 Elasticsearch之分词器的工作流程 Elasticsearch之停用词 Elasticsearch的中文分词器 1.单字分词 ...
- ElasticSearch:为中文分词器增加对英文的支持(让中文分词器可以处理中英文混合文档)
本文地址,需转载请注明出处: http://blog.csdn.net/hereiskxm/article/details/47441911 当我们使用中文分词器的时候,其实也希望它能够支持对于英文的 ...
- 一行命令让ElasticSearch支持中文分词搜索
相信大家在开发博客,在线商城的时候会涉及到搜索功能.而近几年火起来的 ElasticSearch(ES)凭借其稳定.可靠.快速的实时搜索普遍受到大家的好评,连 Github.SoundCloud 也都 ...
- ElasticSearch 之中文分词器
4,中文分词器 4.1. smartCN(这个是参照别人的安装方法安装的有兴趣可以试一下) 一个简单的中文或中英文混合文本的分词器 这个插件提供 smartcn analyzer 和 smartcn ...
- php es中文分词,Elasticsearch搜索中文分词优化
Elasticsearch 中文搜索时遇到几个问题: 当搜索关键词如:"人民币"时,如果分词将"人民币"分成"人","民" ...
最新文章
- GitHub上整理的一些工具【转载】
- SQL基本点—— 思维导图
- 用mendeley在word中插入文献_Mendeley在Word添加工具栏和插入和删除文献的方法 | 科研动力...
- Django 学习笔记之七 实现分页
- 【数据竞赛】五大100%奏效的特征筛选策略
- 关于LayUI单选框渲染checked属性不生效的问题
- mysql表分区数量限制_MySQL分区表的局限和限制详解
- 轩辕炼妖录java_一个Java对象的回忆录:那些被锁住的日子
- error LNK2001: 无法解析的外部符号_wWinMainCRTStartup
- JVS公众号登陆配置
- Gitlab 登录报422错误,账号密码是对的?
- JS学习之BOM | 常见网页特效 | 轮播图 | 返回顶部 | 筋斗云案例
- python之路(1)_重要函数使用
- 洛谷P1359 租用游艇
- Python爬虫-安某某客新房和二手房
- RGB转YUV420,支持NV12(420p)和NV21(420sp)
- 基于 ESP32 的高级气象站,带有 BME280 和实时天气数据
- RAID磁盘阵列详解and软阵列(raid0,raid1,raid5,raid1+0 热备盘)配置实验
- php将阿拉伯数字转换成中文大写,PHP将阿拉伯数字转换成汉字大写支持小数点
- 新唐M261M262M263系列芯片知识总结归纳(1)