进阶-第18__深度探秘搜索技术_基于slop参数实现近似匹配以及原理剖析和相关实验
尝鲜
GET /forum/article/_search { "query": { "match_phrase": { "title": { "query": "java spark", "slop": 1 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } |
slop(移动)的含义是什么?
query string,搜索文本,中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop
slop实际移动举例
实际举例,一个query string经过几次移动之后可以匹配到一个document,然后设置slop
hello world, java is very good, spark is also very good.
java spark,match phrase,搜不到
如果我们指定了slop,那么就允许java spark进行移动,来尝试与doc进行匹配
java is very good spark is
java spark
java --> spark 移动一位
java --> spark 移动两位
java --> spark 移动三位
这里的slop,就是3,因为java spark这个短语,spark移动了3次,就可以跟一个doc匹配上了
slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。而是说,一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
slop,设置的是3,那么就ok
GET /forum/article/_search
{
"query": {
"match_phrase": {
"title": {
"query": "spark data",
"slop": 3
}
}
}
}
就可以把刚才那个doc匹配上,那个doc会作为结果返回
但是如果slop设置的是2,那么java spark,spark最多只能移动2次,此时跟doc是匹配不上的,那个doc是不会作为结果返回的
做实验,验证slop的含义
实验一
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 3 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } |
实验二
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 2 } } } } 结果 { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 0, "max_score": null, "hits": [] } } |
实验三
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "spark data", "slop": 3 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.21824157, "hits": [ { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.21824157, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith" } } ] } } |
Spark is best big data solution based on scala ,an programming language similar to java spark
spark data
--> data 移动一位
--> data 移动两位
spark --> data 移动三位
实验四增强
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "data spark", "slop": 5 } } } } 结果: { "took": 1, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 0.154366, "hits": [ { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.154366, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith" } } ] } } |
spark is best big data
data spark
--> data/spark 移动一位
spark àdata 移动两位
spark --> data 移动三位
spark --> data 移动四位
spark --> data 移动五位
slop搜索下,关键词离的越近,relevance score就会越高,做实验说明。。。
{ "took": 4, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1.3728157, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 1.3728157, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course", "author_first_name": "Smith", "author_last_name": "Williams", "new_author_last_name": "Williams", "new_author_first_name": "Smith" } }, { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.5753642, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith", "new_author_last_name": "Peter Smith", "new_author_first_name": "Tonny" } }, { "_index": "forum", "_type": "article", "_id": "1", "_score": 0.28582606, "_source": { "articleID": "XHDK-A-1293-#fJ3", "userID": 1, "hidden": false, "postDate": "2017-01-01", "tag": [ "java", "hadoop" ], "tag_cnt": 2, "view_cnt": 30, "title": "this is java and elasticsearch blog", "content": "i like to write best elasticsearch article", "sub_title": "learning more courses", "author_first_name": "Peter", "author_last_name": "Smith", "new_author_last_name": "Smith", "new_author_first_name": "Peter" } } ] } } |
实验
GET /forum/article/_search { "query": { "match_phrase": { "content": { "query": "java best", "slop": 15 } } } } 结果 { "took": 3, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 2, "max_score": 0.65380025, "hits": [ { "_index": "forum", "_type": "article", "_id": "2", "_score": 0.65380025, "_source": { "articleID": "KDKE-B-9947-#kL5", "userID": 1, "hidden": false, "postDate": "2017-01-02", "tag": [ "java" ], "tag_cnt": 1, "view_cnt": 50, "title": "this is java blog", "content": "i think java is the best programming language", "sub_title": "learned a lot of course", "author_first_name": "Smith", "author_last_name": "Williams", "new_author_last_name": "Williams", "new_author_first_name": "Smith" } }, { "_index": "forum", "_type": "article", "_id": "5", "_score": 0.07111243, "_source": { "articleID": "DHJK-B-1395-#Ky5", "userID": 3, "hidden": false, "postDate": "2017-03-01", "tag": [ "elasticsearch" ], "tag_cnt": 1, "view_cnt": 10, "title": "this is spark blog", "content": "spark is best big data solution based on scala ,an programming language similar to java spark", "sub_title": "haha, hello world", "author_first_name": "Tonny", "author_last_name": "Peter Smith", "new_author_last_name": "Peter Smith", "new_author_first_name": "Tonny" } } ] } } |
其实,加了slop的phrase match,就是proximity match,近似匹配
1、java spark,短语,doc,phrase match
2、java spark,可以有一定的距离,但是靠的越近,越先搜索出来,proximity match
移动搜索的短语,以达到文档的内容
进阶-第18__深度探秘搜索技术_基于slop参数实现近似匹配以及原理剖析和相关实验相关推荐
- 白话Elasticsearch18-深度探秘搜索技术之基于slop参数实现近似匹配以及原理剖析
文章目录 概述 官网 slop 含义 例子 示例一 示例二 示例三 概述 继续跟中华石杉老师学习ES,第18篇 课程地址: https://www.roncoo.com/view/55 接上篇博客 白 ...
- 22_深度探秘搜索技术_手动控制全文检索(match)结果的精准度、基于boost的细粒度搜索条件实现权重控制...
本文章收录于[Elasticsearch 系列],将详细的讲解 Elasticsearch 整个大体系,包括但不限于ELK讲解.ES调优.海量数据处理等 本博客以例子为主线,来说明在elasticse ...
- 白话Elasticsearch11-深度探秘搜索技术之基于tie_breaker参数优化dis_max搜索效果
文章目录 概述 官方文档 例子 tie_breaker 概述 继续跟中华石杉老师学习ES,第十一篇 课程地址: https://www.roncoo.com/view/55 官方文档 https:// ...
- 白话Elasticsearch20-深度探秘搜索技术之使用rescoring机制优化近似匹配搜索的性能
文章目录 概述 官网 match和phrase match(proximity match)区别 优化proximity match的性能 概述 继续跟中华石杉老师学习ES,第19篇 课程地址: ht ...
- 23_深度探秘搜索技术_best fields策略的dis_max、tie_breaker参数以及multi_match语法
目录 一.引入dis_max 实现best fields 的必要性 1.使用bulk批量添加测试数据 2.搜索title或content中包含java或solution的帖子 3.结果分析 二.bes ...
- Elasticsearch深度探秘搜索技术如何手动控制全文检索结果的精准度
为帖子数据增加标题字段 #插入数据 POST /post/_doc/_bulk { "update": { "_id": "1"} } { ...
- Elasticsearch深度探秘搜索技术基于multi_match语法实现dis_max+tie_breaker
直接上代码 GET /post/_search {"query": {"multi_match": {"query": "java ...
- 白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端
文章目录 概述 官网 示例 概述 继续跟中华石杉老师学习ES,第十四篇 课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...
- 白话Elasticsearch13-深度探秘搜索技术之基于multi_match+most fields策略进行multi-field搜索
文章目录 概述 官网 示例 构造模拟数据 普通查询 使用 multi_match + most fileds查询 best fields VS most fields 概述 继续跟中华石杉老师学习ES ...
- 白话Elasticsearch12-深度探秘搜索技术之基于multi_match + best fields语法实现dis_max+tie_breaker
文章目录 概述 官网 示例 概述 继续跟中华石杉老师学习ES,第十二篇 课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...
最新文章
- Spark on k8s: 通过hostPath设置SPARK_LOCAL_DIRS加速Shuffle
- 终端界面如何改成彩色的
- LeetCode 1679. K 和数对的最大数目(哈希)
- 使用std::function 把类成员函数指针转换为普通函数指针
- java php rsa加密解密算法_PHP rsa加密解密算法原理解析
- selenium处理动态加载数据
- CentOS 6系统FreeSwitch和RTMP服务 安装及演示(四)
- 2021-2027中国游戏开发工具市场现状及未来发展趋势
- python123九宫格输入法_python制作朋友圈九宫格图片
- php 抓取百度快照时间,php获取网站百度快照日期的方法
- 2008年IT行业10大热门职业调查结果出炉
- 主流搜索引擎分析[特点、功能、市场份额、应用领域]
- scilab中文简介
- BUUCTF·[MRCTF2020]天干地支+甲子·WP
- .NET调用百度天气api经验
- Android打印小票速度太慢,解决打印PDF打印机打印速度慢的问题(适用所有打印机)...
- JavaScript-设计模式(四) 原型模式
- 全金属狂怒云上计算机密码,【攻略向】游戏中所有装备解锁地点
- java mail 邮箱发送_Java Mail 发送邮件
- Basler 工业相机 Python开发采集数据、保存照片