专注于大数据及容器云核心技术解密,可提供全栈的大数据+云原生平台咨询方案,请持续关注本套博客。如有任何学术交流,可随时联系。更多内容请关注《数据云技术社区》公众号。

1 制作案例

POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"author_first_name" : "Peter", "author_last_name" : "Smith"} }
{ "update": { "_id": "2"} }
{ "doc" : {"author_first_name" : "Smith", "author_last_name" : "Williams"} }
{ "update": { "_id": "3"} }
{ "doc" : {"author_first_name" : "Jack", "author_last_name" : "Ma"} }
{ "update": { "_id": "4"} }
{ "doc" : {"author_first_name" : "Robbin", "author_last_name" : "Li"} }
{ "update": { "_id": "5"} }
{ "doc" : {"author_first_name" : "Tonny", "author_last_name" : "Peter Smith"} }//实现cross-fields搜索
PUT /forum/_mapping/article
{"properties": {"new_author_first_name": {"type":     "string","copy_to":  "new_author_full_name" },"new_author_last_name": {"type":     "string","copy_to":  "new_author_full_name" },"new_author_full_name": {"type":     "string"}}
}//其实效果不佳
POST /forum/article/_bulk
{ "update": { "_id": "1"} }
{ "doc" : {"new_author_first_name" : "Peter", "new_author_last_name" : "Smith"} }     --> Peter Smith
{ "update": { "_id": "2"} }
{ "doc" : {"new_author_first_name" : "Smith", "new_author_last_name" : "Williams"} }      --> Smith Williams
{ "update": { "_id": "3"} }
{ "doc" : {"new_author_first_name" : "Jack", "new_author_last_name" : "Ma"} }         --> Jack Ma
{ "update": { "_id": "4"} }
{ "doc" : {"new_author_first_name" : "Robbin", "new_author_last_name" : "Li"} }           --> Robbin Li
{ "update": { "_id": "5"} }
{ "doc" : {"new_author_first_name" : "Tonny", "new_author_last_name" : "Peter Smith"} }       --> Tonny Peter SmithGET /forum/article/_search
{"query": {"match": {"new_author_full_name":       "Peter Smith"}}
}//测试短语匹配
POST /forum/article/5/_update
{"doc": {"content": "spark is best big data solution based on scala ,an programming language similar to java spark"}
}//单单包含java的doc也返回了,不是我们想要的结果
GET /forum/article/_search
{"query": {"match": {"content": "java spark"}}
}
复制代码

2 短语匹配(match_phrase)

  • 要求:只有包含java spark这个短语的doc才返回了,只包含java的doc不会返回
GET /forum/article/_search
{"query": {"match_phrase": {"content": "java spark"}}
}
复制代码
  • term position的意思
hello world, java spark       doc1
hi, spark java          doc2hello       doc1(0)
wolrd       doc1(1)
java        doc1(2) doc2(2)
spark       doc1(3) doc2(1)了解什么是分词后的positionGET _analyze
{"text": "hello world, java spark","analyzer": "standard"
}
复制代码

3 近似匹配(slop)

  • query string,搜索文本中的几个term,要经过几次移动才能与一个document匹配,这个移动的次数,就是slop
  • slop的含义,不仅仅是说一个query string terms移动几次,跟一个doc匹配上。一个query string terms,最多可以移动几次去尝试跟一个doc匹配上
  • slop搜索下,关键词离的越近,relevance score就会越高,
GET /forum/article/_search
{"query": {"match_phrase": {"content": {"query": "spark data","slop": 3}}}
}spark is best big data solution based on scala ,an programming language similar to java sparkspark data--> data--> data
spark         --> dataGET /forum/article/_search
{"query": {"match_phrase": {"content": {"query": "java best","slop": 15}}}
}{"took": 3,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 0.65380025,"hits": [{"_index": "forum","_type": "article","_id": "2","_score": 0.65380025,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "2017-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith"}},{"_index": "forum","_type": "article","_id": "5","_score": 0.07111243,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "2017-03-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny"}}]}
}
复制代码

4 优先满足召回率

  • 优先满足召回率,意思是:java spark,包含java的也返回,包含spark的也返回,包含java和spark的也返回;同时兼顾精准度,就是包含java和spark,同时java和spark离的越近的doc排在最前面
GET /forum/article/_search
{"query": {"bool": {"must": [{"match": {"content": "java spark"}}],"should": [{"match_phrase": {"content": {"query": "java spark","slop": 50}}}]}}
}{"took": 5,"timed_out": false,"_shards": {"total": 5,"successful": 5,"failed": 0},"hits": {"total": 2,"max_score": 1.258609,"hits": [{"_index": "forum","_type": "article","_id": "5","_score": 1.258609,"_source": {"articleID": "DHJK-B-1395-#Ky5","userID": 3,"hidden": false,"postDate": "2017-03-01","tag": ["elasticsearch"],"tag_cnt": 1,"view_cnt": 10,"title": "this is spark blog","content": "spark is best big data solution based on scala ,an programming language similar to java spark","sub_title": "haha, hello world","author_first_name": "Tonny","author_last_name": "Peter Smith","new_author_last_name": "Peter Smith","new_author_first_name": "Tonny","followers": ["Jack","Robbin Li"]}},{"_index": "forum","_type": "article","_id": "2","_score": 0.68640786,"_source": {"articleID": "KDKE-B-9947-#kL5","userID": 1,"hidden": false,"postDate": "2017-01-02","tag": ["java"],"tag_cnt": 1,"view_cnt": 50,"title": "this is java blog","content": "i think java is the best programming language","sub_title": "learned a lot of course","author_first_name": "Smith","author_last_name": "Williams","new_author_last_name": "Williams","new_author_first_name": "Smith","followers": ["Tom","Jack"]}}]}
}
复制代码

5 总结

执笔小记,温故知新

专注于大数据及容器云核心技术解密,可提供全栈的大数据+云原生平台咨询方案,请持续关注本套博客。如有任何学术交流,可随时联系。更多内容请关注《数据云技术社区》公众号。

转载于:https://juejin.im/post/5d62ab6f5188253961299c74

Elasticsearch短语或近似匹配及召回率案例深入剖析-搜索系统线上实战相关推荐

  1. 19_ElasticSearch 使用match和近似匹配实现召回率与精准度的平衡

    19_ElasticSearch 使用match和近似匹配实现召回率与精准度的平衡 更多干货 分布式实战(干货) spring cloud 实战(干货) mybatis 实战(干货) spring b ...

  2. 白话Elasticsearch19-深度探秘搜索技术之混合使用match和近似匹配实现召回率(recall)与精准度(precision)的平衡

    文章目录 概述 召回率recall 精准度 precision 分析利弊 方案 概述 继续跟中华石杉老师学习ES,第19篇 课程地址: https://www.roncoo.com/view/55 召 ...

  3. 【Elasticsearch】Elasticsearch 搜索体验可量化的指标 查准率(精确率)、查全率(召回率)

    文章目录 1.概述 1.1 召回率 1.2 精确率 1.3 表格 1.概述 用户体验是感官反应,但感觉的搜索结果需要量化下. 如何量化?实际本质指标就是:查准率(精确率).查全率(召回率). 1.1 ...

  4. 准确率、精度和召回率

    原文链接 精度(查准率)和召回率(查全率)是衡量机器学习模型性能的重要指标,特别是数据集分布不平衡的案例中. 什么是分布不平衡的数据集? 倘若某人声称创建一个能够识别登上飞机的恐怖分子的模型,并且准确 ...

  5. FP、FN、TP、TN、精确率(Precision)、召回率(Recall)、准确率(Accuracy)评价指标详述

    来自微信公众号:小白CV关注可了解更多CV,ML,DL领域基础/最新知识;如果你觉得小白CV对您有帮助,欢迎点赞/收藏/转发 在机器学习领域中,用于评价一个模型的性能有多种指标,其中几项就是FP.FN ...

  6. 精确率、召回率、F1 值、ROC、AUC

    首先我们来思考一个问题,如何评估一个机器学习模型效果的好坏呢? 1.性能度量 机器学习首先要建模,对于模型性能的好坏(即模型的泛化能力),我们必须有个评判的标准.为了了解模型的泛化能力,我们需要用某个 ...

  7. FP、FN、TP、TN、精确率(Precision)、召回率(Recall)、准确率(Accuracy)是什么意思

    在机器学习领域中,用于评价一个模型的性能有多种指标,其中几项就是FP.FN.TP.TN.精确率(Precision).召回率(Recall).准确率(Accuracy).这里我们就对这块内容做一个集中 ...

  8. 机器学习算法衡量指标——准确率、精确率(查准率)、召回率(查全率)

    机器学习算法衡量指标 在分类问题中,将机器学习模型的预测与实际情况进行比对后,结果可以分为四种:TP.TN.FN.FP.每个的第一个字母:T/F,代表预测结果是否符合事实,模型猜得对不对,True o ...

  9. 召回率 matlab代码,召回率和精度(示例代码)

    召回率(Recall) 查全率 精度(Precise) 查准率 是广泛用于信息检索和统计学分类领域的两个度量值,用来评价结果的质量. 在信息检索中的解释: 系统检索到的相关文档数           ...

  10. 偏差、方差、精确率、召回率

    1. 偏差.方差.精确率.召回率 四个概念 偏差 从直观上来讲,"偏"是偏离,放在分类任务上,也就是偏离了真实值.真实标签. 含义:偏差度量了学习算法的期望预测与真实结果的偏离程度 ...

最新文章

  1. Android 插件化原理解析——Service的插件化
  2. 论文学习2-Incorporating Graph Attention Mechanism into Knowledge Graph Reasoning Based on Deep Reinforce
  3. oracle 12c sp2 0667,SP2-0667/SP2-0750错误
  4. WORD中如何添加复选框控件?
  5. 产品特点概述-驰骋工作流
  6. hadoop 网页监控
  7. springboot获取apk包名、app名称、版本名称、版本号
  8. three.js 05-08 之 TorusKnotGeometry 几何体
  9. 华为OD机试 - We Are A Team
  10. 网吧无盘服务器连接交换机,网吧为什么要使用万兆交换机
  11. Steaming SQL for Apache Kafka 学习
  12. 路由器网口1一直闪烁正常吗_网口1一直闪烁上不了网(图文)
  13. Could not transfer artifact XXX:XXX:pom:XX from/to镜像地址
  14. 北航计算机专业录取线,北航各专业录取分数线
  15. Mac无法安装第三方软件
  16. php 正则 /is,PHP 正则表达式后面接的/isU, /is, /s含义
  17. win2008sever CA证书颁发服务器部署
  18. 如何用计算机扫描图片变成文字,怎么扫描图片上的文字-华为手机黑科技"文字扫描仪",3秒就能将纸质文档转成电子档,牛...
  19. 长虹g2958进入总控php是什么,长虹G2958型彩电总线故障检修一例
  20. php实现epoll,PHP socket初探 --- 颤颤抖抖开篇libevent(一)

热门文章

  1. 阿里云短信服务(无需营业执照)快速上手
  2. OnlyOffice快速入门
  3. 软件生命周期和开发模型
  4. 《林肯传》--[美]戴尔·卡耐基
  5. 如何利用【百度地图API】,制作房产酒店地图?(上)——制作自定义标注和自定义信息窗口
  6. Eslint+Prettier 实现代码 git 提交时自动格式化及修复
  7. C++ 类设计规则及注意事项
  8. 关于写专利的一点感想
  9. 共阳极管的代码_共阳极数码管显示数字程序的进化
  10. 注册时添加学号Idnumber