文章目录

概述
官网
什么是ngram
什么是edge ngram
ngram和index-time搜索推荐原理
例子

概述

继续跟中华石杉老师学习ES，第23篇

课程地址： https://www.roncoo.com/view/55

官网

NGram Tokenizer：
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

NGram Token Filter:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html

Edge NGram Tokenizer:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenizer.html

Edge NGram Token Filter：
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-edgengram-tokenfilter.html

什么是ngram

假设有个单词quick，5种长度下的ngram

ngram length=1，会被拆成 q u i c k
ngram length=2，会被拆成 qu ui ic ck
ngram length=3，会被拆成 qui uic ick
ngram length=4，会被拆成 quic uick
ngram length=5，会被拆成 quick

其中任意一个被拆分的部分就被称为ngram 。

什么是edge ngram

quick，anchor首字母后进行ngram

q
qu
qui
quic
quick

上述拆分方式就被称为edge ngram

使用edge ngram将每个单词都进行进一步的分词切分，用切分后的ngram来实现前缀搜索推荐功能

举个例子两个doc
doc1 hello world
doc2 hello we

使用edge ngram拆分

h
he
hel
hell
hello -------> 可以匹配 doc1,doc2

w -------> 可以匹配 doc1,doc2
wo
wor
worl
world
e ---------> 可以匹配 doc2

使用hello w去搜索

hello --> hello，doc1
w --> w，doc1

doc1中hello和w，而且position也匹配，所以，ok，doc1返回，hello world

ngram和index-time搜索推荐原理

搜索的时候，不用再根据一个前缀，然后扫描整个倒排索引了，而是简单的拿前缀去倒排索引中匹配即可，如果匹配上了，那么就好了，就和match query全文检索一样

例子

PUT /my_index
{"settings": {"analysis": {"filter": {"autocomplete_filter": { "type":     "edge_ngram","min_gram": 1,"max_gram": 20}},"analyzer": {"autocomplete": {"type":      "custom","tokenizer": "standard","filter": ["lowercase","autocomplete_filter" ]}}}}
}

helloworld
设置

min ngram = 1
max ngram = 3

使用edge_ngram ，则会被拆分为一下 ,

h
he
hel

知识点： autocomplete

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-analyzer.html

GET /my_index/_analyze
{"analyzer": "autocomplete","text": "helll world"
}

设置mapping , 查询的时候还是使用standard

PUT /my_index/_mapping/my_type
{"properties": {"title": {"type":     "text","analyzer": "autocomplete","search_analyzer": "standard"}}
}

造数据

PUT /my_index/my_type/1
{"content":"hello Jack"
}PUT /my_index/my_type/2
{"content":"hello John"
}PUT /my_index/my_type/3
{"content":"hello Jose"
}

查询

GET /my_index/my_type/_search
{"query": {"match": {"content": "hello J"}}
}

{"took": 7,"timed_out": false,"_shards": {"total": 5,"successful": 5,"skipped": 0,"failed": 0},"hits": {"total": 3,"max_score": 0.2876821,"hits": [{"_index": "my_index","_type": "my_type","_id": "2","_score": 0.2876821,"_source": {"content": "hello John"}},{"_index": "my_index","_type": "my_type","_id": "1","_score": 0.2876821,"_source": {"content": "hello Jack"}},{"_index": "my_index","_type": "my_type","_id": "3","_score": 0.2876821,"_source": {"content": "hello Jose"}}]}
}

如果用match，只有hello的也会出来，全文检索，只是分数比较低
推荐使用match_phrase，要求每个term都有，而且position刚好靠着1位，符合我们的期望的

白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐相关推荐

23_ElsaticSearch 搜索推荐ngram分词机制实现index-time
23_ElsaticSearch 搜索推荐ngram分词机制实现index-time 更多干货分布式实战(干货) spring cloud 实战(干货) mybatis 实战(干货) spring ...
白话Elasticsearch13-深度探秘搜索技术之基于multi_match+most fields策略进行multi-field搜索
文章目录概述官网示例构造模拟数据普通查询使用 multi_match + most fileds查询 best fields VS most fields 概述继续跟中华石杉老师学习ES ...
白话Elasticsearch15-深度探秘搜索技术之使用copy_to定制组合field解决cross-fields搜索弊端
文章目录概述官网例子总结概述继续跟中华石杉老师学习ES,第15篇课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.c ...
程序员业务，微信全文搜索技术优化
一.iOS微信全文搜索技术的现状全文搜索是使用倒排索引进行搜索的一种搜索方式.倒排索引也称为反向索引,是指对输入的内容中的每个Token建立一个索引,索引中保存了这个Token在内容中的具体位置.全 ...
微信全文搜索技术优化
一.iOS 微信全文搜索技术的现状全文搜索是使用倒排索引进行搜索的一种搜索方式.倒排索引也称为反向索引,是指对输入的内容中的每个Token建立一个索引,索引中保存了这个Token在内容中的具体位置. ...
白话Elasticsearch27-深度探秘搜索技术之误拼写时的fuzzy模糊搜索技术
文章目录概述官方指导例子推荐写法概述继续跟中华石杉老师学习ES,第27篇课程地址: https://www.roncoo.com/view/55 官方指导 https://www.ela ...
白话Elasticsearch14-深度探秘搜索技术之基于multi_match 使用most_fields策略进行cross-fields search弊端
文章目录概述官网示例概述继续跟中华石杉老师学习ES,第十四篇课程地址: https://www.roncoo.com/view/55 官网 https://www.elastic.co/g ...
白话Elasticsearch08-深度探秘搜索技术之基于boost的细粒度搜索条件权重控制
文章目录概述 boost 示例概述继续跟中华石杉老师学习ES,第八篇课程地址: https://www.roncoo.com/view/55 boost https://www.elastic ...
Elasticsearch深度探秘搜索技术如何手动控制全文检索结果的精准度
为帖子数据增加标题字段 #插入数据 POST /post/_doc/_bulk { "update": { "_id": "1"} } { ...

白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐

文章目录

概述

官网

什么是ngram

什么是edge ngram

ngram和index-time搜索推荐原理

例子

白话Elasticsearch23-深度探秘搜索技术之通过ngram分词机制实现index-time搜索推荐相关推荐

最新文章

热门文章