sphinx搜索结果按权重排序php,Sphinx Ranking Mode(排序模式) (翻译)

渣渣英文水平，莫要介意

Search results ranking(搜索结果排序)

Ranking overview(概览)

Ranking (aka weighting) of the search results can be defined as a

process of computing a so-called relevance (aka weight) for every

given matched document with regards to a given query that matched it.

So relevance is in the end just a number attached to every document

that estimates how relevant the document is to the query. Search

results can then be sorted based on this number and/or some additional

parameters, so that the most sought after results would come up higher

on the results page.

排序(又名加权)，是基于请求匹配到的结果，计算所谓的相关性(又名权重)的一个程序。相关性是请求结束后被附加在文档结果中的一个估算出来的数值,表示匹配的文档于请求的关键词相关的程度，然后搜索的结果就能基于这个数值和其他的一些附加的参数进行排序，这样大多数相关的结果就能排在前面。

There is no single standard one-size-fits-all way to rank any document

in any scenario. Moreover, there can not ever be such a way, because

relevance is subjective. As in, what seems relevant to you might not

seem relevant to me. Hence, in general case it's not just hard to

compute, it's theoretically impossible.

对排序来说，在任何场景中都没有适应所有的情况的标准，甚至可以说不可能有这种标准，因为相关性是一种很主观的东西，比如，对你来说相关性很强，对我来说却没有。因而一般很难去计算，理论上是不可能的。

So ranking in Sphinx is configurable. It has a of a so-called . A

ranker can formally be defined as a function that takes document and

query as its input and produces a relevance value as output. In

layman's terms, a ranker controls exactly how (using which specific

algorithm) will Sphinx assign weights to the document.

在sphinx中，排序其实是可配置的，他有一个叫ranker(这里我翻译成排序器)的概念，根据定义的方法，把匹配的文档和请求作为输入，输出来一个相关性的值。简而言之，一个ranker可以精确的给每个文档计算出相关性的值。

Previously, this ranking function was rigidly bound to the matching

So in the legacy matching modes (that is, SPH_MATCH_ALL,

SPH_MATCH_ANY, SPH_MATCH_PHRASE, and SPH_MATCH_BOOLEAN) you can not

choose the ranker. You can only do that in the SPH_MATCH_EXTENDED

(Which is the only mode in SphinxQL and the suggested mode in

SphinxAPI anyway.) To choose a non-default ranker you can either use

SetRankingMode() with SphinxAPI, or OPTION ranker clause in SELECT

statement when using SphinxQL.

以前，相排序方法被硬性的于匹配模式绑定在一起，所以在一些老的匹配模式中(比如 SPH_MATCH_ALL, SPH_MATCH_ANY, SPH_MATCH_PHRASE, and SPH_MATCH_BOOLEAN)，你不能选择ranker(排序器)。你只能在SPH_MATCH_EXTENDED(这也是在sphinxsql和sphinxApi中被建议使用的唯一的一种模式)模式下选择。如何选择一个非默认的ranker(排序器)，在SphinxApi中使用SetRankingMode()方法，在SphinxQL中设置ranker选项

As a sidenote, legacy matching modes are internally implemented via

the unified syntax anyway. When you use one of those modes, Sphinx

just internally adjusts the query and sets the associated ranker, then

executes the query using the very same unified code path.

注意，老的匹配模式被内置了统一的语法，当你使用这些模式的时候，sphinx仅仅内部判断请求和设置相应的ranker,然后使用相同的代码路径去执行这些请求。

Available built-in rankers(内置的ranker)

Sphinx ships with a number of built-in rankers suited for different

A number of them uses two factors, phrase proximity (aka

LCS) and BM25. Phrase proximity works on the keyword positions, while

BM25 works on the keyword frequencies. Basically, the better the

degree of the phrase match between the document body and the query,

the higher is the phrase proximity (it maxes out when the document

contains the entire query as a verbatim quote). And BM25 is higher

when the document contains more rare words. We'll save the detailed

discussion for later.

Sphinx 内置了一系列的ranker，用于不同的目的。他们中都是基于两个因素， phrase proximity(又名LCS)和BM25， Phrase proximity用于表示关键字与关键字的位置有关， BM25于关键词的出现的频率有关。基本上，请求与匹配的文档越接近， phrase proximity就越高(当文档完整的包含整个请求的关键字时最高)。当文档中包含的关键词越多，BM25就越高。我们稍候讨论这些细节

Currently implemented rankers are:

当前内置的ranker有：

1.SPH_RANK_PROXIMITY_BM25, the default ranking mode that uses and

combines both phrase proximity and BM25 ranking.

1.SPH_RANK_PROXIMITY_BM25, 默认的ranker,基于hrase proximity and BM25 ranking两个因素

2.SPH_RANK_BM25, statistical ranking mode which uses BM25 ranking only

(similar to most other full-text engines). This mode is faster but may

result in worse quality on queries which contain more than 1 keyword.

2.SPH_RANK_BM25，当仅仅使用BM25这种排序因素的时候的模式(于大多数其他的全文引擎相似)，这种模式虽然快，但结果的质量不高，很多结果包含的关键词不止一个(即关键字越多，分值越高，但很多时候我们最想要的仅仅是一个完全命中的结果)

3.SPH_RANK_NONE, no ranking mode. This mode is obviously the fastest. A

weight of 1 is assigned to all matches. This is sometimes called

boolean searching that just matches the documents but does not rank

them.

3.SPH_RANK_NONE 没有任何排序模式的模式，这种模式很明显最快，所有匹配的文档的权重都是1，有时候被成为布尔搜索，这种搜索仅仅搜索文档，但不会排序

SPH_RANK_WORDCOUNT, ranking by the keyword occurrences count. This

computes the per-field keyword occurrence counts, then

multiplies them by field weights, and sums the resulting values.

4.SPH_RANK_WORDCOUNT 根据关键字出现的次数排序，这种排序方式的计算是基于每个字段的关键字出现的次数，然后整合这些字段的权重得出的结果。

5.SPH_RANK_PROXIMITY, added in version 0.9.9-rc1, returns raw phrase

proximity value as a result. This mode is internally used to emulate

SPH_MATCH_ALL queries.

5.SPH_RANK_PROXIMITY，这种排序返回的是每个文档于请求的相似程度，这种模式被内置用来在SPH_MATCH_ALL匹配模式的时候排序

6.SPH_RANK_MATCHANY, added in version 0.9.9-rc1, returns rank as it was

computed in SPH_MATCH_ANY mode earlier, and is internally used to

emulate SPH_MATCH_ANY queries.

6.SPH_RANK_MATCHANY，早期的时候在SPH_MATCH_ANY匹配模式中使用，返回相关值，在SPH_MATCH_ANY模式中内置的就是这种排序模式

7.SPH_RANK_FIELDMASK, added in version 0.9.9-rc2, returns a 32-bit mask

with N-th bit corresponding to N-th fulltext field, numbering from 0.

The bit will only be set when the respective field has any keyword

occurrences satisfying the query.

7.SPH_RANK_FIELDMASK 返回一个32位的掩码，每个位都对应一个相应的全文字段(不能应该要补零)，从0开始，只有当相应的字段有关键字出现的时候才会被置1

8.SPH_RANK_SPH04, added in version 1.10-beta, is generally based on the

default SPH_RANK_PROXIMITY_BM25 ranker, but additionally boosts the

matches when they occur in the very beginning or the very end of a

text field. Thus, if a field equals the exact query, SPH04 should rank

it higher than a field that contains the exact query but is not equal

to it. (For instance, when the query is "Hyde Park", a document

entitled "Hyde Park" should be ranked higher than a one entitled "Hyde

Park, London" or "The Hyde Park Cafe".)

8.SPH_RANK_SPH04，基于默认的SPH_RANK_PROXIMITY_BM25模式，但假如匹配的文档的开头或者结尾出现了，那么这个文档的相关值就会提升，所以，如果某个文档的一个字段完全于请求的关键字一致，那么这种模式下的排序的位置就应该比文档中包含请求关键字的文档高。(比如，如果请求的关键字是"Hyde Park", "Hyde Park"的文档就会比"Hyde Park, London"或者"The Hyde Park Cafe"的排序高)

9.SPH_RANK_EXPR, added in version 2.0.2-beta, lets you specify the

ranking formula in run time. It exposes a number of internal text

factors and lets you define how the final weight should be computed

from those factors.

9.SPH_RANK_EXPR 这种模式让你能在运行的时候指定排序规则，他暴露了一系列的内置的文本的因素，让你能基于这些因素计算出最终的权重

You should specify the SPH_RANK_ prefix and use capital letters only

when using the SetRankingMode() call from the SphinxAPI. The API ports

expose these as global constants. Using SphinxQL syntax, the prefix

should be omitted and the ranker name is case insensitive.

你可以指定一个SPH_RANK_为前缀的排序模式，要全部大写。在SphinxAPI中使用SetRankingMode()方法，这个API中定义了这些模式的全局常量。在SphinxQL中，这个前缀要被映射，且ranker的名称是大小写敏感的(就是要指定ranker模式的参数选项)

Example:

// SphinxAPI

$client->SetRankingMode ( SPH_RANK_SPH04 );

// SphinxQL

mysql_query ( "SELECT ... OPTION ranker=sph04" );

sphinx搜索结果按权重排序php,Sphinx Ranking Mode(排序模式) (翻译)相关推荐

coreseek php接口,筹建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+php调用示例...
搭建coreseek(sphinx+mmseg3)详细安装配置+php之sphinx扩展安装+php调用示例一个文档包含了安装.增量备份.扩展.api调用示例,省去了查找大量文章的时间. 搭建cor ...
22_深度探秘搜索技术_手动控制全文检索（match）结果的精准度、基于boost的细粒度搜索条件实现权重控制...
本文章收录于[Elasticsearch 系列],将详细的讲解 Elasticsearch 整个大体系,包括但不限于ELK讲解.ES调优.海量数据处理等本博客以例子为主线,来说明在elasticse ...
antd table 时间搜索_antd table按表格里的日期去排序操作
表格内容根据票据日期升序(这里是已经排序后的效果) 上代码代码中data的内容如下根据paper_date排序,因为目前这种格式不支持比较,需要先转换成时间戳 new Date(aTimeStr ...
数据结构（八）：排序 | 插入排序 | 希尔排序 | 冒泡排序 | 快速排序 | 简单选择排序 | 堆排序 | 归并排序 | 基数排序 | 外部排序 | 败者树 | 置换-选择排序 | 最佳归并树
文章目录第八章排序一.排序的基本概念 (一)什么是排序 (二)排序的应用 (三)排序算法的评价指标 (四)排序算法的分类 (五)总结二.插入排序 (一)算法思想 (二)算法实现 (三)算法效率 ...
张仰彪第二排序法_十大排序之冒泡和选择排序
你好,我是goldsunC 让我们一起进步吧! 排序所谓排序,就是指将一组数据,按照特定规则调换位置,使数据具有某种顺序关系(递增或递减).在排序过程中,数据的移动方式可分为直接移动和逻辑移动两种. ...
oracle和mysql查询条件排序_Oracle数据库中ORDERBY排序和查询按IN条件的顺序输出
ORDER BY非稳定的排序提一个问题: oracle在order by 排序时,是稳定排序算法吗? 发现用一个type进行排序后,做分页查询,第一页的数据和第二页的数据有重复怀疑是order by ...
《数据结构与算法》实验：排序算法实验比较——选择排序堆排序
<数据结构与算法>实验和课程Github资源 <数据结构与算法>实验:线性结构及其应用--算术表达式求值 <数据结构与算法>实验:树型结构的建立与遍历 <数据 ...
各种排序算法、十大排序算法
目录二分查找冒泡排序选择排序插入排序希尔排序归并排序快速排序堆排序计数排序桶排序基数排序外部排序与归并排序(强调一种思想) 动态规划要点: 二分查找 //不使用递归实现:whi ...
Extjs6 gridPanel排序与获取Store的排序信息
ExtJS6中表格排序也与ExtJS3中有所区别,下面分别给出这两个版本的实现方法 ExtJS3: // 复选框模型 var selm = new Ext.grid.CheckboxSelection ...

sphinx搜索结果按权重排序php,Sphinx Ranking Mode(排序模式) (翻译)

sphinx搜索结果按权重排序php,Sphinx Ranking Mode(排序模式) (翻译)相关推荐

最新文章

热门文章