Learning to Ranking

Pointwise & Pairwise & Listwise

Learning to Rank简介
http://www.cnblogs.com/bentuwuying/p/6681943.html
PDF资料
监督学习方法在排序中的应用，基本思想是将排序问题转换为回归、分类问题，有pointwise, pairwise, listwise三大类，区别于传统的基于排序函数的排序方法。

LTR (Learning to Ranking) 的学习方法分为Pointwise、Pairwise和Listwise三类。Pointwise和pairwise把排序问题转换成回归、分类或者有序分类问题。Lisewise把查询q下整个搜索结果作为一个训练的实例。3种方法的区别主要体现在损失函数（Loss function）上。
[补充说明] LTR相关的三种学习方法本来是信息检索 (IR) 领域的内容，这里用于推荐系统的深度模型训练过程。

以下是从信息检索的角度来讨论和分析三种学习方法的。

pointwise

Pointwise: your model will try to learn the score of each query<->document.
Pointwise 模型把排序问题转化为单结果的回归、分类或者有序分类的问题。
基本特点：
Pointwise方法仅仅使用传统的分类，回归或者Ordinal Regression方法来对给定查询下单个文档的相关度进行建模。这种方法没有考虑到排序的一些特征，比如文档之间的排序结果针对的是给定查询下的文档集合，而Pointwise方法仅仅考虑单个文档的绝对相关度；另外，在排序中，排在最前的几个文档对排序效果的影响非常重要 (TopN)，Pointwise没有考虑这方面的影响。

pairwise

Pairwise: your model will learn the relationship between a pair of documents in different relevance levels under the same query. The output of your model is used to compare the qualities of different documents. The cost function to minimize is the correctness of pairwise preference. For pairwise approach, suggest you to read the LambdaRank: Page on microsoft.com
Pairwise模型把排序问题转化为对结果的回归、分类或者有序分类的问题。
基本特点：
相比于Pointwise方法，Pairwise方法通过考虑两两文档之间的相对相关度来进行排序，有一定的进步。但是，Pairwise使用的这种基于两两文档之间相对相关度的损失函数，和真正衡量排序效果的一些指标之间，可能存在很大的不同，有时甚至是负相关。另外，*有的***Pairwise方法没有考虑到排序结果前几名对整个排序的重要性，也没有考虑不同查询对应的文档集合的大小对查询结果的影响(但是有的Pairwise方法对这些进行了改进，比如IR SVM就是对Ranking SVM针对以上缺点进行改进得到的算法)。

listwise

Listwise: the cost function is correctness of provided list in the validation/test set.
Listwise模型与Pointwise和Pairwise方法不同，直接考虑给定查询下的集合的整体序列，直接优化模型输出的文档序列，使得其尽可能的接近真实文档序列。
基本特点：
相比于Pointwise和Pairwise方法，Listwise方法直接优化给定查询下，整个文档集合的序列，所以比较好的解决了克服了以上算法的缺陷。Listwise方法中的LambdaMART(是对RankNet和LambdaRank的改进)在Yahoo Learning to Rank Challenge表现出最好的性能。

[主要参考]
Pointwise & Pairwise & Listwise

排序算法分类->Learning Method->Pointwise&Pairwise&Listwise
学习排序
评价指标：
NDCG:全称NormalizedDiscounted Cumulative Gain
MAP是另外一种在IR中经常用到的指标
MRR：Mean Reciprocal Rank: 是把标准答案在被评价系统给出结果中的排序取倒数作为它的准确度，再对所有的问题取平均。相对简单。

References

[1] Cossock D, Zhang T. Subset ranking usingregression[M]//Learning theory. Springer Berlin Heidelberg, 2006: 605-619.

[2] Nallapati R. Discriminative models forinformation retrieval[C]//Proceedings of the 27th annual international ACMSIGIR conference on Research and development in information retrieval. ACM,2004: 64-71.

[3] Gey F C. Inferring probability of relevanceusing the method of logistic regression[C]//SIGIR’94. Springer London, 1994:222-231.

[4] Li P, Wu Q, Burges C J. Mcrank: Learning torank using multiple classification and gradient boosting[C]//Advances in neuralinformation processing systems. 2007: 897-904.

[5] Shashua A, Levin A. Ranking with largemargin principle: Two approaches[C]//Advances in neural information processingsystems. 2002: 937-944.

[6] Crammer K, Singer Y. Pranking withranking[C]//NIPS. 2001, 14: 641-647.

[7] Liu T Y. Learning to rank for informationretrieval[J]. Foundations and Trends in Information Retrieval, 2009, 3(3):225-331.

[8] Pointwise&Pairwise&Listwise 豆瓣

[9] Learning to Rank for Information Retrieval
LETOR

[10] Microsoft Learning to Rank Datasets
Microsoft Learning to Rank Datasets

[11] Yahoo Learning toRank
Yahoo Learning toRank