论文分享——推荐算法

文章出处

这篇文章的作者是三位CMU的大佬，当时看到Carnegie Mellon我就果断入坑了。论文2014年发表在ACM的一个会议上，题目是《Question Recommendation with Constraints for MOOC》。

大概介绍

读paper日常发问：作者在文章中提出了什么？解决了什么问题？怎么解决的？
（1）文章里提出了一个矩阵分解的模型来预测学生对问题的喜好，文章区别与传统推荐算法的地方在于：需要同时满足一些约束条件，这就有点像那种带约束的凸优化问题，然后文章中用一个最大流模型来解。
（2）本文的背景是在慕课论坛上，给参与者推荐题目，以达到锻炼参与者，并且让题目都能得到回答的目的。最终本文的模型 performance 都超过了baseline（果然灌水要靠baseline）。

模型介绍

Context-Aware Matrix Factorization

模型首先需要预测学生与题目的关联程度（relevance），通过对三个feature的观察与结合来预测相关度。分别是：student feature，question feature，implicit feedback。

Student Feature

先给公式
r ^ u , q = b i a s + ( P u + ϕ u Φ + θ u Θ + Γ γ ) T Q q \hat{r}_{u,q}=bias+(P_u+\phi_u\Phi + \theta_u\Theta + \Gamma_\gamma)^TQ_q r^u,q=bias+(Pu+ϕuΦ+θuΘ+Γγ)TQq
下面一个个来说明变量含义。

等式左边就是对相关度的预测值， u u u表示用户， q q q代表问题
ϕ u \phi_u ϕu 是学生之前所有回答过的问题的总数，可以一定程度上表示学生在论坛中的参与度
θ u \theta_u θu 是学生上周参与的问题数，表示学生近期活跃度（local level）
γ \gamma γ 是学生注册的时间，可以被看做motivation proxy
至于 Φ , Θ \Phi,\Theta Φ,Θ 是对应的特征变量（参数），然后 P u , Q u P_u,Q_u Pu,Qu 则是对应学生/问题的特征向量， Γ γ \Gamma_\gamma Γγ 是表示注册时间的向量

Question Feature

r ^ u , q = b i a s + P u T ( Q q + δ q Δ + l q L ) \hat{r}_{u,q}=bias+P_u^T(Q_q + \delta_q\Delta + l_qL) r^u,q=bias+PuT(Qq+δqΔ+lqL)

δ q \delta_q δq 是该问题目前有多少回复或评论
l q l_q lq 是问题的长度（字数），用来表示问题的难度（这个assumption比较naive）
其余特征向量的定义与上面类似

Implicit Feedback

这个变量代表的是学对问题的preference，即参与这个问题的可能性。我们用 U ( q ) U(q) U(q) 表示参与问题 q 的所有学生，以及 φ u \varphi_u φu 表示预测的喜好。
r ^ u , q = b i a s + P u T ( Q q + 1 ∣ U ( q ) ∣ ∑ v ∈ U ( q ) φ v ) \hat{r}_{u,q}=bias+P_u^T(Q_q + \frac{1}{\sqrt{|U(q)|}}\sum_{v\in U(q)}\varphi_v ) r^u,q=bias+PuT(Qq+∣U(q)∣ 1v∈U(q)∑φv)

把上面提到的三个等式结合在一起，我们得到
r ^ u , q = b i a s + ( P u + ϕ u Φ + θ u Θ + Γ γ ) T ( Q q + δ q Δ + l q L + 1 ∣ U ( q ) ∣ ∑ v ∈ U ( q ) φ v ) \hat{r}_{u,q}=bias+(P_u+\phi_u\Phi + \theta_u\Theta + \Gamma_\gamma)^T(Q_q + \delta_q\Delta + l_qL + \frac{1}{\sqrt{|U(q)|}}\sum_{v\in U(q)}\varphi_v ) r^u,q=bias+(Pu+ϕuΦ+θuΘ+Γγ)T(Qq+δqΔ+lqL+∣U(q)∣ 1v∈U(q)∑φv)
由于相关度应该是个0到1的值，所以用sigmoid函数映射一下
f ( r ^ u , q ) = e r ^ u , q e r ^ u , q + 1 f( \hat{r}_{u,q}) = \frac{e^{ \hat{r}_{u,q}}}{e^{ \hat{r}_{u,q}}+1} f(r^u,q)=er^u,q+1er^u,q
最后使用logistic loss，再结合L2范数作为正则项

Max Cost Flow Constraint Filtering

这里引入了本文最重要的观点：load balancing 和 expertise matching。正是这两个限制的引入让本文区别与传统的推荐文章。定义： f u , q ∈ { 0 , 1 } f_{u,q}\in\{0,1\} fu,q∈{0,1} 表示 q 是否被推荐给 u。

Load Balancing 关键是对用户的合理利用，既不能让用户做太多的题，也不能浪费一点人力资源。所以就有 ∀ u , L u ≤ ∑ q f u , q ≤ R u \forall u, L_u\leq \sum_{q}f_{u,q}\leq R_u ∀u,Lu≤∑qfu,q≤Ru 和 ∀ q , 0 ≤ ∑ u f u , q ≤ M q . \forall q, 0\leq \sum_{u}f_{u,q}\leq M_q. ∀q,0≤∑ufu,q≤Mq. 其中每一个bound都是事先确定的。
Expertise Matching 是要匹配每个用户的能力。 ∀ q , ∃ u : B u ⋅ f u , q ≥ H q \forall q, \exist u: B_u\cdot f_{u,q} \geq H_q ∀q,∃u:Bu⋅fu,q≥Hq 必须被满足，也就是说至少有一个被推到的用户，是有能力解题的。另外，为了避免浪费人力资源，我们需要最小化 ∑ u ∑ q I ( B u ⋅ f u , q ≥ H q ) ( B u − H q ) \sum_{u}\sum_{q}\mathbb{I}(B_u\cdot f_{u,q} \geq H_q)(B_u - H_q) ∑u∑qI(Bu⋅fu,q≥Hq)(Bu−Hq).

把上面的constraint都考虑，就会有

这个公式看上去就是一个线性规划问题。同时下面的图也和这个公式等价。

实验

文章用Mean Average Precision 当评估矩阵；题目的 hardness H q H_q Hq 定义成题目的长度； R u R_u Ru 是学生参与过题目的总数然后 L u = 1 L_u=1 Lu=1。学生的能力设置为 B u = max ⁡ { H q ∣ q ∈ T ( u ) } . T ( u ) B_u=\max\{H_q|q\in T(u)\}. T(u) Bu=max{Hq∣q∈T(u)}.T(u) 是目前参与过的题目集合。

最终发现模型中只保留relevance最高的20%的边即可，因为有80%的边，relevance score 都接近0。

Review

为提高英语写作能力，必须每周完成对paper的abstract，conclusion的重写！

Abstract

Today MOOC has experienced a recent boom in interest. However the effeciency of forum becomes a problem, which is how to let posted questions get solved and recommand students questions according to their preference. The traditional recommander system doesn’t quite apply because of some contraint: (1) Load-Balancing: students should not be over-burdened with too many quesitons. (2) Expertise Matching: students should not be requested to solve the problems they are not capable of solving. So that’s why we propose our constrained question recommendation system. First, we design a context-aware matrix factorization model to predict the relevance score between student and question, then build a max cost flow model to do optimization. Experimental result also showed our model’s superiority over baselines.

Conclusion

In this paper, we formulated the contrained question recommender system that simultaneously consider load balancing and expertise matching. Our framework employs two models: (1) a context-aware matrix factorization model to predict the students’ preference over questions and (2) a max cost flow model to optimize overall benefit under constraints. The experiment result proved our model is significantly better than proposed baselines. However, how to approximate student’s ability, question difficulty and student capacity remains a challenge. And in the future, it’s important to consider how to generalize our model to online settings.