[论文阅读笔记]2018_IJCAI_a dual-embedding based deep latent factor model for recommendation—(IJCAI, 2018.07.13)-- Weiyu Cheng, Yanyan Shen, Yanmin Zhu, Linpeng Huang

论文下载地址：https://doi.org/10.24963/ijcai.2018/462
发表期刊：IJCAI
Publish time: 13 July 2018
作者单位：上交
数据集：
Movielens 1M https://grouplens.org/datasets/movielens/1m/
Amazon Music http://jmcauley.ucsd.edu/data/amazon/
代码：

本文创新点(个人理解)

本文最大的创新点在于：dual_embedding
1.以前的的模型，用一个user latent embedding 表示用户
本文除了原始的embedding, 还通过uesr rate过的item来表示用户，新增了一个 item-based embedding来表示user.
2.对于item,同user。

Abstract(本文创新点)

1 this paper proposes a dual-embedding based deep latent factor model named DELF for recommendation with implicit feedback.
2 In addition to learning a single embedding for a user (resp. item), we represent each user (resp.
item) with an additional embedding from the perspective of the interacted items (resp. users).
3 We employ an attentive neural method to discriminate he importance of interacted users/items for dual-embedding learning.
4 We further introduce a neural network architecture to incorporate dual embeddings for recommendation
5 A novel attempt of DELF is to model each user-item interaction with four deep representations that are subtly fused for preference prediction

1 Introduction

1 An important challenge of applying latent factor models to implicit feedback based recommendation is: how to learn appropriate embeddings for users and items given scarce negative feedback? Since all the observed interactions are positive implicit feedback, learning user and item embeddings with only positive feedback will result in significant overfitting
2 前人工作的一些不足
3 本文的工作，与Abstract重复，就是解释的更详细了

2 Preliminaries

We consider a user-item interaction matrix $R\in R^{MxN}$ from users’ implicit feedback, where $M$ and $N$ are the number of users and items, respectively. $R_{ui} = 1$ indicates an interaction between user $u$ and item $i$ , and $R_{ui}=0$ means no interaction is observed.

2.1 Neural Collaborative Filtering

1 Matrix Factorization (MF)

2 Neural Collaborative Filtering (NCF)

Note that the original NCF paper ensembles MLP and MF to obtain the NeuMF model. In this paper, we focus on developing single CF model for recommendation, while the proposed model can be ensembled with other models to achieve better performance.

2.2 NVSD & SVD++

NVSD

1 NSVD [Pa-terek, 2007] models users based on the items they have rated. Formally, each item is associated with two latent vectors $q_i$ and $y_i$ . The preference score of user $u$ to item $i$ is estimated as:

where $R (u)$ is the set of items rated by user $u$ , $b_u$ and b_i are bias terms
2 不足：
the main issue of NSVD is that two users who have rated the same set of items with entirely different ratings are tied to have the same representation

SVD++

SVD++ [Koren, 2008] is proposed for recommendation with explicit ratings, which estimates user-item preferences as follows:

Where $p_u$ is a latent factor. SVD++ leverages the NSVD-based representation to adjust the user latent factor rather than represent the user. We observe that NSVD-based latent factors are determined by users’ rated items, which are useful to avoid false negatives from noisy implicit feedback and more robust than explicitly parameterized factors.

3 DELF

Figure 1: Dual-Embedding based Deep Latent Factor Model

3.1 Model

where where $\Theta$ denotes latent factors of $u$ and $i$ , and f denotes the interaction function. Figure 1 illustrates the design of $T h e t a$ and $f$ in DELF,

Input Layer

(1) Single-embedding based latent factor models simply associate u and i with their one-hot representations $u$ and $i$ .
(2) In addition to the one-hot vectors, DELF also incorporates the binary interaction vectors $R_{u∗}$ and $R_{∗i}$ from the observed interactions for $u$ and $i$ , respectively.

Embedding Layer

(1) The embedding layer projects each feature vector from the input layer into a dense vector representation.
(2) The primitive feature vector embeddings (i.e., u and i) can be obtained by referring to the embedding matrix as follows.
$P_u = P^{T}u \tag{8}$
where $P\in R^{MxN}$ denotes the user embedding matrix, and $K$ is the dimension of user embeddings. Similarly, $q_i$ can be
obtained from the item embedding matrix $Q$
(3) NSVD averages the factors of rated items to represent a user. However, different items can reflect user preference in different degrees. Therefore, we employ the attention mechanism [Bahdanau et al., 2014] to discriminate the importance of the interacted items automatically, as defined below:

where $m_u$ is the item-based user embedding, and $\alpha_i$ is the attention score for item $i$ rated by user $u$ . Here we parameterize the attention score for item $i$ by:

where $W_a$ , $b_a$ denote the weight matrix and bias vector respectively, and $h_a$ is a context vector.

Pairwise Neural Interaction Layers

(1) Instead of using a single network structure, we model interactions for the two kinds of user/item embeddings separately, and obtain four deep representations for different embedding interactions. Formally,

where $j\in \{1,2,3,4\}$ ; $h^j$ is the deep representation of embedding interaction learned by the $j - t h$ feedforward neural network; $\phi^j_l$ is the $l - t h$ layer in network j; $W^j_l,$ $b^j_l$ ,and $\delta^j_l$ denote the weight matrix, bias vector and activation function of layer $l$ int network j, respectively;

(2) An insight of DELF is that the primitive and additional embeddings should be of varying importance to the final preference score under different circumstances
(3) Modeling embedding interactions separately avoids two kinds of embeddings from affecting each other and hence may benefit the prediction result.
(4) In DELF, we choose Rectifier ReLU as the activation function by default if not otherwise specified, which is proven to be non-saturated and ields good performance in deep networks [Glorot et al., 2011]
(5) As for the network structure, we follow the setting proposed by [He et al., 2017] and employ a tower structure for each network, where igher layers have smaller number of neurons.

Fusion and Prediction.

We propose two fusion schemes: MLP and an empirical scheme.
(1) For MLP , the combined feature after the fusion layer is formulated as:

where $W_f$ , $b_f$ , $\delta_f$ are the weght matrix, bias vector and activation function, respectively; $z_f$ is the concatenation of four latent interaction representations. We dub this model “DELF-MLP”
(2) The empirical scheme follows our observation that primitive embeddings $p_u$ and $q_i$ should be less expressive with fewer ratings but yield good performance with enough true instances. Hence, we empirically assign non-uniform weights to four deep representations. ormally, for user $u$ and item $i$ , we have:

where $\lambda_u$ and $\lambda_i$ are hyper-parameters to be tuned via the validation set.
We dub this model “DELF-EF”.
(3) At last, the output $h_f$ of the fusion layer is transformed to the final prediction score:

where $W_p$ , $b_f$ are the weight matrix and bias term, respectively; $\delta_p$ is the sigmoid function as we expect the prediction score to be in the range of [0, 1].
(4) It is worthy noticing that both NCF and NSVD can be interpreted as special cases of our DELF framework.
(一般一个新的模型，都是在前人模型基础上的扩展，新的模型要包含前人的模型)

3.2 Learning

(1) Both point-wise and pair-wise objective functions are widely used in recommender systems. In this work, we employ point-wise objective function for simplicity and leave the other one as future work.
(2) Due to the one-class nature of implicit feedback, we follow [He et al., 2017] to use the binary cross-entropy loss, which is defined as:

(3) To optimize the objective function, we adopt Adam, a variant of Stochastic Gradient Descent (SGD) that dynamically tunes the learning rate during training process and leads to faster convergence

4 Experiments

4.1 Experiments Settings

Datasets

Movielens 1M1 and Amazon Music2. We transformed both datasets to implicit feedback, where each entry is marked as 0 or 1 denoting whether the user has rated the item.

Evaluation Protocol

(1) we employed the widely used leave-one-out evaluation
(2) we followed the common strategy to randomly sample 100 items that are not interacted with the user.
(3) We used Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) [He et al., 2015] as metrics. The ranked list is truncated at 10 for both metrics.

Compared Methods

-ItemPop

-eALS

-BPR

-MLP

-NeuiMF

-DMF

Parameter Settings

4.2 Performance Comparison (RQ1)

4.3 Effects of Key Components (RQ2)

4.4 Hyper-parameter Investigation (RQ3)

5 Conclusion and Future Work

(1) In this paper, we propose a novel deep latent factor model with dual embeddings for recommendation.
(2) In addition to the primary user and item embeddings, we employ an attentive neural method to obtain additional embeddings for users and items based on their interaction vectors from implicit feedback.
(3) n the future, we plan to extend DELF to incorporate auxiliary information. Auxiliary information such as social relations, user review and knowledge base can be utilized to characterize users/items from different perspectives.

Acknowledgements

Reference

2018_IJCAI_DELF: a dual-embedding based deep latent factor model for recommendation相关推荐

推荐系统学习笔记之三 LFM (Latent Factor Model) 隐因子模型 + SVD (singular value decomposition) 奇异值分解
Low Rank Matrix Factorization低阶矩阵分解在上一篇笔记之二里面说到我们有五部电影,以及四位用户,每个用户对电影的评分如下,?表示未评分. Movies\User User ...
隐语义模型LFM(Latent Factor Model)
隐语义模型LFM(Latent Factor Model)是主题模型中的一种,跟其他主题模型一样,LFM也需要定义若干"主题",来表示个中隐含的关系,这些"主题" ...
Latent factor model, LFM
目录 1. LFM算法概述 1.1 类别归属 1.2 用户对于各个类别的喜爱程度 1.3 LFM算法原理介绍 1.3.1 具体的例子介绍算法的思想 2. LFM算法应用场景 1. LFM算法概述对于 ...
推荐系统笔记：基于潜在因子模型的协同过滤（latent factor model）
1 基本思想基本思想是利用评分矩阵行和列的高度相关性.数据具有内在的丰富关联性,并且生成的数据矩阵通常可以通过各条目均有数值的低秩矩阵很好地近似. 潜在因子模型被认为是推荐系统中的最新技术.这些模型 ...
python实现lfm_【知识发现】隐语义模型(LFM,Latent Factor Model)推荐算法python实现
1.隐语义模型: 物品:表示为长度为k的向量q(每个分量都表示物品具有某个特征的程度) 用户兴趣:表示为长度为k的向量p(每个分量都表示用户对某个特征的喜好程度) 用户u对物品i的兴趣可以表示 ...
Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social...》论文学习笔记
Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social Effects in Recom ...
2019_WWW_Dual graph attention networks for deep latent representation of multifaceted social effect
[论文阅读笔记]2019_WWW_Dual graph attention networks for deep latent representation of multifaceted social ...
学习Knowledge Graph Embedding Based Question Answering代码笔记
前言最近被导师安排学习一下[Knowledge Graph Embedding Based Question Answering] 这篇paper,这篇paper的重点在于运用了Knowledge ...
论文阅读：Semantic Aware Attention Based Deep Object Co-segmentation（ACCV2018）
协同分割论文:Semantic Aware Attention Based Deep Object Co-segmentation(ACCV2018) 论文原文 code 目录 1.简介 2. ...

2018_IJCAI_DELF: a dual-embedding based deep latent factor model for recommendation