[论文阅读笔记]2018_IJCAI_a dual-embedding based deep latent factor model for recommendation—(IJCAI, 2018.07.13)-- Weiyu Cheng, Yanyan Shen, Yanmin Zhu, Linpeng Huang

论文下载地址:https://doi.org/10.24963/ijcai.2018/462
发表期刊:IJCAI
Publish time: 13 July 2018
作者单位:上交
数据集
Movielens 1M https://grouplens.org/datasets/movielens/1m/
Amazon Music http://jmcauley.ucsd.edu/data/amazon/
代码

本文创新点(个人理解)

本文最大的创新点在于:dual_embedding
1.以前的的模型,用一个user latent embedding 表示用户
本文除了原始的embedding, 还通过uesr rate过的item来表示用户,新增了一个 item-based embedding来表示user.
2.对于item,同user。

Abstract(本文创新点)

1 this paper proposes a dual-embedding based deep latent factor model named DELF for recommendation with implicit feedback.
2 In addition to learning a single embedding for a user (resp. item), we represent each user (resp.
item) with an additional embedding from the perspective of the interacted items (resp. users).
3 We employ an attentive neural method to discriminate he importance of interacted users/items for dual-embedding learning.
4 We further introduce a neural network architecture to incorporate dual embeddings for recommendation
5 A novel attempt of DELF is to model each user-item interaction with four deep representations that are subtly fused for preference prediction

1 Introduction

1 An important challenge of applying latent factor models to implicit feedback based recommendation is: how to learn appropriate embeddings for users and items given scarce negative feedback? Since all the observed interactions are positive implicit feedback, learning user and item embeddings with only positive feedback will result in significant overfitting
2 前人工作的一些不足
3 本文的工作,与Abstract重复,就是解释的更详细了

2 Preliminaries

We consider a user-item interaction matrix R ∈ R M x N R\in R^{MxN} RRMxN from users’ implicit feedback, where M M M and N N N are the number of users and items, respectively. R u i = 1 R_{ui} = 1 Rui=1 indicates an interaction between user u u u and item i i i, and R u i = 0 R_{ui}=0 Rui=0 means no interaction is observed.

2.1 Neural Collaborative Filtering

1 Matrix Factorization (MF)

2 Neural Collaborative Filtering (NCF)

Note that the original NCF paper ensembles MLP and MF to obtain the NeuMF model. In this paper, we focus on developing single CF model for recommendation, while the proposed model can be ensembled with other models to achieve better performance.

2.2 NVSD & SVD++

NVSD

1 NSVD [Pa-terek, 2007] models users based on the items they have rated. Formally, each item is associated with two latent vectors q i q_i qi and y i y_i yi . The preference score of user u u u to item i i i is estimated as:

where R ( u ) R(u) R(u) is the set of items rated by user u u u, b u b_u bu and b_i are bias terms
2 不足:
the main issue of NSVD is that two users who have rated the same set of items with entirely different ratings are tied to have the same representation

SVD++

SVD++ [Koren, 2008] is proposed for recommendation with explicit ratings, which estimates user-item preferences as follows:

Where p u p_u pu is a latent factor. SVD++ leverages the NSVD-based representation to adjust the user latent factor rather than represent the user. We observe that NSVD-based latent factors are determined by users’ rated items, which are useful to avoid false negatives from noisy implicit feedback and more robust than explicitly parameterized factors.

3 DELF

Figure 1: Dual-Embedding based Deep Latent Factor Model

3.1 Model


where where Θ \Theta Θ denotes latent factors of u u u and i i i, and f denotes the interaction function. Figure 1 illustrates the design of T h e t a Theta Theta and f f f in DELF,

Input Layer

(1) Single-embedding based latent factor models simply associate u and i with their one-hot representations u u u and i i i.
(2) In addition to the one-hot vectors, DELF also incorporates the binary interaction vectors R u ∗ R_{u∗} Ru and R ∗ i R_{∗i} Ri from the observed interactions for u u u and i i i, respectively.

Embedding Layer

(1) The embedding layer projects each feature vector from the input layer into a dense vector representation.
(2) The primitive feature vector embeddings (i.e., u and i) can be obtained by referring to the embedding matrix as follows.
P u = P T u (8) P_u = P^{T}u \tag{8} Pu=PTu(8)
where P ∈ R M x N P\in R^{MxN} PRMxN denotes the user embedding matrix, and K K K is the dimension of user embeddings. Similarly, q i q_i qi can be
obtained from the item embedding matrix Q Q Q
(3) NSVD averages the factors of rated items to represent a user. However, different items can reflect user preference in different degrees. Therefore, we employ the attention mechanism [Bahdanau et al., 2014] to discriminate the importance of the interacted items automatically, as defined below:

where m u m_u mu is the item-based user embedding, and α i \alpha_i αi is the attention score for item i i i rated by user u u u. Here we parameterize the attention score for item i i i by:


where W a W_a Wa, b a b_a ba denote the weight matrix and bias vector respectively, and h a h_a ha is a context vector.

Pairwise Neural Interaction Layers

(1) Instead of using a single network structure, we model interactions for the two kinds of user/item embeddings separately, and obtain four deep representations for different embedding interactions. Formally,

where j ∈ { 1 , 2 , 3 , 4 } j\in \{1,2,3,4\} j{1,2,3,4}; h j h^j hj is the deep representation of embedding interaction learned by the j − t h j-th jthfeedforward neural network; ϕ l j \phi^j_l ϕlj is the l − t h l-th lth layer in network j; W l j , W^j_l, Wlj, b l j b^j_l blj,and δ l j \delta^j_l δlj denote the weight matrix, bias vector and activation function of layer l l l int network j, respectively;

(2) An insight of DELF is that the primitive and additional embeddings should be of varying importance to the final preference score under different circumstances
(3) Modeling embedding interactions separately avoids two kinds of embeddings from affecting each other and hence may benefit the prediction result.
(4) In DELF, we choose Rectifier ReLU as the activation function by default if not otherwise specified, which is proven to be non-saturated and ields good performance in deep networks [Glorot et al., 2011]
(5) As for the network structure, we follow the setting proposed by [He et al., 2017] and employ a tower structure for each network, where igher layers have smaller number of neurons.

Fusion and Prediction.

We propose two fusion schemes: MLP and an empirical scheme.
(1) For MLP , the combined feature after the fusion layer is formulated as:

where W f W_f Wf, b f b_f bf, δ f \delta_f δf are the weght matrix, bias vector and activation function, respectively; z f z_f zf is the concatenation of four latent interaction representations. We dub this model “DELF-MLP”
(2) The empirical scheme follows our observation that primitive embeddings p u p_u pu and q i q_i qi should be less expressive with fewer ratings but yield good performance with enough true instances. Hence, we empirically assign non-uniform weights to four deep representations. ormally, for user u u u and item i i i, we have:

where λ u \lambda_u λu and λ i \lambda_i λi are hyper-parameters to be tuned via the validation set.
We dub this model “DELF-EF”.
(3) At last, the output h f h_f hf of the fusion layer is transformed to the final prediction score:

where W p W_p Wp, b f b_f bf are the weight matrix and bias term, respectively; δ p \delta_p δp is the sigmoid function as we expect the prediction score to be in the range of [0, 1].
(4) It is worthy noticing that both NCF and NSVD can be interpreted as special cases of our DELF framework.
(一般一个新的模型,都是在前人模型基础上的扩展,新的模型要包含前人的模型)

3.2 Learning

(1) Both point-wise and pair-wise objective functions are widely used in recommender systems. In this work, we employ point-wise objective function for simplicity and leave the other one as future work.
(2) Due to the one-class nature of implicit feedback, we follow [He et al., 2017] to use the binary cross-entropy loss, which is defined as:

(3) To optimize the objective function, we adopt Adam, a variant of Stochastic Gradient Descent (SGD) that dynamically tunes the learning rate during training process and leads to faster convergence

4 Experiments

4.1 Experiments Settings

Datasets

Movielens 1M1 and Amazon Music2. We transformed both datasets to implicit feedback, where each entry is marked as 0 or 1 denoting whether the user has rated the item.

Evaluation Protocol

(1) we employed the widely used leave-one-out evaluation
(2) we followed the common strategy to randomly sample 100 items that are not interacted with the user.
(3) We used Hit Ratio (HR) and Normalized Discounted Cumulative Gain (NDCG) [He et al., 2015] as metrics. The ranked list is truncated at 10 for both metrics.

Compared Methods

-ItemPop

-eALS

-BPR

-MLP

-NeuiMF

-DMF

Parameter Settings

4.2 Performance Comparison (RQ1)

4.3 Effects of Key Components (RQ2)

4.4 Hyper-parameter Investigation (RQ3)


5 Conclusion and Future Work

(1) In this paper, we propose a novel deep latent factor model with dual embeddings for recommendation.
(2) In addition to the primary user and item embeddings, we employ an attentive neural method to obtain additional embeddings for users and items based on their interaction vectors from implicit feedback.
(3) n the future, we plan to extend DELF to incorporate auxiliary information. Auxiliary information such as social relations, user review and knowledge base can be utilized to characterize users/items from different perspectives.

Acknowledgements

Reference

2018_IJCAI_DELF: a dual-embedding based deep latent factor model for recommendation相关推荐

  1. 推荐系统学习笔记之三 LFM (Latent Factor Model) 隐因子模型 + SVD (singular value decomposition) 奇异值分解

    Low Rank Matrix Factorization低阶矩阵分解 在上一篇笔记之二里面说到我们有五部电影,以及四位用户,每个用户对电影的评分如下,?表示未评分. Movies\User User ...

  2. 隐语义模型LFM(Latent Factor Model)

    隐语义模型LFM(Latent Factor Model)是主题模型中的一种,跟其他主题模型一样,LFM也需要定义若干"主题",来表示个中隐含的关系,这些"主题" ...

  3. Latent factor model, LFM

    目录 1. LFM算法概述 1.1 类别归属 1.2 用户对于各个类别的喜爱程度 1.3 LFM算法原理介绍 1.3.1 具体的例子介绍算法的思想 2. LFM算法应用场景 1. LFM算法概述 对于 ...

  4. 推荐系统笔记:基于潜在因子模型的协同过滤(latent factor model)

    1 基本思想 基本思想是利用评分矩阵行和列的高度相关性.数据具有内在的丰富关联性,并且生成的数据矩阵通常可以通过各条目均有数值的低秩矩阵很好地近似. 潜在因子模型被认为是推荐系统中的最新技术.这些模型 ...

  5. python实现lfm_【知识发现】隐语义模型(LFM,Latent Factor Model)推荐算法python实现

    1.隐语义模型: 物品:表示为长度为k的向量q(每个分量都表示  物品具有某个特征的程度) 用户兴趣:表示为长度为k的向量p(每个分量都表示  用户对某个特征的喜好程度) 用户u对物品i的兴趣可以表示 ...

  6. Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social...》论文学习笔记

    Dual Graph Attention Networks for Deep Latent Representation of Multifaceted Social Effects in Recom ...

  7. 2019_WWW_Dual graph attention networks for deep latent representation of multifaceted social effect

    [论文阅读笔记]2019_WWW_Dual graph attention networks for deep latent representation of multifaceted social ...

  8. 学习Knowledge Graph Embedding Based Question Answering代码笔记

    前言 最近被导师安排学习一下[Knowledge Graph Embedding Based Question Answering] 这篇paper,这篇paper的重点在于运用了Knowledge ...

  9. 论文阅读:Semantic Aware Attention Based Deep Object Co-segmentation(ACCV2018)

    协同分割论文:Semantic Aware Attention Based Deep Object Co-segmentation(ACCV2018) 论文原文     code 目录 1.简介 2. ...

最新文章

  1. android平板值得买吗,2021年一月更新1000-2000价位最全平板选购指南
  2. C++fast power快速指数的实现(附完整源码)
  3. 病人排队(信息学奥赛一本通-T1183)
  4. verilog异步复位jk触发器_同步复位和异步复位常见问题总结
  5. LeetCode Week 3:第 21 ~ 30 题
  6. 【Django】Django—Form两种解决表单数据无法动态刷新的方法
  7. 如何避免 obj1 + obj2 = obj3 错误?
  8. STI、LOD与WPE概念:形成机理及对电路设计的影响
  9. html本地站点建立代码,实验目的通过编写一小网页熟练HTML语言书写方法;学会建立本地站点.doc...
  10. 2022年9月11日:人生第一次相亲记录
  11. 【2012Esri中国用户大会讲座】ArcGIS 10.1 for Server 安全机制(1)用户角色权限
  12. 分布式理论 PACELC 了解么?
  13. 水仙花数判断 (10 分)
  14. scala练习_李孟_新浪博客
  15. C++ 类自杀 delete this
  16. 大连民族大学选课登录(手机端电脑端)
  17. 地震勘探基础(八)之地震动校正
  18. jpg转换成pdf有什么妙招
  19. [转载]网站使用QQ登陆
  20. 09_星仔带你学Java之类和对象、构造器、static修饰符

热门文章

  1. 【蓝桥杯选拔赛真题25】Scratch黑白块 少儿编程scratch蓝桥杯选拔赛真题讲解
  2. 使用AVPlayer实现在线音频播放注意问题
  3. Eigen库学习笔记(四)Eigen用于三维张量
  4. 某微型计算机广告中标有四核,中国大学MOOC: 某微型计算机广告中标有“四核 i5-3330 4G 1TB GT640 1G 独显 DVD Win8 WIFIP”,1G独显的含义是( )。...
  5. 【fly-iot 飞驰物联】(6):通过docker镜像使用gitbook启动ActorCloud项目文档,发现是个IOT功能非常丰富的项目,可以继续研究下去。
  6. CentOS7 主机名配置
  7. java8 java9
  8. 由于找不到openni2_Python OpenNI2 libOpenNI2.so问题
  9. 使用DSW训练一个线性回归模型
  10. Spinrg Security Authentication(一)