潜在因子模型

什么是推荐系统? (What is a recommendation system?)

A recommendation system is any rating system which predicts an individual’s preferred choices, based on available data. Recommendation systems are utilized in a variety of services, such as video streaming, online shopping, and social media. Typically, the system provides the recommendation to the users based on its prediction of the rating a user would give to an item. Recommendation systems can be categorized by two aspects, the utilized information and the prediction models.

推荐系统是任何可根据可用数据预测个人偏爱选择的评分系统。 推荐系统用于各种服务中,例如视频流,在线购物和社交媒体。 通常,系统基于其对用户将给项目的评级的预测向用户提供推荐。 推荐系统可以从两个方面进行分类,即所使用的信息和预测模型。

内容过滤与协作过滤 (Content filtering vs. collaborative filtering)

The two major recommendation approaches, content filtering and collaborative filtering, mainly differ according to the information utilized for rating prediction. Figure 1 shows a set of movie rating data together with some tags for users and movies. Note that the ratings from users to movies from a graph structure: users and movies are the vertices, and ratings are the edges of the graph. In this example, a content filtering approach leverages the tag attributes on the movies and users. By querying the tags, we know Alice is a big fan of Marvel movies, and Iron Man is a Marvel movie, thus the Iron Man series will be a good recommendation for her. A specific example of a content filtering scoring system is TF-IDF, used for ranking document searches.

内容过滤和协作过滤这两种主要的推荐方法主要根据用于等级预测的信息而有所不同。 图1显示了一组电影分级数据以及一些用于用户和电影的标签。 请注意,用户对图表结构中电影的评分:用户和电影是顶点,评分是图表的边缘。 在此示例中,内容过滤方法利用了电影和用户上的标签属性。 通过查询标签,我们知道爱丽丝是漫威电影的忠实粉丝,而钢铁侠是漫威电影,因此《 钢铁侠》系列将是对她的不错推荐。 内容过滤评分系统的特定示例是TF-IDF,用于对文档搜索进行排名。

The tags on the users and movies may not always be available. When the tag data is sparse, the content filtering method can be unreliable. On the other hand, collaborative filtering approaches mainly rely on the user behavior data (e.g. rating record or movie watching history). In the example above, both Alice and Bob love the movie Toy Story which was rated 5 and 4.5 by them, respectively. Based on these rating records, it can be inferred that these two users may share similar preferences. Now considering that Bob loves Iron Man, we can expect similar behavior from Alice, and recommend Iron Man to Alice. K-nearest neighbors (KNN) is a typical collaborative filtering approach. The collaborative filtering, however, has the so-called Cold Start problem — it cannot produce a recommendation for new users that have no rating records.

用户和电影上的标签可能并不总是可用。 当标签数据稀疏时,内容过滤方法可能不可靠。 另一方面,协作过滤方法主要依赖于用户行为数据(例如,评级记录或电影观看历史记录)。 在上面的示例中,爱丽丝(Alice)和鲍勃(Bob)都喜欢电影《 玩具总动员 》( Toy Story) ,他们分别将其评为5和4.5。 根据这些评分记录,可以推断出这两个用户可能共享相似的偏好。 现在考虑到鲍勃喜欢钢铁侠 ,我们可以期待爱丽丝的类似行为,并向爱丽丝推荐钢铁侠 。 K最近邻( KNN )是一种典型的协作过滤方法。 但是,协作式过滤存在所谓的“冷启动”问题-无法为没有评分记录的新用户提供推荐。

Memory-based vs. model-based

基于内存与基于模型

Recommendation systems can also be categorized into memory-based and model-based approaches depending on the implicity of the utilized features. In the examples above, all the user and movie features are given explicitly, which allows us to directly match movies and users based on their tags. However sometimes a deep learning model is needed to extract the latent features from the information of users and movies (e.g. an NLP model to categorize the movies based on its outline). A model-based recommendation system utilizes machine learning models for prediction. While a memory-based recommendation system mainly leverages the explicit features.

推荐系统还可以根据所使用功能的隐含性分为基于内存和基于模型的方法。 在上面的示例中,所有用户和电影功能都被明确给出,这使我们能够根据他们的标签直接匹配电影和用户。 然而,有时需要深度学习模型来从用户和电影的信息中提取潜在特征(例如,用于基于电影轮廓对电影进行分类的NLP模型)。 基于模型的推荐系统利用机器学习模型进行预测。 基于内存的推荐系统主要利用显式功能。

Some typical examples of different types of recommendation systems are shown below.

不同类型的推荐系统的一些典型示例如下所示。

(Image by Author)
(图片由作者提供)

图分解如何用于推荐系统 (How graph factorization works for recommendation systems)

As discussed above, a collaborative filtering method, such as KNN, can predict the movie rating without knowing the attributes of the movies and users. However, such an approach may not generate accurate predictions when the rating data is sparse, i.e., the typical user has rated only a few movies. In Fig. 1, KNN cannot predict Alice’s rating of Iron Man if Bob has not rated Toy Story, since there would be no path between Alice and Bob, and there would be no “neighbor” per se for Alice. To address this challenge, the graph factorization approach [1] combines the model-based method with the collaborative filtering method to improve prediction accuracy when the rating record is sparse.

如上所述,协作过滤方法(例如KNN)可以在不知道电影和用户属性的情况下预测电影收视率。 但是,当分级数据稀疏(即,典型用户仅对几部电影进行分级)时,这种方法可能无法生成准确的预测。 在图1中,如果鲍勃未对《 玩具总动员》进行评分,则KNN无法预测爱丽丝对钢铁侠的评分,因为爱丽丝与鲍勃之间不会有任何联系,爱丽丝本身就不会有“邻居”。 为了解决这一挑战,图因式分解方法[1]将基于模型的方法与协作过滤方法相结合,以在评分记录稀疏时提高预测精度。

Fig. 2 illustrates the intuition of the graph factorization method. Graph factorization is also known as a latent factor or matrix factorization method. The objective is to obtain a vector for each user and movie, which represents their latent features. In this example (Fig. 2), the dimension of the vectors is specified as 2. The two elements of the vector for a movie x(i) represent the degrees of romance and action content of this movie, and the two elements of the vector for a user θ(j) represent the user’s degree of preference for romance and action content, respectively. The prediction of the rating that user j would give to movie i is the dot product of x(i) and θ(j), as we expect a better alignment of these two vectors indicates a higher degree of preference from the user to the movie.

图2说明了图形分解方法的直觉。 图分解也被称为潜在因子或矩阵分解方法。 目的是为每个用户和电影获得一个向量,以表示它们的潜在特征。 在此示例中(图2),向量的维数指定为2。电影x ( i )的向量的两个元素表示该电影的浪漫程度和动作内容,而电影的两个元素代表电影的浪漫程度。用户的向量θ ( j )分别表示用户对浪漫和动作内容的偏爱程度。 用户j对电影i的评级的预测是x ( i )和θ ( j )的点积,因为我们期望这两个向量更好的对齐表示用户对电影的偏好程度更高。

θ(j) and x(i) represent the latent factor vectors. The rating predictions for Alice are computed from the latent factors and shown in orange next to the real values.θ(j)和x(i)表示潜在因子向量。 根据潜在因子计算出Alice的评分预测,并在实际值旁边显示为橙色。

The example in Fig.2 is only meant to illustrate the intuition of the graph factorization approach. In practice, the meaning of each vector element is usually unknown. The vectors are actually randomly initialized and get “trained” by minimizing the loss function below [1].

图2中的示例仅用于说明图形分解方法的直觉。 实际上,每个向量元素的含义通常是未知的。 向量实际上是随机初始化的,并且通过最小化低于[1]的损失函数来进行“训练”。

where M is the number of rating records, n is the dimension of the latent factor vector, y(i,j) is the rating that user j gave to movie i , and

潜在因子模型_如何使用潜在因子模型在图形数据库中构建推荐系统相关推荐

  1. unity 构建迷宫_教程:使用GameDraw在Unity中构建迷宫游戏关卡

    unity 构建迷宫 GameDraw is a 3D modeling extension for Unity developed by Mixed Dimensions that reduces ...

  2. 如何识别媒体偏见_描述性语言理解,以识别文本中的潜在偏见

    如何识别媒体偏见 TGumGum can do to bring change by utilizing our Natural Language Processing technology to s ...

  3. 【自然语言处理】潜在语义分析【上】潜在语义分析

    有任何的书写错误.排版错误.概念错误等,希望大家包含指正. 由于字数限制,分成两篇博客. [自然语言处理]潜在语义分析[上]潜在语义分析 [自然语言处理]潜在语义分析[下]概率潜在语义分析 基础概念 ...

  4. 交替最小二乘矩阵分解_使用交替最小二乘矩阵分解与pyspark建立推荐系统

    交替最小二乘矩阵分解 pyspark上的动手推荐系统 (Hands-on recommender system on pyspark) Recommender System is an informa ...

  5. python进阶项目设计_推荐系统进阶:设计和构建推荐系统流程综述(1)

    内容目录推荐系统应用场景概述 为什么需要推荐? 推荐系统的目标? 推荐系统的工作? 推荐系统的基本模型 构建推荐系统的方法 相关参考? 1.推荐系统概述以及它们如何提供有效形式的定向营销 推荐系统 推 ...

  6. adobe air管理员_了解Adobe AIR,第二部分:构建客户管理应用

    adobe air管理员 In our previous tutorial, we created a personal notes storage database using HTML, CSS, ...

  7. python3中的int类型占64位_在windows 10 64位计算机中,默认情况下,numpy数组数据类型将以int32形式出现...

    最初的海报Prana问了一个非常好的问题."为什么在64位计算机上,整数默认设置为32位?"在 据我所知,简短的回答是:"因为它的设计是错误的". 显然,64位 ...

  8. 闪亮蔚蓝_在R中构建第一个闪亮的Web应用

    闪亮蔚蓝 数据科学 (DATA SCIENCE) Do you want to make your R code publicly available for others to use? If yo ...

  9. keras构建卷积神经网络_在Keras中构建,加载和保存卷积神经网络

    keras构建卷积神经网络 This article is aimed at people who want to learn or review how to build a basic Convo ...

最新文章

  1. android动画的实现原理,Android动画的实现原理 .
  2. git 创建邮箱 用户名_git设置用户名和邮箱
  3. python 打开文件-Python打开文件的方式
  4. 无心剑随感《爱心教育》
  5. Linux系统下MYSQL主从同步
  6. Python 机器学习——解决过拟合的方法
  7. docker php7 mysql分开,Docker nginx+php74+mysql57, 并安装gd和mysql扩展
  8. 美国航空航天局(NASA)高度集成WebFOCUS和SharePoint
  9. 杭州·云栖 2050 大会日程(5.25-5.27)
  10. Linux无线投屏软件,scrcpy - 手机无线投屏到电脑
  11. uniapp 支付宝小程序 获取用户信息 ISV权限不足
  12. 数据总线和地址总线区别是什么?作用分别是什么
  13. DH坐标系的建立与DH表—机器人学
  14. Share Your Music - HTML5 Music Web App
  15. linux的系统监视器图片_用Jetson Nano构建一个价值60美元的人脸识别系统
  16. 仿百度音乐html5,js仿百度音乐全选操作
  17. 2014年10月25日深圳彩讯科技和北京宇信易诚的笔试记录
  18. 关于手机和固话号码正则表达式
  19. mysql数据库压缩图片_MySQL8.0.20压缩版本安装教程图片加文字详解
  20. 36氪上的这七家程序员网站你都了解吗?

热门文章

  1. 小福利,用selenium模块爬取qq音乐歌单
  2. 7-2 二叉搜索树的删除操作
  3. 已经包含头文件却仍然显示未定义标识符
  4. 解决导入maven项目之后pom.xml中的project标签报错:批量删除没有下载完全的pom依赖bat脚本
  5. 项管(十六)——文档管理、配置管理、知识管理、变更管理
  6. 阿里2021年面经汇总
  7. 中国石斑鱼养殖产量不断上升,捕捞产量逐渐下降「图」
  8. 微信气泡主题设置_华为手机微信怎么设置气泡? 怎样改微信的气泡和主题
  9. Python爬虫之Requests模块巩固深入案例
  10. Android 加固应用