1.AutoRec: Autoencoders Meet Collaborative Filtering论文解读以及AutoRec代码实现(pytorch)
一、论文原文
二、综述
协同过滤模型的目的是利用用户对商品的偏好信息来提供个性化的推荐。AutoRec是一个新型的基于自动编码器的协同过滤模型。论文作者认为AutoRec与现有的将玻尔兹曼机用于协同过滤的神经方法相比具有表征和计算上的优势,并从经验上证明了AutoRec优于当前最先进的方法。
三、自编码器模型
四、AutoRec模型
总体而言
1、模型输入
item-based:每个item用各个user对它的打分作为其向量描述(user-based:每个user用该user对各个item的打分作为输入)。
2、模型输出
将模型对input重建后的新向量里对应位置的值认为是预测值
3、模型优化目标
后一项为防止过拟合加入的正则项。需要注意的是第一项里在计算loss只在观测到的数据上计算。未观测到的missing value在初始时赋一个默认值,比如1-5分的打分体系里统一给3。
AE的目标是学习一个模型能使得输出尽可能接近输入,选用平方误差作为loss function后目标为上图。本质上AE是学习到了一个原始输入的一个向量表达。从input layer->hidden layer的过程是encode的过程,从hidden layer->output layer是个decode的过程,希望这个模型能够对原始数据进行最大程度的重构。AE是一种无监督聚类的方法,用途很多,比如特征提取、数据压缩等。
AutoRec的想法是将AE拿来直接学习Rating Matrix里的行或列数据的压缩向量表达,分为user-based AutoRec和item-based AutoRec两种。对于item-based AutoRec,input为R里的每列,即每个item用各个user对它的打分作为其向量描述;对于user-based AutoRec是用R里的每行。
五、代码实现(pytorch)
import torch
import torch.nn as nn
import numpy as np
from torch.autograd import variable
import pytest
from data import get_data
import math
import time
import argparse
import torch.utils.data as Data
import torch.optim as optimclass Autorec(nn.Module):def __init__(self,args, num_users,num_items):super(Autorec, self).__init__()self.args = argsself.num_users = num_usersself.num_items = num_itemsself.hidden_units = args.hidden_unitsself.lambda_value = args.lambda_valueself.encoder = nn.Sequential(nn.Linear(self.num_items, self.hidden_units),nn.Sigmoid())self.decoder = nn.Sequential(nn.Linear(self.hidden_units, self.num_items),)def forward(self,torch_input):encoder = self.encoder(torch_input)decoder = self.decoder(encoder)return decoderdef loss(self,decoder,input,optimizer,mask_input):cost = 0temp2 = 0cost += (( decoder - input) * mask_input).pow(2).sum()rmse = costfor i in optimizer.param_groups:for j in i['params']:#print(type(j.data), j.shape,j.data.dim())if j.data.dim() == 2:temp2 += torch.t(j.data).pow(2).sum()cost += temp2 * self.lambda_value * 0.5return cost,rmsedef train(epoch):RMSE = 0cost_all = 0for step, (batch_x, batch_mask_x, batch_y) in enumerate(loader):batch_x = batch_x.type(torch.FloatTensor).cuda()batch_mask_x = batch_mask_x.type(torch.FloatTensor).cuda()decoder = rec(batch_x)loss, rmse = rec.loss(decoder=decoder, input=batch_x, optimizer=optimer, mask_input=batch_mask_x)optimer.zero_grad()loss.backward()optimer.step()cost_all += lossRMSE += rmseRMSE = np.sqrt(RMSE.detach().cpu().numpy() / (train_mask_r == 1).sum())print('epoch ', epoch, ' train RMSE : ', RMSE)def tst(epoch):test_r_tensor = torch.from_numpy(test_r).type(torch.FloatTensor).cuda()test_mask_r_tensor = torch.from_numpy(test_mask_r).type(torch.FloatTensor).cuda()decoder = rec(test_r_tensor)#decoder = torch.from_numpy(np.clip(decoder.detach().cpu().numpy(),a_min=1,a_max=5)).cuda()unseen_user_test_list = list(user_test_set - user_train_set)unseen_item_test_list = list(item_test_set - item_train_set)for user in unseen_user_test_list:for item in unseen_item_test_list:if test_mask_r[user,item] == 1:decoder[user,item] = 3mse = ((decoder - test_r_tensor) * test_mask_r_tensor).pow(2).sum()RMSE = mse.detach().cpu().numpy() / (test_mask_r == 1).sum()RMSE = np.sqrt(RMSE)print('epoch ', epoch, ' test RMSE : ', RMSE)if __name__ == '__main__':parser = argparse.ArgumentParser(description='I-AutoRec ')parser.add_argument('--hidden_units', type=int, default=500)parser.add_argument('--lambda_value', type=float, default=1)parser.add_argument('--train_epoch', type=int, default=100)parser.add_argument('--batch_size', type=int, default=100)parser.add_argument('--optimizer_method', choices=['Adam', 'RMSProp'], default='Adam')parser.add_argument('--grad_clip', type=bool, default=False)parser.add_argument('--base_lr', type=float, default=1e-3)parser.add_argument('--decay_epoch_step', type=int, default=50, help="decay the learning rate for each n epochs")parser.add_argument('--random_seed', type=int, default=1000)parser.add_argument('--display_step', type=int, default=1)args = parser.parse_args()np.random.seed(args.random_seed)data_name = 'ml-1m'num_users = 6040num_items = 3952num_total_ratings = 1000209train_ratio = 0.9path = "./%s" % data_name + "/"train_r,train_mask_r,test_r,test_mask_r,user_train_set,item_train_set,user_test_set,\item_test_set = get_data(path, num_users, num_items, num_total_ratings, train_ratio)args.cuda = torch.cuda.is_available()rec = Autorec(args,num_users,num_items)if args.cuda:rec.cuda()optimer = optim.Adam(rec.parameters(), lr = args.base_lr, weight_decay=1e-4)num_batch = int(math.ceil(num_users / args.batch_size))torch_dataset = Data.TensorDataset(torch.from_numpy(train_r),torch.from_numpy(train_mask_r),torch.from_numpy(train_r))loader = Data.DataLoader(dataset=torch_dataset,batch_size=args.batch_size,shuffle=True)for epoch in range(args.train_epoch):train(epoch=epoch)tst(epoch=epoch)
import numpy as np
import argparse
import mathdef get_data(path,num_users,num_items,num_total_ratings,train_ratio):fp = open(path + "ratings1.dat")user_train_set = set()user_test_set = set()item_train_set = set()item_test_set = set()train_r = np.zeros((num_users, num_items))test_r = np.zeros((num_users, num_items))train_mask_r = np.zeros((num_users, num_items))test_mask_r = np.zeros((num_users, num_items))random_perm_idx = np.random.permutation(num_total_ratings)train_idx = random_perm_idx[0:int(num_total_ratings * train_ratio)]test_idx = random_perm_idx[int(num_total_ratings * train_ratio):]lines = fp.readlines()''' Train '''for itr in train_idx:line = lines[itr]user, item, rating, _ = line.split("::")user_idx = int(user) - 1item_idx = int(item) - 1train_r[user_idx, item_idx] = int(float(rating))train_mask_r[user_idx, item_idx] = 1user_train_set.add(user_idx)item_train_set.add(item_idx)''' Test '''for itr in test_idx:line = lines[itr]user, item, rating, _ = line.split("::")user_idx = int(user) - 1item_idx = int(item) - 1test_r[user_idx, item_idx] = int(float(rating))test_mask_r[user_idx, item_idx] = 1user_test_set.add(user_idx)item_test_set.add(item_idx)return train_r,train_mask_r,test_r,test_mask_r,user_train_set,item_train_set,user_test_set,item_test_set
六、注意点
项目源地址为:https://github.com/NeWnIx5991/AutoRec-for-CF
项目所用数据集:经典的Movielens数据集ml-1m ,下载链接为https://grouplens.org/datasets/movielens/1m/
注意:项目下载好以后一定要换用完整的数据集,原项目中使用的是部分数据集,不换用原始数据集会有各种错误。另外要将其中的test函数改成别的名字,否则会报fixture ‘XXXl’ not found的错误。
七、实验结果
E:\anaconda\envs\torch\python.exe E:/BaiduNetdiskDownload/RecommenderSystem/code/AutoRec-for-CF-master/autorec.py
epoch 0 train RMSE : 1.4532232262357314
epoch 0 test RMSE : 1.2679235205400494
epoch 1 train RMSE : 1.0296993451276202
epoch 1 test RMSE : 1.1995174287675012
epoch 2 train RMSE : 1.0229532861149255
epoch 2 test RMSE : 1.1508297274562664
epoch 3 train RMSE : 1.0085456859550435
epoch 3 test RMSE : 1.1550811536991492
epoch 4 train RMSE : 0.9963796179722617
epoch 4 test RMSE : 1.1347365766156279
epoch 5 train RMSE : 0.9846128982943366
epoch 5 test RMSE : 1.1130812983582472
epoch 6 train RMSE : 0.9695787398021294
epoch 6 test RMSE : 1.1072469562511127
epoch 7 train RMSE : 0.9563792287052298
epoch 7 test RMSE : 1.1077498493937814
epoch 8 train RMSE : 0.9427777515765096
epoch 8 test RMSE : 1.0879296747238765
epoch 9 train RMSE : 0.9304574301006198
epoch 9 test RMSE : 1.078709233246811
epoch 10 train RMSE : 0.9186056288838194
epoch 10 test RMSE : 1.06446998390852
epoch 11 train RMSE : 0.9075696469504302
epoch 11 test RMSE : 1.0598787686424866
epoch 12 train RMSE : 0.8959690322182777
epoch 12 test RMSE : 1.0418823906889079
epoch 13 train RMSE : 0.8839866079448584
epoch 13 test RMSE : 1.033346155325933
epoch 14 train RMSE : 0.8727465366014069
epoch 14 test RMSE : 1.0156245517767202
epoch 15 train RMSE : 0.8628805261661804
epoch 15 test RMSE : 1.0042657072865355
epoch 16 train RMSE : 0.8532305914977587
epoch 16 test RMSE : 0.9934019970003054
epoch 17 train RMSE : 0.8449145521866606
epoch 17 test RMSE : 0.9840979296495956
epoch 18 train RMSE : 0.8377069863203649
epoch 18 test RMSE : 0.9783103793522518
epoch 19 train RMSE : 0.8326468652230488
epoch 19 test RMSE : 0.965798921263207
epoch 20 train RMSE : 0.8283320803487217
epoch 20 test RMSE : 0.9585498168573225
epoch 21 train RMSE : 0.8247225146553356
epoch 21 test RMSE : 0.9574704538884139
epoch 22 train RMSE : 0.8215921855497417
epoch 22 test RMSE : 0.9473140392358408
epoch 23 train RMSE : 0.8191938955908276
epoch 23 test RMSE : 0.9466399959967757
epoch 24 train RMSE : 0.8181787190758416
epoch 24 test RMSE : 0.9405850874246285
epoch 25 train RMSE : 0.8178176910766458
epoch 25 test RMSE : 0.9356826406181583
epoch 26 train RMSE : 0.8182136802939787
epoch 26 test RMSE : 0.9352580186302085
epoch 27 train RMSE : 0.8195191591614224
epoch 27 test RMSE : 0.9291412130496122
epoch 28 train RMSE : 0.8209243967283013
epoch 28 test RMSE : 0.9273676325344586
epoch 29 train RMSE : 0.823027627859796
epoch 29 test RMSE : 0.9251557008175119
epoch 30 train RMSE : 0.8257307811947707
epoch 30 test RMSE : 0.923276715764178
epoch 31 train RMSE : 0.8288149883045541
epoch 31 test RMSE : 0.920683615515029
epoch 32 train RMSE : 0.8319576979904993
epoch 32 test RMSE : 0.9191865841138853
epoch 33 train RMSE : 0.8350872504684178
epoch 33 test RMSE : 0.9178032421569582
epoch 34 train RMSE : 0.8387002228342569
epoch 34 test RMSE : 0.9153994420939402
epoch 35 train RMSE : 0.8428106634581468
epoch 35 test RMSE : 0.9164203690056911
epoch 36 train RMSE : 0.8467025316069907
epoch 36 test RMSE : 0.9138069580398925
epoch 37 train RMSE : 0.8510923181880208
epoch 37 test RMSE : 0.9143302207364493
epoch 38 train RMSE : 0.855530632388373
epoch 38 test RMSE : 0.9114283550582871
epoch 39 train RMSE : 0.8602994364737621
epoch 39 test RMSE : 0.9140058818640407
epoch 40 train RMSE : 0.8646202327472946
epoch 40 test RMSE : 0.9133212823477805
epoch 41 train RMSE : 0.8683725500891368
epoch 41 test RMSE : 0.9127708712271082
epoch 42 train RMSE : 0.8727748969223958
epoch 42 test RMSE : 0.9119627894069224
epoch 43 train RMSE : 0.877316884493616
epoch 43 test RMSE : 0.9110502578671521
epoch 44 train RMSE : 0.8813463293203347
epoch 44 test RMSE : 0.9115397570906938
epoch 45 train RMSE : 0.885477096697663
epoch 45 test RMSE : 0.9107259516070817
epoch 46 train RMSE : 0.8894116935154472
epoch 46 test RMSE : 0.9112814549485093
epoch 47 train RMSE : 0.8935350517783681
epoch 47 test RMSE : 0.911027280684672
epoch 48 train RMSE : 0.8975502448757641
epoch 48 test RMSE : 0.9106882999041371
epoch 49 train RMSE : 0.9019011253887129
epoch 49 test RMSE : 0.9107605572095009
epoch 50 train RMSE : 0.9058899677559987
epoch 50 test RMSE : 0.911235940249753
epoch 51 train RMSE : 0.9097496391570119
epoch 51 test RMSE : 0.9114631910800207
epoch 52 train RMSE : 0.9140489843531912
epoch 52 test RMSE : 0.9131060849256843
epoch 53 train RMSE : 0.9179506673476494
epoch 53 test RMSE : 0.9120023583339368
epoch 54 train RMSE : 0.9216045842612224
epoch 54 test RMSE : 0.9118077945760321
epoch 55 train RMSE : 0.9246190807008362
epoch 55 test RMSE : 0.9133990180742992
epoch 56 train RMSE : 0.9278485195807951
epoch 56 test RMSE : 0.9130024025645888
epoch 57 train RMSE : 0.9309482200407767
epoch 57 test RMSE : 0.9136437269687981
epoch 58 train RMSE : 0.9340916725035929
epoch 58 test RMSE : 0.9145315075876512
epoch 59 train RMSE : 0.9370355097132806
epoch 59 test RMSE : 0.9142838326337236
epoch 60 train RMSE : 0.9403535140399197
epoch 60 test RMSE : 0.9152356416143697
epoch 61 train RMSE : 0.9435875984062677
epoch 61 test RMSE : 0.9161550451266373
epoch 62 train RMSE : 0.9468883629238184
epoch 62 test RMSE : 0.9155188499148691
epoch 63 train RMSE : 0.9501853335548086
epoch 63 test RMSE : 0.916305170339178
epoch 64 train RMSE : 0.9530729790504919
epoch 64 test RMSE : 0.916063645102569
epoch 65 train RMSE : 0.9558041265126143
epoch 65 test RMSE : 0.9171976134906477
epoch 66 train RMSE : 0.9583907994771408
epoch 66 test RMSE : 0.9170194413545861
epoch 67 train RMSE : 0.9613132379715423
epoch 67 test RMSE : 0.9180920392158843
epoch 68 train RMSE : 0.9640272702271127
epoch 68 test RMSE : 0.9183530186284395
epoch 69 train RMSE : 0.9665782181323211
epoch 69 test RMSE : 0.9192261819745662
epoch 70 train RMSE : 0.9692210616895247
epoch 70 test RMSE : 0.9187284504886061
epoch 71 train RMSE : 0.9715793110102142
epoch 71 test RMSE : 0.9191151591955541
epoch 72 train RMSE : 0.9742516682237872
epoch 72 test RMSE : 0.9195442625579975
epoch 73 train RMSE : 0.9765343878287162
epoch 73 test RMSE : 0.9205765018019466
epoch 74 train RMSE : 0.9787128984083284
epoch 74 test RMSE : 0.921274235187087
epoch 75 train RMSE : 0.981015068450287
epoch 75 test RMSE : 0.9218371509741228
epoch 76 train RMSE : 0.983066948620157
epoch 76 test RMSE : 0.9216484770956767
epoch 77 train RMSE : 0.9854644565681877
epoch 77 test RMSE : 0.9211555310301334
epoch 78 train RMSE : 0.9877888471285392
epoch 78 test RMSE : 0.9217264850595405
epoch 79 train RMSE : 0.990003747077749
epoch 79 test RMSE : 0.9226378543458116
epoch 80 train RMSE : 0.9921190223573366
epoch 80 test RMSE : 0.9233916791045184
epoch 81 train RMSE : 0.9940462245877457
epoch 81 test RMSE : 0.9234482672800399
epoch 82 train RMSE : 0.9960442855543167
epoch 82 test RMSE : 0.9244115951634168
epoch 83 train RMSE : 0.9980369203221262
epoch 83 test RMSE : 0.924170286837872
epoch 84 train RMSE : 1.0001366287907674
epoch 84 test RMSE : 0.9257114046845268
epoch 85 train RMSE : 1.002109659907066
epoch 85 test RMSE : 0.9253818544897614
epoch 86 train RMSE : 1.0037769726147
epoch 86 test RMSE : 0.9257810973256935
epoch 87 train RMSE : 1.0053872423878687
epoch 87 test RMSE : 0.9265887189727885
epoch 88 train RMSE : 1.0071591571352527
epoch 88 test RMSE : 0.926667153959291
epoch 89 train RMSE : 1.0087698782656849
epoch 89 test RMSE : 0.9264720868106042
epoch 90 train RMSE : 1.0102730956282084
epoch 90 test RMSE : 0.928398156378113
epoch 91 train RMSE : 1.0117523605366079
epoch 91 test RMSE : 0.9271410365309368
epoch 92 train RMSE : 1.0134238480712479
epoch 92 test RMSE : 0.9281471968303375
epoch 93 train RMSE : 1.015465350392488
epoch 93 test RMSE : 0.9280799542311516
epoch 94 train RMSE : 1.0173334158756722
epoch 94 test RMSE : 0.9287938740954761
epoch 95 train RMSE : 1.0194187492323359
epoch 95 test RMSE : 0.9295269929967344
epoch 96 train RMSE : 1.021440385215036
epoch 96 test RMSE : 0.929660255781569
epoch 97 train RMSE : 1.0229325170553767
epoch 97 test RMSE : 0.9295415721702701
epoch 98 train RMSE : 1.0245354836239178
epoch 98 test RMSE : 0.9293829956196558
epoch 99 train RMSE : 1.0260938936289916
epoch 99 test RMSE : 0.9312116639307958Process finished with exit code 0
大约在25epoch的时候train RMSE将到最低0.8178176910766458,在48epoch的时候test RMSE降到最低,为0.9106882999041371
八、总结
AutoRec模型使用一个单隐层的AutoEncoder泛化用户或物品评分,使模型具有一定的泛化和表达能力。由于AutoRec模型的结构比较简单,使其存在一定的表达能力不足的问题。
从深度学习的角度来说,AutoRec模型的提出,拉开了使用深度学习的思想解决推荐问题的序幕,为复杂深度学习网络的构建提供了思路。
参考链接:https://zhuanlan.zhihu.com/p/129891661
1.AutoRec: Autoencoders Meet Collaborative Filtering论文解读以及AutoRec代码实现(pytorch)相关推荐
- RGCF:Learning to Denoise Unreliable Interactions forGraph Collaborative Filtering论文解读
这篇和之前的一篇工作简称是一样的,但是内容完全不同,本文是Robust Graph Collaborative Filtering (RGCF),注意甄别 一.前言 最近,图神经网络(GNN)作为一种 ...
- Neutral Graph Collaborative Filtering——论文提炼
目录 Abstract KEYWORDS 1 INTRODUCTION 2 METHODOLOGY 2.1 Embedding Layer 2.2 Embedding Propagation Laye ...
- Resnet论文解读与TensorFlow代码分析
残差网络Resnet论文解读 1.论文解读 博客地址:https://blog.csdn.net/loveliuzz/article/details/79117397 2.理解ResNet结构与Ten ...
- Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering论文解读( and code)
<Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering> 提供了已实现的GCN,并且 ...
- DGCF,Disentangled Graph Collaborative Filtering论文理解
论文地址:http://staff.ustc.edu.cn/~hexn/papers/sigir20-DGCF.pdf Intros 核心思想:用户和商品的某次交互背后都有其潜在的原因,比如购买手机, ...
- 医疗实体链接(标准化)论文解读 (附代码) A Lightweight Neural Model for Biomedical Entity Linking
一.动机 论文:https://arxiv.org/abs/2012.08844 代码:https://github.com/tigerchen52/Biomedical-Entity-Linking ...
- fast rcnn 论文解读(附代码链接)
要解决的问题 1.RCNN和SPPnet分多步训练,先要fine tuning一个预训练的网络,然后针对每个类别都训练一个SVM分类器,最后还要用regressors对bounding-box进行回归 ...
- SR-GNN论文解读并附代码分析
采用GNN解决基于会话的推荐,啥是会话?session-based? 一般是指用户的行为,时间一般限制在30min内,也就是说用户在30min内的点击浏览行为,这是一个会话. 基于会话的推荐最终结果是 ...
- 【推荐系统论文精读系列】(五)--Neural Collaborative Filtering
文章目录 一.摘要 二.介绍 三.准备知识 3.1 从隐式数据中进行学习 3.2 矩阵分解 四.神经协同过滤 4.1 总体框架 4.1.1 学习NCF 4.2 广义矩阵分解(GMF) 4.3 多层感知 ...
最新文章
- 2022-2028年中国抗肿瘤药物行业市场分析调研及发展趋势研究报告
- scikit-learn - 分类模型的评估 (classification_report)
- PHP 各种实用资源
- Uliweb多人博客教程demo站点
- java开发工具软件排行榜
- Nova 组件详解 - 每天5分钟玩转 OpenStack(26)
- Windows下搭建HTK
- 一步一步教你使用AgileEAS.NET基础类库进行应用开发-基础篇-使用UDA操纵SQL语句...
- 开发常用常用插件介绍
- ftp安装包_【干货分享】原来cisco通过FTP,TFTP系统升级是这样做的?
- 英特尔核显无法为此计算机,win10intel显卡驱动装不上怎么办_Win10系统无法安装intel显卡驱动如何解决...
- 009 [转载]天才与鬼才:黑客精英-凯文·米特尼克
- coreldraw橙子怎么画_cdr怎么画一杯橙汁?CorelDRAW简单绘制的一杯满满的橙汁教程...
- Codeforces Round #459 (Div. 1) B. MADMAX
- Feb23 小白《linux就该这么学》学习笔记5
- 拍牌神器是怎样炼成的(一)--- 键鼠模拟之WinAPI
- 8CRM客户案例分享
- 云计算--Docker在Ubuntu上安装
- python快速开发app_python 使用Airtest超快速开发App爬虫
- 1.GraspNet-API之Grasp Lable Format