Personalized Ranking Metric Embedding for Nest New POI Recommendation

介绍(Introduction):

本篇论文主要利用距离嵌入(Metric Embedding)将每个POI映射到一个低维的欧拉空间当中，有效地利用马尔科夫链模型预测POI的变化，用两个POI的欧拉距离衡量两者的序列关系，并且进一步提出了成对排序(pair-wise ranking)的距离嵌入，可以对空间中潜在的POI进行排序，最后提出了个性化的距离嵌入排名(PRME)算法，综合考虑序列信息和个人喜好，因为人们都倾向于拜访距离他们位置比较近的POI，所以考虑空间因素，将模型拓展为PRME-G模型。

论文原理：

论文使用了两个数据集，FourSquare在新加坡内的数据和Gowalla在加利福尼亚和内华达的数据，在使用前对数据集进行预处理，将访问少于10个POI的用户删除，以及将少于10个用户访问的POI删去。通过对数据的统计可以得到以下三个结论：

用户有探索新POI的倾向
时间局部性，用户访问两个POI的时间间隔不会很长
空间局部性，用户连续访问的两个POI的距离不会很远

当在短时间内发生两个check-in时，可以相信存在马尔科夫链的属性，也就是下一个POI很大程度上受当前POI的影响。基于这种短时间内的马尔科夫属性和人们探索新POI的倾向，我们可以定义本论文涉及推荐问题：给定一个用户u和他当前所处的位置l，从用户u没有访问过的POI中选择一个新的推荐给用户u。如果只是推荐一个POI，那么推荐用户u访问最频繁的POI就可能得到较高的正确率，但是我们要推荐新的POI，所以这种方法并不适用，它要使用更稀疏的历史数据推测转移概率，所以下一个新POI的推荐要比下一个POI推荐更难。

我们首先介绍使用成对排名的距离嵌入算法来对位置变换进行建模。距离嵌入模型适用于处理稀疏的数据和未观测到的数据。我们用高维空间的一个点表示现实世界的POI，用两个POI在高维空间中的欧拉距离表示两个POI转换的概率，距离越小，概率越大，把所有的POI嵌入到高维隐空间，我们的模型可以推测位置转换的概率，并且也可以用来给没有观测到的转换赋予有意义的概率。在距离嵌入模型中，每个POI在K维空间中用一个K维向量表示位置，我们的任务，就是通过访问序列来推测出表示POI的K维向量，转换概率如下所示：

上述式子只能表示已经观测到的POI转换关系，因为被观测到的数据非常稀疏，为了让学习到的向量关系符合POI转换的概率关系，我们需要充分利用没有观测到的数据，我们假设观测到的下一个POI和当前的POI更有关系，没有观测到POI影响更小，所以能够观测到的POI的排名应该比没有观测到的POI排名高，以此作为排名推测的依据。

POI推荐的目标就是提供对所有POI的排名，推荐排名最高的一项。我们可以进一步简化上面的概率表示：

接下来介绍个性化排名距离嵌入算法，下一个POI推荐不仅与当前位置有关，而且与用户的喜好有关，我们引入一个新的高维空间，将用户和POI嵌入到这个高维空间，用户u和位置l在空间中的欧拉距离表示u对l的喜爱程度，距离越近，喜爱程度越高，去的可能越大，综合考虑序列信息和个人喜好，用户将l作为下一个访问的POI的概率可以表示为：

根据之前提到的，马尔科夫链属性在两次短时间访问时才能凸显，所以当下一次访问和当前访问时间差距比较大时，可以不考虑序列信息，只考虑用户的喜好，所以可以改善表示为：

最后将地理因素考虑进模型，我们用当前POI的位置和下一次访问的位置之间的距离计算地理因素系数w，位置越近越近，w越小，可能性越大，同样，当两次时间差过大时，不考虑当前POI对下一次访问的影响，地理因子同样不需要考虑，所以最终的概率表示为：

该模型的最优化标准参考贝叶斯个性化推荐(BPR)的方法，最大化后验概率来推测参数，使用logistic函数表示条件概率，对参数使用高斯前验，最后加正则化参数，防止过拟合，损失函数为：

算法实现：

如果直接对上面的表达式利用梯度下降计算最值时的参数，计算量比较大，所以采用之前提到的排名原则，对用户u，当前位置lc，观测到的下一访问li，随机选择一个没有观测到过的位置lj，用户u在位置lc访问观测到的li的概率应该大于没有观测到的lj的概率，所以我们最小化的目标变为

当z最小时，前一项最小，概率大，后一项最大，概率小，符合预期，所以梯度下降算法用下列方式进行参数更新：

在算法实现过程中，首先获得数据元组，包括用户，当前位置，下一观测到的位置，随机选择一个没有观测到的位置，然后用期望为0，方差为0.01的正态分布随机初始化用户和POI在高维空间中向量位置，一个表示序列关系的空间，一个表示用户喜好的空间，然后用上面的参数更新方法更新参数，直到收敛，即损失函数最小，收敛后返回用户和POI在两个空间中的高维坐标。

在测试时，如果要推测用户u下一刻要访问哪一个POI，需要对所有未观测到的POI利用之前训练出的两个空间中的坐标计算出D，按D值进行排序，将D值最小的POI推荐给用户。

需要的数据集可以从http://www.ntu.edu.sg/home/gaocong/data/poidata.zip下载，代码如下所示：

import os
import numpy as np
from math import radians, cos, sin, asin, sqrt, pow, logdef getUser():fr=open("user.txt",'r')user=[]for line in fr.readlines():user.append(line.strip())fr.close()return userdef getShop():fr=open("shop.txt",'r')shop=[]for line in fr.readlines():shop.append(line.strip())fr.close()return shopdef getTrainTuple(fileName):data=[]observedPOI={}exUser=''exShop=''exTime=''fr=open(fileName)for line in fr.readlines():lineArr=line.strip().split('\t')user=lineArr[0]shop=lineArr[1]time=float(lineArr[4])*24+float(lineArr[3].split(':')[0])+float(lineArr[3].split(':')[1])/60.0if user==exUser:newTuple=[user,exShop,shop,exTime,time]data.append(newTuple)if user not in observedPOI.keys():observedPOI[user]={}if exShop not in observedPOI[user].keys():observedPOI[user][exShop]=[]observedPOI[user][exShop].append(shop)exShop=shopexTime=timeelse:exUser=userexShop=shopexTime=timefr.close()return data,observedPOIdef getTestTuple(fileName):data=[]exUser=''exShop=''exTime=''fr=open(fileName)for line in fr.readlines():lineArr=line.strip().split('\t')user=lineArr[0]shop=lineArr[1]time=float(lineArr[4])*24+float(lineArr[3].split(':')[0])+float(lineArr[3].split(':')[1])/60.0 if user==exUser:newTuple=[user,exShop,shop,exTime,time]data.append(newTuple)exShop=shopexTime=timeelse:exUser=userexShop=shopexTime=timefr.close()return datadef initVec():userP={}shopP={}shopS={}user=getUser()shop=getShop()for item in user:userP[item]=np.random.normal(0,0.01,60)for item in shop:shopP[item]=np.random.normal(0,0.01,60)shopS[item]=np.random.normal(0,0.01,60)return userP,shopP,shopSdef loadFileWithDic(fileName):fr=open(fileName,'r')data={}i=0arr=[]key=''for line in fr.readlines():if i==0:key=line.strip().split('\t')[0]temp=line.strip().split('\t')[1][1:].split(' ')for item in temp:if item!='':arr.append(float(item))i=1else:temp=line.strip().split(' ')for item in temp:if item!='' and item!=']':if item[-1]==']':arr.append(float(item[:-1]))else:arr.append(float(item))if len(arr)==60:i=0data[key]=np.array(arr)arr=[]fr.close()return datadef getVisited(fileName):fr=open(fileName,'r')visited={}for line in fr.readlines():lineArr=line.strip().split('\t')user=lineArr[0]shop=lineArr[1]if user not in visited.keys():visited[user]=[]if shop not in visited[user]:visited[user].append(shop)fr.close()return visiteddef sigmoid(x):return 1.0/(1.0+np.exp(float(-x)))def Edis(a,b):sum=0.0for i in range(len(a)):sum=sum+(a[i]-b[i])*(a[i]-b[i])return sumdef train():userP,shopP,shopS=initVec()data,observedPOI=getTrainTuple('train.txt')shop=getShop()for i in range(500):print("The "+str(i+1)+" is done!")for item in data:(user,exShop,Cshop,exTime,time)=itemshopJ=shop[int(np.random.uniform(len(shop)))]while shopJ==exShop or shopJ in observedPOI[user][exShop]:shopJ=shop[int(np.random.uniform(len(shop)))]if time-exTime<6:z=0.2*(Edis(userP[user],shopP[shopJ])-Edis(userP[user],shopP[Cshop]))+0.8*(Edis(shopS[exShop],shopS[shopJ])-Edis(shopS[exShop],shopS[Cshop]))d=1-sigmoid(z)userP[user]=userP[user]+0.005*(d*0.4*(shopP[Cshop]-shopP[shopJ])-0.006*userP[user])shopP[Cshop]=shopP[Cshop]+0.005*(d*0.4*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])shopP[shopJ]=shopP[shopJ]+0.005*(d*0.4*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])shopS[exShop]=shopS[exShop]+0.005*(d*1.6*(shopS[Cshop]-shopS[shopJ])-0.006*shopS[exShop])shopS[Cshop]=shopS[Cshop]+0.005*(d*1.6*(shopS[exShop])-shopS[Cshop]-0.006*shopS[Cshop])shopS[shopJ]=shopS[shopJ]+0.005*(d*1.6*(shopS[shopJ]-shopS[exShop])-0.006*shopS[shopJ])else:z=Edis(userP[user],shopP[shopJ])-Edis(userP[user],shopP[Cshop])d=1-sigmoid(z)userP[user]=userP[user]+0.005*(d*2*(shopP[Cshop]-shopP[shopJ])-0.006*userP[user])shopP[Cshop]=shopP[Cshop]+0.005*(d*2*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])shopP[shopJ]=shopP[shopJ]+0.005*(d*2*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])fr=open('userP1000.txt','w')for key in userP.keys():fr.write(str(key)+'\t'+str(userP[key])+'\n')fr.close()fr=open('shopP1000.txt','w')for key in shopP.keys():fr.write(str(key)+'\t'+str(shopP[key])+'\n')fr.close()fr=open('shopS1000.txt','w')for key in shopS.keys():fr.write(str(key)+'\t'+str(shopS[key])+'\n')fr.close()return userP,shopP,shopSdef test():userP,shopP,shopS=train()#userP=loadFileWithDic('userP.txt')#shopS=loadFileWithDic('shopS.txt')#shopP=loadFileWithDic('shopP.txt')data=getTestTuple("test.txt")visited=getVisited("train.txt")user=getUser()shop=getShop()allNum=0corNum=0count=0for item in data:(Cuser,exShop,Cshop,exTime,time)=itemif Cuser not in user or exShop not in shop or Cshop not in shop or Cshop in visited[Cuser] or Cshop==exShop:continueallNum=allNum+1if exShop not in visited[Cuser]:visited[Cuser].append(exShop)poss={}count=count+1for pShop in shop:if pShop in visited[Cuser] or pShop==exShop:continueif (time-exTime)<6:poss[pShop]=0.2*Edis(userP[Cuser],shopP[pShop])+0.8*Edis(shopS[exShop],shopS[pShop])else:poss[pShop]=Edis(userP[Cuser],shopP[pShop])ans=min(poss.items(), key=lambda x: x[1])[0]if ans==Cshop:corNum=corNum+1print(str(corNum)+" : "+str(count))print("The currect rate is "+str((100.0*float(corNum))/float(allNum))+"%.")def haversine(lon1, lat1, lon2, lat2): lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])  dlon = lon2 - lon1   dlat = lat2 - lat1   a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2  c = 2 * asin(sqrt(a))   r = 6371 return c * rdef getPosition():fileList=['New/FourSquare/train.txt','New/FourSquare/test.txt','New/FourSquare/tune.txt']position={}for fileName in fileList:fr=open(fileName,'r')for line in fr.readlines():shop=line.strip().split('\t')[1]if shop not in position.keys():lat=float(line.strip().split('\t')[2].split(',')[0])lon=float(line.strip().split('\t')[2].split(',')[1])position[shop]={'lat':lat,'lon':lon}fr.close()return positiondef trainG():userP,shopP,shopS=initVec()data,observedPOI=getTrainTuple('train.txt')position=getPosition()shop=getShop()for i in range(500):print("The "+str(i+1)+" is done!")for item in data:(user,exShop,Cshop,exTime,time)=itemshopJ=shop[int(np.random.uniform(len(shop)))]while shopJ==exShop or shopJ in observedPOI[user][exShop]:shopJ=shop[int(np.random.uniform(len(shop)))]if time-exTime<6:d1=haversine(position[exShop]['lat'],position[exShop]['lon'],position[Cshop]['lat'],position[Cshop]['lon'])d2=haversine(position[exShop]['lat'],position[exShop]['lon'],position[shopJ]['lat'],position[shopJ]['lon'])w1=pow(1+d1,0.25)w2=pow(1+d2,0.25)z=0.2*(w2*Edis(userP[user],shopP[shopJ])-w1*Edis(userP[user],shopP[Cshop]))+0.8*(w2*Edis(shopS[exShop],shopS[shopJ])-w1*Edis(shopS[exShop],shopS[Cshop]))d=1-sigmoid(z)userP[user]=userP[user]+0.005*(d*0.4*(w1*shopP[Cshop]-w2*shopP[shopJ])-0.006*userP[user])shopP[Cshop]=shopP[Cshop]+0.005*(d*0.4*w1*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])shopP[shopJ]=shopP[shopJ]+0.005*(d*0.4*w2*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])shopS[exShop]=shopS[exShop]+0.005*(d*1.6*(w1*shopS[Cshop]-w2*shopS[shopJ])-0.006*shopS[exShop])shopS[Cshop]=shopS[Cshop]+0.005*(d*1.6*w1*(shopS[exShop])-shopS[Cshop]-0.006*shopS[Cshop])shopS[shopJ]=shopS[shopJ]+0.005*(d*1.6*w2*(shopS[shopJ]-shopS[exShop])-0.006*shopS[shopJ])else:z=Edis(userP[user],shopP[shopJ])-Edis(userP[user],shopP[Cshop])d=1-sigmoid(z)userP[user]=userP[user]+0.005*(d*2*(shopP[Cshop]-shopP[shopJ])-0.006*userP[user])shopP[Cshop]=shopP[Cshop]+0.005*(d*2*(userP[user]-shopP[Cshop])-0.006*shopP[Cshop])shopP[shopJ]=shopP[shopJ]+0.005*(d*2*(shopP[shopJ]-userP[user])-0.006*shopP[shopJ])fr=open('userP.txt','w')for key in userP.keys():fr.write(str(key)+'\t'+str(userP[key])+'\n')fr.close()fr=open('shopP.txt','w')for key in shopP.keys():fr.write(str(key)+'\t'+str(shopP[key])+'\n')fr.close()fr=open('shopS.txt','w')for key in shopS.keys():fr.write(str(key)+'\t'+str(shopS[key])+'\n')fr.close()return userP,shopP,shopSdef testG():userP,shopP,shopS=trainG()#userP=loadFileWithDic('userP.txt')#shopS=loadFileWithDic('shopS.txt')#shopP=loadFileWithDic('shopP.txt')data=getTestTuple("test.txt")visited=getVisited("train.txt")user=getUser()shop=getShop()allNum=0corNum=0count=0for item in data:(Cuser,exShop,Cshop,exTime,time)=itemif Cuser not in user or exShop not in shop or Cshop not in shop or Cshop in visited[Cuser] or Cshop==exShop:continueallNum=allNum+1if exShop not in visited[Cuser]:visited[Cuser].append(exShop)poss={}count=count+1for pShop in shop:if pShop in visited[Cuser] or pShop==exShop:continueif (time-exTime)<6:d=haversine(position[exShop]['lat'],position[exShop]['lon'],position[pshop]['lat'],position[pshop]['lon'])w=pow(1+d1,0.25)poss[pShop]=w*(0.2*Edis(userP[Cuser],shopP[pShop])+0.8*Edis(shopS[exShop],shopS[pShop]))else:poss[pShop]=Edis(userP[Cuser],shopP[pShop])ans=min(poss.items(), key=lambda x: x[1])[0]if ans==Cshop:corNum=corNum+1print(str(corNum)+" : "+str(count))print("The currect rate is "+str((100.0*float(corNum))/float(allNum))+"%.")

代码如有问题，欢迎指正。

Personalized Ranking Metric Embedding for Nest New POI Recommendation相关推荐

Adversarial Personalized Ranking for Recommendation(个人笔记)
上周总结了一篇关于CPR_loss的文章,指导老师提出CPR_loss在采集正负样本标的标准和生成对抗学习方面有一些相似处,所以这周我就找到这一篇文章并加以总结.有趣的是,这一篇文章也是何向南老师组于 ...
【论文阅读】 BPR: Bayesian Personalized Ranking from Implicit Feedback
BPR: Bayesian Personalized Ranking from Implicit Feedback 论文链接:https://arxiv.org/abs/1205.2618 Abstr ...
BPR: Bayesian Personalized Ranking from Implicit Feedback 论文笔记
有什么问题欢迎讨论呀! 论文标题:BPR: Bayesian Personalized Ranking from Implicit Feedback BPR 主要采用用户的隐式反馈(如点击.收藏等), ...
论文笔记：BPR-Bayesian Personalized Ranking from Implicit Feedback | 推荐系统BPR算法分析
BPR:Bayesian Personalized Ranking from Implicit Feedback 论文链接:BPR:Bayesian Personalized Ranking from ...
【论文阅读+实现】BPR: Bayesian Personalized Ranking from Implicit Feedback
1.BPR是什么? BPR 的全称是贝叶斯个性化排序(Bayesian Personalized Ranking),它是一种排序算法,做的是TopN任务. 适用于隐性反馈数据,当然显性反馈数据把评分变 ...
【论文阅读】BPR: Bayesian personalized ranking from implicit feedback
Rendle S, Freudenthaler C, Gantner Z, et al. BPR: Bayesian personalized ranking from implicit feedba ...
【论文阅读】GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation
[论文阅读]GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation 前言 Next POI 推荐是根 ...
Decentralized Collaborative Learning Framework for Next POI Recommendation
Decentralized Collaborative Learning Framework for Next POI Recommendation 1. What does literature s ...
TransFM：基于因子分解机的序列推荐方法
▌概述今天解读的论文是由 Rajiv Pasricha 和 Julian McAuley 两位大佬提出的发表在 RecSys18 上的,是 TransRec 和 FM 的结合版本.论文下载地址: h ...

Personalized Ranking Metric Embedding for Nest New POI Recommendation

Personalized Ranking Metric Embedding for Nest New POI Recommendation相关推荐

最新文章

热门文章