论文

Translating Embeddings for Modeling Multi-relational Data

TransE

算法概览

核心思想

实体向量 + 关系向量 = 实体向量（h+l = t）

Tips

关系向量（l）需要归一化，避免训练时带来实体向量的尺度变化
正样本 - 即原有样本，公式中的d(h+l, t)
负样本 - 随机替换h或者l, 不同时替换，即为负样本, 公式中的d(h’+l, t’)
距离采用l1 norm 或者l2 norm
训练方式采用SGD训练法

参考代码

https://github.com/wuxiyu/transE/blob/master/tranE.py
关键代码片段

def update(self, Tbatch):copyEntityList = deepcopy(self.entityList)copyRelationList = deepcopy(self.relationList)for tripletWithCorruptedTriplet in Tbatch:headEntityVector = copyEntityList[tripletWithCorruptedTriplet[0][0]]#tripletWithCorruptedTriplet是原三元组和打碎的三元组的元组tupletailEntityVector = copyEntityList[tripletWithCorruptedTriplet[0][1]]relationVector = copyRelationList[tripletWithCorruptedTriplet[0][2]]headEntityVectorWithCorruptedTriplet = copyEntityList[tripletWithCorruptedTriplet[1][0]]tailEntityVectorWithCorruptedTriplet = copyEntityList[tripletWithCorruptedTriplet[1][1]]headEntityVectorBeforeBatch = self.entityList[tripletWithCorruptedTriplet[0][0]]#tripletWithCorruptedTriplet是原三元组和打碎的三元组的元组tupletailEntityVectorBeforeBatch = self.entityList[tripletWithCorruptedTriplet[0][1]]relationVectorBeforeBatch = self.relationList[tripletWithCorruptedTriplet[0][2]]headEntityVectorWithCorruptedTripletBeforeBatch = self.entityList[tripletWithCorruptedTriplet[1][0]]tailEntityVectorWithCorruptedTripletBeforeBatch = self.entityList[tripletWithCorruptedTriplet[1][1]]if self.L1:distTriplet = distanceL1(headEntityVectorBeforeBatch, tailEntityVectorBeforeBatch, relationVectorBeforeBatch)distCorruptedTriplet = distanceL1(headEntityVectorWithCorruptedTripletBeforeBatch, tailEntityVectorWithCorruptedTripletBeforeBatch ,  relationVectorBeforeBatch)else:distTriplet = distanceL2(headEntityVectorBeforeBatch, tailEntityVectorBeforeBatch, relationVectorBeforeBatch)distCorruptedTriplet = distanceL2(headEntityVectorWithCorruptedTripletBeforeBatch, tailEntityVectorWithCorruptedTripletBeforeBatch ,  relationVectorBeforeBatch)eg = self.margin + distTriplet - distCorruptedTripletif eg > 0: #[function]+ 是一个取正值的函数self.loss += egif self.L1:tempPositive = 2 * self.learingRate * (tailEntityVectorBeforeBatch - headEntityVectorBeforeBatch - relationVectorBeforeBatch)tempNegtative = 2 * self.learingRate * (tailEntityVectorWithCorruptedTripletBeforeBatch - headEntityVectorWithCorruptedTripletBeforeBatch - relationVectorBeforeBatch)tempPositiveL1 = []tempNegtativeL1 = []for i in range(self.dim):#不知道有没有pythonic的写法（比如列表推倒或者numpy的函数）？if tempPositive[i] >= 0:tempPositiveL1.append(1)else:tempPositiveL1.append(-1)if tempNegtative[i] >= 0:tempNegtativeL1.append(1)else:tempNegtativeL1.append(-1)tempPositive = array(tempPositiveL1)  tempNegtative = array(tempNegtativeL1)else:tempPositive = 2 * self.learingRate * (tailEntityVectorBeforeBatch - headEntityVectorBeforeBatch - relationVectorBeforeBatch)tempNegtative = 2 * self.learingRate * (tailEntityVectorWithCorruptedTripletBeforeBatch - headEntityVectorWithCorruptedTripletBeforeBatch - relationVectorBeforeBatch)headEntityVector = headEntityVector + tempPositivetailEntityVector = tailEntityVector - tempPositiverelationVector = relationVector + tempPositive - tempNegtativeheadEntityVectorWithCorruptedTriplet = headEntityVectorWithCorruptedTriplet - tempNegtativetailEntityVectorWithCorruptedTriplet = tailEntityVectorWithCorruptedTriplet + tempNegtative#只归一化这几个刚更新的向量，而不是按原论文那些一口气全更新了copyEntityList[tripletWithCorruptedTriplet[0][0]] = norm(headEntityVector)copyEntityList[tripletWithCorruptedTriplet[0][1]] = norm(tailEntityVector)copyRelationList[tripletWithCorruptedTriplet[0][2]] = norm(relationVector)copyEntityList[tripletWithCorruptedTriplet[1][0]] = norm(headEntityVectorWithCorruptedTriplet)copyEntityList[tripletWithCorruptedTriplet[1][1]] = norm(tailEntityVectorWithCorruptedTriplet)self.entityList = copyEntityListself.relationList = copyRelationList

评价指标

转自https://blog.csdn.net/hello_acm/article/details/95070669?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.channel_param
1、Mean rank
首先对于每个 testing triple，以预测tail entity为例，我们将（h,r,t）中的t用知识图谱中的每个实体来代替，然后通过fr（h,t）函数来计算分数，这样我们可以得到一系列的分数，之后按照升序将这些分数排列。
f函数值是越小越好，那么在上个排列中，排的越前越好。
现在重点来了，我们去看每个 testing triple中正确答案也就是真实的t到底能在上述序列中排多少位，比如说t1排100，t2排200，t3排60…，之后对这些排名求平均，Mean rank就得到了。

2、MRR是一个国际上通用的对搜索算法进行评价的机制，即第一个结果匹配，分数为1，第二个匹配分数为0.5，第n个匹配分数为1/n，如果没有匹配的句子分数为0。最终的分数为所有得分之和。

3、hits@10
还是按照上述进行f函数值排列，然后去看每个testing triple正确答案是否排在序列的前十，如果在的话就计数+1，最终排在前十的个数/总个数就是Hit@10

问题

构建负样本的时候，如果是1Vmulit 的关系，会造成负样本构建错误。

知识图谱 - TransE算法相关推荐

B.特定领域知识图谱知识推理方案[一]：基于表示学习的知识感知推理算法[对抗负采样、Logic Rule，链接预测任务]在关系预测、推荐场景下应用
知识图谱专栏简介:数据增强,智能标注,文本信息抽取(实体关系事件抽取).知识融合算法方案.知识推理.模型优化.模型压缩技术等专栏详细介绍:知识图谱专栏简介:数据增强,智能标注,文本信息抽取(实体关系 ...
知识图谱专栏简介：数据增强，智能标注，文本信息抽取（实体关系事件抽取）、知识融合算法方案、知识推理、模型优化、模型压缩技术等
知识图谱专栏简介:数据增强,智能标注,文本信息抽取(实体关系事件抽取).知识融合算法方案.知识推理.模型优化.模型压缩技术等专栏链接:NLP知识图谱相关技术业务落地方案和码源 NLP知识图谱相关技术 ...
知识图谱算法岗位招聘要求总结
知识图谱算法岗位招聘总结作为一名知识图谱方向的研一的菜鸟,不知道将来该走向哪里,一直是我的困惑.闲来无事逛了逛拉钩网,总结了一些知识图谱算法岗位的招聘要求,算作对自己未来方向的一个初探,也是研究生生 ...
融合知识图谱的电影推荐_算法与交互界面的实现
笔者的论文项目部分分享,主要内容为使用Neo4j构建知识图谱,使用python实现融合知识图谱推荐算法与相关的简单交互界面. 内容脑图如下图:主要学习自项亮的推荐系统实践与唐宇迪的推荐系统实战其中不 ...
论文浅尝 | ICLR 2020 - 一文全览知识图谱研究
本文转载自公众号: AI科技评论作者 | Michael Galkin 编译 | 贾伟 ICLR 2020 正在进行,但总结笔记却相继出炉.我们曾对 ICLR 2020 上的趋势进行介绍,本文考虑的 ...
大规模知识图谱预训练模型及电商应用
点击上方蓝字关注我们大规模知识图谱预训练模型及电商应用陈华钧1,2, 张文3, 黄志文4, 叶橄强1, 文博1, 张伟2,4 1 浙江大学计算机科学与技术学院,浙江杭州 310007 2 阿里巴 ...
知识图谱推理：现代的方法与应用
摘要: 知识图谱推理技术再根据已有的知识推导出新的知识,是机器智能具有和人类一样的推理能力和决策能力的关键性技术,系统的研究了知识图谱推理的现代方法,通过统一的架构介绍了向量空间中进行知识图谱推理的现 ...
知识图谱——TransH模型原理
知识图谱--TransH模型原理 1 从TransE到TransH模型在之前的文章知识图谱--TransE模型原理中,我们介绍了TransE模型的基本原理,对于TransE模型而言,其核心思想为: ...
图谱实战 | 再谈图谱表示：图网络表示GE与知识图谱表示KGE的原理对比与实操效果分析...
转载公众号 | 老刘说NLP 知识图谱嵌入是一个经典话题,在之前的文章<知识表示技术:图谱表示VS图网络表示及基于距离函数的表示学习总结>中,围绕知识图谱嵌入学习这一主题,对比了知识图谱嵌 ...
知识图谱推理问题总结
文章目录背景研究内容研究内容性能表现问题参考文献背景知识图谱在许多自然语言处理应用中有非常重要的作用,例如问答系统.语义搜索等.这些应用的性能受限于知识图谱的不完整性,甚至知识图谱中存 ...

知识图谱 - TransE算法

这里写自定义目录标题

论文