sklearn的pca建模

Note: All the code for the below can be found here.

注意:以下所有代码均可在此处找到。

Previously I wrote an article on how we can use graph networks to help provide Champion recommendations in the game League of Legends (LoL). The technique is known as “User-user collaborative filtering”, where we utilise the information we know about a person to find similar users and then base our recommendation on what we know they like.

之前,我写过一篇文章,介绍如何使用图形网络帮助英雄联盟(LoL)游戏中提供冠军推荐。 该技术被称为“用户-用户协作过滤” ,其中我们利用我们了解的有关某人的信息来查找相似的用户,然后根据我们所知道的他们喜欢的东西提出建议。

To help illustrate this, we’ll use the classic Amazon example. Imagine that you have added a PS4 and the latest FIFA game to your Amazon basket, the algorithm looks at all users who have previously bought a PS4 and FIFA together and then finds which other items they tend to have in their basket, i.e. the latest NFL game, Madden, which is then recommended to you.

为了说明这一点,我们将使用经典的Amazon示例。 想象一下,您已经在您的亚马逊购物篮中添加了PS4和最新的FIFA游戏,该算法会查看先前一起购买过PS4和FIFA的所有用户,然后查找他们倾向于在购物篮中拥有哪些其他物品,即最新的NFL游戏,Madden,然后推荐给您。

Today, we’re looking at a different form of recommendation algorithm known as a “Content Based Model”. This technique instead looks to connect items together based on their similarities, i.e. if you’re buying a PS4 sports game produced by EA then here are some other PS4 sports games produced by EA. This technique is favourable when you have no information about user preference, such as when just launching the product.

今天,我们正在寻找一种不同形式的推荐算法,即“基于内容的模型”。 这项技术而是根据相似度将项目连接在一起,即,如果您购买的是EA制作的PS4体育游戏,那么这里是EA制作的其他PS4体育游戏。 当您没有有关用户首选项的信息时(例如仅在启动产品时),此技术非常有用。

However, there are almost 150 LoL Champions and we don’t want to spend all our time labeling them with all the various attributes we would need to make this work. So instead, what we are going to do is “describe” the Champions using their in-game statistics, such as their average kills per game or how much objective damage they do.

但是,有将近150个LoL冠军,我们不想花所有时间为他们贴上进行这项工作所需的所有各种属性的标签。 因此,相反,我们要做的是使用游戏中的统计数据“描述”冠军,例如他们每场比赛的平均击杀数或他们造成的客观伤害。

To do this, we can analyse 150,000 Diamond games. Note that I’ve limited this to Top, Middle and ADC players only given the inherent difference support and junglers have in their statistics (i.e. low gold from minions).

为此,我们可以分析15万钻石游戏。 请注意,我仅将这种情况限制在顶级,中级和ADC播放器中,仅出于对内在差异的支持,而打野者的统计数据也是如此(例如,从小兵中获得的低价)。

After averaging the data for all Champions the first thing to note is that there are some very distinct correlations between many of the statistics. It shouldn’t be a surprise that attributes such as “killingSprees” and “kills” are almost perfectly correlated (the former indicating how many times a player has been on a killing spree, the latter is how many kills in total that game).

将所有冠军的数据平均后,首先要注意的是,许多统计数据之间存在一些非常不同的相关性。 诸如“ killingSprees”和“ kills”之类的属性几乎完美相关(前者表示玩家进行一次杀戮狂潮的次数,后者是该游戏总共杀灭了多少次),这并不奇怪。

Graph illustrating the multicollinearity issue that occurs with such a large number of attributes.
该图说明了由于大量属性而发生的多重共线性问题。

A common approach to deal with this level of multicollinearity is either exclusion (pick kills, delete killingSprees) or aggregation (kills * killingSprees). However, there is a better solution known as Principle Component Analysis (PCA) which is able to extract the core relationship between these attributes without manual intervention or the removal of potential key drivers.

处理这种多重共线性的一种常见方法是排除(剔除杀死,删除killingSprees)或聚合(杀死* killingSprees)。 但是,有一个更好的解决方案称为主成分分析(PCA),它能够提取这些属性之间的核心关系,而无需人工干预或删除潜在的关键驱动因素。

PCA is a fairly complex subject that requires an understanding of Eigenvectors/values and there are plenty of great articles on it so I won’t labour the subject here. Instead, I will say that what PCA is trying to do is capture as much of the variance in the data as possible, whilst minimising the amount of variables used.

PCA是一个相当复杂的主题,需要了解特征向量/值,并且上面有很多不错的文章,因此我在这里不做任何工作。 相反,我要说的是PCA要做的是捕获数据中尽可能多的方差,同时最大程度地减少使用的变量量。

The percentage of variance each component explains of the original data, summing to 100%.
每个成分解释原始数据的方差百分比,总计为100%。

After fitting PCA to the dataset, we find that well over 30% of the variance of the data can be fit inside a single component, just over 16% is then found in the second component, 11% or so in the third and so on..

将PCA拟合到数据集后,我们发现可以将数据方差的30%以上拟合到单个组件中,然后在第二个组件中找到16%以上,在第三个组件中找到11%左右,依此类推。 ..

But what are these components? To help understand what they are made of and where they have come from, take a look at the graph below illustrating which variables are part of the first component. It’s clear that goldEarned is the largest contributor to this component, alongside objective damage, the largest multi-kill achieved, the number of killing sprees, damage dealt and total kills. It’s safe to say that this component is capturing the variables relating to stomping lane. If we add on the fact that “physical” damage is specified, you can almost see the Fiora/Riven/Trynd one tricks appearing in front of your eyes.

但是这些成分是什么? 为了帮助理解它们的构成以及它们的来源,请查看下图,其中说明了哪些变量是第一个组件的一部分。 显然,goldEarned是这一部分的最大贡献者,此外还有客观伤害,所实现的最大多重杀伤力,杀伤力的数量,造成的伤害和总杀伤力。 可以肯定地说,此组件正在捕获有关踩踏车道的变量。 如果加上指定了“物理”损坏的事实,您几乎可以看到Fiora / Riven / Trynd一招出现在眼前。

Graph illustrating which of the original variables are most highly correlated with the first component.
该图说明了哪些原始变量与第一成分之间的相关性最高。

The 2nd component compromises of two main attributes: towers taken and damage self-mitigated (blocked/parried/immune/reduced etc..). However, you may be thinking how this all relates to content based recommendation models! Well, what we now have are two components that contain over 50% of the variance between the Champions. These can be considered as proxies for descriptions, where instead of “sports game” we have “Champion who kills everyone” and “produced by EA” becomes “high turret damage”! We can then plot these descriptive components out in a 2D space and we can start to see how it all comes together (warning, big old graph coming at you for visibility):

第二部分是两个主要属性的折衷方案:被夺取的塔和自减轻的伤害(受阻/格挡/免疫/降低等)。 但是,您可能正在考虑这一切与基于内容的推荐模型之间的关系! 好了,我们现在有两个组成部分,其中包含冠军之间方差的50%以上。 这些可以看作是描述的代理,在这里我们不是“体育比赛”,而是“杀死所有人的冠军”,而“ EA生产的”则变成了“高炮塔伤害”! 然后,我们可以在2D空间中绘制这些描述性组件,并且可以开始看到它们是如何组合在一起的(警告,较大的旧图形会向您显示):

2D representation of the first two components, which can be used as the base for a recommendation engine. Champions are coloured depending on their main role, but the data is not necessarily gathered from players in that position.
前两个组件的2D表示形式,可以用作推荐引擎的基础。 冠军的颜色取决于他们的主要角色,但数据不一定来自该位置的球员。

Note: Although “Support” champions are shown here in yellow, the data is actually derived from farming lanes only. I.e. the Zilean data you see above is from when the Champion is played in either Top, Mid or as the APC.

注意:虽然此处以黄色显示“支持”冠军,但这些数据实际上仅来自耕种车道。 也就是说,您在上方看到的Zilean数据来自当冠军在上,中或作为APC比赛时。

Those of you paying attention will note that component 1 is inversed, where high damage/kills is scored low on the X-axis. Component 2 is not inversed, so a high number on the Y-axis indicates lots of turret taking and damage mitigation. To make sure it’s worked as expected, take a look at the Champions in the top left (i.e. that do lots of physical damage, take towers and mitigate damage); Fiora & Tryndamere (Trynd’s ult counts as damage mitigation). How about the bottom center where we see Katarina and Karthus who score relatively high on damage and kills but aren’t smashing turrets and mitigating damage. Sounds right to me.

那些需要注意的人会注意到,组件1相反,在X轴上,较高的伤害/杀伤力得分较低。 部件2没有反转,所以在Y轴的数字表示大量炮塔了结和减轻损失。 为了确保它能按预期工作,请查看左上角的冠军(即造成大量物理伤害,防御塔并减轻伤害); Fiora&Tryndamere(Trynd的超值可算是减轻伤害)。 在底部中心,我们看到卡塔琳娜和卡尔萨斯在伤害和杀伤力上得分较高,但没有砸破炮塔并减轻伤害的情况如何? 对我来说听起来不错。

The next step is simple, the recommendation is based on the Champion with the shortest Euclidean distance (straight line) from the Champion they currently play. You play a lot of Taric? Try Maokai. Akali? How about Fizz. Unkillable Dr. Mundo? You’ll love our boy Sion.

下一步很简单,建议是基于距当前比赛冠军最短欧几里德距离(直线)的冠军。 你玩很多塔里克吗? 试试茂凯。 阿卡利? 菲兹呢。 不可杀死的蒙多博士? 您会爱我们的男孩Sion。

If we wanted to expand on this, we’d move to higher dimensions. If you go back to the graph showing how much variance is captured in each component, I’d say there’s an argument to build the model based on 3, maybe even 5 dimensions. The rest works the same, but given the visualisation becomes tricky we’ll leave it there for now!

如果我们想对此进行扩展,我们将移至更高的维度。 如果返回到显示每个组件捕获了多少差异的图表,我会说有一个论据可以基于3维甚至5维构建模型。 其余的工作原理相同,但是鉴于可视化变得棘手,我们现在就将其保留!

I hope this provides another insight into potential recommendation types that may be worth exploring and the benefits PCA provides, although I use League of Legends as my domain these can easily be applied to any other field. I recommend going back up to the large graph, find your main and seeing whether you’d agree that the ones surrounding it are a similar play-style — let me know below in the comments!

我希望这可以为潜在的推荐类型提供另一种见解,尽管PCA可以将其应用到其他领域,但我可能将PCA提供的优势与英雄联盟联系在一起。 我建议回到大型图表,找到您的主要图表,然后看看您是否同意围绕它的图表是类似的游戏风格-在下面的评论中让我知道!

Thanks for getting to the bottom of my article! My name is Jack J. and I’m a professional Data Scientist, writer and founder of the League of Legends analytics site JUNG.GG. You can also find me on my blog LeagueOfData, where I post less Data Science intense articles, it’s also the best place to get in contact with me.

感谢您深入我的文章! 我叫Jack J.,我是职业数据科学家,英雄联盟分析网站JUNG.GG的作家和创始人。 您也可以在我的博客LeagueOfData上找到我,我在该博客上发布了有关Data Science的文章较少,这也是与我联系的最佳场所。

翻译自: https://towardsdatascience.com/pca-and-content-based-modelling-for-champion-recommendation-league-of-legends-80e909e56672

sklearn的pca建模


http://www.taodudu.cc/news/show-5304961.html

相关文章:

  • Fiori学习笔记 - 服务跨域处理neo-app
  • 技术博客3
  • Flex 启动失败 Failed to create the Java Virtual Machine
  • 如何将ios app安装到模拟器
  • 使用doker快速搭建运行环境mysql+redis+tomcat
  • 学习笔记-FRIDA脚本系列(一)
  • Web框架中间件插件BurpSuite浏览器被动主动探针[武装浏览器]
  • Nuclei——一款基于YAML语法模板的快速漏洞扫描工具
  • 2023最新Fiora二次元的Web多人在线网络聊天系统源码/UI漂亮/很有意思
  • 宝塔面板+轻松部署一款二次元的Web多人在线网络聊天系统fiora聊天室
  • Burp插件Fiora联动nuclei(windows)
  • 一款二次元的Web多人在线网络聊天系统:Fiora安装及使用
  • Web操作系统漏洞发现——工具使用总结
  • Fiora:漏洞PoC框架的图形版,快捷搜索PoC、一键运行Nuclei
  • Cytoskeleton 艾美捷肌动蛋白染色丨活性染料研究
  • Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applicati
  • BBC纪录片:新型冠状病毒是如何入侵的?
  • 如何与病毒搏斗?这部BBC“史诗级大片”告诉你答案
  • Cytoskeleton丨艾美捷 微管结合蛋白自旋下降分析生物化学试剂盒
  • 荧光染料ICG-Myoglobin 吲哚菁绿标记肌红蛋白(肌球蛋白)
  • Cytoskeleton / 艾美捷——抗微管蛋白抗体
  • Cytoskeleton丨艾美捷 微管/微管蛋白体内分析生化试剂盒
  • Cytoskeleton 艾美捷肌动蛋白结合蛋白降速分析生物化学试剂盒
  • Cytoskeleton丨艾美捷 蛋白酶抑制剂鸡尾酒
  • Cytoskeleton 艾美捷 肌动蛋白结合蛋白研究
  • Xcode 截取手机屏幕并保存在mac上
  • js截图保存俩种方式
  • android iphone 多核,苹果A11跑分对比Android各大处理器:神一般的处理器
  • a12处理器怎么样_a14和a12z哪个性能好?处理器参数对比怎么样?
  • 8核服务器cpu配套主板芯片,三款8核16线程处理器对比,看看谁才是你最好的选择...

sklearn的pca建模_基于pca和内容的建模,用于英雄推荐英雄联盟相关推荐

  1. python 建筑建模_基于CityEngine的建筑物自动化建模

    近年来, 全国很多城市都在进行数字城市建设, 推进城市信息化进程[.数字城市是运用3S.遥测.仿真-虚拟等技术,以计算机技术.多媒体技术和大规模存储技术为基础,以宽带网络为纽带,实现对城市多尺度.多时 ...

  2. pca图像压缩python_基于PCA的图像降维及图像重构

    1 PCA简述 PCA(Principal Component Analysis)主成分分析算法,在进行图像识别以及高维度数据降维处理中有很强的应用性,算法主要通过计算选择特征值较大的特征向量来对原始 ...

  3. 人工智能3d建模算法_基于计算视觉和3D建模 智能3D带来无限可能

    文:李伦 2017-11-20/17:03 驱动中国2017年11月20日消息 人工智能已经成为我们生活的日常,让我们的生活更加有趣,同时人工智能以强大的魅力,正逐逐渐改变我们的对世界的认识.说到人工 ...

  4. 数据集特征提取_基于PCA算法实现鸢尾花数据集的特征提取任务

    PCA算法的必要性 多变量大数据集无疑会为研究和应用提供丰富的信息,但是许多变量之间可能存在相关性,从而增加了问题分析的复杂性.如果分别对每个指标进行分析,分析往往是孤立的,不能完全利用数据中的信息, ...

  5. JAVA实现PCA主成分分析_主成分分析PCA(principal component analysis)原理

    PCA在很多方面均有应用,但是之前没有仔细探究过,最近看了一些博客和论文,做一下总结. 主成分分析(Principal Component Analysis,PCA), 是一种统计方法.通过正交变换将 ...

  6. 神经网络 mse一直不变_基于关系网络的视觉建模:有望替代卷积神经网络

    最近两年,自注意力机制.图和关系网络等模型在NLP领域刮起了一阵旋风,基于这些模型的Transformer.BERT.MASS等框架已逐渐成为NLP的主流方法.这些模型在计算机视觉领域是否能同样有用呢 ...

  7. 人脸扫描建模_人脸识别中的特征建模方法与流程

    本发明涉及生物特征识别,特别是涉及人脸识别中的特征建模方法. 背景技术: 人脸识别技术一般包括四个组成部分,分别为人脸图像采集.人脸图像预处理.人脸图像特征提取以及匹配与识别,具体来说: 人脸图像采集 ...

  8. 数据科学和数学建模_数据科学与国际象棋心理建模重叠

    数据科学和数学建模 Chess and data science have a lot in common. Some seemingly surface-level parallels includ ...

  9. javascript 建模_如何用JavaScript编写3D建模应用程序

    javascript 建模 介绍 (Introduction) Modeling in Subsurfer is based on cubes, and every model starts as a ...

最新文章

  1. canvas-绘制矩形-读书笔记
  2. 详解动态规划最长公共子序列--JavaScript实现
  3. SQL Server中常用的SQL语句
  4. 【Android 事件分发】ItemTouchHelper 事件分发源码分析 ( 绑定 RecyclerView )
  5. vue.js 动态添加组件
  6. springboot_通过Actuator了解应用程序运行时的内部状况
  7. java回显怎么实现_Java实现简单的server/client回显功能
  8. Linux Shell编程(25)——I/O 重定向
  9. 穷究链表(四)--链表实现前的思考
  10. 进销存软件定制开发怎么做?
  11. 实对称矩阵的特征值一定为实数证明
  12. 大学计算机基础总结与复习
  13. web逻辑思维题目_有关于最难的逻辑思维题目及答案
  14. 显示农历天气时钟小部件下载_软媒时间-时间栏的效率小软件
  15. 网络协议学习(B站观看最多)
  16. [高数][高昆轮][高等数学上][第一章-函数与极限]01.映射与极限
  17. Vue2 带纵向合并的原生表格实现切割侧栏分页
  18. 专升本英语——语法知识——高频语法——第二节 非谓语动词【学习笔记】
  19. 拉格朗日粒子扩散模式FLEXPART,在大气污染溯源中的应用
  20. 罗克韦尔自动化2018年自动化博览会开始接受注册

热门文章

  1. 双系统安装ubuntu20.04和常用软件基础配置流程
  2. 计算机等级考试三级网络
  3. 神笔马良——把图形「画」在音频里(译文 Draw Into Sound)
  4. 在台式计算机中加功放,电脑怎么接功放?功放电脑接有哪些技巧?
  5. 【PPT】高质量免费 PPT模板
  6. Word2003入门动画教程44:在Word中使用Office剪贴板
  7. Android SQLite嵌入式数据库
  8. 解决adb server version (36) doesn‘t match this client (41); killing...
  9. 字体文件的处理 iconfont 的处理
  10. 2022-2028中国硬件安全模块市场现状研究分析与发展前景预测报告