几个周末之前,我开始抓取Bern 拜仁慕尼黑/巴塞罗那比赛的BBC直播文本提要,最初只是从犯规开始,然后建立犯规图表。

从那以后,我花了更多时间,并设法对其他一些事件进行建模,包括尝试,目标,发牌和任意球。

我刚开始只是为拜仁慕尼黑/巴塞罗那比赛做这件事,但意识到扩展这一点并为2014/2015冠军联赛的每场比赛绘制图表并不是特别困难。

为此,我们首先需要下载每个匹配项的页面。 我下载了此页面,并编写了一个简单的Python脚本来获取适当的URI:

from bs4 import BeautifulSoup
from soupselect import select
import bs4soup = BeautifulSoup(open("data/results", "r"))matches = select(soup, "a.report")for match in matches:print "http://www.bbc.co.uk/%s" %(match.get("href"))

然后,我将运行此脚本的输出通过管道传递到wget中:

find_all_matches.py | xargs wget -O data/raw

更新抓取和导入代码以处理多个匹配相对简单。 从头到尾的整个过程如下所示:

大部分代码处于“抓取魔术”阶段,在此阶段中,我经历了所有事件并提取了可以在图中链接在一起的适当元素。

例如,任意球和犯规事件通常相邻,因此我们希望淘汰两名参与者,纸牌类型,事件时间和事件发生的比赛。

我使用Python的Beautiful Soup库完成此任务,但是没有理由您不能使用另一套工具。

README页面显示了如何创建自己的图形版本,但是这里是使用Rik的 元图查询查询图形的概述:

到目前为止,这是我最喜欢的一些查询:

哪个射门次数超过10的球员转换率最高?

match (a:Attempt)<-[:HAD_ATTEMPT]-(app)<-[:MADE_APPEARANCE]-(player),(app)-[:FOR_TEAM]-(team)
WITH player, COUNT(*) as times, COLLECT(a) AS attempts, team
WITH player, times, LENGTH([a in attempts WHERE a:Goal]) AS goals, team
WHERE times > 10
RETURN player.name, team.name, goals, times, (goals * 1.0 / times) AS conversionRate
ORDER BY conversionRate DESC
LIMIT 10==> +------------------------------------------------------------------------------------+
==> | player.name           | team.name            | goals | times | conversionRate      |
==> +------------------------------------------------------------------------------------+
==> | "Luiz Adriano"        | "Shakhtar Donetsk"   | 9     | 14    | 0.6428571428571429  |
==> | "Yacine Brahimi"      | "FC Porto"           | 5     | 13    | 0.38461538461538464 |
==> | "Mario Mandzukic"     | "Atlético de Madrid" | 5     | 14    | 0.35714285714285715 |
==> | "Sergio Agüero"       | "Manchester City"    | 6     | 18    | 0.3333333333333333  |
==> | "Karim Benzema"       | "Real Madrid"        | 6     | 19    | 0.3157894736842105  |
==> | "Klaas-Jan Huntelaar" | "FC Schalke 04"      | 5     | 16    | 0.3125              |
==> | "Neymar"              | "Barcelona"          | 9     | 29    | 0.3103448275862069  |
==> | "Thomas Müller"       | "FC Bayern München"  | 7     | 24    | 0.2916666666666667  |
==> | "Jackson Martínez"    | "FC Porto"           | 7     | 24    | 0.2916666666666667  |
==> | "Callum McGregor"     | "Celtic"             | 3     | 11    | 0.2727272727272727  |
==> +------------------------------------------------------------------------------------+

哪些球员因犯规立即报仇?

match (firstFoul:Foul)-[:COMMITTED_AGAINST]->(app1)<-[:MADE_APPEARANCE]-(revengeFouler),(app1)-[:IN_MATCH]->(match), (firstFoulerApp)-[:COMMITTED_FOUL]->(firstFoul),(app1)-[:COMMITTED_FOUL]->(revengeFoul)-[:COMMITTED_AGAINST]->(firstFoulerApp),(firstFouler)-[:MADE_APPEARANCE]->(firstFoulerApp)
WHERE (firstFoul)-[:NEXT]->(revengeFoul)
RETURN firstFouler.name AS firstFouler, revengeFouler.name AS revengeFouler, firstFoul.time, revengeFoul.time, match.home + " vs " + match.away==> +---------------------------------------------------------------------------------------------------------------------------------+
==> | firstFouler         | revengeFouler               | firstFoul.time | revengeFoul.time | match.home + " vs " + match.away        |
==> +---------------------------------------------------------------------------------------------------------------------------------+
==> | "Derk Boerrigter"   | "Jean Philippe Mendy"       | "88:48"        | "89:42"          | "Celtic vs NK Maribor"                  |
==> | "Mario Suárez"      | "Pajtim Kasami"             | "27:17"        | "32:38"          | "Olympiakos vs Atlético de Madrid"      |
==> | "Aleksandr Volodko" | "Casemiro"                  | "39:27"        | "44:32"          | "FC Porto vs BATE Borisov"              |
==> | "Thomas Müller"     | "Mario Fernandes"           | "87:22"        | "88:31"          | "CSKA Moscow vs FC Bayern München"      |
==> | "Vinicius"          | "Marco Verratti"            | "56:36"        | "58:00"          | "APOEL Nicosia vs Paris Saint Germain"  |
==> | "Lasse Schöne"      | "Dani Alves"                | "84:08"        | "86:18"          | "Barcelona vs Ajax"                     |
==> | "Nick Viergever"    | "Dani Alves"                | "57:22"        | "60:37"          | "Barcelona vs Ajax"                     |
==> | "Nani"              | "Atsuto Uchida"             | "6:10"         | "8:40"           | "FC Schalke 04 vs Sporting Lisbon"      |
==> | "Andreas Samaris"   | "Yannick Ferreira-Carrasco" | "89:21"        | "90:00 +4:21"    | "Monaco vs Benfica"                     |
==> | "Simon Kroon"       | "Guillherme Siqueira"       | "84:05"        | "90:00 +0:29"    | "Atlético de Madrid vs Malmö FF"        |
==> | "Mario Suárez"      | "Isaac Thelin"              | "32:02"        | "38:47"          | "Atlético de Madrid vs Malmö FF"        |
==> | "Hakan Balta"       | "Henrikh Mkhitaryan"        | "62:09"        | "64:14"          | "Borussia Dortmund vs Galatasaray"      |
==> | "Marco Reus"        | "Selcuk Inan"               | "36:17"        | "44:03"          | "Borussia Dortmund vs Galatasaray"      |
==> | "Hakan Balta"       | "Sven Bender"               | "10:57"        | "12:51"          | "Borussia Dortmund vs Galatasaray"      |
==> | "Vinicius"          | "Edinson Cavani"            | "87:56"        | "90:00 +1:25"    | "Paris Saint Germain vs APOEL Nicosia"  |
==> | "Jackson Martínez"  | "Carlos Gurpegi"            | "64:55"        | "66:17"          | "Athletic Club vs FC Porto"             |
==> | "Nani"              | "Chinedu Obasi"             | "1:30"         | "4:47"           | "Sporting Lisbon vs FC Schalke 04"      |
==> | "Vitali Rodionov"   | "Bruno Martins Indi"        | "52:16"        | "60:08"          | "BATE Borisov vs FC Porto"              |
==> | "Raheem Sterling"   | "Behrang Safari"            | "29:00"        | "33:27"          | "Liverpool vs FC Basel"                 |
==> | "Derlis González"   | "Fábio Coentrão"            | "52:55"        | "57:59"          | "FC Basel vs Real Madrid"               |
==> | "Josip Drmic"       | "Lisandro López"            | "15:04"        | "17:35"          | "Benfica vs Bayer 04 Leverkusen"        |
==> | "Fred"              | "Bastian Schweinsteiger"    | "6:04"         | "9:28"           | "Shakhtar Donetsk vs FC Bayern München" |
==> | "Alex Sandro"       | "Derlis González"           | "4:07"         | "7:28"           | "FC Basel vs FC Porto"                  |
==> | "Luca Zuffi"        | "Ruben Neves"               | "73:49"        | "84:44"          | "FC Porto vs FC Basel"                  |
==> | "Marco Verratti"    | "Oscar"                     | "28:49"        | "34:04"          | "Chelsea vs Paris Saint Germain"        |
==> | "Cristiano Ronaldo" | "Jesús Gámez"               | "20:59"        | "25:37"          | "Real Madrid vs Atlético de Madrid"     |
==> | "Bernardo Silva"    | "Álvaro Morata"             | "49:20"        | "62:31"          | "Monaco vs Juventus"                    |
==> | "Arturo Vidal"      | "Fabinho"                   | "38:19"        | "45:00"          | "Monaco vs Juventus"                    |
==> +---------------------------------------------------------------------------------------------------------------------------------+

哪个球员花费最长的时间为犯规报仇?

match (foul1:Foul)-[:COMMITTED_AGAINST]->(app1)-[:COMMITTED_FOUL]->(foul2)-[:COMMITTED_AGAINST]->(app2)-[:COMMITTED_FOUL]->(foul1),(player1)-[:MADE_APPEARANCE]->(app1), (player2)-[:MADE_APPEARANCE]->(app2),(foul1)-[:COMMITTED_IN_MATCH]->(match:Match)<-[:COMMITTED_IN_MATCH]-(foul2)
WHERE (foul1)-[:NEXT*]->(foul2)
WITH match, foul1, player1, player2, foul2 ORDER BY foul1.sortableTime, foul2.sortableTime
WITH match, foul1, player1, player2, COLLECT(foul2) AS revenge
WITH match, foul1,  player1,player2,  revenge[0] AS revengeFoul
RETURN player1.name, player2.name, foul1.time, revengeFoul.time, revengeFoul.sortableTime - foul1.sortableTime AS secondsWaited, match.home + " vs " + match.away AS match
ORDER BY secondsWaited DESC
LIMIT 5==> +---------------------------------------------------------------------------------------------------------------------------+
==> | player1.name      | player2.name        | foul1.time | revengeFoul.time | secondsWaited | match                           |
==> +---------------------------------------------------------------------------------------------------------------------------+
==> | "Stefan Johansen" | "Ondrej Duda"       | "1:30"     | "82:11"          | 4841          | "Legia Warsaw vs Celtic"        |
==> | "Neymar"          | "Vinicius"          | "2:35"     | "80:08"          | 4653          | "Barcelona vs APOEL Nicosia"    |
==> | "Jérémy Toulalan" | "Stefan Kießling"   | "9:19"     | "86:37"          | 4638          | "Monaco vs Bayer 04 Leverkusen" |
==> | "Nabil Dirar"     | "Domenico Criscito" | "6:32"     | "82:39"          | 4567          | "Zenit St Petersburg vs Monaco" |
==> | "Nabil Dirar"     | "Eliseu"            | "7:20"     | "81:30"          | 4450          | "Monaco vs Benfica"             |
==> +---------------------------------------------------------------------------------------------------------------------------+

谁的镜头最多?

match (team)<-[:FOR_TEAM]-(app)<-[appRel:MADE_APPEARANCE]-(player:Player)
optional match (a:Attempt)<-[att:HAD_ATTEMPT]-(app)
WITH player, COUNT( DISTINCT appRel) AS apps, COUNT(att) as times, COLLECT(a) AS attempts, team
WITH player,apps, times, LENGTH([a in attempts WHERE a:Goal]) AS goals, team
WHERE times > 10
RETURN player.name, team.name, apps, goals, times, (goals * 1.0 / times) AS conversionRate
ORDER BY times DESC
LIMIT 10==> +-------------------------------------------------------------------------------------------+
==> | player.name          | team.name             | apps | goals | times | conversionRate      |
==> +-------------------------------------------------------------------------------------------+
==> | "Cristiano Ronaldo"  | "Real Madrid"         | 12   | 10    | 69    | 0.14492753623188406 |
==> | "Lionel Messi"       | "Barcelona"           | 12   | 10    | 51    | 0.19607843137254902 |
==> | "Robert Lewandowski" | "FC Bayern München"   | 12   | 6     | 43    | 0.13953488372093023 |
==> | "Carlos Tévez"       | "Juventus"            | 12   | 7     | 34    | 0.20588235294117646 |
==> | "Gareth Bale"        | "Real Madrid"         | 10   | 2     | 32    | 0.0625              |
==> | "Luis Suárez"        | "Barcelona"           | 9    | 6     | 30    | 0.2                 |
==> | "Neymar"             | "Barcelona"           | 11   | 9     | 29    | 0.3103448275862069  |
==> | "Hakan Calhanoglu"   | "Bayer 04 Leverkusen" | 8    | 2     | 29    | 0.06896551724137931 |
==> | "Edinson Cavani"     | "Paris Saint Germain" | 8    | 6     | 27    | 0.2222222222222222  |
==> | "Alexis Sánchez"     | "Arsenal"             | 9    | 4     | 25    | 0.16                |
==> +-------------------------------------------------------------------------------------------+

也许您能想到一些更酷的产品? 我很想看到他们。 从github获取代码并尝试一下。

翻译自: https://www.javacodegeeks.com/2015/06/neo4j-the-bbc-champions-league-graph.html

Neo4j:BBC冠军联赛图表相关推荐

  1. 冠军联赛:当火焰变成焰火 海水变成泪水

    2019独角兽企业重金招聘Python工程师标准>>>       蓝色像海水,红色像火焰:当蓝军遭遇红军的时候,犹如一半是海水,一半是火焰. 悬念几乎保持到了最后一秒钟,一支红蓝铅 ...

  2. 南威尔士警方称,2017年欧洲冠军联赛决赛使用的人脸识别技术错误率超过90%

    没有任何人脸识别程序是可以达到100%的准确率的,这是一个可预见的,且在未来相当长一段时间内都会存在的问题. 近日,南威尔士警方在一项记录请求中透露,其余2017年欧洲冠军联赛决赛等事件中使用的自动面 ...

  3. 欧洲篮球冠军联赛网站

    欧洲篮球冠军联赛 http://www.euroleague.net

  4. elo 评分_Elo评分系统:使用Clojure对欧洲冠军联赛球队进行排名

    elo 评分 正如我在较早的博客文章中提到的那样, 我一直在学习有关排名系统的知识,而我遇到的第一个系统是Elo评级系统 ,该系统最有名的是用于对棋手进行排名的系统 . Elo评分系统使用以下公式计算 ...

  5. 第147杆147分诞生!吉尔伯特冠军联赛创历史

    大卫·吉尔伯特(资料图) 图片来源:Osports全体育图片社 中新网1月23日电 斯诺克第147杆147分终于诞生!北京时间23日上午,斯诺克冠军联赛第五小组进行的一场比赛中,大卫•吉尔伯特在与斯蒂 ...

  6. Elo评分系统:使用Clojure对欧洲冠军联赛球队进行排名

    正如我在较早的博客文章中提到的那样, 我一直在学习有关排名系统的知识,而我遇到的第一个系统是Elo评级系统 ,该系统最著名地用于对棋手进行排名. Elo评分系统使用以下公式来计算球员/团队参加比赛后的 ...

  7. Neo4j:足球转移图表

    考虑到就欧洲足球而言,我们仍处于赛季前 转会狂潮 ,我认为整理足球转会图表以查看是否有任何有趣的见解会很有趣. 我花了一段时间才找到合适的消息来源,但最终我遇到了transfermarkt.co.uk ...

  8. Linux系统编程:验证kernel内核缓存区大小->4096字节

    Linux系统编程:验证kernel内核缓存区大小->4096字节 李四老师 于 2018-04-04 00:40:04 发布 2778 收藏 2 分类专栏: [Linux编程] [C/C++编 ...

  9. 【BZOJ2768】[JLOI2010]冠军调查/【BZOJ1934】[Shoi2007]Vote 善意的投票 最小割

    [BZOJ2768][JLOI2010]冠军调查 Description 一年一度的欧洲足球冠军联赛已经进入了淘汰赛阶段.随着卫冕冠军巴萨罗那的淘汰,英超劲旅切尔西成为了头号热门.新浪体育最近在吉林教 ...

最新文章

  1. 如何发布ActiveX 控件
  2. 一行代码简化Python异常信息:错误清晰指出,排版简洁美观 | 开源分享
  3. 网络安全-windowserver搭建DHCP服务器
  4. 在C#中怎么调用Resources文件中的图片
  5. python函数定义中参数列表里的参数是_python函数参数中的/和*是什么意思?
  6. 商品管理后台html,商品类型管理.html
  7. HDU2546 饭卡【贪心+0-1背包】
  8. 计算机相关专业及本科课程整理
  9. 生活过得很苦 不知道什么时候才能解脱
  10. Flink CDC 2.2 正式发布,新增四种数据源,支持动态加表,提供增量快照框架
  11. 安装成功后python报错_python安装mysql的依赖包mysql-python操作
  12. php 安装scws,SCWS分词扩展在windows下的安装方法
  13. 3d安卓环境搭建_RoboCup 仿真3D简介及环境搭建
  14. Exynos4412——LCD驱动
  15. 云南等保2.0介绍,等保合规二级、三级整改所需设备清单和具体解决方案
  16. bootstrap视频教程 jsp_家政服务系统(JAVA,SSM,BOOTSTRAP,JSP,AJAX,MYSQL)+手把手系列视频教程...
  17. 狐妖小红娘服务器维护,3月7游戏更新公告 狐妖小红娘版本上线
  18. 十进制100转换成八进制是多少?
  19. java 等待线程/线程池执行完毕
  20. 体育和旅游融合成为今夏显著的旅行趋势

热门文章

  1. 骂程序员恋尸癖?程序员可没那么好惹!
  2. [转载]J2me技术——跟我学制作Pak文件
  3. C语言课设图书信息管理系统(大作业)
  4. 129、交换机如何设置控制IP地址冲突故障
  5. OpenVAS的安装、使用及实战(GVM,Metasploit使用)
  6. Numpy简易教程8——简易分析
  7. Python--龟兔赛跑游戏
  8. 基于微信食堂预约小程序系统设计与实现 开题报告
  9. WB8使用说明-基础(引用)
  10. ASEMI整流桥GBPC3510参数,GBPC3510特征,GBPC3510应用