基于python和SQLite的NBA历年MVP变化趋势可视化分析
目 录
9 成员感想 14 1
1 项目背景以及意义 1
2 项目创新点 2
3 项目的设计 3
一:数据爬取部分 3
- 利用urllib获取网页数据 3
- 利用bs4的beautifulSoup进行数据的解析和获取想要的数据 4
- 利用re正则表达式结合bs4得出想要的数 4
二:数据可视化部分 5
一:数据爬取部分 6
二:数据可视化部分 6
9 成员感想 14
二:数据可视化部分
把所有数据从数据库中取出来
利用了matplotlib.pyplot
matplotlib.pyplot是一个有命令风格的函数集合,它看起来和MATLAB很相似。每一个pyplot函数都使一副图像做出些许改变,例如创建一幅图,在图中创建一个绘图区域,在绘图区域中添加一条线等等。在matplotlib.pyplot中,各种状态通过函数调用保存起来,以便于可以随时跟踪像当前图像和绘图区域这样的东西。绘图函数是直接作用于当前axes(matplotlib中的专有名词,本文转载自http://www.biyezuopin.vip/onews.asp?id=14765图形中组成部分,不是数学中的坐标系。)
随后输出下面这些图片:
NBA历年MVP得分变化趋势
NBA历年MVP得分条形图
NBA历年MVP助攻变化趋势
NBA历年MVP助攻条形图
NBA历年MVP篮板变化趋势
NBA历年MVP篮板条形图
4 如何实现
一:数据爬取部分
数据的爬取分为如下几个步骤
利用urllib获取网页数据
利用bs4的beautifulSoup进行数据的解析和获取想要的数据
利用re正则表达式结合bs4得出想要的数据
利用sqlite3存储数据
二:数据可视化部分
把所有数据从数据库中取出来
利用了matplotlib.pyplot进行可视化
{"cells": [{"cell_type": "markdown","metadata": {},"source": ["# 数据爬取部分\n","## 数据的爬取分为如下几个步骤\n","* 利用urllib获取网页数据\n","* 利用bs4的beautifulSoup进行数据的解析和获取想要的数据\n","* 利用re正则表达式结合bs4得出想要的数据\n","* 利用sqlite3存储数据"]},{"cell_type": "code","execution_count": 1,"metadata": {},"outputs": [],"source": ["# -*- coding: utf-8 -*-# \n","#-------------------------------------------------------------------------------\n","# Name: NBA\n","# Description: \n","# Author: zhouzikang\n","# Date: 2020-01-09\n","#-------------------------------------------------------------------------------\n","import sqlite3 #进行SQLit数据库的操作\n","from bs4 import BeautifulSoup #网页解析,获取数据\n","import urllib.request, urllib.error #指定URL 获取网页数据\n","import re #正则表达式-进行文字匹配\n","\n","def main():\n"," baseurl = \"http://www.stat-nba.com/award/item0.html\"\n"," #1.爬取网页,解析数据\n"," datalist = getData(baseurl)\n"," #2.保存数据\n"," dbpath = \"mvp.db\"\n"," saveData2DB(datalist,dbpath)\n","\n","#正则表达式\n","findPlayer = re.compile(r'[\\s\\S]+/player/[\\s\\S]+')\n","\n","findYear = re.compile(r'current season change_color col0 row[\\s\\S]+')\n","findScore = re.compile(r'normal pts change_color col23 row[\\s\\S]+')\n","findAssist = re.compile(r'normal ast change_color col18 row[\\s\\S]+')\n","findRebound = re.compile(r'normal trb change_color col15 row[\\s\\S]+')\n","\n","# 获取路径\n","def askURL(url):\n"," head = { # 模拟浏览器头部信息,向豆瓣服务器发送消息\n"," \"User-Agent\": \"Mozilla / 5.0(WindowsNT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 70.0.3538 .67Safari / 537.36\"\n"," }\n"," # 用户代理表示告诉豆瓣服务器,我们是什么类型的机器\n"," request = urllib.request.Request(url,headers=head)\n"," html = \"\"\n"," try:\n"," response = urllib.request.urlopen(request)\n"," html = response.read().decode(\"utf-8\")\n"," #print(html)\n"," except urllib.error.URLError as e:\n"," if hasattr(e,\"code\"):\n"," print(e.code)\n"," if hasattr(e,\"reason\"):\n"," print(e.reason)\n"," return html\n","\n","\n","#获取数据\n","def getData(baseurl):\n"," datalist = []\n"," players = []\n"," years = []\n"," scores = []\n"," assists = []\n"," rebounds = []\n"," html = askURL(baseurl)\n"," #逐一解析\n"," soup = BeautifulSoup(html, \"html.parser\")\n","\n"," # 爬取MVP姓名\n"," for item in soup.find_all('a',href=findPlayer):\n"," item = str(item)\n"," item = re.sub('</a>',\" \", item)\n"," item = re.sub('<a[\\s\\S]+>', \" \", item)\n"," players.append(item.strip())\n","\n"," # 爬取MVP年份\n"," for item in soup.find_all('td', class_=findYear):\n"," item = str(item)\n"," item = re.sub('</td>', \" \", item)\n"," item = re.sub('<td[\\s\\S]+>', \" \", item)\n"," years.append(item.strip())\n","\n"," # 爬取MVP获得年份的得分\n"," for item in soup.find_all('td', class_=findScore):\n"," item = str(item)\n"," item = re.sub('</td>', \" \", item)\n"," item = re.sub('<td[\\s\\S]+>', \" \", item)\n"," scores.append(item.strip())\n","\n"," # 爬取MVP获得年份的助攻\n"," for item in soup.find_all('td', class_=findAssist):\n"," item = str(item)\n"," item = re.sub('</td>', \" \", item)\n"," item = re.sub('<td[\\s\\S]+>', \" \", item)\n"," assists.append(item.strip())\n","\n"," # 爬取MVP获得年份的篮板\n"," for item in soup.find_all('td', class_=findRebound):\n"," item = str(item)\n"," item = re.sub('</td>', \" \", item)\n"," item = re.sub('<td[\\s\\S]+>', \" \", item)\n"," rebounds.append(item.strip())\n"," #print(datalist)\n"," #把爬取的数据保存到列表中\n"," for i in range(len(players)):\n"," data = []\n"," data.append(years[i])\n"," data.append(players[i])\n"," data.append(scores[i])\n"," data.append(assists[i])\n"," data.append(rebounds[i])\n"," datalist.append(data)\n"," return datalist\n","\n","def saveData2DB(datalist,dbpath):\n"," init_db(dbpath)\n"," conn = sqlite3.connect(dbpath)\n"," cur = conn.cursor()\n"," for data in datalist:\n"," for index in range(len(data)):\n"," if index >=2:\n"," continue\n"," data[index] = '\"' + str(data[index]) + '\"'\n"," sql = '''\n"," insert into nba (\n"," year,name,score,assist,rebound)\n"," values(%s)'''%\",\".join(data)\n","\n"," cur.execute(sql)\n"," conn.commit()\n"," cur.close()\n"," conn.close()\n","\n","\n","def init_db(dbpath):\n"," sql = '''\n"," create table nba(\n"," year varchar ,\n"," name varchar ,\n"," score numeric ,\n"," assist numeric ,\n"," rebound numeric \n"," )\n"," '''\n"," conn = sqlite3.connect(dbpath)\n"," cursor = conn.cursor()\n"," cursor.execute(sql)\n"," conn.commit()\n"," conn.close()\n","\n","\n","if __name__ == \"__main__\":\n"," #调用参数\n"," main()"]},{"cell_type": "markdown","metadata": {},"source": ["# 数据可视化部分"]},{"cell_type": "markdown","metadata": {},"source": ["## 把所有数据从数据库中取出来"]
基于python和SQLite的NBA历年MVP变化趋势可视化分析相关推荐
- 基于Python的京津冀上广深空气质量可视化分析
目录 (一)2018年北京空气质量数据可视化结果与分析 1.1 2018年北京AQI全年走势图 1.2 2018年北京月均AQI走势图 1.3 2018年北京季度AQI箱型图 1.4 2018年北京P ...
- Algorithm:数学建模大赛(CUMCM/NPMCM)之全国大学生数学建模竞赛历年考察知识点统计可视化分析、论文评阅标准参考、国内外CUMCM数学建模类参考文献论文集合之详细攻略
Algorithm:数学建模大赛(CUMCM/NPMCM)之全国大学生数学建模竞赛历年考察知识点统计可视化分析.论文评阅标准参考.国内外CUMCM数学建模类参考文献论文集合之详细攻略 目录 全国大学生 ...
- 【Python】爬取中国历史票房榜,可视化分析
[Python]爬取中国历史票房榜,可视化分析 最近电影<哪吒之魔童转世>票房已经超过<流浪地球>,<复联4>.升到中国内地票房第二位.就好有哪些电影排进了历史票房 ...
- 基于Python的SQLite基础知识学习
前言 前一段时间偶然的看到了一个名词SQLite3,大概了解到此为一种轻量型的关系型数据库.官网介绍到SQLite是一个进程内库,它实现了一个自包含的.无服务器的.零配置的事务性SQL数据库引擎(官网 ...
- 基于python的SQLite数据库增删改查
与其他数据库管理系统不同,SQLite不是一个客户端/服务器结构的数据库引擎,而是一种嵌入式数据库,他的数据库就是一个文件.SQLite将整个数据库,包括定义.表.索引以及数据本身,作为一个单独的.可 ...
- 基于Python多元线性回归、机器学习、深度学习在近红外光谱分析中的实践应用培训班
一 Python入门基础 [理论讲解与案例演示实操练习] 1.Python环境搭建( 下载.安装与版本选择). 2.如何选择Python编辑器?(IDLE.Notepad++.PyCharm.Jupy ...
- 基于 python获取教育新闻进行分词关键词词共现分析 知识图谱 (附代码+报告)
本文着眼于对疫情期间教育领域新闻的分析,基于 python 语言,利用爬虫获取教育领域的最新新闻,并将其内容进行分词,抓取关键词.在此基础上,根据关键词进行共现分析,并利用 Gephi 软件绘制主题知 ...
- python分布式爬虫开题报告范文_基于Python的豆瓣Top250排行榜影片数据爬取和分析开题报告...
一.选题依据:(简述研究现状,说明该毕业设计的设计目的及意义) 研究现状 Python是一门很全面的语言,又随着大数据和人工智能的兴起,广受爬虫设计者们的青眯.设计者们运用Python语言的框架-Sc ...
- 基于Python的绝地求生数据分析吃鸡攻略可视化(包含报告、答辩PPT以及代码打包)
1 选题背景及需求简介 为了让 PUBG 玩家体会到通关的快乐,需要尽可能准确地推测出吃鸡概 率与多种因素的关系.基于 python 的绝地求生吃鸡攻略可视化就是由此设计 开发的,通过输入不同条件,可 ...
最新文章
- django ForeignKey的使用
- 干货:2015年巴菲特六大投资建议
- ML之SVM:调用(sklearn的lfw_people函数在线下载55个外国人图片文件夹数据集)来精确实现人脸识别并提取人脸特征向量
- (33)调试驱动程序
- HTML系列(七):多媒体
- GDCM:gdcm::IOD的测试程序
- Python之深入解析一行代码计算每个省面积的神器Geopandas
- 女朋友心血来潮帮我清洗电脑
- 20张图,带你搞懂高并发中的线程与线程池!
- [转载] 晓说——第23期:大师照亮八十年代
- MySQL 用户表损坏 无法导出数据 无法使用mysql_update mysqd --update=FORCE无效 措施之一
- git 怎么还原历史版本_git 还原到服务器版本
- 根据身份证号码获取年龄
- Spring Cloud入门+深入(十二)-Gateway网关(一)
- 【微信小程序】使用 Cryptojs 解密微信绑定手机号码
- Java实现 kiosk模式,java – 使用“kiosk模式”创建Phonegap应用程序
- Revo Uninstaller Pro(软件卸载工具)官方中文版V4.4.2.0 | 万能卸载软件下载 | 软件卸载工具哪个好?
- 网络监控cacti1.2.12邮件报警(三)
- 一个面试我的后端妹子问的405错误
- 解决无法使用IMAP将Gmail帐户添加到Outlook的问题