目 录
9 成员感想 14 1
1 项目背景以及意义 1
2 项目创新点 2
3 项目的设计 3
一:数据爬取部分 3

  1. 利用urllib获取网页数据 3
  2. 利用bs4的beautifulSoup进行数据的解析和获取想要的数据 4
  3. 利用re正则表达式结合bs4得出想要的数 4
    二:数据可视化部分 5
    一:数据爬取部分 6
    二:数据可视化部分 6
    9 成员感想 14
    二:数据可视化部分
    把所有数据从数据库中取出来
    利用了matplotlib.pyplot
    matplotlib.pyplot是一个有命令风格的函数集合,它看起来和MATLAB很相似。每一个pyplot函数都使一副图像做出些许改变,例如创建一幅图,在图中创建一个绘图区域,在绘图区域中添加一条线等等。在matplotlib.pyplot中,各种状态通过函数调用保存起来,以便于可以随时跟踪像当前图像和绘图区域这样的东西。绘图函数是直接作用于当前axes(matplotlib中的专有名词,本文转载自http://www.biyezuopin.vip/onews.asp?id=14765图形中组成部分,不是数学中的坐标系。)
    随后输出下面这些图片:
    NBA历年MVP得分变化趋势
    NBA历年MVP得分条形图
    NBA历年MVP助攻变化趋势
    NBA历年MVP助攻条形图
    NBA历年MVP篮板变化趋势
    NBA历年MVP篮板条形图

4 如何实现
一:数据爬取部分
数据的爬取分为如下几个步骤
利用urllib获取网页数据
利用bs4的beautifulSoup进行数据的解析和获取想要的数据
利用re正则表达式结合bs4得出想要的数据
利用sqlite3存储数据

二:数据可视化部分
把所有数据从数据库中取出来
利用了matplotlib.pyplot进行可视化

{"cells": [{"cell_type": "markdown","metadata": {},"source": ["# 数据爬取部分\n","## 数据的爬取分为如下几个步骤\n","* 利用urllib获取网页数据\n","* 利用bs4的beautifulSoup进行数据的解析和获取想要的数据\n","* 利用re正则表达式结合bs4得出想要的数据\n","* 利用sqlite3存储数据"]},{"cell_type": "code","execution_count": 1,"metadata": {},"outputs": [],"source": ["# -*- coding: utf-8 -*-# \n","#-------------------------------------------------------------------------------\n","# Name:         NBA\n","# Description:  \n","# Author:       zhouzikang\n","# Date:         2020-01-09\n","#-------------------------------------------------------------------------------\n","import sqlite3 #进行SQLit数据库的操作\n","from bs4 import BeautifulSoup  #网页解析,获取数据\n","import urllib.request, urllib.error #指定URL 获取网页数据\n","import re   #正则表达式-进行文字匹配\n","\n","def main():\n","    baseurl = \"http://www.stat-nba.com/award/item0.html\"\n","    #1.爬取网页,解析数据\n","    datalist = getData(baseurl)\n","    #2.保存数据\n","    dbpath = \"mvp.db\"\n","    saveData2DB(datalist,dbpath)\n","\n","#正则表达式\n","findPlayer = re.compile(r'[\\s\\S]+/player/[\\s\\S]+')\n","\n","findYear = re.compile(r'current season change_color col0 row[\\s\\S]+')\n","findScore = re.compile(r'normal pts change_color col23 row[\\s\\S]+')\n","findAssist = re.compile(r'normal ast change_color col18 row[\\s\\S]+')\n","findRebound = re.compile(r'normal trb change_color col15 row[\\s\\S]+')\n","\n","# 获取路径\n","def askURL(url):\n","    head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息\n","        \"User-Agent\": \"Mozilla / 5.0(WindowsNT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 70.0.3538 .67Safari / 537.36\"\n","    }\n","    # 用户代理表示告诉豆瓣服务器,我们是什么类型的机器\n","    request = urllib.request.Request(url,headers=head)\n","    html = \"\"\n","    try:\n","        response = urllib.request.urlopen(request)\n","        html = response.read().decode(\"utf-8\")\n","        #print(html)\n","    except urllib.error.URLError as e:\n","        if hasattr(e,\"code\"):\n","            print(e.code)\n","        if hasattr(e,\"reason\"):\n","            print(e.reason)\n","    return html\n","\n","\n","#获取数据\n","def getData(baseurl):\n","    datalist = []\n","    players = []\n","    years = []\n","    scores = []\n","    assists = []\n","    rebounds = []\n","    html = askURL(baseurl)\n","    #逐一解析\n","    soup = BeautifulSoup(html, \"html.parser\")\n","\n","    # 爬取MVP姓名\n","    for item in soup.find_all('a',href=findPlayer):\n","        item = str(item)\n","        item = re.sub('</a>',\" \", item)\n","        item = re.sub('<a[\\s\\S]+>', \" \", item)\n","        players.append(item.strip())\n","\n","    # 爬取MVP年份\n","    for item in soup.find_all('td', class_=findYear):\n","        item = str(item)\n","        item = re.sub('</td>', \" \", item)\n","        item = re.sub('<td[\\s\\S]+>', \" \", item)\n","        years.append(item.strip())\n","\n","    # 爬取MVP获得年份的得分\n","    for item in soup.find_all('td', class_=findScore):\n","        item = str(item)\n","        item = re.sub('</td>', \" \", item)\n","        item = re.sub('<td[\\s\\S]+>', \" \", item)\n","        scores.append(item.strip())\n","\n","    # 爬取MVP获得年份的助攻\n","    for item in soup.find_all('td', class_=findAssist):\n","        item = str(item)\n","        item = re.sub('</td>', \" \", item)\n","        item = re.sub('<td[\\s\\S]+>', \" \", item)\n","        assists.append(item.strip())\n","\n","    # 爬取MVP获得年份的篮板\n","    for item in soup.find_all('td', class_=findRebound):\n","        item = str(item)\n","        item = re.sub('</td>', \" \", item)\n","        item = re.sub('<td[\\s\\S]+>', \" \", item)\n","        rebounds.append(item.strip())\n","    #print(datalist)\n","    #把爬取的数据保存到列表中\n","    for i in range(len(players)):\n","        data = []\n","        data.append(years[i])\n","        data.append(players[i])\n","        data.append(scores[i])\n","        data.append(assists[i])\n","        data.append(rebounds[i])\n","        datalist.append(data)\n","    return datalist\n","\n","def saveData2DB(datalist,dbpath):\n","    init_db(dbpath)\n","    conn = sqlite3.connect(dbpath)\n","    cur = conn.cursor()\n","    for data in datalist:\n","        for index in range(len(data)):\n","            if index >=2:\n","                continue\n","            data[index] = '\"' + str(data[index]) + '\"'\n","        sql = '''\n","                insert into nba (\n","                    year,name,score,assist,rebound)\n","                    values(%s)'''%\",\".join(data)\n","\n","        cur.execute(sql)\n","        conn.commit()\n","    cur.close()\n","    conn.close()\n","\n","\n","def init_db(dbpath):\n","    sql = '''\n","        create table nba(\n","            year varchar ,\n","            name varchar ,\n","            score numeric ,\n","            assist numeric ,\n","            rebound numeric \n","        )\n","    '''\n","    conn = sqlite3.connect(dbpath)\n","    cursor = conn.cursor()\n","    cursor.execute(sql)\n","    conn.commit()\n","    conn.close()\n","\n","\n","if __name__ == \"__main__\":\n","    #调用参数\n","    main()"]},{"cell_type": "markdown","metadata": {},"source": ["# 数据可视化部分"]},{"cell_type": "markdown","metadata": {},"source": ["## 把所有数据从数据库中取出来"]












基于python和SQLite的NBA历年MVP变化趋势可视化分析相关推荐

  1. 基于Python的京津冀上广深空气质量可视化分析

    目录 (一)2018年北京空气质量数据可视化结果与分析 1.1 2018年北京AQI全年走势图 1.2 2018年北京月均AQI走势图 1.3 2018年北京季度AQI箱型图 1.4 2018年北京P ...

  2. Algorithm:数学建模大赛(CUMCM/NPMCM)之全国大学生数学建模竞赛历年考察知识点统计可视化分析、论文评阅标准参考、国内外CUMCM数学建模类参考文献论文集合之详细攻略

    Algorithm:数学建模大赛(CUMCM/NPMCM)之全国大学生数学建模竞赛历年考察知识点统计可视化分析.论文评阅标准参考.国内外CUMCM数学建模类参考文献论文集合之详细攻略 目录 全国大学生 ...

  3. 【Python】爬取中国历史票房榜,可视化分析

    [Python]爬取中国历史票房榜,可视化分析 最近电影<哪吒之魔童转世>票房已经超过<流浪地球>,<复联4>.升到中国内地票房第二位.就好有哪些电影排进了历史票房 ...

  4. 基于Python的SQLite基础知识学习

    前言 前一段时间偶然的看到了一个名词SQLite3,大概了解到此为一种轻量型的关系型数据库.官网介绍到SQLite是一个进程内库,它实现了一个自包含的.无服务器的.零配置的事务性SQL数据库引擎(官网 ...

  5. 基于python的SQLite数据库增删改查

    与其他数据库管理系统不同,SQLite不是一个客户端/服务器结构的数据库引擎,而是一种嵌入式数据库,他的数据库就是一个文件.SQLite将整个数据库,包括定义.表.索引以及数据本身,作为一个单独的.可 ...

  6. 基于Python多元线性回归、机器学习、深度学习在近红外光谱分析中的实践应用培训班

    一 Python入门基础 [理论讲解与案例演示实操练习] 1.Python环境搭建( 下载.安装与版本选择). 2.如何选择Python编辑器?(IDLE.Notepad++.PyCharm.Jupy ...

  7. 基于 python获取教育新闻进行分词关键词词共现分析 知识图谱 (附代码+报告)

    本文着眼于对疫情期间教育领域新闻的分析,基于 python 语言,利用爬虫获取教育领域的最新新闻,并将其内容进行分词,抓取关键词.在此基础上,根据关键词进行共现分析,并利用 Gephi 软件绘制主题知 ...

  8. python分布式爬虫开题报告范文_基于Python的豆瓣Top250排行榜影片数据爬取和分析开题报告...

    一.选题依据:(简述研究现状,说明该毕业设计的设计目的及意义) 研究现状 Python是一门很全面的语言,又随着大数据和人工智能的兴起,广受爬虫设计者们的青眯.设计者们运用Python语言的框架-Sc ...

  9. 基于Python的绝地求生数据分析吃鸡攻略可视化(包含报告、答辩PPT以及代码打包)

    1 选题背景及需求简介 为了让 PUBG 玩家体会到通关的快乐,需要尽可能准确地推测出吃鸡概 率与多种因素的关系.基于 python 的绝地求生吃鸡攻略可视化就是由此设计 开发的,通过输入不同条件,可 ...

最新文章

  1. django ForeignKey的使用
  2. 干货:2015年巴菲特六大投资建议
  3. ML之SVM:调用(sklearn的lfw_people函数在线下载55个外国人图片文件夹数据集)来精确实现人脸识别并提取人脸特征向量
  4. (33)调试驱动程序
  5. HTML系列(七):多媒体
  6. GDCM:gdcm::IOD的测试程序
  7. Python之深入解析一行代码计算每个省面积的神器Geopandas
  8. 女朋友心血来潮帮我清洗电脑
  9. 20张图,带你搞懂高并发中的线程与线程池!
  10. [转载] 晓说——第23期:大师照亮八十年代
  11. MySQL 用户表损坏 无法导出数据 无法使用mysql_update mysqd --update=FORCE无效 措施之一
  12. git 怎么还原历史版本_git 还原到服务器版本
  13. 根据身份证号码获取年龄
  14. Spring Cloud入门+深入(十二)-Gateway网关(一)
  15. 【微信小程序】使用 Cryptojs 解密微信绑定手机号码
  16. Java实现 kiosk模式,java – 使用“kiosk模式”创建Phonegap应用程序
  17. Revo Uninstaller Pro(软件卸载工具)官方中文版V4.4.2.0 | 万能卸载软件下载 | 软件卸载工具哪个好?
  18. 网络监控cacti1.2.12邮件报警(三)
  19. 一个面试我的后端妹子问的405错误
  20. 解决无法使用IMAP将Gmail帐户添加到Outlook的问题

热门文章

  1. STM32学习笔记 | 引起电源和系统异常复位的原因
  2. STM32外部引脚电路个人总结
  3. Openwrt编译进阶-修改ROOT密码,修改默认WiFi名称,修改主机名,修改主机型号
  4. matlab图像的裁剪
  5. python中lcut什么意思_python中如何画火山图
  6. 高数_第6章无穷级数__绝对收敛_条件收敛
  7. Origin Pro 8.5 导出EPS格式稿件图片的设置
  8. 前端如何下载excel表格
  9. 图像修复实例解析(二)
  10. 51单片机八位数码管1到F动态滚动显示仿真及程序