Python 爬取新浪财经部分股票的历史交易数据

本文仅供学习交流，如有错误纰漏，还请谅解，欢迎大家一起来学习探讨！

参考资料（感谢！）
爬取准备
- 爬取思路
- 模块1：网页表格数据爬取
- 模块2：添加输出数据
源代码（近期可能还要修改...）
- 爬取近一个月的历史交易数据
- 爬取近一年的历史交易数据

参考资料（感谢！）

配角七三—如何抓取网页中的表格:
https://zhuanlan.zhihu.com/p/33986020

爬取准备

import requests
from bs4 import BeautifulSoup
import pandas as pd
import  os
import time
import random

爬取思路

找到数据所在的网页，利用开发者工具，查看网页url，请求状态，源代码等等，然后定位数据元素。随后，进行编程。利用相关函数，模拟访问网页，采集数据，加以处理，并保存至本地。（细节之处不到位，还请见谅，博主还会再找时间另外总结）

模块1：网页表格数据爬取

def get_stock_table(stockcode,i):url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/' + str(stockcode) + '.phtml?year=2019&jidu=' + str(i)print(url)res = requests.get(url)res.encoding = 'gbk'soup = BeautifulSoup(res.text, 'lxml')tables = soup.find_all('table', {'id': 'FundHoldSharesTable'})df_list = []for table in tables:df_list.append(pd.concat(pd.read_html(table.prettify())))df = pd.concat(df_list)df.columns = df.iloc[0]headers = df.iloc[0]df = pd.DataFrame(df.values[1:], columns=headers)#print(len(df) - 1)   #df中有几行数据if (len(df) - 1 < 22):c =len(df)-1df = add_stock_table(stockcode,i,c,df)else:df =pd.DataFrame(df.values[1:22], columns=headers)df = df.reset_index(drop=True)df.to_excel('...\\'+str(stockcode) +'.xlsx')sleeptime = random.randint(1, 10)#print(sleeptime)time.sleep(sleeptime)

不过以上函数可能有时候获取不到一个月的数据，因此还要再加一个函数，用以添加数据。如果要获取一年的话，就要加一个循环

模块2：添加输出数据

def add_stock_table(stockcode,i,c,df):i = i - 1url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/' + str(stockcode) + '.phtml?year=2019&jidu=' + str(i)#print(url)res = requests.get(url)res.encoding = 'gbk'soup = BeautifulSoup(res.text, 'lxml')tables = soup.find_all('table', {'id': 'FundHoldSharesTable'})df_addlist = []for table in tables:df_addlist.append(pd.concat(pd.read_html(table.prettify())))df_add = pd.concat(df_addlist)headers = df_add.iloc[0]df_add = pd.DataFrame(df_add.values[1:random.randint(20, 22)-c], columns=headers)#print(df_add)df_sum = df.append(df_add)#print(df_sum)#print(len(df_sum)-1)return df_sum

谨记！本文仅供学习交流，如有错误纰漏，还请原谅，欢迎指教！博主较佛（懒），随缘修改！

源代码（近期可能还要修改…）

注意：
源代码无法直接套用！
.xlsx文件的路径需要修改！！
其他的根据自己的需要做变更。

爬取近一个月的历史交易数据

from bs4 import BeautifulSoup
import requests
import pandas as pd
import  os
import time
import randomdef get_stock_table(stockcode,i):url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/' + str(stockcode) + '.phtml?year=2019&jidu=' + str(i)print(url)res = requests.get(url)res.encoding = 'gbk'soup = BeautifulSoup(res.text, 'lxml')tables = soup.find_all('table', {'id': 'FundHoldSharesTable'})df_list = []for table in tables:df_list.append(pd.concat(pd.read_html(table.prettify())))df = pd.concat(df_list)df.columns = df.iloc[0]headers = df.iloc[0]df = pd.DataFrame(df.values[1:], columns=headers)#print(len(df) - 1)   #df中有几行数据if (len(df) - 1 < 22):c =len(df)-1df = add_stock_table(stockcode,i,c,df)else:df =pd.DataFrame(df.values[1:22], columns=headers)df = df.reset_index(drop=True)df.to_excel('...\\'+str(stockcode) +'.xlsx')sleeptime = random.randint(1, 10)#print(sleeptime)time.sleep(sleeptime)def add_stock_table(stockcode,i,c,df):i = i - 1url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/' + str(stockcode) + '.phtml?year=2019&jidu=' + str(i)#print(url)res = requests.get(url)res.encoding = 'gbk'soup = BeautifulSoup(res.text, 'lxml')tables = soup.find_all('table', {'id': 'FundHoldSharesTable'})df_addlist = []for table in tables:df_addlist.append(pd.concat(pd.read_html(table.prettify())))df_add = pd.concat(df_addlist)headers = df_add.iloc[0]df_add = pd.DataFrame(df_add.values[1:random.randint(20, 22)-c], columns=headers)#print(df_add)df_sum = df.append(df_add)#print(df_sum)#print(len(df_sum)-1)return df_sumif __name__ == "__main__":if os.path.exists("...\\601006.xlsx") == True:os.remove("...\\601006.xlsx")stockcode = ['601006', '000046', '601398', '000069', '601939', '000402','000001', '000089', '000027', '399001', '000002', '000800','601111', '600050', '601600', '600028', '601857', '601988','000951', '601919']i=2
index = 1
print("正在爬取month_stock信息...\n")
print("---------------\n")
print("请耐心等待...\n")
for x in stockcode:print(index)get_stock_table(x,i)index +=1

爬取近一年的历史交易数据

from bs4 import BeautifulSoup
import requests
import pandas as pd
import  os
import time
import randomdef get_stock_yeartable(stockcode,s,y):url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/' + str(stockcode) + '/type/S.phtml?year=' + str(y) + '&jidu=' + str(s)print(url)res = requests.get(url)res.encoding = 'gbk'soup = BeautifulSoup(res.text, 'lxml')tables = soup.find_all('table', {'id': 'FundHoldSharesTable'})df_list = []for table in tables:df_list.append(pd.concat(pd.read_html(table.prettify())))df = pd.concat(df_list)df.columns = df.iloc[0]headers = df.iloc[0]df = pd.DataFrame(df.values[1:], columns=headers)#print(len(df) - 1)   #df中有几行数据while len(df)<250:s -=1while s>0:df = add_stock_table(stockcode,s,y,df)s -= 1s = 5y -= 1df = df.reset_index(drop=True)df = pd.DataFrame(df.values[1:250], columns=headers)df.to_excel('D:\\Workplace\\PyCharm\\MySpider\\sh'+str(stockcode) +'.xlsx')sleeptime = random.randint(1, 10)#print(sleeptime)time.sleep(sleeptime)def add_stock_table(stockcode,s,y,df):print(y,"-",s)url = 'http://vip.stock.finance.sina.com.cn/corp/go.php/vMS_MarketHistory/stockid/' + str(stockcode) + '/type/S.phtml?year=' + str(y) + '&jidu=' + str(s)print(url)res = requests.get(url)res.encoding = 'gbk'soup = BeautifulSoup(res.text, 'lxml')tables = soup.find_all('table', {'id': 'FundHoldSharesTable'})df_addlist = []for table in tables:df_addlist.append(pd.concat(pd.read_html(table.prettify())))df_add = pd.concat(df_addlist)headers = df_add.iloc[0]df_add = pd.DataFrame(df_add.values[1:], columns=headers)#print(df_add)df_sum = df.append(df_add)#print(df_sum)#print(len(df_sum)-1)return df_sumif __name__ == "__main__":if os.path.exists("D:\\Workplace\\PyCharm\\MySpider\\sh000001.xlsx") == True:os.remove("D:\\Workplace\\PyCharm\\MySpider\\sh000001.xlsx")stockcode = ['000001']
s = 2
y = 2019
index = 1
print("正在爬取year_sh_stock信息...\n")
print("---------------\n")
print("请耐心等待...\n")
for x in stockcode:print(index)get_stock_yeartable(x,s,y)

Python 爬取新浪财经部分股票的历史交易数据相关推荐

使用python爬取天气信息（包括历史天气数据）
使用Python爬虫获取城市天气信息(包括历史天气数据) 使用python爬取历史天气数据文章目录使用Python爬虫获取城市天气信息(包括历史天气数据) 一.准备工作二.完整代码更新一.准 ...
如何用python爬取新浪财经
通过python爬取新浪财经的股票历史成交明细要求通过新浪财经爬取历史数据:http://market.finance.sina.com.cn/transHis.php?symbol=sz0000 ...
python 爬取财经新闻股票_70行python代码爬取新浪财经中股票历史成交明细
70行python代码爬取新浪财经中股票历史成交明细发布时间:2018-07-28 01:55, 浏览次数:635 , 标签: python 最近在研究股票量化,想从每笔成交的明细着手,但历史数据的 ...
爬取网易财经中股票的历史交易数据
爬取网易财经中股票的历史交易数据需求分析得到股票代码股票代码的信息是在东方财富网中获取(http://quote.eastmoney.com/stocklist.html) 得到股票的历史交易记 ...
python爬百度翻译-Python爬取百度翻译（利用json提取数据）
本篇文章给大家带来的内容是关于Python爬取百度翻译(利用json提取数据),有一定的参考价值,有需要的朋友可以参考一下,希望对你有所帮助. 工具:Python 3.6.5.PyCharm开发工具. ...
4ye含泪用python爬取了自己的公众号粉丝数据
4ye含泪用python爬取了自己的公众号粉丝数据小伙伴们好呀,最近本来是在捣鼓Gateway的知识点的,结果被一件事情搞得心不在焉哈哈哈哈,结果不得不先鸽下~ 搞完这件事情再继续哦!! ε=ε= ...
利用Python爬取《囧妈》豆瓣短评数据，并进行snownlp情感分析
利用Python爬取<囧妈>豆瓣短评数据,并进行snownlp情感分析一.电影评论爬取今年的贺岁片<囧妈>上映前后,在豆瓣评论上就有不少网友发表了自己的观点,到底是好评的声 ...
表哥用Python爬取数千条淘宝商品数据后，发现淘宝这些潜规则！
本文记录了笔者用 Python 爬取淘宝某商品的全过程,并对商品数据进行了挖掘与分析,最终得出结论. 项目内容本案例选择商品类目:沙发. 数量:共 100 页 4400 个商品. 筛选条件:天猫. ...
Python爬取影评并进行情感分析和数据可视化
Python爬取影评并进行情感分析和数据可视化文章目录 Python爬取影评并进行情感分析和数据可视化一.引言二.使用requests+BeautifulSoup进行影评的爬取 1.分析界面元素 ...