期权数据 获取

by Harry Sauers

哈里·绍尔斯(Harry Sauers)

我如何免费获得期权数据 (How I get options data for free)

网页抓取金融简介 (An introduction to web scraping for finance)

Ever wished you could access historical options data, but got blocked by a paywall? What if you just want it for research, fun, or to develop a personal trading strategy?

曾经希望您可以访问历史期权数据,但是却被付费专区阻止了吗? 如果您只是想将其用于研究,娱乐或制定个人交易策略该怎么办?

In this tutorial, you’ll learn how to use Python and BeautifulSoup to scrape financial data from the Web and build your own dataset.

在本教程中,您将学习如何使用Python和BeautifulSoup从Web刮取财务数据并构建自己的数据集。

入门 (Getting Started)

You should have at least a working knowledge of Python and Web technologies before beginning this tutorial. To build these up, I highly recommend checking out a site like codecademy to learn new skills or brush up on old ones.

在开始本教程之前,您应该至少具有Python和Web技术的工作知识。 要建立这些基础,我强烈建议您访问codecademy之类的网站,以学习新技能或学习旧技能。

First, let’s spin up your favorite IDE. Normally, I use PyCharm but, for a quick script like this Repl.it will do the job too. Add a quick print (“Hello world”) to ensure your environment is set up correctly.

首先,让我们启动您最喜欢的IDE。 通常,我使用PyCharm,但是对于像Repl.it这样的快速脚本也可以完成此工作。 添加快速打印(“ Hello world”)以确保正确设置您的环境。

Now we need to figure out a data source.

现在我们需要找出一个数据源。

Unfortunately, Cboe’s awesome options chain data is pretty locked down, even for current delayed quotes. Luckily, Yahoo Finance has solid enough options data here. We’ll use it for this tutorial, as web scrapers often need some content awareness, but it is easily adaptable for any data source you want.

不幸的是,即使对于当前的延迟报价, Cboe令人敬畏的期权链数据也已被锁定。 幸运的是,Yahoo Finance 在这里拥有足够可靠的期权数据。 我们将在本教程中使用它,因为网络抓取工具通常需要一些内容意识,但是它很容易适应您想要的任何数据源。

依存关系 (Dependencies)

We don’t need many external dependencies. We just need the Requests and BeautifulSoup modules in Python. Add these at the top of your program:

我们不需要很多外部依赖。 我们只需要Python中的Requests和BeautifulSoup模块。 将这些添加到程序顶部:

from bs4 import BeautifulSoupimport requests

Create a main method:

创建一个main方法:

def main():  print(“Hello World!”)if __name__ == “__main__”:  main()

刮HTML (Scraping HTML)

Now you’re ready to start scraping! Inside main(), add these lines to fetch the page’s full HTML:

现在您就可以开始抓取了! 在main()内部,添加以下行以获取页面的完整HTML

data_url = “https://finance.yahoo.com/quote/SPY/options"data_html = requests.get(data_url).contentprint(data_html)

This fetches the page’s full HTML content, so we can find the data we want in it. Feel free to give it a run and observe the output.

这将获取页面的完整HTML内容,因此我们可以在其中找到所需的数据。 随意运行并观察输出。

Feel free to comment out print statements as you go — these are just there to help you understand what the program is doing at any given step.

随时随地注释打印语句-这些语句可以帮助您了解程序在任何给定步骤中的操作。

BeautifulSoup is the perfect tool for working with HTML data in Python. Let’s narrow down the HTML to just the options pricing tables so we can better understand it:

BeautifulSoup是在Python中处理HTML数据的理想工具。 让我们将HTML的范围缩小到期权定价表,以便我们可以更好地理解它:

content = BeautifulSoup(data_html, “html.parser”) # print(content)
options_tables = content.find_all(“table”) print(options_tables)

That’s still quite a bit of HTML — we can’t get much out of that, and Yahoo’s code isn’t the most friendly to web scrapers. Let’s break it down into two tables, for calls and puts:

那仍然是HTML大部分-我们不能从中得到很多,而且Yahoo的代码对网络抓取工具并不是最友好的。 让我们将其分解为两个表,用于看涨期权和看跌期权:

options_tables = [] tables = content.find_all(“table”) for i in range(0, len(content.find_all(“table”))):   options_tables.append(tables[i])
print(options_tables)

Yahoo’s data contains options that are pretty deep in- and out-of-the-money, which might be great for certain purposes. I’m only interested in near-the-money options, namely the two calls and two puts closest to the current price.

雅虎的数据包含大量的价内和价外选项,对于某些用途而言可能非常有用。 我只对近价期权感兴趣,即最接近当前价格的两个看涨期权和两个看跌期权。

Let’s find these, using BeautifulSoup and Yahoo’s differential table entries for in-the-money and out-of-the-money options:

让我们使用BeautifulSoup和Yahoo的差异表条目来选择价内和价外选项,以找到这些:

expiration = datetime.datetime.fromtimestamp(int(datestamp)).strftime(“%Y-%m-%d”)
calls = options_tables[0].find_all(“tr”)[1:] # first row is header
itm_calls = []otm_calls = []
for call_option in calls:    if “in-the-money” in str(call_option):  itm_calls.append(call_option)  else:    otm_calls.append(call_option)
itm_call = itm_calls[-1]otm_call = otm_calls[0]
print(str(itm_call) + “ \n\n “ + str(otm_call))

Now, we have the table entries for the two options nearest to the money in HTML. Let’s scrape the pricing data, volume, and implied volatility from the first call option:

现在,我们有了最接近HTML的money的两个选项的表条目。 让我们从第一个看涨期权中抓取定价数据,数量和隐含波动率:

itm_call_data = [] for td in BeautifulSoup(str(itm_call), “html.parser”).find_all(“td”):   itm_call_data.append(td.text)
print(itm_call_data)
itm_call_info = {‘contract’: itm_call_data[0], ‘strike’: itm_call_data[2], ‘last’: itm_call_data[3],  ‘bid’: itm_call_data[4], ‘ask’: itm_call_data[5], ‘volume’: itm_call_data[8], ‘iv’: itm_call_data[10]}
print(itm_call_info)

Adapt this code for the next call option:

将此代码改编为下一个调用选项:

# otm callotm_call_data = []for td in BeautifulSoup(str(otm_call), “html.parser”).find_all(“td”):  otm_call_data.append(td.text)
# print(otm_call_data)
otm_call_info = {‘contract’: otm_call_data[0], ‘strike’: otm_call_data[2], ‘last’: otm_call_data[3],  ‘bid’: otm_call_data[4], ‘ask’: otm_call_data[5], ‘volume’: otm_call_data[8], ‘iv’: otm_call_data[10]}
print(otm_call_info)

Give your program a run!

运行您的程序!

You now have dictionaries of the two near-the-money call options. It’s enough just to scrape the table of put options for this same data:

现在,您将拥有两个近乎全额认购期权的字典。 只需为这些相同的数据刮入看跌期权表即可:

puts = options_tables[1].find_all("tr")[1:]  # first row is header
itm_puts = []  otm_puts = []
for put_option in puts:    if "in-the-money" in str(put_option):      itm_puts.append(put_option)    else:      otm_puts.append(put_option)
itm_put = itm_puts[0]  otm_put = otm_puts[-1]
# print(str(itm_put) + " \n\n " + str(otm_put) + "\n\n")
itm_put_data = []  for td in BeautifulSoup(str(itm_put), "html.parser").find_all("td"):    itm_put_data.append(td.text)
# print(itm_put_data)
itm_put_info = {'contract': itm_put_data[0],                  'last_trade': itm_put_data[1][:10],                  'strike': itm_put_data[2], 'last': itm_put_data[3],                   'bid': itm_put_data[4], 'ask': itm_put_data[5], 'volume': itm_put_data[8], 'iv': itm_put_data[10]}
# print(itm_put_info)
# otm put  otm_put_data = []  for td in BeautifulSoup(str(otm_put), "html.parser").find_all("td"):    otm_put_data.append(td.text)
# print(otm_put_data)
otm_put_info = {'contract': otm_put_data[0],                  'last_trade': otm_put_data[1][:10],                  'strike': otm_put_data[2], 'last': otm_put_data[3],                   'bid': otm_put_data[4], 'ask': otm_put_data[5], 'volume': otm_put_data[8], 'iv': otm_put_data[10]}

Congratulations! You just scraped data for all near-the-money options of the S&P 500 ETF, and can view them like this:

恭喜你! 您只需收集S&P 500 ETF所有近价期权的数据,就可以像这样查看它们:

print("\n\n") print(itm_call_info) print(otm_call_info) print(itm_put_info) print(otm_put_info)

Give your program a run — you should get data like this printed to the console:

运行您的程序-您应该将这样的数据打印到控制台:

{‘contract’: ‘SPY190417C00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘1.46’, ‘bid’: ‘1.48’, ‘ask’: ‘1.50’, ‘volume’: ‘4,646’, ‘iv’: ‘8.94%’}{‘contract’: ‘SPY190417C00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.80’, ‘bid’: ‘0.82’, ‘ask’: ‘0.83’, ‘volume’: ‘38,491’, ‘iv’: ‘8.06%’}{‘contract’: ‘SPY190417P00290000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘290.00’, ‘last’: ‘0.77’, ‘bid’: ‘0.75’, ‘ask’: ‘0.78’, ‘volume’: ‘11,310’, ‘iv’: ‘7.30%’}{‘contract’: ‘SPY190417P00289000’, ‘last_trade’: ‘2019–04–15’, ‘strike’: ‘289.00’, ‘last’: ‘0.41’, ‘bid’: ‘0.40’, ‘ask’: ‘0.42’, ‘volume’: ‘44,319’, ‘iv’: ‘7.79%’}

设置定期数据收集 (Setting up recurring data collection)

Yahoo, by default, only returns the options for the date you specify. It’s this part of the URL: https://finance.yahoo.com/quote/SPY/options?date=1555459200

默认情况下,Yahoo仅返回您指定日期的选项。 这是URL的这一部分: https: //finance.yahoo.com/quote/SPY/options ? date = 1555459200

This is a Unix timestamp, so we’ll need to generate or scrape one, rather than hardcoding it in our program.

这是Unix时间戳,因此我们需要生成或刮取一个时间戳,而不是在程序中对其进行硬编码。

Add some dependencies:

添加一些依赖项:

import datetime, time

Let’s write a quick script to generate and verify a Unix timestamp for our next set of options:

让我们编写一个快速脚本来为下一组选项生成并验证Unix时间戳:

def get_datestamp():  options_url = “https://finance.yahoo.com/quote/SPY/options?date="  today = int(time.time())  # print(today)  date = datetime.datetime.fromtimestamp(today)  yy = date.year  mm = date.month  dd = date.day

The above code holds the base URL of the page we are scraping and generates a datetime.date object for us to use in the future.

上面的代码保存了我们要抓取的页面的基本URL,并生成了datetime.date对象供我们将来使用。

Let’s increment this date by one day, so we don’t get options that have already expired.

让我们将此日期增加一天,这样我们就不会得到已经到期的选项。

dd += 1

Now, we need to convert it back into a Unix timestamp and make sure it’s a valid date for options contracts:

现在,我们需要将其转换回Unix时间戳,并确保它是期权合约的有效日期:

options_day = datetime.date(yy, mm, dd) datestamp = int(time.mktime(options_day.timetuple())) # print(datestamp) # print(datetime.datetime.fromtimestamp(options_stamp))
# vet timestamp, then return if valid for i in range(0, 7):   test_req = requests.get(options_url + str(datestamp)).content   content = BeautifulSoup(test_req, “html.parser”)   # print(content)   tables = content.find_all(“table”)
if tables != []:   # print(datestamp)   return str(datestamp) else:   # print(“Bad datestamp!”)   dd += 1   options_day = datetime.date(yy, mm, dd)   datestamp = int(time.mktime(options_day.timetuple()))  return str(-1)

Let’s adapt our fetch_options method to use a dynamic timestamp to fetch options data, rather than whatever Yahoo wants to give us as the default.

让我们调整fetch_options方法以使用动态时间戳来获取选项数据,而不是Yahoo想要给我们的默认值。

Change this line:

更改此行:

data_url = “https://finance.yahoo.com/quote/SPY/options"

To this:

对此:

datestamp = get_datestamp()data_url = “https://finance.yahoo.com/quote/SPY/options?date=" + datestamp

Congratulations! You just scraped real-world options data from the web.

恭喜你! 您只是从网上抓取了真实的期权数据。

Now we need to do some simple file I/O and set up a timer to record this data each day after market close.

现在,我们需要执行一些简单的文件I / O,并设置一个计时器,以在收市后每天记录此数据。

改善程序 (Improving the program)

Rename main() to fetch_options() and add these lines to the bottom:

main()重命名为fetch_options()并将这些行添加到底部:

options_list = {‘calls’: {‘itm’: itm_call_info, ‘otm’: otm_call_info}, ‘puts’: {‘itm’: itm_put_info, ‘otm’: otm_put_info}, ‘date’: datetime.date.fromtimestamp(time.time()).strftime(“%Y-%m-%d”)}return options_list

Create a new method called schedule(). We’ll use this to control when we scrape for options, every twenty-four hours after market close. Add this code to schedule our first job at the next market close:

创建一个名为schedule()的新方法。 市场收盘后每隔24小时,我们将使用它来控制何时刮取期权。 添加以下代码以安排我们在下一个市场收盘时的第一份工作:

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()
def schedule():  scheduler.add_job(func=run, trigger=”date”, run_date = datetime.datetime.now())  scheduler.start()

In your if __name__ == “__main__”: statement, delete main() and add a call to schedule() to set up your first scheduled job.

if __name__ == “__main__”:语句中,删除main()并添加对schedule()的调用以设置您的第一个计划作业。

Create another method called run(). This is where we’ll handle the bulk of our operations, including actually saving the market data. Add this to the body of run():

创建另一个名为run()方法。 我们将在这里处理大部分业务,包括实际保存市场数据。 将此添加到run()的主体中:

today = int(time.time()) date = datetime.datetime.fromtimestamp(today) yy = date.year mm = date.month dd = date.day
# must use 12:30 for Unix time instead of 4:30 NY time next_close = datetime.datetime(yy, mm, dd, 12, 30)
# do operations here “”” This is where we’ll write our last bit of code. “””
# schedule next job scheduler.add_job(func=run, trigger=”date”, run_date = next_close)
print(“Job scheduled! | “ + str(next_close))

This lets our code call itself in the future, so we can just put it on a server and build up our options data each day. Add this code to actually fetch data under “”” This is where we’ll write our last bit of code. “””

这样一来,我们的代码就可以在将来自行调用,因此我们可以将其放在服务器上,并每天建立选项数据。 添加此代码以实际获取“”” This is where we'll write our last bit of code. “””下的数据。 “”” This is where we'll write our last bit of code. “”” “”” This is where we'll write our last bit of code. “””

options = {}
# ensures option data doesn’t break the program if internet is out try:   if next_close > datetime.datetime.now():     print(“Market is still open! Waiting until after close…”)   else:     # ensures program was run after market hours     if next_close < datetime.datetime.now():      dd += 1       next_close = datetime.datetime(yy, mm, dd, 12, 30)       options = fetch_options()       print(options)       # write to file       write_to_csv(options)except:  print(“Check your connection and try again.”)

保存数据 (Saving data)

You may have noticed that write_to_csv isn’t implemented yet. No worries — let’s take care of that here:

您可能已经注意到write_to_csv尚未实现。 不用担心-让我们在这里解决:

def write_to_csv(options_data):  import csv  with open(‘options.csv’, ‘a’, newline=’\n’) as csvfile:  spamwriter = csv.writer(csvfile, delimiter=’,’)  spamwriter.writerow([str(options_data)])

打扫干净 (Cleaning up)

As options contracts are time-sensitive, we might want to add a field for their expiration date. This capability is not included in the raw HTML we scraped.

由于期权合约对时间敏感,因此我们可能想为其到期日添加一个字段。 此功能未包含在我们抓取的原始HTML中。

Add this line of code to save and format the expiration date towards the top of fetch_options():

添加以下代码行以保存到期日期并将其格式化为fetch_options()的顶部:

expiration =  datetime.datetime.fromtimestamp(int(get_datestamp())).strftime("%Y-%m-%d")

Add ‘expiration’: expiration to the end of each option_info dictionary like so:

在每个option_info字典的末尾添加'expiration': expiration ,如下所示:

itm_call_info = {'contract': itm_call_data[0],  'strike': itm_call_data[2], 'last': itm_call_data[3],   'bid': itm_call_data[4], 'ask': itm_call_data[5], 'volume': itm_call_data[8], 'iv': itm_call_data[10], 'expiration': expiration}

Give your new program a run — it’ll scrape the latest options data and write it to a .csv file as a string representation of a dictionary. The .csv file will be ready to be parsed by a backtesting program or served to users through a webapp. Congratulations!

运行您的新程序-它会刮擦最新的选项数据,并将其作为字典的字符串表示形式写入.csv文件。 .csv文件将可以通过回测程序进行解析,也可以通过网络应用程序提供给用户。 恭喜你!

翻译自: https://www.freecodecamp.org/news/how-i-get-options-data-for-free-fba22d395cc8/

期权数据 获取

期权数据 获取_我如何免费获得期权数据相关推荐

  1. python题库和答案_Python数据分析与数据可视化_题库免费答案2020

    Python数据分析与数据可视化_题库免费答案2020 更多相关问题 撰写纪要时,发现与会者发言质量不高时,可以进行拔高.提炼,一定要做好后期加工.() 在发布的对象中,凡是属于法规性文件,标题和正文 ...

  2. 数据科学家数据分析师_站出来! 分析人员,数据科学家和其他所有人的领导和沟通技巧...

    数据科学家数据分析师 这一切如何发生? (How did this All Happen?) As I reflect on my life over the past few years, even ...

  3. 数据科学家数据分析师_使您的分析师和数据科学家在数据处理方面保持一致

    数据科学家数据分析师 According to a recent survey conducted by Dimensional Research, only 50 percent of data a ...

  4. jquery数据折叠_通过位折叠缩小大数据

    jquery数据折叠 Sometimes your dataset is just too large, and you need a way to shrink it down to a reaso ...

  5. mongodb数据可视化_使用MongoDB实时可视化开放数据

    mongodb数据可视化 Using Python to connect to Taiwan Government PM2.5 open data API, and schedule to updat ...

  6. gan 总结 数据增强_[NLP]聊一聊,预处理和数据增强技术

    在基于margin-loss的句子相似度这个项目中,为了验证想法,找不到开放数据集,因此自己从新浪爱问爬取了数据.自己爬的数据和学界开放的数据对比,数据显得非常脏.这里有三个含义:第一:数据不规范,比 ...

  7. spss23出现数据消失_改善23亿人口健康数据的可视化

    spss23出现数据消失 District Health Information Software, or DHIS2, is one of the most important sources of ...

  8. informatica数据脱敏_助您首个大数据项目破茧成蝶的实践指南

    自从本世纪初软件应用开始在整个业务流程中盛行以来,一个不争的事实就是:数据改变了我们的工作方式.越来越多的企业认识到必须在大数据方面有所作为,但他们却并未切实规划出如何开展这项工作.而调查发现,切实展 ...

  9. 大数据算法_【中科大】大数据算法(2020年春季)

    算法与理论是计算机科学的核心领域之一.随着大数据时代的来临,传统的算法理论已经不能很好地解决人工智能. 物联网.工业制造等领域所遇到的实际问题.本门课程主要介绍基于大数据的新型算法技术,如随机采样.数 ...

最新文章

  1. 研发投入超876亿的华为,将如何进击云+AI?
  2. 【错误记录】MAC 存储空间 “其它“ 内容清理
  3. 学多少返多少 | 人工智能核心课零门槛就业涨薪培养计划
  4. linux-shell命令之date【显示/设置系统日期/时间】
  5. PageLayoutControl的基本操作
  6. 第一次作业:深入源码分析进程模型
  7. dg修改归档目录 oracle_OracleDG主库丢失归档增量同步
  8. 深入理解javascript中的立即执行函数(function(){…})() 1
  9. ListView控件数据操作——通过代码在窗体上添加控件
  10. php怎么求阶乘_php递归函数求阶乘
  11. 调用百度地图API进行当前位置定位失败解决方法
  12. 浅谈程序员的数学修养
  13. 微分: 全微分定义、偏导数、梯度
  14. 新华三:助力IPv6部署,我们责无旁贷
  15. 怎么将一个音频无损剪切
  16. 如何在IDEA中创建一个项目
  17. [SQLite 开发] 移除掉一个字串中的字头或字尾空格(TRIM)
  18. 使用C++求命题公式的主析取范式与主合取范式
  19. 我们都老得太快,却聪明得太迟
  20. Spring4定时器 cronTrigger和simpleTrigger实现方法

热门文章

  1. CSS清除默认样式,聪明人已经收藏了!
  2. 阿里大神最佳总结Flutter进阶学习笔记,技术详细介绍
  3. git学习心得之从远程仓库克隆
  4. WPF自定义控件之列表滑动特效 PowerListBox
  5. C#格式化字符串中转义大括号“{}”
  6. ComponentOne FlexGrid for WinForms 中文版快速入门(9)—过滤
  7. Windows Phone开发(46):与Socket有个约会 转:http://blog.csdn.net/tcjiaan/article/details/7669315...
  8. CSS提高需要关注的国外网站
  9. MSDN 论坛好帮手3月首发
  10. 解决自建ca认证后浏览器警告