一个股票数据(沪深)爬虫和选股策略测试框架,数据基于雅虎YQL和新浪财经。

  • 根据选定的日期范围抓取所有沪深两市股票的行情数据。
  • 根据指定的选股策略和指定的日期进行选股测试。
  • 计算选股测试实际结果(包括与沪深300指数比较)。
  • 保存数据到JSON文件、CSV文件。
  • 支持使用表达式定义选股策略。
  • 支持多线程处理。

代码

main.py

from stockholm import Stockholm
import option
import osdef checkFoldPermission(path):if(path == 'USER_HOME/tmp/stockholm_export'):path = os.path.expanduser('~') + '/tmp/stockholm_export'try:if not os.path.exists(path):os.makedirs(path)else:txt = open(path + os.sep + "test.txt","w")txt.write("test")txt.close()os.remove(path + os.sep + "test.txt")except Exception as e:print(e)return Falsereturn Truedef main():args = option.parser.parse_args()if not checkFoldPermission(args.store_path):print('\nPermission denied: %s' % args.store_path)print('Please make sure you have the permission to save the data!\n')else:print('Stockholm is starting...\n')stockh = Stockholm(args)stockh.run()print('Stockholm is done...\n')if __name__ == '__main__':main()

option.py

import argparse
import datetimedef get_date_str(offset):if(offset is None):offset = 0date_str = (datetime.datetime.today() + datetime.timedelta(days=offset)).strftime("%Y-%m-%d")return date_str_default = dict(reload_data = 'Y',gen_portfolio = 'N',output_type = 'json',charset = 'utf-8',test_date_range = 60,start_date = get_date_str(-90),end_date = get_date_str(None),target_date = get_date_str(None),store_path = 'USER_HOME/tmp/stockholm_export',thread = 10,testfile_path = './portfolio_test.txt',db_name = 'stockholm',methods = '')parser = argparse.ArgumentParser(description='A stock crawler and portfolio testing framework.') parser.add_argument('--reload', type=str, default=_default['reload_data'], dest='reload_data', help='Reload the stock data or not (Y/N), Default: %s' % _default['reload_data'])parser.add_argument('--portfolio', type=str, default=_default['gen_portfolio'], dest='gen_portfolio', help='Generate the portfolio or not (Y/N), Default: %s' % _default['gen_portfolio'])parser.add_argument('--output', type=str, default=_default['output_type'], dest='output_type', help='Data output type (json/csv/all), Default: %s' % _default['output_type'])parser.add_argument('--charset', type=str, default=_default['charset'], dest='charset', help='Data output charset (utf-8/gbk), Default: %s' % _default['charset'])parser.add_argument('--testrange', type=int, default=_default['test_date_range'], dest='test_date_range', help='Test date range(days): %s' % _default['test_date_range'])parser.add_argument('--startdate', type=str, default=_default['start_date'], dest='start_date', help='Data loading start date, Default: %s' % _default['start_date'])parser.add_argument('--enddate', type=str, default=_default['end_date'], dest='end_date', help='Data loading end date, Default: %s' % _default['end_date'])parser.add_argument('--targetdate', type=str, default=_default['target_date'], dest='target_date', help='Portfolio generating target date, Default: %s' % _default['target_date'])parser.add_argument('--storepath', type=str, default=_default['store_path'], dest='store_path', help='Data file store path, Default: %s' % _default['store_path'])parser.add_argument('--thread', type=int, default=_default['thread'], dest='thread', help='Thread number, Default: %s' % _default['thread'])parser.add_argument('--testfile', type=str, default=_default['testfile_path'], dest='testfile_path', help='Portfolio test file path, Default: %s' % _default['testfile_path'])parser.add_argument('--dbname', type=str, default=_default['db_name'], dest='db_name', help='MongoDB DB name, Default: %s' % _default['db_name'])parser.add_argument('--methods', type=str, default=_default['methods'], dest='methods', help='Target methods for back testing, Default: %s' % _default['methods'])def main():args = parser.parse_args()print(args)if __name__ == '__main__':main()

stockholm.py

#coding:utf-8
import requests
import json
import datetime
import timeit
import time
import io
import os
import csv
import re
from pymongo import MongoClient
from multiprocessing.dummy import Pool as ThreadPool
from functools import partialclass Stockholm(object):def __init__(self, args):## flag of if need to reload all stock dataself.reload_data = args.reload_data## flag of if need to generate portfolioself.gen_portfolio = args.gen_portfolio## type of output file json/csv or bothself.output_type = args.output_type## charset of output file utf-8/gbkself.charset = args.charset## portfolio testing date range(# of days)self.test_date_range = args.test_date_range## stock data loading start date(e.g. 2014-09-14)self.start_date = args.start_date## stock data loading end dateself.end_date = args.end_date## portfolio generating target dateself.target_date = args.target_date## thread numberself.thread = args.thread## data file store pathif(args.store_path == 'USER_HOME/tmp/stockholm_export'):self.export_folder = os.path.expanduser('~') + '/tmp/stockholm_export'else:self.export_folder = args.store_path## portfolio testing file pathself.testfile_path = args.testfile_path## methods for back testingself.methods = args.methods## for getting quote symbolsself.all_quotes_url = 'http://money.finance.sina.com.cn/d/api/openapi_proxy.php'## for loading quote dataself.yql_url = 'http://query.yahooapis.com/v1/public/yql'## export file nameself.export_file_name = 'stockholm_export'self.index_array = ['000001.SS', '399001.SZ', '000300.SS']self.sh000001 = {'Symbol': '000001.SS', 'Name': '上证指数'}self.sz399001 = {'Symbol': '399001.SZ', 'Name': '深证成指'}self.sh000300 = {'Symbol': '000300.SS', 'Name': '沪深300'}## self.sz399005 = {'Symbol': '399005.SZ', 'Name': '中小板指'}## self.sz399006 = {'Symbol': '399006.SZ', 'Name': '创业板指'}## mongodb infoself.mongo_url = 'localhost'self.mongo_port = 27017self.database_name = args.db_nameself.collection_name = 'testing_method'def get_columns(self, quote):columns = []if(quote is not None):for key in quote.keys():if(key == 'Data'):for data_key in quote['Data'][-1]:columns.append("data." + data_key)else:columns.append(key)columns.sort()return columnsdef get_profit_rate(self, price1, price2):if(price1 == 0):return Noneelse:return round((price2-price1)/price1, 5)def get_MA(self, number_array):total = 0n = 0for num in number_array:if num is not None and num != 0:n += 1total += numreturn round(total/n, 3)def convert_value_check(self, exp):val = exp.replace('day', 'quote[\'Data\']').replace('(0)', '(-0)')val = re.sub(r'\(((-)?\d+)\)', r'[target_idx\g<1>]', val)val = re.sub(r'\.\{((-)?\w+)\}', r"['\g<1>']", val)return valdef convert_null_check(self, exp):p = re.compile('\((-)?\d+...\w+\}')iterator = p.finditer(exp.replace('(0)', '(-0)'))array = []for match in iterator:v = 'quote[\'Data\']' + match.group()v = re.sub(r'\(((-)?\d+)\)', r'[target_idx\g<1>]', v)v = re.sub(r'\.\{((-)?\w+)\}', r"['\g<1>']", v)v += ' is not None'array.append(v)val = ' and '.join(array)return valclass KDJ():def _avg(self, array):length = len(array)return sum(array)/lengthdef _getMA(self, values, window):array = []x = windowwhile x <= len(values):curmb = 50if(x-window == 0):curmb = self._avg(values[x-window:x])else:curmb = (array[-1]*2+values[x-1])/3array.append(round(curmb,3))x += 1return arraydef _getRSV(self, arrays):rsv = []x = 9while x <= len(arrays):high = max(map(lambda x: x['High'], arrays[x-9:x]))low = min(map(lambda x: x['Low'], arrays[x-9:x]))close = arrays[x-1]['Close']rsv.append((close-low)/(high-low)*100)t = arrays[x-1]['Date']x += 1return rsvdef getKDJ(self, quote_data):if(len(quote_data) > 12):rsv = self._getRSV(quote_data)k = self._getMA(rsv,3)d = self._getMA(k,3)j = list(map(lambda x: round(3*x[0]-2*x[1],3), zip(k[2:], d)))for idx, data in enumerate(quote_data[0:12]):data['KDJ_K'] = Nonedata['KDJ_D'] = Nonedata['KDJ_J'] = Nonefor idx, data in enumerate(quote_data[12:]):data['KDJ_K'] = k[2:][idx]data['KDJ_D'] = d[idx]if(j[idx] > 100):data['KDJ_J'] = 100elif(j[idx] < 0):data['KDJ_J'] = 0else:data['KDJ_J'] = j[idx]return quote_datadef load_all_quote_symbol(self):print("load_all_quote_symbol start..." + "\n")start = timeit.default_timer()all_quotes = []all_quotes.append(self.sh000001)all_quotes.append(self.sz399001)all_quotes.append(self.sh000300)## all_quotes.append(self.sz399005)## all_quotes.append(self.sz399006)try:count = 1while (count < 100):para_val = '[["hq","hs_a","",0,' + str(count) + ',500]]'r_params = {'__s': para_val}r = requests.get(self.all_quotes_url, params=r_params)if(len(r.json()[0]['items']) == 0):breakfor item in r.json()[0]['items']:quote = {}code = item[0]name = item[2]## convert quote codeif(code.find('sh') > -1):code = code[2:] + '.SS'elif(code.find('sz') > -1):code = code[2:] + '.SZ'## convert quote code endquote['Symbol'] = codequote['Name'] = nameall_quotes.append(quote)count += 1except Exception as e:print("Error: Failed to load all stock symbol..." + "\n")print(e)print("load_all_quote_symbol end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")return all_quotesdef load_quote_info(self, quote, is_retry):print("load_quote_info start..." + "\n")start = timeit.default_timer()if(quote is not None and quote['Symbol'] is not None):yquery = 'select * from yahoo.finance.quotes where symbol = "' + quote['Symbol'].lower() + '"'r_params = {'q': yquery, 'format': 'json', 'env': 'http://datatables.org/alltables.env'}r = requests.get(self.yql_url, params=r_params)## print(r.url)## print(r.text)rjson = r.json()try:quote_info = rjson['query']['results']['quote']quote['LastTradeDate'] = quote_info['LastTradeDate']quote['LastTradePrice'] = quote_info['LastTradePriceOnly']quote['PreviousClose'] = quote_info['PreviousClose']quote['Open'] = quote_info['Open']quote['DaysLow'] = quote_info['DaysLow']quote['DaysHigh'] = quote_info['DaysHigh']quote['Change'] = quote_info['Change']quote['ChangeinPercent'] = quote_info['ChangeinPercent']quote['Volume'] = quote_info['Volume']quote['MarketCap'] = quote_info['MarketCapitalization']quote['StockExchange'] = quote_info['StockExchange']except Exception as e:print("Error: Failed to load stock info... " + quote['Symbol'] + "/" + quote['Name'] + "\n")print(e + "\n")if(not is_retry):time.sleep(1)load_quote_info(quote, True) ## retry once for network issue## print(quote)print("load_quote_info end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")return quotedef load_all_quote_info(self, all_quotes):print("load_all_quote_info start...")start = timeit.default_timer()for idx, quote in enumerate(all_quotes):print("#" + str(idx + 1))load_quote_info(quote, False)print("load_all_quote_info end... time cost: " + str(round(timeit.default_timer() - start)) + "s")return all_quotesdef load_quote_data(self, quote, start_date, end_date, is_retry, counter):## print("load_quote_data start..." + "\n")start = timeit.default_timer()if(quote is not None and quote['Symbol'] is not None):        yquery = 'select * from yahoo.finance.historicaldata where symbol = "' + quote['Symbol'].upper() + '" and startDate = "' + start_date + '" and endDate = "' + end_date + '"'r_params = {'q': yquery, 'format': 'json', 'env': 'http://datatables.org/alltables.env'}try:r = requests.get(self.yql_url, params=r_params)## print(r.url)## print(r.text)rjson = r.json()quote_data = rjson['query']['results']['quote']quote_data.reverse()quote['Data'] = quote_dataif(not is_retry):counter.append(1)          except:print("Error: Failed to load stock data... " + quote['Symbol'] + "/" + quote['Name'] + "\n")if(not is_retry):time.sleep(2)self.load_quote_data(quote, start_date, end_date, True, counter) ## retry once for network issueprint("load_quote_data " + quote['Symbol'] + "/" + quote['Name'] + " end..." + "\n")## print("time cost: " + str(round(timeit.default_timer() - start)) + "s." + "\n")## print("total count: " + str(len(counter)) + "\n")return quotedef load_all_quote_data(self, all_quotes, start_date, end_date):print("load_all_quote_data start..." + "\n")start = timeit.default_timer()counter = []mapfunc = partial(self.load_quote_data, start_date=start_date, end_date=end_date, is_retry=False, counter=counter)pool = ThreadPool(self.thread)pool.map(mapfunc, all_quotes) ## multi-threads executingpool.close() pool.join()print("load_all_quote_data end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")return all_quotesdef data_process(self, all_quotes):print("data_process start..." + "\n")kdj = self.KDJ()start = timeit.default_timer()for quote in all_quotes:if(quote['Symbol'].startswith('300')):quote['Type'] = '创业板'elif(quote['Symbol'].startswith('002')):quote['Type'] = '中小板'else:quote['Type'] = '主板'if('Data' in quote):try:temp_data = []for quote_data in quote['Data']:if(quote_data['Volume'] != '000' or quote_data['Symbol'] in self.index_array):d = {}d['Open'] = float(quote_data['Open'])## d['Adj_Close'] = float(quote_data['Adj_Close'])d['Close'] = float(quote_data['Close'])d['High'] = float(quote_data['High'])d['Low'] = float(quote_data['Low'])d['Volume'] = int(quote_data['Volume'])d['Date'] = quote_data['Date']temp_data.append(d)quote['Data'] = temp_dataexcept KeyError as e:print("Data Process: Key Error")print(e)print(quote)## calculate Change / 5 10 20 30 Day MAfor quote in all_quotes:if('Data' in quote):try:for i, quote_data in enumerate(quote['Data']):if(i > 0):quote_data['Change'] = self.get_profit_rate(quote['Data'][i-1]['Close'], quote_data['Close'])quote_data['Vol_Change'] = self.get_profit_rate(quote['Data'][i-1]['Volume'], quote_data['Volume'])                        else:quote_data['Change'] = Nonequote_data['Vol_Change'] = Nonelast_5_array = []last_10_array = []last_20_array = []last_30_array = []for i, quote_data in enumerate(quote['Data']):last_5_array.append(quote_data['Close'])last_10_array.append(quote_data['Close'])last_20_array.append(quote_data['Close'])last_30_array.append(quote_data['Close'])quote_data['MA_5'] = Nonequote_data['MA_10'] = Nonequote_data['MA_20'] = Nonequote_data['MA_30'] = Noneif(i < 4):continueif(len(last_5_array) == 5):last_5_array.pop(0)quote_data['MA_5'] = self.get_MA(last_5_array)if(i < 9):continueif(len(last_10_array) == 10):last_10_array.pop(0)quote_data['MA_10'] = self.get_MA(last_10_array)if(i < 19):continueif(len(last_20_array) == 20):last_20_array.pop(0)quote_data['MA_20'] = self.get_MA(last_20_array)if(i < 29):continueif(len(last_30_array) == 30):last_30_array.pop(0)quote_data['MA_30'] = self.get_MA(last_30_array)except KeyError as e:print("Key Error")print(e)print(quote)## calculate KDJfor quote in all_quotes:if('Data' in quote):try:kdj.getKDJ(quote['Data'])except KeyError as e:print("Key Error")print(e)print(quote)print("data_process end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")def data_export(self, all_quotes, export_type_array, file_name):start = timeit.default_timer()directory = self.export_folderif(file_name is None):file_name = self.export_file_nameif not os.path.exists(directory):os.makedirs(directory)if(all_quotes is None or len(all_quotes) == 0):print("no data to export...\n")if('json' in export_type_array):print("start export to JSON file...\n")f = io.open(directory + '/' + file_name + '.json', 'w', encoding=self.charset)json.dump(all_quotes, f, ensure_ascii=False)if('csv' in export_type_array):print("start export to CSV file...\n")columns = []if(all_quotes is not None and len(all_quotes) > 0):columns = self.get_columns(all_quotes[0])writer = csv.writer(open(directory + '/' + file_name + '.csv', 'w', encoding=self.charset))writer.writerow(columns)for quote in all_quotes:if('Data' in quote):for quote_data in quote['Data']:try:line = []for column in columns:if(column.find('data.') > -1):if(column[5:] in quote_data):line.append(quote_data[column[5:]])else:line.append(quote[column])writer.writerow(line)except Exception as e:print(e)print("write csv error: " + quote)if('mongo' in export_type_array):print("start export to MongoDB...\n")print("export is complete... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")def file_data_load(self):print("file_data_load start..." + "\n")start = timeit.default_timer()directory = self.export_folderfile_name = self.export_file_nameall_quotes_data = []f = io.open(directory + '/' + file_name + '.json', 'r', encoding='utf-8')json_str = f.readline()all_quotes_data = json.loads(json_str)print("file_data_load end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")return all_quotes_datadef check_date(self, all_quotes, date):is_date_valid = Falsefor quote in all_quotes:if(quote['Symbol'] in self.index_array):for quote_data in quote['Data']:if(quote_data['Date'] == date):is_date_valid = Trueif not is_date_valid:print(date + " is not valid...\n")return is_date_validdef quote_pick(self, all_quotes, target_date, methods):print("quote_pick start..." + "\n")start = timeit.default_timer()results = []data_issue_count = 0for quote in all_quotes:try:if(quote['Symbol'] in self.index_array):results.append(quote)continuetarget_idx = Nonefor idx, quote_data in enumerate(quote['Data']):if(quote_data['Date'] == target_date):target_idx = idxif(target_idx is None):## print(quote['Name'] + " data is not available at this date..." + "\n")data_issue_count+=1continue## pick logic ##valid = Falsefor method in methods:## print(method['name'])## null_check = eval(method['null_check'])try:value_check = eval(method['value_check'])if(value_check):quote['Method'] = method['name']results.append(quote)valid = Truebreakexcept:valid = Falseif(valid):continue## pick logic end ##except KeyError as e:## print("KeyError: " + quote['Name'] + " data is not available..." + "\n")data_issue_count+=1print("quote_pick end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")print(str(data_issue_count) + " quotes of data is not available...\n")return resultsdef profit_test(self, selected_quotes, target_date):print("profit_test start..." + "\n")start = timeit.default_timer()results = []INDEX = NoneINDEX_idx = 0for quote in selected_quotes:if(quote['Symbol'] == self.sh000300['Symbol']):INDEX = quotefor idx, quote_data in enumerate(quote['Data']):if(quote_data['Date'] == target_date):INDEX_idx = idxbreakfor quote in selected_quotes:target_idx = Noneif(quote['Symbol'] in self.index_array):continuefor idx, quote_data in enumerate(quote['Data']):if(quote_data['Date'] == target_date):target_idx = idxif(target_idx is None):print(quote['Name'] + " data is not available for testing..." + "\n")continuetest = {}test['Name'] = quote['Name']test['Symbol'] = quote['Symbol']test['Method'] = quote['Method']test['Type'] = quote['Type']if('KDJ_K' in quote['Data'][target_idx]):test['KDJ_K'] = quote['Data'][target_idx]['KDJ_K']test['KDJ_D'] = quote['Data'][target_idx]['KDJ_D']test['KDJ_J'] = quote['Data'][target_idx]['KDJ_J']test['Close'] = quote['Data'][target_idx]['Close']test['Change'] = quote['Data'][target_idx]['Change']test['Vol_Change'] = quote['Data'][target_idx]['Vol_Change']test['MA_5'] = quote['Data'][target_idx]['MA_5']test['MA_10'] = quote['Data'][target_idx]['MA_10']test['MA_20'] = quote['Data'][target_idx]['MA_20']test['MA_30'] = quote['Data'][target_idx]['MA_30']test['Data'] = [{}]for i in range(1,11):if(target_idx+i >= len(quote['Data'])):print(quote['Name'] + " data is not available for " + str(i) + " day testing..." + "\n")breakday2day_profit = self.get_profit_rate(quote['Data'][target_idx]['Close'], quote['Data'][target_idx+i]['Close'])test['Data'][0]['Day_' + str(i) + '_Profit'] = day2day_profitif(INDEX_idx+i < len(INDEX['Data'])):day2day_INDEX_change = self.get_profit_rate(INDEX['Data'][INDEX_idx]['Close'], INDEX['Data'][INDEX_idx+i]['Close'])test['Data'][0]['Day_' + str(i) + '_INDEX_Change'] = day2day_INDEX_changetest['Data'][0]['Day_' + str(i) + '_Differ'] = day2day_profit-day2day_INDEX_changeresults.append(test)print("profit_test end... time cost: " + str(round(timeit.default_timer() - start)) + "s" + "\n")return resultsdef data_load(self, start_date, end_date, output_types):all_quotes = self.load_all_quote_symbol()print("total " + str(len(all_quotes)) + " quotes are loaded..." + "\n")all_quotes = all_quotes## self.load_all_quote_info(all_quotes)self.load_all_quote_data(all_quotes, start_date, end_date)self.data_process(all_quotes)self.data_export(all_quotes, output_types, None)def data_test(self, target_date, test_range, output_types):## loading test methodsmethods = []path = self.testfile_path## from mongodbif(path == 'mongodb'):print("Load testing methods from Mongodb...\n")client = MongoClient(self.mongo_url, self.mongo_port)db = client[self.database_name]col = db[self.collection_name]q = Noneif(len(self.methods) > 0):applied_methods = list(map(int, self.methods.split(',')))q = {"method_id": {"$in": applied_methods}}for doc in col.find(q, ['name','desc','method']):print(doc)m = {'name': doc['name'], 'value_check': self.convert_value_check(doc['method'])}methods.append(m)## from test fileelse:if not os.path.exists(path):print("Portfolio test file is not existed, testing is aborted...\n")returnf = io.open(path, 'r', encoding='utf-8')for line in f:if(line.startswith('##') or len(line.strip()) == 0):continueline = line.strip().strip('\n')name = line[line.find('[')+1:line.find(']:')]value = line[line.find(']:')+2:]m = {'name': name, 'value_check': self.convert_value_check(value)}methods.append(m)if(len(methods) == 0):print("No method is loaded, testing is aborted...\n")return## portfolio testing all_quotes = self.file_data_load()target_date_time = datetime.datetime.strptime(target_date, "%Y-%m-%d")for i in range(test_range):date = (target_date_time - datetime.timedelta(days=i)).strftime("%Y-%m-%d")is_date_valid = self.check_date(all_quotes, date)if is_date_valid:selected_quotes = self.quote_pick(all_quotes, date, methods)res = self.profit_test(selected_quotes, date)self.data_export(res, output_types, 'result_' + date)def run(self):## output typesoutput_types = []if(self.output_type == "json"):output_types.append("json")elif(self.output_type == "csv"):output_types.append("csv")elif(self.output_type == "all"):output_types = ["json", "csv"]## loading stock dataif(self.reload_data == 'Y'):print("Start loading stock data...\n")self.data_load(self.start_date, self.end_date, output_types)## test & generate portfolioif(self.gen_portfolio == 'Y'):print("Start portfolio testing...\n")self.data_test(self.target_date, self.test_date_range, output_types)

mongo_scripts.txt

use stockholmdb.counters.insert({_id: "method_id",seq: 0}
)function getNextSequence(name) {var ret = db.counters.findAndModify({query: { _id: name },update: { $inc: { seq: 1 } },new: true});return ret.seq;
}db.testing_method.insert({"method_id": getNextSequence("method_id"), "name":"测试方法1", "desc":"这是一个测试方法。", "user_name":"Stockholm", "user_id":"dtnium@gmail.com", "creation_date": new Date(), "modification_date": new Date(), "method":"day(-2).{KDJ_J}<20 and day(-1).{KDJ_J}<20 and day(0).{KDJ_J}-day(-1).{KDJ_J}>=40 and day(0).{Vol_Change}>=1 and day(0).{MA_10}*1.05>day(0).{Close}"})db.testing_method.insert({"method_id": getNextSequence("method_id"), "name":"测试方法2", "desc":"这是一个测试方法。", "user_name":"Stockholm", "user_id":"dtnium@gmail.com", "creation_date": new Date(), "modification_date": new Date(), "method":"day(-2).{KDJ_J}-day(-1).{KDJ_J}>20 and day(0).{KDJ_J}-day(-1).{KDJ_J}>20 and day(-1).{KDJ_J}<50 and day(0).{Vol_Change}<=1"})

portfolio_test.txt

## Portfolio selection methodology sample file[测试方法1]:day(-2).{KDJ_J}<20 and day(-1).{KDJ_J}<20 and day(0).{KDJ_J}-day(-1).{KDJ_J}>=40 and day(0).{Vol_Change}>=1 and day(0).{MA_10}*1.05>day(0).{Close}[测试方法2]:day(-2).{KDJ_J}-day(-1).{KDJ_J}>20 and day(0).{KDJ_J}-day(-1).{KDJ_J}>20 and day(-1).{KDJ_J}<50 and day(0).{Vol_Change}<=1##[测试方法3]:50<day(-1).{KDJ_J}<80 and day(-2).{KDJ_J}<day(-1).{KDJ_J} and day(0).{KDJ_J}<day(-1).{KDJ_J}

运行时参数

--storepath c://test --output csv   --startdate 2015-09-01 --enddate 2015-12-07 --charset utf-8 --testfile ./portfolio_test.txt --reload Y --portfolio Y --thread 10

能干什么

如果你想基于沪深股市行情数据进行一些工作,它可以帮助你导出指定时间范围内所有沪深A股的行情数据和一些技术指标,包括代码、名称、开盘、收盘、最高、最低、成交量、均线、KDJ等。

还有些什么问题

行情数据目前来源于雅虎YQL,每日数据的更新时间不太稳定(一般在中国时间午夜左右)。

环境

Python 3.4以上

pip install requests
pip install pymongo

使用

python main.py [-h] [--reload {Y,N}] [--portfolio {Y,N}] [--output {json,csv,all}] [--storepath PATH] [--thread NUM] [--startdate yyyy-MM-dd] [--enddate yyyy-MM-dd] [--targetdate yyyy-MM-dd] [--testrange NUM] [--testfile PATH]

可选参数

  -h, --help                  查看帮助并退出
  --reload {Y,N}              是否重新抓取股票数据,默认值:Y
  --portfolio {Y,N}           是否生成选股测试结果,默认值:N
  --output {json,csv,all}     输出文件格式,默认值:json
  --charset {utf-8,gbk}       输出文件编码,默认值:utf-8
  --storepath PATH            输出文件路径,默认值:~/tmp/stockholm_export
  --thread NUM                线程数,默认值:10
  --startdate yyyy-MM-dd      抓取数据的开始日期,默认值:当前系统日期-100天(例如2015-01-01)
  --enddate yyyy-MM-dd        抓取数据的结束日期,默认值:当前系统日期
  --targetdate yyyy-MM-dd     测试选股策略的目标日期,默认值:当前系统日期
  --testrange NUM             测试日期范围天数,默认值:50
  --testfile PATH             测试文件路径,默认值:./portfolio_test.txt

可用数据/格式

行情数据:

[{"Symbol": "600000.SS", "Name": "浦发银行","Data": [{"Vol_Change": null, "MA_10": null, "Date": "2015-03-26", "High": 15.58, "Open": 15.15, "Volume": 282340700, "Close": 15.36, "Change": null, "Low": 15.04}, {"Vol_Change": -0.22726, "MA_10": null, "Date": "2015-03-27", "High": 15.55, "Open": 15.32, "Volume": 218174900, "Close": 15.36, "Change": 0.0, "Low": 15.17}]}
]

Date(日期); Open(开盘价); Close(收盘价); High(当日最高); Low(当日最低); Change(价格变化%); Volume(成交量); Vol_Change(成交量较前日变化); MA_5(5日均线); MA_10(10日均线); MA_20(20日均线); MA_30(30日均线); KDJ_K(KDJ指标K); KDJ_D(KDJ指标D); KDJ_J(KDJ指标J);

选股策略测试数据:

[{"Symbol": "600000.SS", "Name": "浦发银行", "Close": 14.51, "Change": 0.06456,"Vol_Change": 2.39592, "MA_10": 14.171, "KDJ_K": 37.65, "KDJ_D": 33.427, "KDJ_J": 46.096, "Data": [{"Day_5_Differ": 0.01869, "Day_9_Profit": 0.08546, "Day_1_Profit": -0.02826, "Day_1_INDEX_Change": -0.00484, "Day_3_INDEX_Change": 0.01557, "Day_5_INDEX_Change": 0.04747, "Day_3_Differ": 0.02647, "Day_9_INDEX_Change": 0.1003, "Day_5_Profit": 0.06616, "Day_3_Profit": 0.04204, "Day_1_Differ": -0.02342, "Day_9_Differ": -0.014840000000000006}]}
]

Close(收盘价); Change(价格变化%); Vol_Change(成交量较前日变化); MA_10(十天均价); KDJ_K(KDJ指标K); KDJ_D(KDJ指标D); KDJ_J(KDJ指标J); Day_1_Profit(后一天利润率%); Day_1_INDEX_Change(后一天沪深300变化率%); Day_1_Differ(后一天相对利润率%——即利润率-沪深300变化率); Day_n_Profit(后n天利润率%); Day_n_INDEX_Change(后n天沪深300变化率%); Day_n_Differ(后n天相对利润率%——即利润率-沪深300变化率);

行情数据抓取范例

获取从当前日期倒推100天(不是100个交易日)的所有沪深股票行情数据。

执行完成后,数据在当前用户文件夹下./tmp/stockholm_export/stockholm_export.json

python main.py

如果想导出csv文件

python main.py --output=csv

选股策略测试范例

选股策略范例文件内容如下(包括在源码中)

选股策略”method 1”是:前前个交易日的KDJ指标的J值小于20+前个交易日的KDJ指标J值小于20+当前交易日的KDJ指标J值比上个交易日大40+当前交易日成交量变化大于100%

## Portfolio selection methodology sample file[method 1]:day(-2).{KDJ_J}<20 and day(-1).{KDJ_J}<20 and day(0).{KDJ_J}-day(-1).{KDJ_J}>=40 and day(0).{Vol_Change}>=1

以当前系统日期为目标日期进行倒推60天得选股策略测试。

不重新抓取行情数据并执行测试命令。

执行完毕后,会将测试结果按照每天一个文件的方式保存在./tmp/stockholm_export/。

文件名格式为result_yyyy-MM-dd.json(例如result_2015-03-24.json)。

python main.py --reload=N --portfolio=Y

通过更改测试文件中的选股策略公式,可以随意测试指定时间范围内的选股效果。

一个用Python编写的股票数据(沪深)爬虫和选股策略测试框架相关推荐

  1. python编写股票公式_一个用Python编写的股票数据(沪深)爬虫和选股策略测试框架...

    一个户外论坛的特点: 列出一些活动,有翻页功能,点向一个活动显示当前活动信息,在二楼一般显示报名名单! 需要的数据: 就是活动的信息, 报名的名单,价钱,主 一个股票数据(沪深)爬虫和选股策略测试框架 ...

  2. python查天气预报_一个用Python编写抓取天气预报的代码示例

    Python代码抓取获取天气预报信息源码讲解.这是一个用Python编写抓取天气预报的代码示例,用python写天气查询软件程序很简单.这段代码可以获取当地的天气和.任意城市的天气预报,原理是根据ur ...

  3. Python爬虫教程-Python爬取股票数据过程详解

    这篇文章主要介绍了基于Python爬取股票数据过程详解,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下 基本环境配置 python 3.6 pycha ...

  4. Python爬取香港交易所HKEX沪深港通历史持股数据

    Python爬取香港交易所HKEX沪深港通历史持股数据 https://www.cnblogs.com/chendongblog/p/12552402.html 使用Python爬取港交所股票行情数据 ...

  5. python爬取股票数据,以上证指数为例,可以爬取任何股票,保存为temp.csv文件

    python爬取股票数据,以上证指数为例,可以爬取任何股票,保存为temp.csv文件 import requests import pandas as pd# market: 0:沪市 1:深市 # ...

  6. 给定一个投资组合的收益序列,以沪深300作为参照,分解该投资组合的α和β

    给定一个投资组合的收益序列,以沪深300作为参照,分解该投资组合的 α \alpha α和 β \beta β 获取数据准备工作 安装JQDdata库,

  7. python tushare获取股票数据_python调用tushare获取沪股通、深股通成份股数据

    接口:hs_const 描述:获取沪股通.深股通成分数据 注:tushare库下载和初始化教程,请查阅我之前的文章 输入参数 名称      |      类型      |      必选      ...

  8. 用Python爬取股票数据,绘制K线和均线并用机器学习预测股价(来自我出的书)

    最近我出了一本书,<基于股票大数据分析的Python入门实战 视频教学版>,京东链接:https://item.jd.com/69241653952.html,在其中用股票范例讲述Pyth ...

  9. 用Python爬取股票数据,绘制K线和均线并用机器学习预测股价

    最近我出了一本书,<基于股票大数据分析的Python入门实战 视频教学版>,在其中用股票范例讲述Python爬虫.数据分析和机器学习的技术,大家看了我的书,不仅能很快用比较热门的案例学好P ...

最新文章

  1. [转]Struts 2.1发布
  2. poj 1815 Friendship 最小割 拆点 输出字典序
  3. 如何在Web服务器IIS 6上配置PHP平台
  4. (转载)Google Analytics(Google分析)使用技巧
  5. tf.train.exponential_decay
  6. RuntimeError: Bool type is not supported by dlpack
  7. Django-内置用户系统
  8. 工作329:uni-数据为空不显示
  9. 利用代码分别实现jdk动态代理和cglib动态代理_代理模式实现方式及优缺点对比...
  10. 硬盘常规测试软件解析
  11. java实现创建窗口
  12. 同比暴增3700%!百度取代谷歌成世界第二
  13. 刘世锦:引入区块链等相关技术建立政府、企业和个人的绿色责任账户
  14. qwidget设置背景颜色_Python+PyQt编程示例:设置窗口背景色及分割条颜色
  15. 机器学习基础:K近邻算法(Machine Learning Fundamentals: KNN)
  16. 《sort命令的k选项大讨论》-linux命令五分钟系列之二十七
  17. MAC使用CodeSign查看已签名的文件的数字签名情况
  18. 各种操作系统中密码文件的位置
  19. 帆软教程:报表数据钻取
  20. win10激活bug 任务栏假死点击无反应解决方案

热门文章

  1. Oracle数据库中的方差、标准差与协方差
  2. 美军丑闻再次抽了公知一记耳光
  3. Rigid Manipulators--Modelling建模--Kinematics运动学
  4. android 根据指定路径获取本地视频列表显示,点击缩略图调用系统播放器进行播放
  5. S4/HANA ME21N创建PO 输出控制消息按钮丢失解决方法(切换EDI 输出模式BRF+至NAST模式)
  6. 算法题——双指针(错题总结)
  7. java教程:JTextField(文本框)组件使用实例|方法
  8. 2021年中国IDC市场最新排行榜
  9. 07 | 校招必备,这些“黑话”你一定要知道
  10. IT6616: 高密度传感器 1.4 至 MIPI 中级/数轴型转换器