文章目录

  • 前言:
    • 1. 导入相关包
    • 2. 数据预处理
    • 3. 构建模型
    • 3. 模型训练
    • 4. 检查数据
    • 6. 工作中其他常用包记录

前言:

%md
在量化投资中,计算收益率是更常见的做法,而不是仅计算股价。计算收益率可以更好地反映投资的回报情况,无论是单个股票、投资组合还是策略的绩效评估都会使用收益率作为重要的指标。

收益率是指投资在一定时间内所产生的盈利或损失的百分比。通过计算收益率,投资者可以比较不同资产或策略的表现,判断其相对优劣。

对于股票投资,收益率可以根据股票价格的变动计算得出。常见的收益率计算方法包括简单收益率、对数收益率等。

简单收益率(简单回报率)可以通过以下公式计算:

简单收益率 = (期末价格 - 起始价格) / 起始价格

对数收益率(对数回报率)可以通过以下公式计算:

对数收益率 = ln(期末价格 / 起始价格)

通过计算收益率,投资者可以评估投资的盈利能力、风险水平以及相对于市场或基准的超额收益。这有助于制定投资策略、进行风险管理和决策制定。

需要注意的是,股价仅仅是股票在某一时点的价格,而收益率则提供了更全面的信息,考虑了价格的变化和时间因素,能够更好地反映投资的效果和绩效。因此,在量化投资中,计算收益率是更常见和有意义的做法。

1. 导入相关包

# 导入所需库和模块:
from tscv import gap_train_test_split
from catboost import CatBoostRegressorimport torch
import torch.nn as nn
import pandas as pd
import numpy as np
from matplotlib import pyplot as pltimport warnings
import tushare as ts
warnings.filterwarnings('ignore')
from torch.optim import Adam, SGD
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from sklearn.model_selection import GridSearchCV, TimeSeriesSplit, ParameterGrid
from torch.utils.data import DataLoader, TensorDataset
from sklearn.metrics import mean_squared_error, r2_score, make_scorerfrom tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras.layers import Dense, Dropout, LSTM
from keras.wrappers.scikit_learn import KerasRegressor
from keras import backend
!pip install chinese_calendar
from chinese_calendar import is_holiday'''
!pip install scikit-learn
!pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html'''start_date = '20040101'
end_date = '20230421'
# 初始化pro接口
pro = ts.pro_api('mycode')
# Get daily trading data for stocks
train_df = pro.daily(ts_code='000001.SZ', start_date=start_date, end_date=end_date)# Get PE ratio data for stocks
pe_data = pro.query('daily_basic', ts_code='000001.SZ', start_date=start_date, end_date=end_date, fields='ts_code,trade_date,pe')# Get turnover rate data for stocks
turnover_data = pro.query('daily_basic', ts_code='000001.SZ', start_date=start_date, end_date=end_date, fields='ts_code,trade_date,turnover_rate')# Merge PE ratio data into train_df
train_df = pd.merge(train_df, pe_data, on=['ts_code', 'trade_date'], how='left')# Merge turnover rate data into train_df
train_df = pd.merge(train_df, turnover_data, on=['ts_code', 'trade_date'], how='left')display(train_df)#####################################################################################
start_date = pd.to_datetime(start_date)
end_date = pd.to_datetime(end_date)
dates = pd.date_range(start=start_date, end=end_date, freq='D')
A = pd.DataFrame({'Date': dates})B_data = train_dfB_columns = ['ts_code', 'trade_date', 'open', 'high', 'low', 'close', 'pre_close', 'change', 'pct_chg', 'vol', 'amount', 'pe', 'turnover_rate']B = pd.DataFrame(B_data, columns=B_columns)
B['trade_date'] = pd.to_datetime(B['trade_date'], format='%Y%m%d')merged = A.merge(B, how='left', left_on='Date', right_on='trade_date')merged_with_holidays = merged
# from chinese_calendar import is_holiday# 添加节假日名称列到合并后的表格
merged_with_holidays['HolidayName'] = merged_with_holidays['Date'].apply(lambda x: is_holiday(x))# 将节假日名称的布尔值转换为相应的字符串
# merged_with_holidays['HolidayName'] = merged_with_holidays['HolidayName'].map({True: '节假日', False: '非节假日'})
# merged_with_holidays['HolidayName'] = merged_with_holidays['HolidayName'].map({True: 1, False: 0})merged_with_holidays['Year'] = pd.to_datetime(merged_with_holidays['Date']).dt.year
merged_with_holidays['Month'] = pd.to_datetime(merged_with_holidays['Date']).dt.month
merged_with_holidays['Day'] = pd.to_datetime(merged_with_holidays['Date']).dt.dayoriginal_data = merged_with_holidays[['Date', 'Year', 'Month', 'Day', 'open', 'high', 'low', 'close','pre_close', 'change', 'pct_chg', 'vol', 'amount', 'pe','turnover_rate', 'HolidayName']].copy()

2. 数据预处理

for year_num in original_data['Year'].unique():# Add previous years' datafor i in range(1, 6):year = year_num - iprint(year)
#         lenyear = len(original_data[original_data['Year'] == year])prefix = f'ya{i:02}'print(prefix)print('\n')#         lenya01 = len(original_data['Year']==year)
#         lenya02 = len(original_data['Year']==year-1)
#         lenya03 = len(original_data['Year']==year-2)
#         lenya04 = len(original_data['Year']==year-3)
#         lenya05 = len(original_data['Year']==year-4)prefix = f'ya{i:02}'if   i==1:
#             lenyear = lenya01lenyear = len(original_data[original_data['Year'] == year])elif i==2:
#             lenyear = lenya01 + lenya02lenyear = len(original_data[original_data['Year'] == year-1] )+ len(original_data[original_data['Year'] == year])elif i==3:
#             lenyear = lenya01 + lenya02 + lenya03lenyear = len(original_data[original_data['Year'] == year-2]) + len(original_data[original_data['Year'] == year-1]) + len(original_data[original_data['Year'] == year])elif i==4:
#             lenyear = lenya01 + lenya02 + lenya03 + lenya04lenyear = len(original_data[original_data['Year'] == year-3]) + len(original_data[original_data['Year'] == year-2]) + len(original_data[original_data['Year'] == year-1]) + len(original_data[original_data['Year'] == year])elif i==5:
#             lenyear = lenya01 + lenya02 + lenya03 + lenya04 + lenya05lenyear = len(original_data[original_data['Year'] == year-4]) + len(original_data[original_data['Year'] == year-3]) + len(original_data[original_data['Year'] == year-2]) + len(original_data[original_data['Year'] == year-1]) + len(original_data[original_data['Year'] == year])print(lenyear)original_data[prefix + '_open'] = original_data['open'].shift(lenyear)original_data[prefix + '_high'] = original_data['high'].shift(lenyear)original_data[prefix + '_low'] = original_data['low'].shift(lenyear)original_data[prefix + '_close'] = original_data['close'].shift(lenyear)original_data[prefix + '_pre_close'] = original_data['pre_close'].shift(lenyear)original_data[prefix + '_change'] = original_data['change'].shift(lenyear)original_data[prefix + '_pct_chg'] = original_data['pct_chg'].shift(lenyear)original_data[prefix + '_vol'] = original_data['vol'].shift(lenyear)original_data[prefix + '_amount'] = original_data['amount'].shift(lenyear)original_data[prefix + '_pe'] = original_data['pe'].shift(lenyear)original_data[prefix + '_turnover_rate'] = original_data['turnover_rate'].shift(lenyear)original_data[prefix + '_HolidayName'] = original_data['HolidayName'].shift(lenyear)

3. 构建模型

# 将数据按照28比分成观测数据和验证数据,其中观测数据分成训练集和测试集
split_size=round(len(original_data)*0.20) # 验证集
mergeed_pandas_df = original_data[:-split_size].copy()
mergeed_pandas_df = mergeed_pandas_df.fillna( method='pad', axis=0)
mergeed_pandas_df = mergeed_pandas_df[mergeed_pandas_df['Year'] >=2010]
# 测试数据集
# test_split = round(split_size*0.20)
test_split = 60X = mergeed_pandas_df[[ 'Year', 'Month', 'Day',
# 注释部分的变量都是未知的
#                         'open', 'high', 'low',
#        'pre_close', 'change', 'pct_chg', 'vol', 'amount', 'pe',
#        'turnover_rate', 'HolidayName','ya01_open', 'ya01_high', 'ya01_low','ya01_close', 'ya01_pre_close', 'ya01_change', 'ya01_pct_chg','ya01_vol', 'ya01_amount', 'ya01_pe', 'ya01_turnover_rate','ya01_HolidayName', 'ya02_open', 'ya02_high', 'ya02_low', 'ya02_close','ya02_pre_close', 'ya02_change', 'ya02_pct_chg', 'ya02_vol','ya02_amount', 'ya02_pe', 'ya02_turnover_rate', 'ya02_HolidayName','ya03_open', 'ya03_high', 'ya03_low', 'ya03_close', 'ya03_pre_close','ya03_change', 'ya03_pct_chg', 'ya03_vol', 'ya03_amount', 'ya03_pe','ya03_turnover_rate', 'ya03_HolidayName', 'ya04_open', 'ya04_high','ya04_low', 'ya04_close', 'ya04_pre_close', 'ya04_change','ya04_pct_chg', 'ya04_vol', 'ya04_amount', 'ya04_pe','ya04_turnover_rate', 'ya04_HolidayName', 'ya05_open', 'ya05_high','ya05_low', 'ya05_close', 'ya05_pre_close', 'ya05_change','ya05_pct_chg', 'ya05_vol', 'ya05_amount', 'ya05_pe','ya05_turnover_rate', 'ya05_HolidayName']][:-test_split]y = mergeed_pandas_df['close'][:-test_split]# 观测数据集
X_train, X_test, y_train, y_test = gap_train_test_split( X,  y, test_size=2, gap_size=25 )
cat_features = ['Year', 'Month', 'Day','HolidayName' , 'ya01_HolidayName','ya03_HolidayName','ya04_HolidayName', 'ya05_HolidayName' ]X_verification = mergeed_pandas_df[[ 'Year', 'Month', 'Day',
# 注释部分的变量都是未知的
#                         'open', 'high', 'low',
#        'pre_close', 'change', 'pct_chg', 'vol', 'amount', 'pe',
#        'turnover_rate', 'HolidayName','ya01_open', 'ya01_high', 'ya01_low','ya01_close', 'ya01_pre_close', 'ya01_change', 'ya01_pct_chg','ya01_vol', 'ya01_amount', 'ya01_pe', 'ya01_turnover_rate','ya01_HolidayName', 'ya02_open', 'ya02_high', 'ya02_low', 'ya02_close','ya02_pre_close', 'ya02_change', 'ya02_pct_chg', 'ya02_vol','ya02_amount', 'ya02_pe', 'ya02_turnover_rate', 'ya02_HolidayName','ya03_open', 'ya03_high', 'ya03_low', 'ya03_close', 'ya03_pre_close','ya03_change', 'ya03_pct_chg', 'ya03_vol', 'ya03_amount', 'ya03_pe','ya03_turnover_rate', 'ya03_HolidayName', 'ya04_open', 'ya04_high','ya04_low', 'ya04_close', 'ya04_pre_close', 'ya04_change','ya04_pct_chg', 'ya04_vol', 'ya04_amount', 'ya04_pe','ya04_turnover_rate', 'ya04_HolidayName', 'ya05_open', 'ya05_high','ya05_low', 'ya05_close', 'ya05_pre_close', 'ya05_change','ya05_pct_chg', 'ya05_vol', 'ya05_amount', 'ya05_pe','ya05_turnover_rate', 'ya05_HolidayName']][-test_split:]y_verification = mergeed_pandas_df['close'][-test_split:]

3. 模型训练

cat_model_01 =  CatBoostRegressor(iterations=20000, learning_rate=0.03,depth=6, l2_leaf_reg=3,loss_function='MAE',eval_metric='MAE',random_seed=23)cat_model_01 = cat_model_01.fit(X_train, y_train, cat_features=cat_features)cat_model_01.score(X_test,y_test)# 参考文献:https://blog.csdn.net/weixin_42305672/article/details/111252715#######################################################
# 查看验证集数据
y_verification_pred = cat_model_01.predict(X_verification )
mse = mean_squared_error(y_verification, y_verification_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_verification, y_verification_pred)
print('Validation set:\n')
print("MSE:", mse)
print("RMSE:", rmse)
print("R2 score:", r2)
print('Model score:', cat_model_01.score(X_verification, y_verification))

4. 检查数据

#############################################################################################
# 画图查看验证集
X_verification_copy = X_verification.copy()
X_verification_copy['Date'] = X_verification_copy['Year'].astype('str') + X_verification_copy['Month'].astype('str') + X_verification_copy['Day'].astype('str')X_verification_copy['y_true'] = y_verification
X_verification_copy['y_pred'] = y_verification_predtmp_verification = X_verification_copyx_xticks = tmp_verification['Date']
y_true = tmp_verification['y_true']
y_pred = tmp_verification['y_pred']mse = mean_squared_error(y_true, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_true, y_pred)use_model = cat_model_01
plt.figure(figsize=(30, 10))
# plt.title( title_name + ' ' + 'Week search'+ ' ' + model_name, fontsize=22)
plt.plot(x_xticks, y_true,color="r",label="y_true", )
#     plt.plot(x_xticks, y_pred,color=(0,0,0),label="y_pred",)
plt.plot(x_xticks, y_pred,color='g',label="y_pred",)# plt.tick_params(labelsize=20, rotation=90)
plt.tick_params(labelsize=20, rotation=90)plt.legend()#增加图例
plt.show() #显示图片

6. 工作中其他常用包记录

from pyspark.sql.window import Window
from pyspark.sql.functions import stddev, avg, pandas_udf, PandasUDFType, expr, array_contains, collect_set, substring, countDistinct, year, month, sum, lag, explode, lit, ceil,posexplode, quarter, first, asc, array, array_intersect, array_distinct, array_except, coalesce, lead
from pyspark.sql.functions import regexp_replace, lower, lit, col, udf, split,  when, count, struct, max, collect_list, weekofyear, lpad, date_format, to_date, weekofyear, desc
from pyspark.sql.types import StringType,ArrayType, IntegerType, DoubleType, StructType, StructField, TimestampType
from pyspark.ml import Pipeline
from pyspark.ml.linalg import VectorUDT, Vectors, SparseVector, DenseVector
from pyspark.ml.evaluation import RegressionEvaluator, ClusteringEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder, TrainValidationSplit
from pyspark.ml.regression import RandomForestRegressor , RandomForestRegressionModel, LinearRegression, GBTRegressor, GBTRegressionModel
from pyspark.ml.clustering import KMeans
from pyspark.ml.feature import Tokenizer, StopWordsRemover, Word2Vec, VectorAssembler, Word2VecModel, StringIndexer, OneHotEncoder, VectorIndexerfrom sklearn.metrics import mean_squared_error, r2_score
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.model_selection import GridSearchCV, KFold, TimeSeriesSplit
from tscv import gap_train_test_split
from sklearn.svm import SVRfrom datetime import datetime
from numpy import dot
from numpy.linalg import norm
from operator import and_
from functools import reduce
from jellyfish import jaro_winkler_similarity
# from xgboost import XGBRegressor
# from catboost import CatBoostRegressor
from lightgbm import LGBMRegressorimport csv
import os
import re
import math
import jieba
import joblib
import random
import warnings
import numpy as np
import networkx as nx
import pyspark.pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as tickerwarnings.filterwarnings('ignore')
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False'''import joblib
joblib.dump('test','/tmp/t.txt')
joblib.dump('testtest','/tmp/t.txt')
'''

tushare单个股票过去五年的数据整理与预测相关推荐

  1. 基于tushare的股票数据构建1

    基于tushare的股票数据构建<一> tushare中提供了很多非常优秀的数据结构,但是并不是所有的数据都对自己有用,这里需要进行构建自己的数据库以方便后续进行数据建模 tushare数 ...

  2. 利用python进行股票分析(五)通过tushare读取股票数据

    文章目录 5. 通过tushare读取股票数据 5.1. 切换pip3的源 5.2. 获取股票数据 5.2.1. 前复权和后复权 5.2.2. 读取tushare数据,保存到csv文件 5.2.3. ...

  3. python tushare获取股票数据_Python 金融: TuShare API 获取股票数据 (1)

    多多教Python 金融 是我为金融同行,自由职业投资人 做的一个专栏.这里包含了我自己作为量化交易员,在做研究时所用到的Python技巧和实用案例.这个栏目专业性会比较强:本人29岁,量化工作5年的 ...

  4. 如何在Python中使用Tushare进行股票日线行情获取并存储至数据库

    Tushare ID:457662 想必很多想从事量化或者有金融爱好的小伙伴们想利用自己的程序对股票走势进行分析. 在最基础的日线行情获取上,很多人就遇到了不小的难题,光是海量的爬虫代码就让人望而生畏 ...

  5. python 涨停统计_python+tushare获取股票和基金每日涨跌停价格

    接口:stk_limit 描述:获取全市场(包含A/B股和基金)每日涨跌停价格,包括涨停价格,跌停价格等,每个交易日8点40左右更新当日股票涨跌停价格. 限量:单次最多提取4800条记录,可循环调取, ...

  6. 通过tushare获取贵州茅台和中国平安历史交易数据并使用plotly进行可视化分析

    通过tushare获取贵州茅台和中国平安历史交易数据并使用plotly进行可视化分析 贵州茅台:赤水河永流淌 贵州茅台酒股份有限公司总部位于中国贵州省遵义市茅台镇,其主导产品贵州茅台酒历史悠久.源远流 ...

  7. 基于python + tushare 的股票盯盘脚本

    目录 基于python + tushare 实现股票盯盘 tushare简介 设计思路 核心代码实现 加点细节 结束语 基于python + tushare 实现股票盯盘 tushare ID:499 ...

  8. python预测股票价格tushare_用tushare对股票进行简单分析

    用tushare对股票进行简单分析(仅供交流学习) import numpy as np import pandas as pd import matplotlib.pyplot as plt imp ...

  9. Tushare不止数字!新文字特色数据 ---(4)

    去年用tushare的时候还没发现有特色数据这个东西,今年应该是新增了,数据种类丰富了很多,有新闻联播,也有财经新闻.疫情新闻等,现在用文本信息来做量化也是很火热咧,省了自己爬虫调格式的麻烦,轻松多啦 ...

最新文章

  1. 微信小程序修改整体背景颜色
  2. libsvm java api文档,libsvm-javaAPI
  3. authc过滤器 shiro_shrio 权限管理filterChainDefinitions过滤器配置
  4. 济南女子用计算机付款,山东一女子带孩子买鞋,用计算机假装付款,店员:给孩子做个榜样...
  5. TextBox只读时不能通过后台赋值取值解决办法
  6. python3.4学习笔记(八) Python第三方库安装与使用,包管理工具解惑
  7. C++工作笔记-map有自动排序的功能
  8. 福大软工1816 · 第一次团队作业
  9. jquery 停止事件冒泡方法
  10. Python数据分析扩展库pandas的DataFrame排序方法小结
  11. postgresql-int,bigint,numeric效率测试
  12. 多任务学习(MTL)在转化率预估上的应用
  13. 敏捷个人课后练习五主题:改变
  14. 16个UEFI固件漏洞影响惠普多个产品线,其中1个影响无数厂商
  15. 在线代理和缓存工具(转)
  16. PTCMS可听书可下载的小说站源码+带采集和教程
  17. 【TLD】改进后的TLD视频目标跟踪方法的MATLAB仿真
  18. 系统集成项目管理工程师有什么用?你真的了解吗
  19. 完美解决Windows10安装HCL模拟器各种疑难问题!!!
  20. Uinux/linux vi保存退出命令 (如何退出vi)

热门文章

  1. 强势来袭!取代传统PC开辟新增长极,这款“云电脑”凭啥?
  2. 【能效管理】变电所运维云平台在上海某医院的设计分析
  3. 【转载】谈SCI、EI、ISTP三大索引收录号的检索
  4. VBA随学随用系列:进口关税电子发票管理工具 - 总纲
  5. Android按键音的默认值修改流程
  6. PCB加工文件—Gerber文件的导出
  7. 意大利黑手党2mac版下载
  8. 《炬丰科技-半导体工艺》半导体封装流程
  9. C++学习--布尔值函数的返回值
  10. iOS两种调用拨打电话方式(亲测)