【金融】【随机森林】使用随机森林对期货数据(涨跌)进行回归

  • RF-RF_train3year3month
    • 读取数据
    • 划分训练集与数据集,3年+3月,以此类推
    • 取特定数据
    • Exponential smoothing
    • Feature Extraction - Technical Indicators
    • Prepare the data with a prediction horizon of 10 days
    • 分数据集依次训练
    • 查看指标

参考《jmartinezheras/reproduce-stock-market-direction-random-forests》《基于随机森林做回归任务(数据预处理、MAPE指标评估、可视化展示、特征重要性、预测和实际值差异显示图)》

RF-RF_train3year3month

读取数据

%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (7,4.5) # Make the default figures a bit biggerimport numpy as np
import random#Let's make this notebook reproducible
np.random.seed(42)
random.seed(42)import pandas_techinal_indicators as ta #https://github.com/Crypto-toolbox/pandas-technical-indicators/blob/master/technical_indicators.pyimport pandas as pd
from sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import f1_score, precision_score, confusion_matrix, recall_score, accuracy_score
from sklearn.model_selection import train_test_splitdf_org = pd.read_csv(r'F:/data/market_index_data/hengSheng_0404.csv')
df_org.head()

划分训练集与数据集,3年+3月,以此类推

为什么要这么划分?详见:
M’Ng J , Mehralizadeh M . Forecasting East Asian Indices Futures via a Novel Hybrid of Wavelet-PCA Denoising and Artificial Neural Network Models[J]. PLOS ONE, 2016, 11.

train_ptr = []
test_ptr = []
end_ptr = []date_flag = [2, 3*12+2, 3*12+5] # 分别代表2000.2,2003.2,2003.5
# date_flag = [12, 3*12+12, 3*12+12+3] # 分别代表2000.2,2003.2,2003.5for i in range(0, len(df_org)):num = (df_org.iloc[i]['year'] - 2006) * 12 + df_org.iloc[i]['month']if num == date_flag[0]:train_ptr.append(i)date_flag[0] += 3if num == date_flag[1]:test_ptr.append(i)date_flag[1] += 3if num == date_flag[2]:end_ptr.append(i)date_flag[2] += 3print(len(end_ptr))

取特定数据

aapl = df_org[['Open', 'High', 'Low', 'Close', 'Volume']]
aapl.head()

Exponential smoothing

def get_exp_preprocessing(df, alpha=0.9):edata = df.ewm(alpha=alpha).mean()    return edatasaapl = get_exp_preprocessing(aapl)
saapl.head() #saapl stands for smoothed aapl

Feature Extraction - Technical Indicators

def feature_extraction(data):for x in [5, 14, 26, 44, 66]:
#     for x in [14]:data = ta.relative_strength_index(data, n=x)data = ta.stochastic_oscillator_d(data, n=x)data = ta.accumulation_distribution(data, n=x)data = ta.average_true_range(data, n=x)data = ta.momentum(data, n=x)data = ta.money_flow_index(data, n=x)data = ta.rate_of_change(data, n=x)data = ta.on_balance_volume(data, n=x)data = ta.commodity_channel_index(data, n=x)data = ta.ease_of_movement(data, n=x)data = ta.trix(data, n=x)data = ta.vortex_indicator(data, n=x)data['ema50'] = data['Close'] / data['Close'].ewm(50).mean()data['ema21'] = data['Close'] / data['Close'].ewm(21).mean()data['ema14'] = data['Close'] / data['Close'].ewm(14).mean()data['ema5'] = data['Close'] / data['Close'].ewm(5).mean()#Williams %R is missingdata = ta.macd(data, n_fast=12, n_slow=26)del(data['Open'])del(data['High'])del(data['Low'])del(data['Volume'])return datadef compute_prediction_int(df, n):pred = (df.shift(-n)['Close'] >= df['Close'])pred = pred.iloc[:-n]return pred.astype(int)def prepare_data(df, horizon):data = feature_extraction(df).dropna().iloc[:-horizon]data['pred'] = compute_prediction_int(data, n=horizon)del(data['Close'])return data.dropna()

Prepare the data with a prediction horizon of 10 days

# 10天后收盘价是否上涨
data = prepare_data(saapl, 10)y = data['pred']#remove the output from the input
features = [x for x in data.columns if x not in ['gain', 'pred']]
X = data[features]print(list(data.columns))

分数据集依次训练

rf = RandomForestClassifier(n_jobs=-1, n_estimators=200, random_state=42)accuracy_his = []
recall_his = []
f1_his = []
precision_his = []all_p = np.array([])
all_prob = np.array([])for k in range(len(end_ptr)):if end_ptr[k] >= len(X):break# 使用从头到当前时刻的所有数据X_train = X[train_ptr[0]:test_ptr[k]]y_train = y[train_ptr[0]:test_ptr[k]]X_test = X[test_ptr[k]:end_ptr[k]]y_test = y[test_ptr[k]:end_ptr[k]]print('\nDataSet No.{} data row {}-{}-{}'.format(k, train_ptr[k], test_ptr[k], end_ptr[k]))rf.fit(X_train, y_train.values.ravel());pred = rf.predict(X_test)prob = rf.predict_proba(X_test)if k == 0:all_prob = probelse : all_prob = np.concatenate((all_prob, prob), axis=0)all_p = np.concatenate((all_p,pred))precision = precision_score(y_pred=pred, y_true=y_test)recall = recall_score(y_pred=pred, y_true=y_test)f1 = f1_score(y_pred=pred, y_true=y_test)accuracy = accuracy_score(y_pred=pred, y_true=y_test)print('precision: {0:1.2f}, recall: {1:1.2f}, f1: {2:1.2f}, accuracy: {3:1.2f}'.format(precision, recall, f1, accuracy))accuracy_his.append(accuracy)recall_his.append(recall)f1_his.append(f1)precision_his.append(precision)confusion = confusion_matrix(y_pred=pred, y_true=y_test)print('Confusion Matrix')print(confusion)

查看指标

print(np.mean(accuracy_his))
print(np.mean(recall_his))
print(np.mean(precision_his))Acc = accuracy_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Accuracy:{:.5f} -------------'.format(Acc))
Pc = precision_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Precision:{:.5f} -------------'.format(Pc))
Recall = recall_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Recall:{:.5f} -------------'.format(Recall))
f1 = f1_score(all_p, y[test_ptr[0]:end_ptr[42]])
print('------------- DataSet:, Specificity:{:.5f} -------------'.format(f1))
year_accuracy_his = []
year_precision_his = []
year_recall_his = []
year_f1_his = []for i in range(int(len(accuracy_his) / 4)):left = test_ptr[i*4] - test_ptr[0]right = end_ptr[i*4+3] - test_ptr[0]item_y = y[left:right]item_p = all_p[left:right]Acc = accuracy_score(item_p, item_y)year_accuracy_his.append(Acc)print('------------- DataSet:{}, Accuracy:{:.5f} -------------'.format(i, Acc))Pc = precision_score(item_p, item_y)year_precision_his.append(Pc)print('------------- DataSet:{}, Precision:{:.5f} -------------'.format(i, Pc))Recall = recall_score(item_p, item_y)year_recall_his.append(Recall)print('------------- DataSet:{}, Recall:{:.5f} -------------'.format(i, Recall))f1 = f1_score(item_p, item_y)year_f1_his.append(f1)print('------------- DataSet:{}, Specificity:{:.5f} -------------'.format(i, f1))if len(accuracy_his)%4 != 0:left = test_ptr[len(accuracy_his)-(len(accuracy_his)%4)-1] - test_ptr[0]right = end_ptr[len(accuracy_his)-1] - test_ptr[0]item_y = y[left:right]item_p = all_p[left:right]Acc = accuracy_score(item_p, item_y)year_accuracy_his.append(Acc)print('------------- DataSet:final, Accuracy:{:.5f} -------------'.format(Acc))Pc = precision_score(item_p, item_y)year_precision_his.append(Pc)print('------------- DataSet:fianl, Precision:{:.5f} -------------'.format(Pc))Recall = recall_score(item_p, item_y)year_recall_his.append(Recall)print('------------- DataSet:fianl, Recall:{:.5f} -------------'.format(Recall))f1 = f1_score(item_p, item_y)year_f1_his.append(f1)print('------------- DataSet:fianl, Specificity:{:.5f} -------------'.format(f1))plt.figure(figsize=(20,7))
plt.plot(np.arange(len(year_accuracy_his)), year_accuracy_his, label='year_accuracy')
plt.plot(np.arange(len(year_recall_his)), year_recall_his, label='year_recall')
plt.plot(np.arange(len(year_f1_his)), year_f1_his, label='year_f1_his')
plt.plot(np.arange(len(year_precision_his)), year_precision_his, label='year_precision')
# plt.grid(True, ls=':', c='r')
plt.axhline(y=0.5, c='r', ls='--', lw=2)
plt.legend();
plt.show()

plt.figure(figsize=(20,7))
plt.plot(np.arange(len(accuracy_his)), accuracy_his, label='accuracy')
plt.plot(np.arange(len(recall_his)), recall_his, label='recall')
plt.plot(np.arange(len(f1_his)), f1_his, label='f1')
plt.plot(np.arange(len(precision_his)), precision_his, label='precision')
plt.grid(True, ls=':', c='r')
plt.axhline(y=0.5, c='r', ls='--', lw=2)
plt.legend();
plt.show()

【金融】【随机森林】使用随机森林对期货数据(涨跌)进行回归相关推荐

  1. 为什么极度随机树比随机森林更随机?这个极度随机的特性有什么好处?在训练阶段、极度随机数比随机森林快还是慢?

    为什么极度随机树比随机森林更随机?这个极度随机的特性有什么好处?在训练阶段.极度随机数比随机森林快还是慢? ExtRa Trees是Extremely Randomized Trees的缩写,意思就是 ...

  2. 在envi做随机森林_随机森林原理介绍与适用情况(综述篇)

    一句话介绍 随机森林是一种集成算法(Ensemble Learning),它属于Bagging类型,通过组合多个弱分类器,最终结果通过投票或取均值,使得整体模型的结果具有较高的精确度和泛化性能.其可以 ...

  3. 12_信息熵,信息熵公式,信息增益,决策树、常见决策树使用的算法、决策树的流程、决策树API、决策树案例、随机森林、随机森林的构建过程、随机森林API、随机森林的优缺点、随机森林案例

    1 信息熵 以下来自:https://www.zhihu.com/question/22178202/answer/161732605 1.2 信息熵的公式 先抛出信息熵公式如下: 1.2 信息熵 信 ...

  4. matlab中随机森林实现,随机森林实现 MATLAB

    matlab 中随机森林工具箱的下载地址: http://code.google.com/p/randomforest-matlab/downloads/detail?name=Windows-Pre ...

  5. 写一个随机森林插补风力发电功率数据的程序

    以下是使用Python编写的随机森林插补风力发电功率数据的程序: import pandas as pd from sklearn.ensemble import RandomForestRegres ...

  6. 【统计分析】(task5) 金融量化分析与随机模拟(通过随机模拟估计看涨期权的报酬分布)

    内容总结 学习datawhale的gitmodel教程.小郭为了锁定价格波动风险,签订合约即买进看涨期权:提前给榴莲超市2块权利金,现在榴莲30元一块(期权的标的资产),下个月能用20元买到一块榴莲( ...

  7. 模型理论5_经济金融学院开展“动态随机一般均衡模型理论及应用”主题讲座(五)...

    2020年5月13日下午,经济金融学院开展"动态随机一般均衡模型理论及应用"主题讲座第五讲,即运用贝叶斯技术估计DSGE模型.我院张伟进老师从理性预期求解.状态空间模型.贝叶斯估计 ...

  8. 【金融】【python】使用python处理多种期货数据指标

    [金融][python]使用python处理多种期货数据指标 featureExtraction.py pandas_techinal_indicators.py featureExtraction. ...

  9. 孤立森林(IsolationForest)算法对数据进行异常检测

    1.摘要 本文主要讲解:使用孤立森林(IsolationForest)算法对数据进行异常检测 主要思路: 对数据进行处理,处理成算法能识别的二维数据 使用孤立森林(IsolationForest)算法 ...

最新文章

  1. 模态框获取id一直不变,都是同一个id值
  2. python多个for的执行顺序-python顺序执行多个py文件
  3. 双层lstm每层有自己的权重参数吗_一幅图真正理解LSTM的物理结构
  4. linux smplayer 快捷键,SMPlayer:让 MPlayer 的使用更简单
  5. linux系统下用到的小知识点积累
  6. GDCM:gdcm::Validate的测试程序
  7. 花里胡哨?一起来看看 PyCharm 2019.3 增加了哪些新功能吧
  8. 实现带header和footer功能的RecyclerView
  9. 20165305 苏振龙 《Java 程序设计》第一次测试总结
  10. MyBatis动态SQL-foreach-数组/List
  11. linux7 无法连接网络,CentOS7无法连接网络怎么办
  12. 【CLR Via C#笔记】 值类型与拆装箱、参数传递
  13. Spark SQL Catalyst源代码分析之Analyzer
  14. python用xlwings 隐藏Excel某行或某列
  15. 蓝牙协议层 GATT ,GAP,ATT 之间的关系
  16. stm32下OLED屏的应用
  17. 研究Google maps及51ditu的图片切割及存储方法(转)
  18. 这份整理的图解Java(全彩版)火了,完整PDF开放下载
  19. vue自适应多行文本
  20. 洛谷——AT1350 深さ優先探索

热门文章

  1. c++ 全局变量_专业解码 | 带你了解Python全局变量与局部变量!
  2. python判断语句 if elif else(一分钟读懂)
  3. python不定长的参数*,**使用(三分钟读懂)
  4. linux整行剪切_云计算人员如何提高效率 要掌握哪些Linux命令
  5. mysql中member_在MySql中实现MemberShip的权限管理
  6. 银河麒麟 安卓nginx_银河麒麟Kydroid 2.0全新发布:原生支持海量安卓APP
  7. python网络安全协议_python网络安全
  8. [警告] multi-字符 character constant [-Wmultichar] ----字符+符号输出错误
  9. httplistener java_Java监听器Listener使用详解
  10. leetcode691:Stickers to Spell Word