【文献阅读】The role of news sentiment in oil futures returns and volatility forecasting
0、摘要
In this paper, we extract the qualitative information from crude oil news headlines, and develop a novel VMD- BiLSTM model with investor sentiment indicator for crude oil forecasting.
本文中,我们提取了原油新闻标题的定性信息,develop a novel VMD-BiLSTM模型进行原油价格预测。
First, we construct a sentiment score considering cumulative effect from contextual data of oil news texts.
第一,我们构建了考虑cumulative effect的sentiment score。
Then, we adopt an event-based method and GARCH model to investigate the impact of news sentiment on returns and volatility. A non-recursive signal decomposition method, namely variational mode decomposition (VMD), is applied to decompose the historical crude oil return and volatility data into various intrinsic modes.
第二,我们考虑了事件分析法以及Garch模型,以investigate情绪指标对收益及波动率的影响。non-recursive signal decomposition method(VMD)应用于分解原油历史收益和波动率。
After that, a bidirectional long short-term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs.
随后,双向long short-term memory neura networks深度学习预测模型被应用,结合了定性及定量的输入。
Our empirical results indicate that the shock of news sentiment significantly causes the fluctuation of oil futures prices, and news sentiment has an asymmetric impact on the volatility of oil futures. The incorporation of sentiment score is always helpful for improving the forecasting performances in all benchmark scenarios. Specifically, our proposed data-decomposition based deep learning model is more effective than several econometric and machine learning models.
我们的实证结果表明,the shock of news sentiment显著导致原油价格fluctuation,并且news sentiment对oil futures的波动率具有非对称影响;纳入sentiment score有助于改善预测表现。
1、Introduction
It is universally acknowledged that the crude oil futures market is a typical risk aggregation market that attracts worldwide attentions: The oil future price movements are identified to be more likely exposed to global political events and receive simultaneous shocks from other fi- nancial asset markets (Leduc and Sill, 2004; Considine and Larson, 2001).
众所周知,原油期货市场是一个typical risk aggregation market, which attracts worldwide attentions:原油期货价格波动exposed to全球政治事件,其他金融市场的同期震荡。
On the other hand, the prices of global financial assets also receive positive feedbacks from oil future price movements and disturbances (Hamilton and Wu, 2014; Teterin et al., 2016).
反过来,全球金融资产价格也会受到positive feedbacks of oil future price.
For crude oil safety management and assets allocation strategy concerns, the precise prediction for oil future market returns and risks is able to provide useful guideline for policy makers and investors.
因此,the precise prediction for oil future market returns and risks is useful.
However, the nonlinearity property of oil future market price is formulated by different types of factors, such as supply and demand relationship (Kilian, 2009), international events (Zhao et al., 2016) and investor sentiment (Qadan and Nama, 2018), which makes it a tough task in oil returns and volatilities prediction for the complex structures of oil price.
但是,其复杂的机制导致预测oil returns and volatilities很困难。
An abundant amount of studies is devoted to predict the oil returns and volatilities utilizing the historical time-series data of oil market and economic related influencing factors. For example, Fan et al. (2008) use historical observations of WTI and Brent crude oil time-series data to predict the future oil prices based on genetic algorithm. (Shin et al., 2013) introduce semi-supervised learning approach to investigate the impact factors that affect the oil price movements, including OPEC and SAUDI oil production, USD exchange rates and Producer price index etc.
An abundant amount of studies is devoted to predict the oil returns and volatilities.
However, some underlying factors, such as investor sentiment, may act as potential causes of oil price changes and fluctuations (Du et al., 2016; Qadan and Nama, 2018), which is hard to assess and calculate in empirical works due to non-quantization characteristic of market sentiment and reaction.
However, 一些潜在因子,譬如投资者情绪,难以量化。
Scholars have attempted to find out appropriate proxies for the investor concerns and sentiments of financial market. For example, Baker and Wurgler (2007) utilize a combination of financial indices to quantify the investor sentiment in stock market, including stock trading volume, mutual fund flows and IPO volume etc. Smales (2017) introduces CBOE Volatility Index (VIX) as a measure of investor fears and in- vestigates the relationship between VIX and stock returns. Kostopoulos et al. (2020) apply Google search volumes as a proxy for trading intensities of individual investors in German.
Scholars尝试构建investor concerns and sentiments的指标。
However, the previous measurements of investor sentiments show less effectiveness in providing untapped information for assets returns and volatilities prediction due to the following weakness (Li et al., 2019).
However, the previous measurements of investor sentiments show less effectiveness in provide untapped information.
First, official indices and statistics, such as transaction volume, are identified to provide less unexplored information about investor attentions, which is mainly due to its less consistency with the individual traders (Deng et al., 2012).
第一,official indices and statistics,比如交易量,难以提供unexplored information (untapped).
Second, the intensity and volume data of search engine contain too much investors-irrelevant noise (Limnios and You, 2018). As a result, the sentiment indicators calculated by search indices may show less ef- fectiveness and confidence level in financial assets prediction.
第二,the data of search engine包括太多无关噪音。
Natural Language Processing (NLP) techniques and big available dataset provide a novel framework for investor sentiment indicator constructions. By crawling the news headlines from hubs and websites for energy news, news dataset of crude oil can be tokenized. Utilizing the headline documents, daily investor sentiments are scored generated based on vector space models (Salton et al., 1975). Finally, returns and volatilities of crude oil are predicted by incorporating the daily polarity score of market sentiment.
NLP techniques and big dataset provide a novel framework for investor sentiment indicator constructions.
Sentiment index based on news headlines has the following advantages: First, news headlines reflect key information of investor attention, which can be measured and obtained efficiently through NLP techniques.
优点:news headlines reflect key informantion of investor attention
Second, sentiment index calculated by news headline contains less noise and irrelevant information, which is helpful to improve the reliability of indicator construction (Nassirtoussi et al., 2015).
无关噪音少
In this paper, we formally investigate the impact of news sentiment on oil futures returns and volatility by an event- based method and GARCH model estimations. Overall, the daily investor sentiment of crude oil is computed by NLP technique in this paper and act as a novel predictor for crude oil future returns and volatilities.
Several types of forecasting methods have been applied to oil future returns and volatility prediction by previous works, such as econometric models (Klein and Walther, 2016) and machine learning approaches (Yu et al., 2008; Tang et al., 2015; Yu et al., 2017).
此前多种方法应用于oil future returns and volatility prediction.
However, the econometric or machine learning typed predictors achieve inferior forecasting performance in comparison with the newly introduced deep learning approach (Mallqui and Fernandes, 2018).
但是,其效果皆inferior to deep learning approach
Utilizing the artificial neural networks consisting of multiple hidden layers, deep learning model shows superior time-series data predictability over its counterparts (LeCun et al., 2015). In recent years, deep learning has been applied broadly in crude oil time-series data prediction. For example, Zhao et al. (2017) apply a novel stacked denoising autoencoders (SDAE) for crude oil forecasting based on a large dataset of exogenous influencing parameters. Luo et al. (2019) employ a novel convolutional neural net- works (CNN) model to improve the short-term prediction performance for crude oil market.
列举一些deep learning literature
Since crude oil market returns and volatilities are non-stationary time-series data and consistent with complex influencing factors, the prediction accuracies of the proposed models may suffer due to the
high volatilities. In recent studies, a novel ensemble forecasting method, namely “Decomposition and Ensemble”, has been developed to handlethe task of irregular and non-stationary time-series data prediction (Bergmeir et al., 2016; Risse, 2019). This method decomposes the original time-series data into several stationary cycles, which can be estimated by forecasting models individually and finally integrated to generate the forecasting output. Among all the decomposition approaches, empirical mode decomposition (EMD) typed method is the predominant approach utilized in current empirical works (Wen et al., 2017; Santhosh et al., 2019).
EMD: 解决因收益时间序列、波动时间序列high volatilities造成的poor prediction accuracies.
EMD typed models可能导致mode-mixing problem.
第一, qualitative information is extracted from news headline,其可以explore untapped information.
2、Methodology
2.1 Research Framework
2.2 Data collection and preprocessing
each news headline is equal to a document. 向量
向量长度 = the number of distinct words in corresponding news headline
number of documents that contain the word in the collection. Specifically, the TF-IDF score of word x in a document is calculated as follows in Eq. (1):
计算TF*IDF:评估某个词语的相对重要性。
In terms of the crude oil price data, we select the daily returns of the
Brent crude oil futures contracts as well as the 7-day volatility as the prediction targets.
日度对数收益率、七天平均波动率:prediction targets.
正交化
2.3 Sentiment analysis
In this study, we employ the Sentimentr package in R to calculate the sentiment of each processed news headline. The Sentimentr package returns the polarity score in the range of [−1.0, 1.0] for each document.
The news is considered as positive news if its polarity score is above zero, otherwise, it is considered as negative news. In general, the more negative the polarity score, the more negative the news; the more positive the polarity score, the more positive the news.
As pointed by previous studies, news often has a rather continuous effect on the investor's sentiment in the actual futures market (Akhtar et al., 2013). That is to say, the public sentiment on a specific day is shaped by the combination of news on the day and that in previous few days. However, the more recent news is more influential than the old news. Considering this situation, we formulate a cumulative senti- ment score (CSS) following Kiritchenko et al. (2014) and Chowdhury et al. (2014). In this study, we assume any piece of news will have a significant impact on the investor sentiment for seven days, and that its impact exponentially declines each day after its release, which is consistent with the actual situation of news impact (Huang et al., 2014).
2.4 Data decomposition
According to previous literature, decomposing the original time series data into sub-series modes with different economic implications can help the neural networks capture its tendency and cyclicity (Wang et al., 2014). In this study, we employ variational mode decomposition (VMD) in the data decomposition process for the daily returns and 7-day volatility time series of Brent crude oil. In general, VMD is a non-recursive optimization technique that decomposes the original input signal f(t) into a series of discrete and stationary intrinsic modes uk through Wiener filtering and Hilbert transform (Liu et al., 2016). The optimization procedure is as follows (Zhang et al., 2017):
Step 1: Calculate the Hilbert transform of each mode uk and transform into respective uni-sided frequency spectrum.
Step 2: Alter the frequency spectrum of each mode uk to narrow frequency baseband
Step 3: Conduct the H1 Gaussian smoothness on the demodulated signal to obtain the bandwidth of each mode uk.
The optimal solution is obtained using the alternative direction method of multipliers (ADMM) (Hestenes, 1969) and the original input signal f(t) is decomposed into K intrinsic modes.
2.5 Deep learning forecasting model: BiLSTM
【文献阅读】The role of news sentiment in oil futures returns and volatility forecasting相关推荐
- 文献阅读总结:网络表示学习/图学习
本文是对网络表示学习/图学习(Network Representation Learning / Graph Learning)领域已读文献的归纳总结,长期更新. 朋友们,我们在github创建了一个 ...
- 细胞亚器文献阅读之酵母液泡与线粒体的动态互作A Dynamic Interface between Vacuoles and Mitochondria in Yeast
细胞亚器文献阅读之酵母液泡与线粒体的动态互作A Dynamic Interface between Vacuoles and Mitochondria in Yeast 本文和前一篇阅读的文献之间的关 ...
- 细胞亚器文献阅读二~An ER-Mitochondria Tethering Complex Revealed by a Synthetic Biology Screen
细胞亚器文献阅读二~An ER-Mitochondria Tethering Complex Revealed by a Synthetic Biology Screen 通过合成生物学筛选ER和Mi ...
- 文献阅读_Document-Level Event Argument Extraction by Conditional Generation
前言 小白读论文 文献阅读汇总 Proceedings of the 2021 Conference of the North American Chapter of the Association ...
- 四位科研牛人介绍的文献阅读经验
每天保持读至少2-3 篇的文献的习惯.读文献有不同的读法,但最重要的自己总结概括这篇文献到底说了什么,否则就是白读,读的时候好像什么都明白,一合上就什么都不知道,这是读文献的大忌,既浪费时间,最 ...
- 最大熵模型(Maximum Entropy Model)文献阅读指南
最大熵模型(Maximum Entropy Model)是一种机器学习方法,在自然语言处理的许多领域(如词性标注.中文分词.句子边界识别.浅层句法分析及文本分类等)都有比较好的应用效果.张乐博士的最大 ...
- 条件随机场(Conditional random fields,CRFs)文献阅读指南
与最大熵模型相似,条件随机场(Conditional random fields,CRFs)是一种机器学习模型,在自然语言处理的许多领域(如词性标注.中文分词.命名实体识别等)都有比较好的应用效果.条 ...
- 文献阅读疑问(202010)
学习笔记,仅供参考 文章目录 文献阅读疑问 Unsupervised Deep Embedding for Clustering Analysis 文献阅读疑问 Unsupervised Deep E ...
- 那些文献阅读能力爆表的科研学子,都在偷偷做这件事……
对于广大科研学子来说,阅读文献这件事可谓是贯穿整个学术生涯,因为文献是了解现在所学专业的领域切入点,且做科研遇到难题时还可以在文献中寻找答案. 以及科研实验完毕后,若是准备发表论文,那么还得再看看文献 ...
最新文章
- 软件测试-培训的套路-log3
- ES6之Promise
- 网骗欺诈?网络裸奔?都是因为 HTTP?
- Liferay 用PortletSession 实现不同Liferay之间通讯
- CF1516E. Baby Ehab Plays with Permutations(组合数学)
- kl散度与js散度_数据集相似度度量之KLamp;JS散度
- 浅析GitLab Flow的十一个规则
- msconfig蓝屏_电脑msconfig改动后蓝屏怎么修复
- matlab:xlswrite
- 污染土壤修复可以采取哪些方式
- 一个公务员工作七年后的肺腑之言
- 曾国藩戒烟——人但有恒,事物不成
- UE4 Matinee制作相机动画及其蓝图播放(UE4.11和UE4.19测试通过)
- 如何登录锐捷设备(网关篇)
- 震惊——JS中百度地图开放平台API尽然是这样使用
- 5.brackets 快捷键 有大用
- flask 分布式蓝图_分布式系统架构蓝图:旋风之旅
- 沙盘软件测试题,心理沙盘软件 3D电子沙盘测试
- 闲来无事,咱也来看看腾讯,看看UC
- html中label文本垂直居中,如何在VB中实现label中的文字垂直方向居中?
热门文章
- STM32 GPS定位
- 小程序停止html5音乐,微信小程序API 音乐播放控制
- android 跟踪方法调用,如何连续跟踪Android手机的位置?
- 基于Django的健身房管理系统
- 小米拒绝权限_小米应用商城:我下载的是快图,打开却是天天相册,还发扣费短信...
- 浙江小学python教材_PPT、H5、Python、大数据……浙江中小学新教材9月投用!
- 数据库总结(五):创建与使用视图
- PPT中的声音文件(完美版)
- php图片文字水印透明度,php图片水印 可以设置透明度
- linux阿里云ecs发邮件