0、摘要

In this paper, we extract the qualitative information from crude oil news headlines, and develop a novel VMD- BiLSTM model with investor sentiment indicator for crude oil forecasting.

本文中,我们提取了原油新闻标题的定性信息,develop a novel VMD-BiLSTM模型进行原油价格预测。

First, we construct a sentiment score considering cumulative effect from contextual data of oil news texts.

第一,我们构建了考虑cumulative effect的sentiment score。

Then, we adopt an event-based method and GARCH model to investigate the impact of news sentiment on returns and volatility. A non-recursive signal decomposition method, namely variational mode decomposition (VMD), is applied to decompose the historical crude oil return and volatility data into various intrinsic modes.

第二,我们考虑了事件分析法以及Garch模型,以investigate情绪指标对收益及波动率的影响。non-recursive signal decomposition method(VMD)应用于分解原油历史收益和波动率。

After that, a bidirectional long short-term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs.

随后,双向long short-term memory neura networks深度学习预测模型被应用,结合了定性及定量的输入。

Our empirical results indicate that the shock of news sentiment significantly causes the fluctuation of oil futures prices, and news sentiment has an asymmetric impact on the volatility of oil futures. The incorporation of sentiment score is always helpful for improving the forecasting performances in all benchmark scenarios. Specifically, our proposed data-decomposition based deep learning model is more effective than several econometric and machine learning models.

我们的实证结果表明,the shock of news sentiment显著导致原油价格fluctuation,并且news sentiment对oil futures的波动率具有非对称影响;纳入sentiment score有助于改善预测表现。

1、Introduction

It is universally acknowledged that the crude oil futures market is a typical risk aggregation market that attracts worldwide attentions: The oil future price movements are identified to be more likely exposed to global political events and receive simultaneous shocks from other fi- nancial asset markets (Leduc and Sill, 2004; Considine and Larson, 2001).

众所周知,原油期货市场是一个typical risk aggregation market, which attracts worldwide attentions:原油期货价格波动exposed to全球政治事件,其他金融市场的同期震荡。

On the other hand, the prices of global financial assets also receive positive feedbacks from oil future price movements and disturbances (Hamilton and Wu, 2014; Teterin et al., 2016).

反过来,全球金融资产价格也会受到positive feedbacks of oil future price.

For crude oil safety management and assets allocation strategy concerns, the precise prediction for oil future market returns and risks is able to provide useful guideline for policy makers and investors.

因此,the precise prediction for oil future market returns and risks is useful.

However, the nonlinearity property of oil future market price is formulated by different types of factors, such as supply and demand relationship (Kilian, 2009), international events (Zhao et al., 2016) and investor sentiment (Qadan and Nama, 2018), which makes it a tough task in oil returns and volatilities prediction for the complex structures of oil price.

但是,其复杂的机制导致预测oil  returns and volatilities很困难。

An abundant amount of studies is devoted to predict the oil returns and volatilities utilizing the historical time-series data of oil market and economic related influencing factors. For example, Fan et al. (2008) use historical observations of WTI and Brent crude oil time-series data to predict the future oil prices based on genetic algorithm. (Shin et al., 2013) introduce semi-supervised learning approach to investigate the impact factors that affect the oil price movements, including OPEC and SAUDI oil production, USD exchange rates and Producer price index etc.

An abundant amount of studies is devoted to predict the oil returns and volatilities. 

However, some underlying factors, such as investor sentiment, may act as potential causes of oil price changes and fluctuations (Du et al., 2016; Qadan and Nama, 2018), which is hard to assess and calculate in empirical works due to non-quantization characteristic of market sentiment and reaction.

However, 一些潜在因子,譬如投资者情绪,难以量化。

Scholars have attempted to find out appropriate proxies for the investor concerns and sentiments of financial market. For example, Baker and Wurgler (2007) utilize a combination of financial indices to quantify the investor sentiment in stock market, including stock trading volume, mutual fund flows and IPO volume etc. Smales (2017) introduces CBOE Volatility Index (VIX) as a measure of investor fears and in- vestigates the relationship between VIX and stock returns. Kostopoulos et al. (2020) apply Google search volumes as a proxy for trading intensities of individual investors in German.

Scholars尝试构建investor concerns and sentiments的指标。

However, the previous measurements of investor sentiments show less effectiveness in providing untapped information for assets returns and volatilities prediction due to the following weakness (Li et al., 2019).

However, the previous measurements of investor sentiments show less effectiveness in provide untapped information.

First, official indices and statistics, such as transaction volume, are identified to provide less unexplored information about investor attentions, which is mainly due to its less consistency with the individual traders (Deng et al., 2012).

第一,official indices and statistics,比如交易量,难以提供unexplored information (untapped).

Second, the intensity and volume data of search engine contain too much investors-irrelevant noise (Limnios and You, 2018). As a result, the sentiment indicators calculated by search indices may show less ef- fectiveness and confidence level in financial assets prediction.

第二,the data of search engine包括太多无关噪音。

Natural Language Processing (NLP) techniques and big available dataset provide a novel framework for investor sentiment indicator constructions. By crawling the news headlines from hubs and websites for energy news, news dataset of crude oil can be tokenized. Utilizing the headline documents, daily investor sentiments are scored generated based on vector space models (Salton et al., 1975). Finally, returns and volatilities of crude oil are predicted by incorporating the daily polarity score of market sentiment.

NLP techniques and big dataset provide a novel framework for investor sentiment indicator constructions.

Sentiment index based on news headlines has the following advantages: First, news headlines reflect key information of investor attention, which can be measured and obtained efficiently through NLP techniques.

优点:news headlines reflect key informantion of investor attention

Second, sentiment index calculated by news headline contains less noise and irrelevant information, which is helpful to improve the reliability of indicator construction (Nassirtoussi et al., 2015).

无关噪音少

In this paper, we formally investigate the impact of news sentiment on oil futures returns and volatility by an event- based method and GARCH model estimations. Overall, the daily investor sentiment of crude oil is computed by NLP technique in this paper and act as a novel predictor for crude oil future returns and volatilities.

Several types of forecasting methods have been applied to oil future returns and volatility prediction by previous works, such as econometric models (Klein and Walther, 2016) and machine learning approaches (Yu et al., 2008; Tang et al., 2015; Yu et al., 2017).

此前多种方法应用于oil future returns and volatility prediction.

However, the econometric or machine learning typed predictors achieve inferior forecasting performance in comparison with the newly introduced deep learning approach (Mallqui and Fernandes, 2018).

但是,其效果皆inferior to deep learning approach

Utilizing the artificial neural networks consisting of multiple hidden layers, deep learning model shows superior time-series data predictability over its counterparts (LeCun et al., 2015). In recent years, deep learning has been applied broadly in crude oil time-series  data  prediction.  For example,  Zhao et al. (2017) apply a novel stacked denoising autoencoders (SDAE) for crude oil forecasting based on a large dataset of exogenous influencing parameters. Luo et al. (2019) employ a novel convolutional neural net- works (CNN) model to improve the short-term prediction performance for crude oil market.

列举一些deep learning literature

Since crude oil market returns and volatilities are non-stationary time-series data and consistent with complex influencing factors, the prediction accuracies of the proposed models may suffer due to the

high volatilities. In recent studies, a novel ensemble forecasting method, namely “Decomposition and Ensemble”, has been developed to handlethe task of irregular and non-stationary time-series data prediction (Bergmeir et al., 2016; Risse, 2019). This method decomposes the original time-series data into several stationary cycles, which can be estimated by forecasting models individually and finally integrated to generate the forecasting output. Among all the decomposition approaches, empirical mode decomposition (EMD) typed method is the predominant approach utilized in current empirical works (Wen et al., 2017; Santhosh et al., 2019).

EMD: 解决因收益时间序列、波动时间序列high volatilities造成的poor prediction accuracies.

However, the prediction error term may accumulate during the combination process of individual decomposed data forecasting, which is considered to reduce the prediction accuracies (Tang et al., 2015). In addition, EMD typed models may also give rise to the mode-mixing problem, which may probably produce the oscillations with similar scales in single decomposed factors (Colominas et al., 2014).

EMD typed models可能导致mode-mixing problem.

Based on the above studies, this paper develops a novel VMD- BiLSTM model with investor sentiment indicator for crude oil forecasting.

First, we extract the qualitative information from crude oil news headlines and conduct sentiment analysis on the contextual data, which provides effective and unexplored information for deep learning forecasting.

第一, qualitative information is extracted from news headline,其可以explore untapped information. 

Moreover, we adopt an event-based method and GARCH model to investigate the impact of news sentiment on returns and volatility.

并且,event-based method and Garch model are adopted to investigate the impact ot news sentiment 对于returns volaitilities. 

Second, a non-recursive signal decomposition method, namely variational mode decomposition (VMD), is applied to decompose the historical crude oil return and volatility data into various intrinsic modes. Compared to the predominant decomposition approach EMD, VMD is tested to avoid the mode-mixing problem effectively (Dragomiretskiy and Zosso, 2014).

第二,VDM is applied to decompose 历史原油收益和波动into various intrinstic modes, which can avoid the mode-mixing problem.

Third, a bidirectional long short- term memory neural networks (BiLSTM) is introduced as the deep learning prediction model that integrates both the qualitative and quantitative model inputs. The proposed BiLSTM model can extract a two- way sequential relationship in the time series data.

第三,BiLSTM can integrate both the qualitative and quantitative model inputs. The proposed BiLSTM model can extract a two-way sequential relationship.

According to our empirical results, we find the shock of news sentiment significantly causes the fluctuation of oil futures prices. Specifically, oil futures prices react positively around positive news shocks, and present relatively weak decline surrounding negative news shocks. According to the estimations of GARCH models, we find that news sentiment has an asymmetric impact on the volatility of oil futures. As for oil return and volatility forecasting, the incorporation of news index is always helpful for improving the forecasting performances in all benchmark scenarios. Specifically, our proposed data-decomposition based deep learning model is more effective than several econometric and machine learning models.

The major contributions of this paper may lie in that, to the best of our knowledge, this is the first paper to incorporate the sentiment index of oil market based on NLP technique for oil future returns and volatilities prediction, which serves as an initial attempt to improve the forecasting results utilizing the hidden and effective information of irrational behaviors in the crude oil market.

Furthermore, we empirically confirm the effectiveness of our proposed hybrid deep learning models for oil return and volatility forecasting. Our proposed model outperforms several benchmark econometrics, machine learning models, deep learning models and hybrid learning models. The methodology and empirical results presented by our study shed new light on risk controls of oil-related assets based on large-scale online datasets and data- driven approaches.

The rest of this paper is arranged as follows: Section 2 presents the research framework, news text analysis methods and forecasting models; Section 3 tests the impact of news sentiment on oil returns and volatility based on an event-based method and the estimation of GARCH models; Section 4 presents the empirical results of oil returns and volatility forecasting, including several robustness tests. Finally, the concluding remarks and future directions are concluded in Section 5.

2、Methodology

2.1 Research Framework

The forecasting approach proposed in this study aims to utilize qualitative information extracted from financial news headlines and quantitative information extracted from market time series data to improve the return and volatility forecasting accuracy in the crude oil futures market.

The framework of our proposed approach is shown in Fig. 1. Specifically, there are five major steps, namely data collection, data pre- processing, sentiment analysis, data decomposition, as well as returns and volatility forecasting. These steps are explained in detail in Sections 2.2–2.5.

2.2 Data collection and preprocessing

For this study, we collected two different datasets separately: crude oil futures price data and news headlines. In terms of the crude oil price dataset, the Brent (LCO) crude oil daily futures contract closing prices are retrieved from Investing.com, for the time period from January 4, 2010 to September 17, 2019. In terms of the news headlines dataset,    all the available news data related to “Crude oil” from oilprice.com, which is one of the largest hubs for energy news in the world with over 100,000 daily visitors, for the same time period as the crude oil news headline data. Instead of using full news articles in the analysis, we use news headlines due to several advantages: first, news headlines can provide a sufficient summary of the key news information; second, news headlines contain much less repetition and fewer irrelevant words than the news article itself (Nassirtoussi et al., 2015).

We first preprocess the raw news headlines dataset using tokenization to convert all headlines into lower cases, and to remove stop words and punctuations.

转小写,去除停用词,标点

Stop words are the most common words in a language, such as “the”, “a”, “on”, “all” and “is”. Since stop words, along with punctuations, do not carry important information re- lated to the text, they are removed during preprocessing.

After removing  stop  words and  punctuations,  the “bag-of-words” approach is then employed to transform new texts into vectors. In this approach, each document (news headline) is represented by a vector, and each word within the document represents an element in the vector.

each news headline is equal to a document. 向量

每一个标题的每一个词语代表向量中的一个元素

The length of each vector is determined by the number of distinct  words  in the corresponding news headline in the dataset.

向量长度 =  the number of distinct words in corresponding news headline

In this study, we also use a commonly used weighting technique, namely Term Frequency- Inverse Document Frequency (TF-IDF), in the vectorization process to evaluate the importance of a word to a specific document in a collection of documents. The importance of the word increases proportionally with the number of times it appears in the document, but decreases with the

number of documents that contain the word in the collection. Specifically, the TF-IDF score of word x in a document is calculated as follows in Eq. (1):

 计算TF*IDF:评估某个词语的相对重要性。

In terms of the crude oil price data, we select the daily returns of the

Brent crude oil futures contracts as well as the 7-day volatility as the prediction targets.

日度对数收益率、七天平均波动率:prediction targets.

 正交化

2.3 Sentiment analysis

In this study, we employ the Sentimentr package in R to calculate the sentiment of each processed news headline. The Sentimentr package returns the polarity score in the range of [−1.0, 1.0] for each document.

The news is considered as positive news if its polarity score is above zero, otherwise, it is considered as negative news. In general, the more negative the polarity score, the more negative the news; the more positive the polarity score, the more positive the news.

As pointed by previous studies, news often has a rather continuous effect on the investor's sentiment in the actual futures market (Akhtar et al., 2013). That is to say, the public sentiment on a specific day is shaped by the combination of news on the day and that in previous few days. However, the more recent news is more influential than the old news. Considering this situation, we formulate a cumulative senti- ment score (CSS) following Kiritchenko et al. (2014) and Chowdhury et al. (2014). In this study, we assume any piece of news will have a significant impact on the investor sentiment for seven days, and that its impact exponentially declines each day after its release, which is consistent with the actual situation of news impact (Huang et al., 2014).

2.4 Data decomposition

According to previous literature, decomposing the original time series data into sub-series modes with different economic implications can help the neural networks capture its tendency and cyclicity (Wang et al., 2014). In this study, we employ variational mode decomposition (VMD) in the data decomposition process for the daily returns and 7-day volatility time series of Brent crude oil. In general, VMD is a non-recursive optimization technique that decomposes the original input signal f(t) into a series of discrete and stationary intrinsic modes uk through Wiener filtering and Hilbert transform (Liu et al., 2016). The optimization procedure is as follows (Zhang et al., 2017):

Step 1: Calculate the Hilbert transform of each mode uk and transform into respective uni-sided frequency spectrum.

Step 2: Alter the frequency spectrum of each mode uk to narrow frequency baseband

Step 3: Conduct the H1 Gaussian smoothness on the demodulated signal to obtain the bandwidth of each mode uk.

The optimal solution is obtained using the alternative direction method of multipliers (ADMM) (Hestenes, 1969) and the original input signal f(t) is decomposed into K intrinsic modes.

2.5 Deep learning forecasting model: BiLSTM

【文献阅读】The role of news sentiment in oil futures returns and volatility forecasting相关推荐

  1. 文献阅读总结:网络表示学习/图学习

    本文是对网络表示学习/图学习(Network Representation Learning / Graph Learning)领域已读文献的归纳总结,长期更新. 朋友们,我们在github创建了一个 ...

  2. 细胞亚器文献阅读之酵母液泡与线粒体的动态互作A Dynamic Interface between Vacuoles and Mitochondria in Yeast

    细胞亚器文献阅读之酵母液泡与线粒体的动态互作A Dynamic Interface between Vacuoles and Mitochondria in Yeast 本文和前一篇阅读的文献之间的关 ...

  3. 细胞亚器文献阅读二~An ER-Mitochondria Tethering Complex Revealed by a Synthetic Biology Screen

    细胞亚器文献阅读二~An ER-Mitochondria Tethering Complex Revealed by a Synthetic Biology Screen 通过合成生物学筛选ER和Mi ...

  4. 文献阅读_Document-Level Event Argument Extraction by Conditional Generation

    前言 小白读论文 文献阅读汇总 Proceedings of the 2021 Conference of the North American Chapter of the Association ...

  5. 四位科研牛人介绍的文献阅读经验

     每天保持读至少2-3 篇的文献的习惯.读文献有不同的读法,但最重要的自己总结概括这篇文献到底说了什么,否则就是白读,读的时候好像什么都明白,一合上就什么都不知道,这是读文献的大忌,既浪费时间,最 ...

  6. 最大熵模型(Maximum Entropy Model)文献阅读指南

    最大熵模型(Maximum Entropy Model)是一种机器学习方法,在自然语言处理的许多领域(如词性标注.中文分词.句子边界识别.浅层句法分析及文本分类等)都有比较好的应用效果.张乐博士的最大 ...

  7. 条件随机场(Conditional random fields,CRFs)文献阅读指南

    与最大熵模型相似,条件随机场(Conditional random fields,CRFs)是一种机器学习模型,在自然语言处理的许多领域(如词性标注.中文分词.命名实体识别等)都有比较好的应用效果.条 ...

  8. 文献阅读疑问(202010)

    学习笔记,仅供参考 文章目录 文献阅读疑问 Unsupervised Deep Embedding for Clustering Analysis 文献阅读疑问 Unsupervised Deep E ...

  9. 那些文献阅读能力爆表的科研学子,都在偷偷做这件事……

    对于广大科研学子来说,阅读文献这件事可谓是贯穿整个学术生涯,因为文献是了解现在所学专业的领域切入点,且做科研遇到难题时还可以在文献中寻找答案. 以及科研实验完毕后,若是准备发表论文,那么还得再看看文献 ...

最新文章

  1. 软件测试-培训的套路-log3
  2. ES6之Promise
  3. 网骗欺诈?网络裸奔?都是因为 HTTP?
  4. Liferay 用PortletSession 实现不同Liferay之间通讯
  5. CF1516E. Baby Ehab Plays with Permutations(组合数学)
  6. kl散度与js散度_数据集相似度度量之KLamp;JS散度
  7. 浅析GitLab Flow的十一个规则
  8. msconfig蓝屏_电脑msconfig改动后蓝屏怎么修复
  9. matlab:xlswrite
  10. 污染土壤修复可以采取哪些方式
  11. 一个公务员工作七年后的肺腑之言
  12. 曾国藩戒烟——人但有恒,事物不成
  13. UE4 Matinee制作相机动画及其蓝图播放(UE4.11和UE4.19测试通过)
  14. 如何登录锐捷设备(网关篇)
  15. 震惊——JS中百度地图开放平台API尽然是这样使用
  16. 5.brackets 快捷键 有大用
  17. flask 分布式蓝图_分布式系统架构蓝图:旋风之旅
  18. 沙盘软件测试题,心理沙盘软件 3D电子沙盘测试
  19. 闲来无事,咱也来看看腾讯,看看UC
  20. html中label文本垂直居中,如何在VB中实现label中的文字垂直方向居中?

热门文章

  1. STM32 GPS定位
  2. 小程序停止html5音乐,微信小程序API 音乐播放控制
  3. android 跟踪方法调用,如何连续跟踪Android手机的位置?
  4. 基于Django的健身房管理系统
  5. 小米拒绝权限_小米应用商城:我下载的是快图,打开却是天天相册,还发扣费短信...
  6. 浙江小学python教材_PPT、H5、Python、大数据……浙江中小学新教材9月投用!
  7. 数据库总结(五):创建与使用视图
  8. PPT中的声音文件(完美版)
  9. php图片文字水印透明度,php图片水印 可以设置透明度
  10. linux阿里云ecs发邮件