An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm
Introduction
Many studies have emerged aiming to explain the phenomenon that irrational decisions tend to be biased on the same direction rather than holding rational expectations (De Long, Shleifer, Summers, Waldmann, 1990, Kahneman, Tversky, 1979, Shiller, 2003, Shiller, Fischer, Friedman, 1984). Investor sentiment has become a focus of behavioral finance in recent years (Antoniou, Doukas, Subrahmanyam, 2013, Barber, Odean, 2012, Sun, Najand, Shen, 2016, Tetlock, 2007, Yang, Mo, Zhu, 2014). On the one hand it directly challenges the assumption that participants of the financial markets are rational but on the other hand, it has inspired researchers to design novel trading strategies to exploit premiums caused by investor’s irrational behaviors driven by the market sentiment.
The aim of this research is to reveal the intrinsic relationship between investor sentiment and market returns using the inverse reinforcement learning (IRL) approach by designing an effective trading system. In this study we answer three questions: (i) What is the interaction mechanism between investor sentiment and financial market? (ii) Is there an inherent mapping between this interaction mechanism and future financial market directions? (iii) Can we take advantage of this interaction mechanism to consistently beat the market? Most of the effort in this field has been focused on building direct relationships between investor sentiment proxies and future financial market movements (Bollen, Mao, Zeng, 2011, Kurov, 2010, Yang, Song, Mo, Datta, Deane, 2015), where these relationship indicators are then used to design trading strategies (Yang, Mo, Liu, & Kirilenko, 2017). However, in this paper we model the financial market dynamics as a Markov decision process and regard investor sentiment as a series of actions taken at different market states. Assuming there exists an intrinsic market reward function governing this process, we then extract the reward function using Gaussian process based inverse reinforcement learning algorithm (Qiao & Beling, 2011), and use the rewards to forecast directions of the future marketx returns.
The intuition of this approach rests on the evidence that investor sentiment has significant influence on financial market movement. When investors are optimistic about the market, they tend to long assets and contribute to the boom of the market. When investors are pessimistic about the future of the market, they tend to take short positions on assets and contribute to the downturn of the market. Investor sentiment would also adjust itself to these market movements when the asset values deviate too far from the fundamental values. During this process, we posit that the reward function from the inverse reinforcement learning framework is “the most succinct, robust, and transferable definition of the task” (Abbeel & Ng, 2004), assuming the market as a whole follows a Markov process. In other words, the reward function is a feature that can be extracted from the observations of the past interactions between investor sentiment and financial market movement. This process naturally filters out the sentiment signals that do not generate market reactions, and hence the reward-based signals should improve the quality of the forecast where the reward function contains cleaner information of such interactions than the raw sentiment measures at least. Based on this intuition, we hypothesize that the market sentiment reward function should be a good feature space for predicting future financial market directions. Then we can design a profitable strategy based on the sentiment rewards extracted from this inverse learning process.
In this paper, we use news sentiment from Thomson Reuters news analytics database as the proxy of investor sentiment toward the general U.S. market. We apply Gaussian process based inverse reinforcement learning (GPIRL) method (Qiao & Beling, 2011) on past observations of market states and investor sentiment shocks. In this process, a preference graph is constructed to capture the distinctive state transition choices of the Markov decision process. One unique feature of this GPIRL method is that it requires less data observations than other linear IRL methods, moreover it is less susceptible to observation noise which is very prevalent when dealing with financial market data. We then compare the performance of the sentiment-reward signals with other sentiment signals such as the raw sentiment score (Feuerriegel, Prendinger, 2016, Sun, Najand, Shen, 2016, Yang, Mo, Zhu, 2014), sentiment shock (Song, Liu, Yang, 2017, Yang, Song, Mo, Datta, Deane, 2015), and sentiment trend (Song, Liu et al., 2017) using support vector machines (SVM) method to generate trading signals. The out-of-sample tests show significant improvement over the existing sentiment signals. Moreover, we examine the sensitivity of the sentiment reward-based signal using different machine learning methods such as Random Forest, Boosting, k-NN (k-nearest neighbors), Decision Tree and Bagging along with SVM method. We show that the SVM method combined with the sentiment reward signals outperforms all the other methods, suggesting an optimal sentiment reward-based trading system.
The major contribution of the paper is to propose a news sentiment based trading system where the trading signals are generated based on market’s rewards to investor sentiment. We argue investor sentiment signals are mostly very noisy, and often markets do not react to these signals. There are two sources of noise that prevent better model forecasting: a) news sentiment is only a proxy of the “true” investor sentiment given the investor sentiment is unobservable, and proxies are mostly deployed to indirectly measure the underlying investor sentiment (Antweiler, Frank, 2004, Barber, Odean, 2012); b) news sentiment measure itself is noisy where many expressions may be used in news articles, but not all accurately describe market conditions (Feuerriegel, Heitzmann, Neumann, 2015, Feuerriegel, & Neumann). Although efforts to filtering the noise existing in sentiment measures show improvement (Song, Almahdi, & Yang, 2017), their effect of applying direct filtering is rather limited (Song, Liu et al., 2017). In this study we propose an investor sentiment reward-based trading system aimed at extracting only signals that generate either negative or positive market responses. This filtering mechanism is realized through training a model based on how market gives rewards to sentiment signals using a supervised learning method. Because if the market does not reward certain news sentiment signals, the learned model will automatically filter out such sentiment signals. As a result, the model will only capture the effect of “true” investor sentiment on the market. Moreover, such a reward extraction mechanism is based not only on market returns but also market volatility representing a succinct and robust feature space. To the best of our knowledge, this approach is the first in representing investor sentiment influence to market returns in the reward space. It can be broadly applied to other market sentiment proxy based trading systems.
The rest of the paper is organized as follows. In Section 2, we review the literature about investor sentiment and news sentiment, as well as machine learning and reinforcement learning based trading strategies. Then we introduce the Gaussian process based inverse reinforcement learning technique and construct state space and action space and present our proposed trading strategy in Section 3. In Section 4, we assess the performance of this sentiment reward-based trading system with other existing sentiment filtering methods along with some popular passive trading strategies. In the last section we make a conclusion of our research and highlight the contributions of the paper.
Section snippets
Literature review
In this section we review two strands of related literature to set the background of our work. First, we review the reinforcement learning and inverse reinforcement learning approaches applied in the financial market forecasting and trading. In the second part we examine the recent work in combining investor sentiment and machine learning methods in building trading systems.
Methodology and data
In this section, we propose a sentiment reward-based inverse reinforcement learning approach to estimate the reward function based on past observations. We use support vector machine (SVM) to classify these rewards into up-trend or down-trend indicators. We also discuss how to estimate the reward function using Gaussian process based inverse reinforcement learning technique, followed by how to construct the state space and action space in our case as well as the design of a trading strategy. At
Experiments and discussions
In this section, we perform a number of experiments to evaluate the system parameter choices and assess the subsequent performance of the proposed sentiment reward-based trading system.
We first discuss the training data used in the experiments. As discussed in Section 3.3, we use a moving window of certain number of hours to learn the rewards, and then we use a number of learned rewards to predict the market direction of the next period under a stationarity assumption. This moving window sample
Conclusion
The main contributions of this study can be summarized as follows: (i) We model the interaction mechanism between investor sentiment and market return using Gaussian process inverse reinforcement learning method with a preference graph to fit the situation in which market states respond to investor sentiment shocks differently. (ii) We propose an inverse reinforcement learning method to extract market rewards toward investor sentiment and show that the sentiment rewards provide high quality
References (49)
- et al.
Twitter mood predicts the stock market
Journal of Computational Science
(2011) - et al.
The hasty wisdom of the mob: How market sentiment predicts stock market behavior
Expert Systems with Applications
(2017) - et al.
News-based trading strategies
Decision Support Systems
(2016) - et al.
The time-varying nature of social media sentiments in modeling stock returns
Decision Support Systems
(2017) Investor sentiment and the stock market’s reaction to monetary policy
Journal of Banking & Finance
(2010)- et al.
Stop-loss strategies with serial correlation, regime switching, and transaction costs
Journal of Financial Markets
(2017) - et al.
Contextual sentiment analysis for social media genres
Knowledge-Based Systems
(2016) - et al.
Quantifying stocktwits semantic terms’ trading behavior in financial markets
Expert Systems with Applications: An International Journal
(2015) - et al.
The impact of microblogging data for stock market prediction: Using twitter to predict returns, volatility, trading volume and survey sentiment indices
Expert Systems with Applications
(2017) - et al.
Negation scope detection in sentiment analysis: decision support for news-driven trading
Decision Support Systems
(2016)