A novel data-driven stock price trend prediction system☆
Introduction
Stock price trend prediction is a classic and interesting topic that has attracted many researchers and participants in multiple disciplines such as economics, financial engineering, statistics, operations research, and machine learning. Although a lot of efforts have been paid during the past several decades (Abarbanell, Bernard, 1992, Adam, Marcet, Nicolini, 2016, Adebiyi, Adewumi, Ayo, 2014, Blume, Easley, O’hara, 1994, Göçken, Özçalıcı, Boru, Dosdoğru, 2016), accurate forecast of the stock price, even its movements, is still not easy to achieve hitherto, though some advanced machine learning techniques have been utilized. For instance, Kim (2003) used support vector machines to predict the direction of the daily socket price movements in Korea, obtaining a hit rate 56%. Schumaker and Chen (2009) included the text mining technique into socket price forecast, achieving a hit rate 57%. Tsai and Wang (2009) combined the decision tree and neural networks to make prediction to Taiwan stock market. The accuracy of the hybrid model achieves around 70%. However, their test data sets were relatively small, only including dozens of stocks. According to a recent empirical study (Gerlein, McGinnity, Belatreche, & Coleman, 2016), the prediction accuracies of several machine learning models (such as C4.5, K*, logistic model tree, etc.) are in the range of 48% ∼ 54%.
Traditional technical analysts have developed many indices and sequential analytical methods that may reflect the trends in the movements of the stock price. However, technical analysis contradicts with the efficient-market hypothesis but they cannot make generalised inferences regarding the accuracy. For example, the efficient-market hypothesis states that as long as the market is weak-form efficient, the price of a stock follows the random walk model (Fama, 1995) and cannot be predicted by analyzing prices from the past. Meanwhile, the prices are affected by many macro-economical factors, fundamental factors of companies and the involvement of public investors. Therefore, some criticism of technical analysis is that it only considers transactional data of stocks and completely ignores the fundamental factors of companies (Nassirtoussi, Aghabozorgi, Wah, Ngo, 2014, Patel, Shah, Thakkar, Kotecha, 2015) which might be helpful, if the market is in weak-form efficiency.
The fundamental factors of a company cover many aspects such as basic financial status, marketing and development strategies, political events, general economic conditions, commodity price indices, interest rate changes, movements of other stock markets, expectations and psychology of investors, and so on. Comprehensively figuring out the impact of these compound factors on the movement of the stock price is obviously out of the capability of human analysts. Researchers have begun to develop some text-mining based methods that can automatically analyze some of these fundamental factors (Nassirtoussi et al., 2014). For example, Schumaker and Chen (2009) extracted information from the breaking financial news to increase the accuracy of prediction. Bollen, Mao, and Zeng (2011) analyzed the mood of investors from twitter to reveal the sentiments of investors to some stocks. Ruiz, Hristidis, Castillo, Gionis, and Jaimes (2012) analyzed the correlations of financial time series from micro-blogging activities. Si et al. (2013) proposed a technique to leverage topic based sentiments from Twitter to facilitate the prediction of the stock market. Even the up-to-date deep learning techniques have been introduced to conduct event-driven stock market prediction, where events are extracted from news (Ding, Zhang, Liu, & Duan, 2015). However, the automatic fundamental factor analysis may be of some weakness. First, even though the messages or reports are released by the companies, public media or some third-party institutes, it still cannot be guaranteed that there is no misleading information. Second, it is not very clear how strong the correlation is between the released information and the stock price movement. Third, when the market is in semi-strong-form and strong-form efficiencies, the fundamental factor analysis even cannot bring excess returns (Timmermann & Granger, 2004).
Fortunately, in today’s big data age, above issues could be bypassed, as a new train of thought, saying ”let the data speak for themselves”, has been proposed and drawn more attention. Unlike the information obtained from newspapers, micro-blogging and twitter, the everyday transaction data taking place in trade systems are absolutely realistic. The rapid development of machine learning provides a lot of new opportunities to utilize these transaction data to predict the trend of the stock price movement. In fact, applying machine learning to stock prediction has been studied for over thirty years. The early studies in 1990s mainly focused on using Neutral Networks to make prediction (Schöneburg, 1990, Zhang, Patuwo, Hu, 1998), which partially refuted the validity of efficient market hypothesis (Lawrence, 1997). For example, Tsibouris and Zeidenberg (1995) utilized neural networks to predict stock price only based on past stock prices. The performance of these early methods usually was not good because of the size limitation of the neural networks. To address this issue, some recent studies resort to fusion or combination of models (Hadavandi, Shavandi, Ghanbari, 2010, Tsai, Hsiao, 2010) and ensemble learning (Ballings, Van den Poel, Hespeels, Gryp, 2015, Barak, Arjmand, Ortobelli, 2017, Tsai, Lin, Yen, Chen, 2011). All above studies have a common weak point that their practical availability is still questionable. In their studies, a small amount of carefully selected and labeled stock data were used to train and test models. Since the data do not cover all stocks and their movements in a stock market, the generalisation capabilities of the models are reduced in the real applications.
A real stock market carries out huge amount of transactions every day. We cannot expect that a real-world computer-aided decision system heavily relies on humans selecting and labeling the data used for model training. Unsupervised pattern recognition becomes more and more important in today’s big data age (Wu, Zhu, Wu, & Ding, 2014). If the problem of automatic data preprocessing cannot be solved, the system will hardly to be pushed into a real usage, even if the learning algorithms inside are advanced. In this paper, we propose a novel data-driven stock price trend prediction system Xuanwu1 The contribution of Xuanwu is three-fold. (1) it introduces unsupervised pattern recognition methods to generate training samples from raw transaction data without any human intervene; (2) it is a system for a real usage, in which multiple learning models are trained to meet the prediction goals derived from actual user requirements, and its application interface is of the maximum availability that is suitable for any stock and any prediction duration; (3) it provides a simple and easy-to-test framework in which different supervised learning models and feature selection methods can be easily integrated. Experimental results show that the proposed system outperforms some state-of-the-art methods in stock movement prediction even though the models of the compared methods are trained with carefully human-labeled samples.
The remainder of the paper is organized as follows. Section 2 describes the requirements from the aspect of users. Section 3 illustrates the architecture of the proposed system. Section 4 describes our unsupervised method to generate training samples. Section 5 presents our learning method in details. Section 6 addresses experimental results and discussions on these results. Section 7 concludes the paper and points out some future work.
Section snippets
Requirements
Xuanwu follows an assumption that the actual investment activities which are carried out based on the prediction results of the system do not have a far-reaching impact on the movements of the stock prices in the future. Therefore, it is specially suitable for the small startup investment companies that collect money from a small population and return the profits in fixed contract periods, whose investment volumes usually do not cause obvious market fluctuation. Otherwise, the prediction will
System architecture
The system architecture of Xuanwu is illustrated in Fig. 1. The main components of the system include a model training tool, a fundamental data module, a prediction API and a visualization tool (XwExplorer).
Training sample generation
In this section, we will present our training sample generation scheme in Xuanwu in detail.
Model training
The current version of Xuanwu partially utilizes the off-the-self classification algorithms implemented in WEKA (Hall et al., 2009) to build learning models, among which we find that the models trained by the random forests (Breiman, 2001) perform well. To further improve the performance of the learned classifiers, we focus on two issues in model training: imbalanced class distribution and feature selection.
Evaluation
We evaluate our proposed system Xuanwu on 495 stocks in Shenzhen Growth Enterprise Market in China. The time span of the transaction data of these stocks is within the range from January 25, 2010 to October 1, 2016.
Conclusion and future work
For small startup investment companies, due to limited funds, it is impossible to trade in the stock market frequently. Instead, they are interested in moderate investment periods that last a week to three months. To address the prediction of the stock price trend in such periods, this paper proposes a novel data-driven system Xuanwu. The system gets through all machine learning processes from generating training samples from the original transaction data to building the prediction models
Acknowledgements
This research has been supported by the National Natural Science Foundation of China under Grant No. 61603186, the Natural Science Foundation of Jiangsu Province, China, under Grant No. BK20160843, the China Postdoctoral Science Foundation under Grant No. 2016M590457 and 2017T100370, and the Science Foundation of the Science and Technology Commission of the Central Military Commission (Youth Project), China.
References (42)
- et al.
A dynamic trading rule based on filtered flag pattern recognition for stock market price forecasting
Expert Systems with Applications
(2017) - et al.
Evaluating multiple classifiers for stock price direction prediction
Expert Systems with Applications
(2015) - et al.
Fusion of multiple diverse predictors in stock market
Information Fusion
(2017) - et al.
Twitter mood predicts the stock market
Journal of Computational Science
(2011) - et al.
Stock market trading rule based on pattern recognition and technical analysis: Forecasting the DJIA index with intraday data
Expert Systems with Applications
(2015) - et al.
Feature selection for classification
Intelligent Data Analysis
(1997) - et al.
Evaluating machine learning classification for financial trading: An empirical approach
Expert Systems with Applications
(2016) - et al.
Integrating metaheuristics and artificial neural networks for improved stock price prediction
Expert Systems with Applications
(2016) - et al.
Integration of genetic fuzzy systems and artificial neural networks for stock price forecasting
Knowledge-Based Systems
(2010) - et al.
Feature selection and parameter optimization of a fuzzy-based stock selection model using genetic algorithms
International Journal of Fuzzy Systems
(2012)
Financial time series forecasting using support vector machines
Neurocomputing
Short-term stock price prediction based on echo state networks
Expert Systems with Applications
Stock trend prediction based on fractal feature selection and support vector machine
Expert Systems with Applications
Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques
Expert Systems with Applications
Stock price prediction using neural networks: A project report
Neurocomputing
Efficient market hypothesis and forecasting
International Journal of Forecasting
Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches
Decision Support Systems
Predicting stock returns by classifier ensembles
Applied Soft Computing
Stock market trading rule discovery using pattern recognition and technical analysis
Expert Systems with Applications
Forecasting with artificial neural networks: The state of the art
International Journal of Forecasting
Tests of analysts’ overreaction/underreaction to earnings information as an explanation for anomalous stock price behavior
The Journal of Finance
Cited by (110)
Multi-source data ensemble for energy price trend forecasting
2024, Engineering Applications of Artificial IntelligenceNews-based intelligent prediction of financial markets using text mining and machine learning: A systematic literature review
2023, Expert Systems with ApplicationsScientometric review and analysis of recent approaches to stock market forecasting: Two decades survey
2023, Expert Systems with ApplicationsStock movement prediction: A multi-input LSTM approach
2024, Journal of ForecastingUncertainty Optimization Based Feature Selection Model for Stock Marketing
2024, Computational EconomicsSTOCK MARKET PREDICITION USING STATISTICAL & DEEP LEARNING TECHNIQUES
2023, Journal of Theoretical and Applied Information Technology
- ☆
The source code of the system is available at https://sourceforge.net/p/xuanwu/svn/.