A novel data-driven stock price trend prediction system

https://doi.org/10.1016/j.eswa.2017.12.026Get rights and content

Highlights

  • A data-driven stock price trend prediction system is designed and implemented.

  • Models are trained from historical data using random forest with feature selection.

  • Training data are created by unsupervised morphological pattern recognition.

Abstract

This paper proposes a novel stock price trend prediction system that can predict both stock price movement and its interval of growth (or decline) rate within the predefined prediction durations. It utilizes an unsupervised heuristic algorithm to cut raw transaction data of each stock into multiple clips with the predefined fixed length and classifies them into four main classes (Up, Down, Flat, and Unknown) according to the shapes of their close prices. The clips in Up and Down can be further classified into different levels reflecting the extents of their growth (or decline) rates with respect to both close price and relative return rate. The features of clips include their prices and technical indices. The prediction models are trained from these clips by a combination of random forests, imbalance learning and feature selection. Evaluations on the seven-year Shenzhen Growth Enterprise Market (China) transaction data show that the proposed system can make effective predictions, is robust to the market volatility, and outperforms some existing methods in terms of accuracy and return per trade.

Introduction

Stock price trend prediction is a classic and interesting topic that has attracted many researchers and participants in multiple disciplines such as economics, financial engineering, statistics, operations research, and machine learning. Although a lot of efforts have been paid during the past several decades (Abarbanell, Bernard, 1992, Adam, Marcet, Nicolini, 2016, Adebiyi, Adewumi, Ayo, 2014, Blume, Easley, O’hara, 1994, Göçken, Özçalıcı, Boru, Dosdoğru, 2016), accurate forecast of the stock price, even its movements, is still not easy to achieve hitherto, though some advanced machine learning techniques have been utilized. For instance, Kim (2003) used support vector machines to predict the direction of the daily socket price movements in Korea, obtaining a hit rate 56%. Schumaker and Chen (2009) included the text mining technique into socket price forecast, achieving a hit rate 57%. Tsai and Wang (2009) combined the decision tree and neural networks to make prediction to Taiwan stock market. The accuracy of the hybrid model achieves around 70%. However, their test data sets were relatively small, only including dozens of stocks. According to a recent empirical study (Gerlein, McGinnity, Belatreche, & Coleman, 2016), the prediction accuracies of several machine learning models (such as C4.5, K*, logistic model tree, etc.) are in the range of 48% ∼ 54%.

Traditional technical analysts have developed many indices and sequential analytical methods that may reflect the trends in the movements of the stock price. However, technical analysis contradicts with the efficient-market hypothesis but they cannot make generalised inferences regarding the accuracy. For example, the efficient-market hypothesis states that as long as the market is weak-form efficient, the price of a stock follows the random walk model (Fama, 1995) and cannot be predicted by analyzing prices from the past. Meanwhile, the prices are affected by many macro-economical factors, fundamental factors of companies and the involvement of public investors. Therefore, some criticism of technical analysis is that it only considers transactional data of stocks and completely ignores the fundamental factors of companies (Nassirtoussi, Aghabozorgi, Wah, Ngo, 2014, Patel, Shah, Thakkar, Kotecha, 2015) which might be helpful, if the market is in weak-form efficiency.

The fundamental factors of a company cover many aspects such as basic financial status, marketing and development strategies, political events, general economic conditions, commodity price indices, interest rate changes, movements of other stock markets, expectations and psychology of investors, and so on. Comprehensively figuring out the impact of these compound factors on the movement of the stock price is obviously out of the capability of human analysts. Researchers have begun to develop some text-mining based methods that can automatically analyze some of these fundamental factors (Nassirtoussi et al., 2014). For example, Schumaker and Chen (2009) extracted information from the breaking financial news to increase the accuracy of prediction. Bollen, Mao, and Zeng (2011) analyzed the mood of investors from twitter to reveal the sentiments of investors to some stocks. Ruiz, Hristidis, Castillo, Gionis, and Jaimes (2012) analyzed the correlations of financial time series from micro-blogging activities. Si et al. (2013) proposed a technique to leverage topic based sentiments from Twitter to facilitate the prediction of the stock market. Even the up-to-date deep learning techniques have been introduced to conduct event-driven stock market prediction, where events are extracted from news (Ding, Zhang, Liu, & Duan, 2015). However, the automatic fundamental factor analysis may be of some weakness. First, even though the messages or reports are released by the companies, public media or some third-party institutes, it still cannot be guaranteed that there is no misleading information. Second, it is not very clear how strong the correlation is between the released information and the stock price movement. Third, when the market is in semi-strong-form and strong-form efficiencies, the fundamental factor analysis even cannot bring excess returns (Timmermann & Granger, 2004).

Fortunately, in today’s big data age, above issues could be bypassed, as a new train of thought, saying ”let the data speak for themselves”, has been proposed and drawn more attention. Unlike the information obtained from newspapers, micro-blogging and twitter, the everyday transaction data taking place in trade systems are absolutely realistic. The rapid development of machine learning provides a lot of new opportunities to utilize these transaction data to predict the trend of the stock price movement. In fact, applying machine learning to stock prediction has been studied for over thirty years. The early studies in 1990s mainly focused on using Neutral Networks to make prediction (Schöneburg, 1990, Zhang, Patuwo, Hu, 1998), which partially refuted the validity of efficient market hypothesis (Lawrence, 1997). For example, Tsibouris and Zeidenberg (1995) utilized neural networks to predict stock price only based on past stock prices. The performance of these early methods usually was not good because of the size limitation of the neural networks. To address this issue, some recent studies resort to fusion or combination of models (Hadavandi, Shavandi, Ghanbari, 2010, Tsai, Hsiao, 2010) and ensemble learning (Ballings, Van den Poel, Hespeels, Gryp, 2015, Barak, Arjmand, Ortobelli, 2017, Tsai, Lin, Yen, Chen, 2011). All above studies have a common weak point that their practical availability is still questionable. In their studies, a small amount of carefully selected and labeled stock data were used to train and test models. Since the data do not cover all stocks and their movements in a stock market, the generalisation capabilities of the models are reduced in the real applications.

A real stock market carries out huge amount of transactions every day. We cannot expect that a real-world computer-aided decision system heavily relies on humans selecting and labeling the data used for model training. Unsupervised pattern recognition becomes more and more important in today’s big data age (Wu, Zhu, Wu, & Ding, 2014). If the problem of automatic data preprocessing cannot be solved, the system will hardly to be pushed into a real usage, even if the learning algorithms inside are advanced. In this paper, we propose a novel data-driven stock price trend prediction system Xuanwu1 The contribution of Xuanwu is three-fold. (1) it introduces unsupervised pattern recognition methods to generate training samples from raw transaction data without any human intervene; (2) it is a system for a real usage, in which multiple learning models are trained to meet the prediction goals derived from actual user requirements, and its application interface is of the maximum availability that is suitable for any stock and any prediction duration; (3) it provides a simple and easy-to-test framework in which different supervised learning models and feature selection methods can be easily integrated. Experimental results show that the proposed system outperforms some state-of-the-art methods in stock movement prediction even though the models of the compared methods are trained with carefully human-labeled samples.

The remainder of the paper is organized as follows. Section 2 describes the requirements from the aspect of users. Section 3 illustrates the architecture of the proposed system. Section 4 describes our unsupervised method to generate training samples. Section 5 presents our learning method in details. Section 6 addresses experimental results and discussions on these results. Section 7 concludes the paper and points out some future work.

Section snippets

Requirements

Xuanwu follows an assumption that the actual investment activities which are carried out based on the prediction results of the system do not have a far-reaching impact on the movements of the stock prices in the future. Therefore, it is specially suitable for the small startup investment companies that collect money from a small population and return the profits in fixed contract periods, whose investment volumes usually do not cause obvious market fluctuation. Otherwise, the prediction will

System architecture

The system architecture of Xuanwu is illustrated in Fig. 1. The main components of the system include a model training tool, a fundamental data module, a prediction API and a visualization tool (XwExplorer).

Training sample generation

In this section, we will present our training sample generation scheme in Xuanwu in detail.

Model training

The current version of Xuanwu partially utilizes the off-the-self classification algorithms implemented in WEKA (Hall et al., 2009) to build learning models, among which we find that the models trained by the random forests (Breiman, 2001) perform well. To further improve the performance of the learned classifiers, we focus on two issues in model training: imbalanced class distribution and feature selection.

Evaluation

We evaluate our proposed system Xuanwu on 495 stocks in Shenzhen Growth Enterprise Market in China. The time span of the transaction data of these stocks is within the range from January 25, 2010 to October 1, 2016.

Conclusion and future work

For small startup investment companies, due to limited funds, it is impossible to trade in the stock market frequently. Instead, they are interested in moderate investment periods that last a week to three months. To address the prediction of the stock price trend in such periods, this paper proposes a novel data-driven system Xuanwu. The system gets through all machine learning processes from generating training samples from the original transaction data to building the prediction models

Acknowledgements

This research has been supported by the National Natural Science Foundation of China under Grant No. 61603186, the Natural Science Foundation of Jiangsu Province, China, under Grant No. BK20160843, the China Postdoctoral Science Foundation under Grant No. 2016M590457 and 2017T100370, and the Science Foundation of the Science and Technology Commission of the Central Military Commission (Youth Project), China.

References (42)

  • K.-j. Kim

    Financial time series forecasting using support vector machines

    Neurocomputing

    (2003)
  • X. Lin et al.

    Short-term stock price prediction based on echo state networks

    Expert Systems with Applications

    (2009)
  • L.-P. Ni et al.

    Stock trend prediction based on fractal feature selection and support vector machine

    Expert Systems with Applications

    (2011)
  • J. Patel et al.

    Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques

    Expert Systems with Applications

    (2015)
  • E. Schöneburg

    Stock price prediction using neural networks: A project report

    Neurocomputing

    (1990)
  • A. Timmermann et al.

    Efficient market hypothesis and forecasting

    International Journal of Forecasting

    (2004)
  • C.-F. Tsai et al.

    Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches

    Decision Support Systems

    (2010)
  • C.-F. Tsai et al.

    Predicting stock returns by classifier ensembles

    Applied Soft Computing

    (2011)
  • J.-L. Wang et al.

    Stock market trading rule discovery using pattern recognition and technical analysis

    Expert Systems with Applications

    (2007)
  • G. Zhang et al.

    Forecasting with artificial neural networks: The state of the art

    International Journal of Forecasting

    (1998)
  • J.S. Abarbanell et al.

    Tests of analysts’ overreaction/underreaction to earnings information as an explanation for anomalous stock price behavior

    The Journal of Finance

    (1992)
  • Cited by (110)

    • Multi-source data ensemble for energy price trend forecasting

      2024, Engineering Applications of Artificial Intelligence
    • STOCK MARKET PREDICITION USING STATISTICAL & DEEP LEARNING TECHNIQUES

      2023, Journal of Theoretical and Applied Information Technology
    View all citing articles on Scopus

    The source code of the system is available at https://sourceforge.net/p/xuanwu/svn/.

    View full text