Elsevier

Expert Systems with Applications

Volume 67, January 2017, Pages 126-139
Expert Systems with Applications

Forecasting daily stock market return using dimensionality reduction

https://doi.org/10.1016/j.eswa.2016.09.027Get rights and content

Highlights

  • A data mining procedure to forecast daily stock market return is proposed.

  • The raw data includes 60 financial and economic features over a 10-year period.

  • Combining ANNs with PCA gives slightly higher classification accuracy.

  • Combining ANNs with PCA provides significantly higher risk-adjusted profits.

Abstract

In financial markets, it is both important and challenging to forecast the daily direction of the stock market return. Among the few studies that focus on predicting daily stock market returns, the data mining procedures utilized are either incomplete or inefficient, especially when a large amount of features are involved. This paper presents a complete and efficient data mining process to forecast the daily direction of the S&P 500 Index ETF (SPY) return based on 60 financial and economic features. Three mature dimensionality reduction techniques, including principal component analysis (PCA), fuzzy robust principal component analysis (FRPCA), and kernel-based principal component analysis (KPCA) are applied to the whole data set to simplify and rearrange the original data structure. Corresponding to different levels of the dimensionality reduction, twelve new data sets are generated from the entire cleaned data using each of the three different dimensionality reduction methods. Artificial neural networks (ANNs) are then used with the thirty-six transformed data sets for classification to forecast the daily direction of future market returns. Moreover, the three different dimensionality reduction methods are compared with respect to the natural data set. A group of hypothesis tests are then performed over the classification and simulation results to show that combining the ANNs with the PCA gives slightly higher classification accuracy than the other two combinations, and that the trading strategies guided by the comprehensive classification mining procedures based on PCA and ANNs gain significantly higher risk-adjusted profits than the comparison benchmarks, while also being slightly higher than those strategies guided by the forecasts based on the FRPCA and KPCA models.

Section snippets

Introduction and methodology

Analyzing stock market movements is extremely challenging for both investors and researchers. This is mainly due to the stock market essentially being a dynamic, nonlinear, nonstationary, nonparametric, noisy, and chaotic system (Deboeck, 1994, Yaser and Atiya, 1996). In fact, stock markets are affected by many highly interrelated factors. These factors include: 1) economic variables, such as interest rates, exchange rates, monetary growth rates, commodity prices, and general economic

Data description

The data set utilized for this study involves the daily direction (UP or DOWN) of the closing price of the SPDR S&P 500 ETF (ticker: SPY) as the output, along with 60 financial and economic factors as the potential features. These daily data are collected from 2518 trading days between June 1, 2003 and May 31, 2013. The 60 potential features can be divided into 10 groups, including the SPY return for the current day and three previous days, the relative difference in percentage of the SPY

PCA

A number of linear or nonlinear techniques have been developed to embed high-dimensional data into a lower dimensional space without much loss of the information. Among them, PCA is the most popular unsupervised linear technique for dimensionality reduction. Jolliffe (1986) gives an authoritative and accessible account of this methodology. As one of the earliest multivariate techniques, PCA is aimed to construct a low-dimensional representation of the data while keeping the maximal variance and

The ANN classifiers

Artificial Neural Networks (ANNs) were invented to mimic the human brain by carefully defining and designing the network architecture, including the number of network layers, the types of connections among the network layers, the numbers of the neurons in each layer, the learning algorithm, the learning rate, weights between neurons, and the various neuron activation functions. ANNs function like a black box that can output prediction or classification results based on the input information.

An

Use PCA, FRPCA, and KPCA to reduce the dimensionality

Background modeling details for the PCA, FRPCA, and KPCA dimensionality reduction techniques are provided in Sections 3.1, 3.2, and 3.3, respectively. The following sections apply each previously described technique to the datasets being tested.

Results

The performance of the ANN classifier is measured with the rate or percentage of times correctly predicting the direction of the SPY for the next day. Table 3 includes four sections. The leftmost section lists twelve values; each of these values represents the number of principal components based on which one of the twelve new data sets with respect to each of the three dimensionality reduction methods is generated. Moreover, each of the twelve numbers is selected from Table 1 according to the

Trading simulation

After using the ANNs to predict the daily SPY direction, it is natural to carry out a trading simulation to see if the higher predictability implies higher profitability. Given that this research study is based on predicting the direction of S&P 500 ETF (SPY) daily returns, we modified the trading strategy for classification models defined by Enke and Thawornwong (2005) as follows:

IfUPt+1=1, fully invest in stocks or maintain, and receive the actual stock return for the day t+1 (i.e.,SPYt+1);

Conclusion

For this research a comprehensive and efficient daily direction of the stock market return forecasting process is presented. The process starts with data cleaning and data preprocessing, and concludes with an analysis of forecasting and simulation results. Often, researchers look to apply the simplest set of algorithms to the least amount of data with both the most accurate forecasting results and the highest risk-adjusted profits. To achieve this goal, three dimensionality reduction

References (70)

  • S.H. Chun et al.

    Data mining for financial prediction and trading: Application to single and multiple markets

    Expert Systems with Applications

    (2004)
  • D. Enke et al.

    The use of data mining and neural networks for forecasting stock market returns

    Expert Systems with Applications

    (2005)
  • P.H. Franses et al.

    Additive outliers, GARCH and forecasting volatility

    International Journal of Forecasting

    (1999)
  • E. Guresen et al.

    Using artificial neural network models in stock market index prediction

    Expert Systems with Applications

    (2011)
  • J.V. Hansen et al.

    Data mining of time series using stacked generalizers

    Neurocomputing

    (2002)
  • M. Jensen

    Some anomalous evidence regarding market efficiency

    Journal of Financial Economics

    (1978)
  • Y. Kara et al.

    Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul stock exchange

    Expert Systems with Applications

    (2011)
  • Y. Kim et al.

    Developing a rule change trading system for the futures market using rough set analysis

    Expert Systems with Applications

    (2016)
  • K.J. Kim et al.

    Genetic algorithms approach to feature discretization in artificial neural networks for the predication of stock price index

    Expert Systems with Applications

    (2000)
  • M. Lam

    Neural network techniques for financial performance prediction: Integrating fundamental and technical analysis

    Decision Support Systems

    (2004)
  • M.T. Leung et al.

    Forecasting stock indices: A comparison of classification and level estimation models

    International Journal of Forecasting

    (2000)
  • S.A. Monfared et al.

    Volatility forecasting using a hybrid GJR-GARCH neural network model

    Procedia Computer Science

    (2014)
  • N. O'Connor et al.

    A neural network approach to predicting stock exchange movements using external factors

    Knowledge-Based Systems

    (2006)
  • E. Oja et al.

    On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix

    Journal of Mathematical Analysis and Applications

    (1985)
  • J. Patel et al.

    Predicting stock market index using fusion of machine learning techniques

    Expert Systems with Applications

    (2015)
  • A.M. Rather et al.

    Recurrent neural network and a hybrid model for prediction of stock returns

    Expert Systems with Applications

    (2015)
  • N. Sarantis

    Nonlinearities, cyclical behavior and predictability in stock markets: International evidence

    International Journal of Forecasting

    (2001)
  • L. Shen et al.

    Applying rough sets to market timing decisions

    Decision Support Systems

    (2004)
  • S. Thawornwong et al.

    The adaptive selection of financial and economic variables for use with artificial neural networks

    Neurocomputing

    (2004)
  • M. Ture et al.

    Comparison of four different time series methods to forecast hepatitis A virus infection

    Expert Systems with Applications

    (2006)
  • B. Vanstone et al.

    An empirical methodology for developing stock market trading systems using artificial neural networks

    Expert Systems with Applications

    (2009)
  • A. Vellido et al.

    Segmentation of the on-line shopping market using neural networks

    Expert Systems with Applications

    (1999)
  • WangJ.Z. et al.

    Forecasting stock indices with back propagation neural network

    Expert Systems with Applications

    (2011)
  • T.N. Yang et al.

    Robust algorithms for principal component analysis

    Pattern Recognition Letters

    (1999)
  • L.A. Zadeh

    Fuzzy sets

    Information and Control

    (1965)
  • Cited by (291)

    View all citing articles on Scopus
    View full text