Forecasting daily stock market return using dimensionality reduction

doi:10.1016/j.eswa.2016.09.027

Expert Systems with Applications

Volume 67, January 2017, Pages 126-139

https://doi.org/10.1016/j.eswa.2016.09.027 Get rights and content

Highlights

•
A data mining procedure to forecast daily stock market return is proposed.
•
The raw data includes 60 financial and economic features over a 10-year period.
•
Combining ANNs with PCA gives slightly higher classification accuracy.
•
Combining ANNs with PCA provides significantly higher risk-adjusted profits.

Abstract

In financial markets, it is both important and challenging to forecast the daily direction of the stock market return. Among the few studies that focus on predicting daily stock market returns, the data mining procedures utilized are either incomplete or inefficient, especially when a large amount of features are involved. This paper presents a complete and efficient data mining process to forecast the daily direction of the S&P 500 Index ETF (SPY) return based on 60 financial and economic features. Three mature dimensionality reduction techniques, including principal component analysis (PCA), fuzzy robust principal component analysis (FRPCA), and kernel-based principal component analysis (KPCA) are applied to the whole data set to simplify and rearrange the original data structure. Corresponding to different levels of the dimensionality reduction, twelve new data sets are generated from the entire cleaned data using each of the three different dimensionality reduction methods. Artificial neural networks (ANNs) are then used with the thirty-six transformed data sets for classification to forecast the daily direction of future market returns. Moreover, the three different dimensionality reduction methods are compared with respect to the natural data set. A group of hypothesis tests are then performed over the classification and simulation results to show that combining the ANNs with the PCA gives slightly higher classification accuracy than the other two combinations, and that the trading strategies guided by the comprehensive classification mining procedures based on PCA and ANNs gain significantly higher risk-adjusted profits than the comparison benchmarks, while also being slightly higher than those strategies guided by the forecasts based on the FRPCA and KPCA models.

Section snippets

Introduction and methodology

Analyzing stock market movements is extremely challenging for both investors and researchers. This is mainly due to the stock market essentially being a dynamic, nonlinear, nonstationary, nonparametric, noisy, and chaotic system (Deboeck, 1994, Yaser and Atiya, 1996). In fact, stock markets are affected by many highly interrelated factors. These factors include: 1) economic variables, such as interest rates, exchange rates, monetary growth rates, commodity prices, and general economic

Data description

The data set utilized for this study involves the daily direction (UP or DOWN) of the closing price of the SPDR S&P 500 ETF (ticker: SPY) as the output, along with 60 financial and economic factors as the potential features. These daily data are collected from 2518 trading days between June 1, 2003 and May 31, 2013. The 60 potential features can be divided into 10 groups, including the SPY return for the current day and three previous days, the relative difference in percentage of the SPY

PCA

A number of linear or nonlinear techniques have been developed to embed high-dimensional data into a lower dimensional space without much loss of the information. Among them, PCA is the most popular unsupervised linear technique for dimensionality reduction. Jolliffe (1986) gives an authoritative and accessible account of this methodology. As one of the earliest multivariate techniques, PCA is aimed to construct a low-dimensional representation of the data while keeping the maximal variance and

The ANN classifiers

Artificial Neural Networks (ANNs) were invented to mimic the human brain by carefully defining and designing the network architecture, including the number of network layers, the types of connections among the network layers, the numbers of the neurons in each layer, the learning algorithm, the learning rate, weights between neurons, and the various neuron activation functions. ANNs function like a black box that can output prediction or classification results based on the input information.

Use PCA, FRPCA, and KPCA to reduce the dimensionality

Background modeling details for the PCA, FRPCA, and KPCA dimensionality reduction techniques are provided in Sections 3.1, 3.2, and 3.3, respectively. The following sections apply each previously described technique to the datasets being tested.

Results

The performance of the ANN classifier is measured with the rate or percentage of times correctly predicting the direction of the SPY for the next day. Table 3 includes four sections. The leftmost section lists twelve values; each of these values represents the number of principal components based on which one of the twelve new data sets with respect to each of the three dimensionality reduction methods is generated. Moreover, each of the twelve numbers is selected from Table 1 according to the

Trading simulation

After using the ANNs to predict the daily SPY direction, it is natural to carry out a trading simulation to see if the higher predictability implies higher profitability. Given that this research study is based on predicting the direction of S&P 500 ETF (SPY) daily returns, we modified the trading strategy for classification models defined by Enke and Thawornwong (2005) as follows:

If $U P_{t + 1} = 1$ , fully invest in stocks or maintain, and receive the actual stock return for the day $t + 1$ (i.e., $S P Y_{t + 1}$ );

Conclusion

For this research a comprehensive and efficient daily direction of the stock market return forecasting process is presented. The process starts with data cleaning and data preprocessing, and concludes with an analysis of forecasting and simulation results. Often, researchers look to apply the simplest set of algorithms to the least amount of data with both the most accurate forecasting results and the highest risk-adjusted profits. To achieve this goal, three dimensionality reduction

References (70)

G. Armano et al.
A hybrid genetic-neural architecture for stock indexes forecasting
Information Sciences
(2005)
D. Bao et al.
Intelligent stock trading system by turning point confirming and probabilistic reasoning
Expert Systems with Applications
(2008)
S. Barak et al.
Wrapper ANFIS-ICA method to do stock market timing and feature selection on the basis of Japanese Candlestick
Expert Systems with Applications
(2015)
Q. Cao et al.
A comparison between Fama and French's model and artificial neural networks in predicting the Chinese stock market
Computers & Operations Research
(2005)
R. Cervelló-Royo et al.
Stock market trading rule based on pattern recognition and technical analysis: Forecasting the DJIA index with intraday data
Expert Systems with Applications
(2015)
T. Chavarnakul et al.
Intelligent technical analysis based equivolume charting for stock trading using neural networks
Expert Systems with Applications
(2008)
ChenT.L. et al.
An intelligent pattern recognition model for supporting investment decisions in stock market
Information Sciences
(2016)
ChenA.S. et al.
Application of neural networks to an emerging financial market: Forecasting and trading the Taiwan stock index
Computers and Operations Research
(2003)
W.C. Chiang et al.
An adaptive stock index trading decision support system
Expert Systems with Applications
(2016)
K. Chourmouziadis et al.
An intelligent short term stock trading fuzzy system for assisting investors in portfolio management
Expert Systems with Applications
(2016)

S.H. Chun et al.

Data mining for financial prediction and trading: Application to single and multiple markets

Expert Systems with Applications

(2004)

D. Enke et al.

The use of data mining and neural networks for forecasting stock market returns

Expert Systems with Applications

(2005)

P.H. Franses et al.

Additive outliers, GARCH and forecasting volatility

International Journal of Forecasting

(1999)

E. Guresen et al.

Using artificial neural network models in stock market index prediction

Expert Systems with Applications

(2011)

J.V. Hansen et al.

Data mining of time series using stacked generalizers

Neurocomputing

(2002)

M. Jensen

Some anomalous evidence regarding market efficiency

Journal of Financial Economics

(1978)

Y. Kara et al.

Predicting direction of stock price index movement using artificial neural networks and support vector machines: The sample of the Istanbul stock exchange

Expert Systems with Applications

(2011)

Y. Kim et al.

Developing a rule change trading system for the futures market using rough set analysis

Expert Systems with Applications

(2016)

K.J. Kim et al.

Genetic algorithms approach to feature discretization in artificial neural networks for the predication of stock price index

Expert Systems with Applications

(2000)

M. Lam

Neural network techniques for financial performance prediction: Integrating fundamental and technical analysis

Decision Support Systems

(2004)

M.T. Leung et al.

Forecasting stock indices: A comparison of classification and level estimation models

International Journal of Forecasting

(2000)

S.A. Monfared et al.

Volatility forecasting using a hybrid GJR-GARCH neural network model

Procedia Computer Science

(2014)

N. O'Connor et al.

A neural network approach to predicting stock exchange movements using external factors

Knowledge-Based Systems

(2006)

E. Oja et al.

On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix

Journal of Mathematical Analysis and Applications

(1985)

J. Patel et al.

Predicting stock market index using fusion of machine learning techniques

Expert Systems with Applications

(2015)

A.M. Rather et al.

Recurrent neural network and a hybrid model for prediction of stock returns

Expert Systems with Applications

(2015)

N. Sarantis

Nonlinearities, cyclical behavior and predictability in stock markets: International evidence

International Journal of Forecasting

(2001)

L. Shen et al.

Applying rough sets to market timing decisions

Decision Support Systems

(2004)

S. Thawornwong et al.

The adaptive selection of financial and economic variables for use with artificial neural networks

Neurocomputing

(2004)

M. Ture et al.

Comparison of four different time series methods to forecast hepatitis A virus infection

Expert Systems with Applications

(2006)

B. Vanstone et al.

An empirical methodology for developing stock market trading systems using artificial neural networks

Expert Systems with Applications

(2009)

A. Vellido et al.

Segmentation of the on-line shopping market using neural networks

Expert Systems with Applications

(1999)

WangJ.Z. et al.

Forecasting stock indices with back propagation neural network

Expert Systems with Applications

(2011)

T.N. Yang et al.

Robust algorithms for principal component analysis

Pattern Recognition Letters

(1999)

L.A. Zadeh

Fuzzy sets

Information and Control

(1965)

Cited by (291)

Enhanced prediction of stock markets using a novel deep learning model PLSTM-TAL in urbanized smart cities
2024, Heliyon
Accurate predictions of stock markets are important for investors and other stakeholders of the equity markets to formulate profitable investment strategies. The improved accuracy of a prediction model even with a slight margin can translate into considerable monetary returns. However, the stock markets' prediction is regarded as an intricate research problem for the noise, complexity and volatility of the stocks' data. In recent years, the deep learning models have been successful in providing robust forecasts for sequential data. We propose a novel deep learning-based hybrid classification model by combining peephole LSTM with temporal attention layer (TAL) to accurately predict the direction of stock markets. The daily data of four world indices including those of U.S., U.K., China and India, from 2005 to 2022, are examined. We present a comprehensive evaluation with preliminary data analysis, feature extraction and hyperparameters' optimization for the problem of stock market prediction. TAL is introduced post peephole LSTM to select the relevant information with respect to time and enhance the performance of the proposed model. The prediction performance of the proposed model is compared with that of the benchmark models CNN, LSTM, SVM and RF using evaluation metrics of accuracy, precision, recall, F1-score, AUC-ROC, PR-AUC and MCC. The experimental results show the superior performance of our proposed model achieving better scores than the benchmark models for most evaluation metrics and for all datasets. The accuracy of the proposed model is 96% and 88% for U.K. and Chinese stock markets respectively and it is 85% for both U.S. and Indian markets. Hence, the stock markets of U.K. and China are found to be more predictable than those of U.S. and India. Significant findings of our work include that the attention layer enables peephole LSTM to better identify the long-term dependencies and temporal patterns in the stock markets' data. Profitable and timely trading strategies can be formulated based on our proposed prediction model.
Improving prediction efficiency of Chinese stock index futures intraday price by VIX-Lasso-GRU Model
2024, Expert Systems with Applications
With $T + 0$ and short selling mechanism, the stock index futures are attractive to short-term traders in China, where stocks cannot be liquidated within the day and are difficult to short. So in terms of futures, how to improve the accuracy and speed of intraday price forecasting always fascinates short-term traders and researchers. Here we propose a novel forecasting model, VIX-Lasso-GRU Model, which based on the gated recurrent unit (GRU) by adding VIX information and a method called Least absolute shrinkage and selection operator (Lasso). The volatility index (VIX) can reduce the prediction errors and the Lasso algorithm significantly improve the training speed of the model. We predict the 5-minute closing prices of three datasets of index futures by VIX-Lasso-GRU Model. Comparing to the pure GRU and LSTM, we find that this new prediction model can improve the prediction efficiency with faster speed and higher accuracy.
A novel distance-based moving average model for improvement in the predictive accuracy of financial time series
2024, Borsa Istanbul Review
Time-series forecasting is essential for system analysis. Many traditional studies have paid attention to individual stock-oriented solutions and disregarded general approaches on financial time series or skipped the dynamics of the system and its triggering components. It is difficult to fully adapt to evolving market conditions with stable financial indicators. For this reason, the proposed novel distance-based exponential moving-average (DBEMA) model is dynamically designed to overcome the changing conditions of financial time series. A novel distance-based moving-average feature model can produce an adaptive prediction approach for financial time series. To evaluate the impact of the novel proposed DBEMA features, they are compared to the features selected by recursive feature elimination using classification and regression trees among the financial indicators, using benchmark classification models. To confirm the performance of the proposed novel distance-based moving-average features, the forecasting results of the features are compared using linear regression, bagged trees regressor, Gaussian naive Bayes, k-nearest neighbors, random forests, multilayer perceptron, convolutional neural network, long short-term memory, gated recurrent unit, and relative strength index method benchmark models. The experimental analysis has shown that methods with our proposed novel DBEMA features has better forecasting accuracy with respect to the methods without DBEMA. Therefore, the proposed novel distance-based moving-average methodology designed for financial time-series analysis demonstrates that it guides a new perspective in nonlinear time-series trends.
Practical machine learning: Forecasting daily financial markets directions
2023, Expert Systems with Applications
Financial time series prediction has many applications in economics, but producing profitable strategies certainly has a special place among them, a daunting challenge. Statistical and machine learning techniques are intensively researched in the search for a holy grail of stock markets forecasting. However, it is not clear to prospecting researchers how good those popular models are regarding useful predictions on a real scenario. This paper contributes to that discussion, providing decisive evidences contrary to the use of basic out-of-the-box models, specifically Artificial Neural Networks (ANN), Support Vector Machines (SVM), Random Forest (RF) and Naive-Bayes (NB). Results consider optimistic and unreal variables often found in literature, as well as a more close-to-real simulation of the models usage. Specifically, current day closing prices direction forecasting results are contrasted with those on next day forecasts. As expected, when forecasting the current day, accuracy is almost perfect. However, when used to forecast next day closing direction, with a strict data separation policy and without direction or snooping bias, ANN, SVM, RF and NB produce results essentially equal to random guessing. The main achieved result is the demonstration of how a machine learning approach would fare in a support decision system for forecasting short-term future market direction, regardless of the level of market development, considering more than 100 securities in a 10 years period. Consequences for algorithmic trading relate to discouraging usage of the considered models as implemented here. On a more abstract sense, this paper presents more evidence to the Efficient Market Hypothesis (EMH).
Market growth strategies for sustainable smart farm: A correlation and causal relationship approach
2023, Developments in the Built Environment
Smart farms are integral to agriculture, evolving with technology while increasing energy consumption. Addressing issues like data scarcity is vital for smart farm-market growth. This study identified these issues and conducted an analysis of both correlation and causal relationship. Additionally, it created a growth strategy roadmap for the smart farm-market, tailored to stakeholders. To achieve carbon neutrality, government support, energy self-sufficiency tech, and energy reduction efforts are essential. These findings offer valuable guidance for policy formulation and operational planning, aiding the promotion of smart farm-market growth with sustainability in mind. In summary, smart farms are evolving alongside technology but face energy challenges, data issues, and the need for supportive policies and sustainable practices, all of which this study addresses to foster growth.
Co-evolution of neural architectures and features for stock market forecasting: A multi-objective decision perspective
2023, Decision Support Systems
In a multi-objective setting, a portfolio manager’s highly consequential decisions can benefit from assessing alternative forecasting models of stock index movement. The present investigation proposes a new approach to identify a set of non-dominated neural network models for further selection by the decision-maker. A new co-evolution approach is proposed to simultaneously select the features and topology of neural networks (collectively referred to as neural architecture), where the features are viewed from a topological perspective as input neurons. Further, the co-evolution is posed as a multi-criteria problem to evolve sparse and efficacious neural architectures. The well-known dominance and decomposition based multi-objective evolutionary algorithms are augmented with a non-geometric crossover operator to diversify and balance the search for neural architectures across conflicting criteria. Moreover, the co-evolution is augmented to accommodate the data-based implications of distinct market behaviors prior to and during the ongoing COVID-19 pandemic. A detailed comparative evaluation is carried out with the conventional sequential approach of feature selection followed by neural topology design, as well as a scalarized co-evolution approach. The results on three market indices (NASDAQ, NYSE, and S&P500) in pre- and peri-COVID time windows convincingly demonstrate that the proposed co-evolution approach can evolve a set of non-dominated neural forecasting models with better generalization capabilities.

View all citing articles on Scopus

View full text

Forecasting daily stock market return using dimensionality reduction

Highlights

Abstract

Section snippets

Introduction and methodology

Data description

PCA

The ANN classifiers

Use PCA, FRPCA, and KPCA to reduce the dimensionality

Results

Trading simulation

Conclusion

Information Sciences

Expert Systems with Applications

Expert Systems with Applications

Computers & Operations Research

Expert Systems with Applications

Expert Systems with Applications

Information Sciences

Computers and Operations Research

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

International Journal of Forecasting

Expert Systems with Applications

Neurocomputing

Journal of Financial Economics

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Decision Support Systems

International Journal of Forecasting

Procedia Computer Science

Knowledge-Based Systems

Journal of Mathematical Analysis and Applications

Expert Systems with Applications

Expert Systems with Applications

International Journal of Forecasting

Decision Support Systems

Neurocomputing

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Expert Systems with Applications

Pattern Recognition Letters

Information and Control