Elsevier

Neurocomputing

Volume 157, 1 June 2015, Pages 231-242
Neurocomputing

A neural network based linear ensemble framework for time series forecasting

https://doi.org/10.1016/j.neucom.2015.01.012Get rights and content

Abstract

Combining time series forecasts from several models is a fruitful alternative to using only a single individual model. In the literature, it has been widely documented that a combined forecast improves the overall accuracy to a great extent and is often better than the forecast of each component model. The accuracy of a linear combination of forecasts primarily depends on the associated combining weights. Despite extensive research in this direction, finding out the most appropriate weights is still very challenging. This paper proposes a linear combination method for time series forecasting that determines the combining weights through a novel neural network structure. The designed neural network successively recognizes the weight patterns of the constituent models from their past forecasting records and then predicts the desired set of the combining weights. Empirical results from eight real-world time series show that our approach provides significantly better forecasting accuracies than the component models and other well recognized linear combination schemes. These findings are also verified through ranking methods and a non-parametric statistical test.

Introduction

Combining forecasts from different time series models started in the late sixties with the pioneering work of Bates and Granger [1] and since then this approach has been extensively analyzed in the forecasting literature. The combining methodology provides a much better alternative to using only a single model for performing the forecasts [2], [3]. The default technique of forecasting is to test several potential models on the in-sample dataset and then select the best among them for generating the desired out-of-sample forecasts. Despite being the most intuitive, this approach of forecasting has a number of serious limitations. First, a time series seldom has the independence and identical distribution (i.i.d.) property that is a fundamental requirement of realistic statistical processes [4]. As a result, the model that performed best for the in-sample dataset might not always provide the best forecasts for the unseen future values. Second, a forecasting model is specific to the nature of the time series, i.e. whether the series is generated from a linear or nonlinear process, follows stationary or nonstationary distribution, contains trend, seasonal, or cyclical patterns, etc. Estimating the exact nature requires large number of historical observations, but in practice there is only a very small sample of available data and so the fitted model may be inappropriate. Third, a time series is a dynamical process that keeps on changing continuously with high degree of uncertainty and may even exhibit regime switches. This jeopardizes the validity of the forecasting model when new observations are added to the available data. Finally, a particular model is always prone to faulty assumptions, implementation biases and errors in parameter estimation, which considerably affect the desired forecasts [5]. These discouraging facts of single modeling approach motivated the exploration of various forecast combination techniques. A combination of forecasts benefits from the inter-model diversities, mitigates the risks of using an isolated model, and compensates the drawbacks of the individual models. It has been observed in numerous studies that a combination of multiple forecasts improves the forecasting accuracy to a large extent and often comes out as the superior to each constituent model [6], [7].

During the past few decades, there has been an overwhelming interest in combining time series forecasts that consequently has led to the development of a large number of forecast combination techniques. A majority of them form a weighted linear combination of the component forecasts. The statistical averaging techniques, e.g. simple average, trimmed mean, Winsorized mean, median, etc. are the most basic ensemble methods, as they do not explicitly determine the combining weights. Many studies found that these fairly simple methods reasonably outperformed a number of more advanced combining schemes [8], [9], [10]. Jose and Winkler [8] have meticulously studied the performances of these four statistical ensemble techniques in combining forecasts for the widely popular M3 competition datasets [6]. The outcome of their research was that both trimmed as well as Winsorized mean are competent alternatives to simple average and median in combining forecasts. However, one major downside of these methods is that they do not consider the relative performances of the individual models and are mainly suitable when the component forecasts have comparable accuracies. There are various other sophisticated linear combination methods, which assign weights to the constituent forecasts on the basis of the past forecasting records of the respective models. A common approach is to select each individual weight to be the normalized unbiased inverse of the in-sample absolute forecasting error of the respective model. This scheme follows the intuitive notion that a model with more error should get less weight and vice versa. Granger and Ramanathan [11] interpreted the forecast combination methodology in a regression framework, where the time series observations and the individual forecasts are considered to be the dependent and explanatory variables, respectively. The combining weights are then determined through the Ordinary Least Square (OLS) regression method. Later, Aksu and Gunter [12] in their extensive study found that the performance of the OLS method is reasonably improved through considering unbiased weights. A well-known alternative to the OLS technique is the minimum-variance method [13] that determines the weights by minimizing the variance of the combined forecast error. The outperformance method, proposed by Bunn [14] adopts a Bayesian framework of subjective probabilities to assign the combining weights. It assumes that the weight to each component forecast is the probability that the respective model will outperform the others, i.e. produce the least error in the next trial. It is a robust non-parametric approach that is found to achieve reasonably good accuracy when there are few historical data or the weights are subjective to expert judgment [10].

Over the past decade, a lot of research interests has also been observed in forming homogeneous ensemble of Artificial Neural Networks (ANNs). Such a framework combines several ANNs with varying architectures in order to achieve better overall accuracy. Evidently, an ensemble of ANNs is beneficial only when there is a considerable amount of disagreements among the ANN outputs. Krogh and Vedelsby [15] formulated a method that relates this disagreement, termed as the ensemble ambiguity and the generalization error to construct successful ANN ensembles. In another important work, Zhou et al. [16] minutely analyzed the relationship between the ensemble accuracy and the constituent neural networks. They employed genetic algorithm to develop a method that selectively combines a class of neural networks from the available choices. Their work found that it is a better strategy to selectively combine many neural networks, instead of all. An intensive review on combining forecasts was provided by Clemen [17] that is little old now, but still very helpful in summarizing the development of the subject. Some good recent reviews in this domain are the works of Timmermann [7], De Gooijer and Hyndman [9], De Menezes et al. [10], and Lemke and Gabrys [18].

The majority of the works on combining forecasts are devoted to determine the optimal weights through minimizing the Sum of Squared Error (SSE), calculated from the original dataset and its combined forecast. However, the actual SSE is unknown in advance and hence must be estimated from the available in-sample observations. The most common resolution is to estimate the unknown SSE from some in-sample training and validation sets. But, in this approach the determined weights are biased to the particular validation dataset and so the combined forecasts may be inappropriate if there are significant dissimilarities among the in-sample validation and future observations. Furthermore, the weights are highly sensitive to the changes in the validation dataset and so the obtained combined forecasts are often quite unstable. There are some sophisticated combining schemes in the literature those mitigate this problem to some extent. Among them, the Recursive Least Squares (RLS) [19], [20] is quite well-known that recursively updates the least squares weights with addition of new observations. The common variants of this method are the dynamic RLS and covariance addition RLS (RLS-CA), which actually belong to the broad class of Kalman filtering algorithms [19], [20]. But, these methods are not straightforward to use and are also computationally quite expensive, which overshadow the achieved accuracy improvements through them. Bunn׳s outperformance method also imparts somewhat dynamic nature to weight estimation through performing a number of validation trials on the in-sample dataset. However, there are disputes regarding the interpretation, appropriateness, and validity of the associated outperformance probabilities [17], [21].

This paper proposes a new linear combination method for time series forecasting in which the weights are determined from the forecasting results of the individual models on several in-sample datasets. The models are successively applied for a number of in-sample forecasting trials and weights are assigned to them on the basis of their obtained absolute errors, in an inversely proportionate manner. A set of in-sample weights of all models is formed and then an ANN is fitted to this set. To recognize the in-sample weight pattern, we use an ANN because of its flexible, data-driven, and model-free structure with remarkably good learning and generalization ability. Thus, the proposed approach interprets the in-sample set of weights as a new time series, whose inherent pattern is identified and learned through a novel ANN model. The desired combining weights are then predicted from this fitted ANN. In this manner, the distinguished learning and generalization ability of ANN is utilized to dynamically determine the combining weights from several past forecasting records of the component models. The effectiveness of the proposed combination method is tested with four individual models on eight real-world time series. The forecasting performances of the proposed method are compared with the individual models as well as other popular conventional linear combination schemes in terms of two well-known absolute error measures.

The remainder of the paper is organized as follows. Section 2 briefly describes the individual forecasting models and Section 3 explains the linear forecast combination methodology. Our proposed combination algorithm is described in Section 4. Section 5 reports the empirical results, followed by summary and conclusions in Section 6.

Section snippets

The time series forecasting models

A time series is a sequential collection of observations, recorded at consecutive time periods. Univariate time series with discrete values are most widely studied in the literature, where such a series is represented as Y=[y1,y2,,yN]T, yt being the observation at time t. Time series forecasting is the projection of desired number of future values through some mathematical model. In usual forecasting paradigm, an appropriate model is identified from a particular class and it is then used to

Linear combination of time series forecasts

A linear ensemble is the most common approach of combining multiple forecasts [19], [10]. For the dataset Y=[y1,y2,,yN]T and its n forecasts Y^(i)=[y^1(i),y^2(i),,y^N(i)]T (i=1,2,,n), the forecasts from a linear combination are given byy^k=w0+w1y^k1+w2y^k2++wny^knk=1,2,,N.The terms wi(i=1,2,,n) are the combining weights, which are often assumed to be nonnegative, i.e. wi0i and unbiased, i.e. i=0nwi=1. The introduction or non-introduction of the constant term w0 has notable impact on

The proposed linear combination method

We propose a method for linearly combining time series forecasts that attempts to determine the combining weights after analyzing their patterns in successive in-sample forecasting trials. The proposed approach can be divided into the following three phases. First, the individual models are applied to consecutive in-sample forecasting trials and weights are assigned to them on the basis of their inverse absolute forecasting errors. Then, a set of weights is formed that is a sequential

Empirical results and discussions

This section presents the empirical works, performed to study the effectiveness of the proposed linear combination method. Eight discrete time series, representing real-world phenomena, are used in this study. All the raw series are available at the Time Series Data Library (TSDL) [48], an open repository of a wide collection of time series datasets. These eight time series are (1) Lynx contains the annual number of lynx trapped in the Mackenzie River district of Northern Canada from 1821 to

Conclusions

Time series analysis and forecasting is a dynamic research area, having fundamental importance in numerous practical fields. Improving accuracy of time series forecasts is a challenging task that has been gaining continuous research attentions from past few decades. Extensive works have been performed on combining forecasts from several time series models with the general conclusion that this practice improves the forecasting accuracy to a large extent. Moreover, a forecast combination method

Acknowledgements

The author is very much thankful to the anonymous reviewers for their constructive suggestions which significantly facilitated the improvement of this paper. In addition, the author also expresses his profound gratitude to the Council of Scientific and Industrial Research (CSIR), India, for the obtained financial support that provided a great help in performing the present research work.

Ratnadip Adhikari received his B.Sc. degree with Mathematics Honors from Assam University, Silchar, India in 2004 and M.Sc. in applied mathematics from Indian Institute of Technology, Roorkee, India in 2006. After that he obtained M. Tech in computer science and technology and Ph.D. in computer science, both from Jawaharlal Nehru University, New Delhi, India in 2009 and 2014, respectively. He was awarded the best graduate from Assam University in 2004 and subsequently received Gold Medal for

References (51)

  • R.T. Clemen

    Combining forecasts: a review and annotated bibliography

    Int. J. Forecast.

    (1989)
  • C. Lemke et al.

    Meta-learning for time series forecasting and forecast combination

    Neurocomputing

    (2010)
  • P.S. Freitas et al.

    Model combination in neural-based forecasting

    Eur. J. Oper. Res.

    (2006)
  • D. Pollock

    Recursive estimation in econometrics

    Comput. Stat. Data Anal.

    (2003)
  • G. Zhang et al.

    Forecasting with artificial neural networks: the state of the art

    Int. J. Forecast.

    (1998)
  • G.P. Zhang

    Time series forecasting using a hybrid ARIMA and neural network model

    Neurocomputing

    (2003)
  • J.L. Elman

    Finding structure in time

    Cognit. Sci.

    (1990)
  • C. Hamzaçebi

    Improving artificial neural networks׳ performance in seasonal time series forecasting

    Inf. Sci.

    (2008)
  • K. Hornik et al.

    Multilayer feedforward networks are universal approximators

    Neural Netw.

    (1989)
  • J. Zhao et al.

    Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration

    Neurocomputing

    (2013)
  • G.P. Zhang

    A neural network ensemble method with jittered training data for time series forecasting

    Inf. Sci.

    (2007)
  • I.S. Markham et al.

    The effect of sample size and variability of data on the comparative performance of artificial neural networks and regression

    Comput. Oper. Res.

    (1998)
  • J.M. Bates et al.

    The combination of forecasts

    Oper. Res. Q.

    (1969)
  • J.S. Armstrong, Principles of Forecasting: A Handbook for Researchers and Practitioners, vol. 30, Kluwer Academic...
  • C.W.J. Granger et al.

    Improved methods of combining forecasts

    J. Forecast.

    (1984)
  • Cited by (107)

    View all citing articles on Scopus

    Ratnadip Adhikari received his B.Sc. degree with Mathematics Honors from Assam University, Silchar, India in 2004 and M.Sc. in applied mathematics from Indian Institute of Technology, Roorkee, India in 2006. After that he obtained M. Tech in computer science and technology and Ph.D. in computer science, both from Jawaharlal Nehru University, New Delhi, India in 2009 and 2014, respectively. He was awarded the best graduate from Assam University in 2004 and subsequently received Gold Medal for this achievement. At present, he is working as an Assistant Professor in the Computer Science & Engineering (CSE) Department of the LNM Institute of Information Technology (LNMIIT), Jaipur, Rajasthan, India. His primary research interests include Pattern recognition, time series forecasting, data stream classification, and hybrid modeling. His research works are published in various reputed international journals and conferences. He has attended a number of conferences and workshops throughout his academic career.

    View full text