A neural network based linear ensemble framework for time series forecasting
Introduction
Combining forecasts from different time series models started in the late sixties with the pioneering work of Bates and Granger [1] and since then this approach has been extensively analyzed in the forecasting literature. The combining methodology provides a much better alternative to using only a single model for performing the forecasts [2], [3]. The default technique of forecasting is to test several potential models on the in-sample dataset and then select the best among them for generating the desired out-of-sample forecasts. Despite being the most intuitive, this approach of forecasting has a number of serious limitations. First, a time series seldom has the independence and identical distribution (i.i.d.) property that is a fundamental requirement of realistic statistical processes [4]. As a result, the model that performed best for the in-sample dataset might not always provide the best forecasts for the unseen future values. Second, a forecasting model is specific to the nature of the time series, i.e. whether the series is generated from a linear or nonlinear process, follows stationary or nonstationary distribution, contains trend, seasonal, or cyclical patterns, etc. Estimating the exact nature requires large number of historical observations, but in practice there is only a very small sample of available data and so the fitted model may be inappropriate. Third, a time series is a dynamical process that keeps on changing continuously with high degree of uncertainty and may even exhibit regime switches. This jeopardizes the validity of the forecasting model when new observations are added to the available data. Finally, a particular model is always prone to faulty assumptions, implementation biases and errors in parameter estimation, which considerably affect the desired forecasts [5]. These discouraging facts of single modeling approach motivated the exploration of various forecast combination techniques. A combination of forecasts benefits from the inter-model diversities, mitigates the risks of using an isolated model, and compensates the drawbacks of the individual models. It has been observed in numerous studies that a combination of multiple forecasts improves the forecasting accuracy to a large extent and often comes out as the superior to each constituent model [6], [7].
During the past few decades, there has been an overwhelming interest in combining time series forecasts that consequently has led to the development of a large number of forecast combination techniques. A majority of them form a weighted linear combination of the component forecasts. The statistical averaging techniques, e.g. simple average, trimmed mean, Winsorized mean, median, etc. are the most basic ensemble methods, as they do not explicitly determine the combining weights. Many studies found that these fairly simple methods reasonably outperformed a number of more advanced combining schemes [8], [9], [10]. Jose and Winkler [8] have meticulously studied the performances of these four statistical ensemble techniques in combining forecasts for the widely popular M3 competition datasets [6]. The outcome of their research was that both trimmed as well as Winsorized mean are competent alternatives to simple average and median in combining forecasts. However, one major downside of these methods is that they do not consider the relative performances of the individual models and are mainly suitable when the component forecasts have comparable accuracies. There are various other sophisticated linear combination methods, which assign weights to the constituent forecasts on the basis of the past forecasting records of the respective models. A common approach is to select each individual weight to be the normalized unbiased inverse of the in-sample absolute forecasting error of the respective model. This scheme follows the intuitive notion that a model with more error should get less weight and vice versa. Granger and Ramanathan [11] interpreted the forecast combination methodology in a regression framework, where the time series observations and the individual forecasts are considered to be the dependent and explanatory variables, respectively. The combining weights are then determined through the Ordinary Least Square (OLS) regression method. Later, Aksu and Gunter [12] in their extensive study found that the performance of the OLS method is reasonably improved through considering unbiased weights. A well-known alternative to the OLS technique is the minimum-variance method [13] that determines the weights by minimizing the variance of the combined forecast error. The outperformance method, proposed by Bunn [14] adopts a Bayesian framework of subjective probabilities to assign the combining weights. It assumes that the weight to each component forecast is the probability that the respective model will outperform the others, i.e. produce the least error in the next trial. It is a robust non-parametric approach that is found to achieve reasonably good accuracy when there are few historical data or the weights are subjective to expert judgment [10].
Over the past decade, a lot of research interests has also been observed in forming homogeneous ensemble of Artificial Neural Networks (ANNs). Such a framework combines several ANNs with varying architectures in order to achieve better overall accuracy. Evidently, an ensemble of ANNs is beneficial only when there is a considerable amount of disagreements among the ANN outputs. Krogh and Vedelsby [15] formulated a method that relates this disagreement, termed as the ensemble ambiguity and the generalization error to construct successful ANN ensembles. In another important work, Zhou et al. [16] minutely analyzed the relationship between the ensemble accuracy and the constituent neural networks. They employed genetic algorithm to develop a method that selectively combines a class of neural networks from the available choices. Their work found that it is a better strategy to selectively combine many neural networks, instead of all. An intensive review on combining forecasts was provided by Clemen [17] that is little old now, but still very helpful in summarizing the development of the subject. Some good recent reviews in this domain are the works of Timmermann [7], De Gooijer and Hyndman [9], De Menezes et al. [10], and Lemke and Gabrys [18].
The majority of the works on combining forecasts are devoted to determine the optimal weights through minimizing the Sum of Squared Error (SSE), calculated from the original dataset and its combined forecast. However, the actual SSE is unknown in advance and hence must be estimated from the available in-sample observations. The most common resolution is to estimate the unknown SSE from some in-sample training and validation sets. But, in this approach the determined weights are biased to the particular validation dataset and so the combined forecasts may be inappropriate if there are significant dissimilarities among the in-sample validation and future observations. Furthermore, the weights are highly sensitive to the changes in the validation dataset and so the obtained combined forecasts are often quite unstable. There are some sophisticated combining schemes in the literature those mitigate this problem to some extent. Among them, the Recursive Least Squares (RLS) [19], [20] is quite well-known that recursively updates the least squares weights with addition of new observations. The common variants of this method are the dynamic RLS and covariance addition RLS (RLS-CA), which actually belong to the broad class of Kalman filtering algorithms [19], [20]. But, these methods are not straightforward to use and are also computationally quite expensive, which overshadow the achieved accuracy improvements through them. Bunn׳s outperformance method also imparts somewhat dynamic nature to weight estimation through performing a number of validation trials on the in-sample dataset. However, there are disputes regarding the interpretation, appropriateness, and validity of the associated outperformance probabilities [17], [21].
This paper proposes a new linear combination method for time series forecasting in which the weights are determined from the forecasting results of the individual models on several in-sample datasets. The models are successively applied for a number of in-sample forecasting trials and weights are assigned to them on the basis of their obtained absolute errors, in an inversely proportionate manner. A set of in-sample weights of all models is formed and then an ANN is fitted to this set. To recognize the in-sample weight pattern, we use an ANN because of its flexible, data-driven, and model-free structure with remarkably good learning and generalization ability. Thus, the proposed approach interprets the in-sample set of weights as a new time series, whose inherent pattern is identified and learned through a novel ANN model. The desired combining weights are then predicted from this fitted ANN. In this manner, the distinguished learning and generalization ability of ANN is utilized to dynamically determine the combining weights from several past forecasting records of the component models. The effectiveness of the proposed combination method is tested with four individual models on eight real-world time series. The forecasting performances of the proposed method are compared with the individual models as well as other popular conventional linear combination schemes in terms of two well-known absolute error measures.
The remainder of the paper is organized as follows. Section 2 briefly describes the individual forecasting models and Section 3 explains the linear forecast combination methodology. Our proposed combination algorithm is described in Section 4. Section 5 reports the empirical results, followed by summary and conclusions in Section 6.
Section snippets
The time series forecasting models
A time series is a sequential collection of observations, recorded at consecutive time periods. Univariate time series with discrete values are most widely studied in the literature, where such a series is represented as , yt being the observation at time t. Time series forecasting is the projection of desired number of future values through some mathematical model. In usual forecasting paradigm, an appropriate model is identified from a particular class and it is then used to
Linear combination of time series forecasts
A linear ensemble is the most common approach of combining multiple forecasts [19], [10]. For the dataset and its n forecasts , the forecasts from a linear combination are given byThe terms are the combining weights, which are often assumed to be nonnegative, i.e. and unbiased, i.e. . The introduction or non-introduction of the constant term w0 has notable impact on
The proposed linear combination method
We propose a method for linearly combining time series forecasts that attempts to determine the combining weights after analyzing their patterns in successive in-sample forecasting trials. The proposed approach can be divided into the following three phases. First, the individual models are applied to consecutive in-sample forecasting trials and weights are assigned to them on the basis of their inverse absolute forecasting errors. Then, a set of weights is formed that is a sequential
Empirical results and discussions
This section presents the empirical works, performed to study the effectiveness of the proposed linear combination method. Eight discrete time series, representing real-world phenomena, are used in this study. All the raw series are available at the Time Series Data Library (TSDL) [48], an open repository of a wide collection of time series datasets. These eight time series are (1) Lynx contains the annual number of lynx trapped in the Mackenzie River district of Northern Canada from 1821 to
Conclusions
Time series analysis and forecasting is a dynamic research area, having fundamental importance in numerous practical fields. Improving accuracy of time series forecasts is a challenging task that has been gaining continuous research attentions from past few decades. Extensive works have been performed on combining forecasts from several time series models with the general conclusion that this practice improves the forecasting accuracy to a large extent. Moreover, a forecast combination method
Acknowledgements
The author is very much thankful to the anonymous reviewers for their constructive suggestions which significantly facilitated the improvement of this paper. In addition, the author also expresses his profound gratitude to the Council of Scientific and Industrial Research (CSIR), India, for the obtained financial support that provided a great help in performing the present research work.
Ratnadip Adhikari received his B.Sc. degree with Mathematics Honors from Assam University, Silchar, India in 2004 and M.Sc. in applied mathematics from Indian Institute of Technology, Roorkee, India in 2006. After that he obtained M. Tech in computer science and technology and Ph.D. in computer science, both from Jawaharlal Nehru University, New Delhi, India in 2009 and 2014, respectively. He was awarded the best graduate from Assam University in 2004 and subsequently received Gold Medal for
References (51)
- et al.
Forecast combinations of computational intelligence and linear models for the NN5 time series forecasting competition
Int. J. Forecast.
(2011) - et al.
To combine or not to combineselecting among forecasts and their combinations
Int. J. Forecast.
(2005) - et al.
A novel neural network ensemble architecture for time series forecasting
Neurocomputing
(2011) - et al.
The M3-Competitionresults, conclusions and implications
Int. J. Forecast.
(2000) Forecast combinations
Handb. Econ. Forecast.
(2006)- et al.
Simple robust averages of forecasts: some empirical results
Int. J. Forecast.
(2008) - et al.
25 years of time series forecasting
Int. J. Forecast.
(2006) - et al.
Review of guidelines for the use of combined forecasts
Eur. J. Oper. Res.
(2000) - et al.
An empirical analysis of the accuracy of SA, ERLS and NRLS combination forecasts
Int. J. Forecast.
(1992) - et al.
Ensembling neural networks: many could be better than all
Artif. Intell.
(2002)
Combining forecasts: a review and annotated bibliography
Int. J. Forecast.
Meta-learning for time series forecasting and forecast combination
Neurocomputing
Model combination in neural-based forecasting
Eur. J. Oper. Res.
Recursive estimation in econometrics
Comput. Stat. Data Anal.
Forecasting with artificial neural networks: the state of the art
Int. J. Forecast.
Time series forecasting using a hybrid ARIMA and neural network model
Neurocomputing
Finding structure in time
Cognit. Sci.
Improving artificial neural networks׳ performance in seasonal time series forecasting
Inf. Sci.
Multilayer feedforward networks are universal approximators
Neural Netw.
Extended Kalman filter-based Elman networks for industrial time series prediction with GPU acceleration
Neurocomputing
A neural network ensemble method with jittered training data for time series forecasting
Inf. Sci.
The effect of sample size and variability of data on the comparative performance of artificial neural networks and regression
Comput. Oper. Res.
The combination of forecasts
Oper. Res. Q.
Improved methods of combining forecasts
J. Forecast.
Cited by (107)
Optimal forecast combination based on PSO-CS approach for daily agricultural future prices forecasting
2023, Applied Soft ComputingOutliers in financial time series data: Outliers, margin debt, and economic recession
2022, Machine Learning with ApplicationsMINE: A framework for dynamic regressor selection
2021, Information Sciences
Ratnadip Adhikari received his B.Sc. degree with Mathematics Honors from Assam University, Silchar, India in 2004 and M.Sc. in applied mathematics from Indian Institute of Technology, Roorkee, India in 2006. After that he obtained M. Tech in computer science and technology and Ph.D. in computer science, both from Jawaharlal Nehru University, New Delhi, India in 2009 and 2014, respectively. He was awarded the best graduate from Assam University in 2004 and subsequently received Gold Medal for this achievement. At present, he is working as an Assistant Professor in the Computer Science & Engineering (CSE) Department of the LNM Institute of Information Technology (LNMIIT), Jaipur, Rajasthan, India. His primary research interests include Pattern recognition, time series forecasting, data stream classification, and hybrid modeling. His research works are published in various reputed international journals and conferences. He has attended a number of conferences and workshops throughout his academic career.