Model combination in neural-based forecasting

doi:10.1016/j.ejor.2005.06.057

European Journal of Operational Research

Volume 173, Issue 3, 16 September 2006, Pages 801-814

https://doi.org/10.1016/j.ejor.2005.06.057 Get rights and content

Abstract

This paper discusses different ways of combining neural predictive models or neural-based forecasts. The proposed approaches consider Gaussian radial basis function networks, which can be efficiently identified and estimated through recursive/adaptive methods. The usual framework for linearly combining estimates from different models is extended, to cope with the case where the forecasting errors from those models are correlated. A prefiltering methodology is proposed, addressing the problems raised by heavily nonstationary time series. Moreover, the paper discusses two approaches for decision-making from forecasting models: either inferring decisions from combined predictive estimates, or combining prescriptive solutions derived from different forecasting models.

Introduction

Time series forecasting is a common goal in data mining applications, where most often the recorded data is indexed in time, and the variables or attributes have distributional properties and correlation effects that are nonstationary in time. Without sacrificing predictive accuracy, the models to be used should be not too complex, should be both flexible and robust, and the methods to estimate those models should be efficient. Therefore, it is convenient to depart from the classic point of view of identifying a single, “clearly best” model, which might require a high computational burden for its identification and optimization.

Most references in the literature on neural-based forecasting follow that traditional paradigm, usually referring to the application of multilayer perceptrons (see [28] for a review). These are highly nonlinear models requiring optimization in a high-dimensional space of a nonlinear least-squares cost function, inevitably with many local optima (see [18] for a comprehensive introduction to the optimization of neural networks). Parameter updating in those models, given new data, is cumbersome and, most importantly, their direct application to nonstationary data is inadequate (see [23], [24]).

This paper seeks to discuss alternative approaches where, while still using supervised neural models, one can achieve good, if not better forecasting performance, through efficient recursive estimation and adaptive identification methods. All the parametric predictive models here proposed—neural or otherwise—are based on time-varying linear parameters that can be efficiently estimated in a recursive manner. Furthermore, we concur with the viewpoint that two or more suboptimal models, linearly composed or linearly combined, may, in general, constitute a better alternative to the optimization of a single neural model in terms of predictive accuracy, efficiency and robustness.

In most of the literature about time series forecasting and neural supervised learning, a classic paradigm is followed, where all effort is directed to the identification and estimation of a single model, in some sense optimal within a class of many possible models, different in structure, in size or in parameterization. The rationale behind this paradigm is the assumption that a “best” model can be conveniently identified for a given problem.

In real-world problems, the true model is likely to be unknown and some choices and assumptions have to be made such that the problem under study can be acceptably modelled and the underlying optimization problem solved. During this process, there are some issues that might be hard to sort out, such as choosing appropriate model selection criteria. In particular, if the chosen model is too complex or overparameterized, it can learn the noise intrinsic to the data, thus causing poor generalization performance, i.e., producing poor results when applied to new data (see, e.g., [3]).

As alternatives to the classic paradigm, several approaches have been proposed, where multiple models are explored and combined. This tends to minimize the implicit risk in taking into account just one model, even it is optimized, and that is limited in its capabilities with respect to the characteristics of the data and to the problem itself.

One may identify three fundamental ways of combining models, to yield a final estimate:

•
model mixing, where the chosen estimate is computed as a combination of estimates from different models;
•
model synthesis, where the chosen estimate is obtained from the (linear) combination of different, partial models, estimated in conjunction or in sequence; and,
•
model switching, where the chosen estimate is selected from the estimates of different models.

The main goal of the present work is to propose some guidelines for combining neural models or neural-based forecasts. While most of the optimization problems reported here are related to time series forecasting, some may be adapted to other problems, such as classification or clustering. The main models used in the paper are Gaussian radial basis function networks, a type of supervised neural networks that may be conveniently used as filtering models, and not just as strictly regressive models (Section 2). We propose some new ideas of using the model mixing approach (Section 3), provided the data is reasonably stationary, or the model synthesis approach (Section 4), when the data is clearly nonstationary. As prediction is just a means for supporting decision-making, in Section 5 we discuss whether one should combine the predictive models or the prescriptive ones.

Section snippets

Model specification

Supervised neural networks can be used as nonlinear autoregressive models, with the input patterns, x_k, built from sequences of observations of a time series: $x_{k} = {[\begin{matrix} y_{k - p} & \dots & y_{k - 2} & y_{k - 1} \end{matrix}]}^{T} .$ The network outputs are viewed as predicted estimates, in particular, one-step-ahead forecasts, ${\hat{y}}_{k} \equiv {\hat{y}}_{k ∣ k - 1}$ . This scheme can be easily adapted to longer horizons or to hybrid—causal and autoregressive—models.

The one-step-ahead forecasting errors are defined as $e_{k} \equiv e_{k ∣ k - 1} = y_{k} - {\hat{y}}_{k}$ and are used to optimize the model, for a given

The combination of estimates

Probably, the most common approach to model combination is model mixing, where one combines the estimates produced by the individual models through weights. The first studies go back to Bates and Granger [2], and possibly others, who considered the linear combination of two different forecasting models. This approach was later extended to more models in [11], [20], [26]. Since then, many contributions have emerged on the linear combination of supervised neural models, including [15], [21] for

Asymmetrical costs

Prediction is just a means—albeit a very important one—of supporting decision-making. In this section, we wish to discuss two possible combining approaches in the context of optimal decision-making:

•
(CPred) first, combine (using optimal weights) the predictive estimates produced by distinct models and then determine the corresponding best decision; or,
•
(CDec) first, for each of the predictive models, determine the corresponding best decision, and then combine (using optimal weights) those

Conclusions

Computational efficiency is of critical importance when dealing with large data sets, large collections of data, or with streamed data. Further difficulties arise when the data is noisy, nonstationary and nonlinear, requiring more complex and flexible yet robust forecasting models. Model optimality is very difficult, or virtually impossible to achieve, but through the optimal combination of suboptimal solutions, one may hope to efficiently obtain better quality solutions.

In this paper, we

References (29)

J.L. Carmo et al.
Adaptive forecasting of irregular demand processes
Engineering Applications of Artificial Intelligence
(2004)
J.D. Hart
Differencing as an approximate de-trending device
Stochastic Processes and their Applications
(1989)
S. Hashem
Optimal linear combinations of neural networks
Neural Networks
(1997)
D.S.G. Pollock
Recursive estimation in econometrics
Computational Statistics & Data Analysis
(2003)
G. Zhang et al.
Forecasting with artificial neural networks: The state of the art
International Journal of Forecasting
(1998)
A.M. Bagirov et al.
Unsupervised and supervised data classification via nonsmooth and global optimization
Top
(2003)
J.M. Bates et al.
The combination of forecasts
Operational Research Quarterly
(1969)
C.M. Bishop
Neural Networks for Pattern Recognition
(1995)
R.R. Bitmead
Adaptive control algorithms
J.L. Carmo et al.
Identificação de redes neuronais Gaussianas como modelos de previsão (in Portuguese)
Investigação Operacional
(2002)

K.H. Chan et al.

A note on trend removal methods: The case of polynomial regression versus variate differencing

Econometrica

(1977)

F.X. Diebold et al.

Deterministic vs. stochastic trend in US GNP, yet again

American Economic Review

(1996)

R.C. Eberhart, J. Kennedy, A new optimizer using Particle Swarm Theory, in: Proc. of the Sixth International Symposium...

P.S.A. Freitas, Combinação de Modelos Neuronais na Previsão de Séries Temporais, M.Sc. thesis, (in Portuguese), FCUL,...

Cited by (48)

Forecast combinations: An over 50-year review
2023, International Journal of Forecasting
Citation Excerpt :
Additionally, an empirical study by Petropoulos, Kourentzes et al. (2018) showed that a weighted combination based on AIC improves the performance of the statistical benchmark they used. Linear combination approaches implicitly assume a linear dependence between constituent forecasts and the variable of interest (Donaldson & Kamstra, 1996; Freitas & Rodrigues, 2006), and may not result in the best forecast (Shi et al., 1999). This is especially true if the individual forecasts come from nonlinear models or if the relationship between combination members and the best forecast is characterized by nonlinear systems (Babikir & Mwambi, 2016).
Forecast combinations have flourished remarkably in the forecasting community and, in recent years, have become part of mainstream forecasting research and activities. Combining multiple forecasts produced for a target time series is now widely used to improve accuracy through the integration of information gleaned from different sources, thereby avoiding the need to identify a single “best” forecast. Combination schemes have evolved from simple combination methods without estimation to sophisticated techniques involving time-varying weights, nonlinear combinations, correlations among components, and cross-learning. They include combining point forecasts and combining probabilistic forecasts. This paper provides an up-to-date review of the extensive literature on forecast combinations and a reference to available open-source software implementations. We discuss the potential and limitations of various methods and highlight how these ideas have developed over time. Some crucial issues concerning the utility of forecast combinations are also surveyed. Finally, we conclude with current research gaps and potential insights for future research.
Outliers in financial time series data: Outliers, margin debt, and economic recession
2022, Machine Learning with Applications
Outliers in financial time series data are different from that in cross-sectional data in terms of the treatment and the detection. First, outliers in time series can be the focus of analysis itself, such as outliers in margin debt to indicate an overheating market. Second, the outlier detection in time series should be accompanied by decomposition to exclude inherent patterns. Unfortunately, there is a lack of consensus on the best decomposition method. Thus, we propose an ensemble model that combines multiple decomposition methods. Using the approach, we found that the outliers in margin debt are strong predictors of a recession.
Online joint replacement-order optimization driven by a nonlinear ensemble remaining useful life prediction method
2022, Mechanical Systems and Signal Processing
Citation Excerpt :
According to the demonstration provided in [30], the linear ensemble of different time forecast models will inevitably jeopardize its robustness and generalization ability because of collinearity effect. Through further studies in [31,32], the impact of collinearity on the overall ensemble accuracy of individual prediction models can be reduced by considering the correlations among them. Motivated by this statement, we introduce the correlations among individual RUL models as the nonlinear term and formulate the nonlinear ensemble framework as follows.
Remaining useful life (RUL) prediction and maintenance optimization are two critical and sequentially connected modules in the prognostics and health management of machines. Due to the advantages of obtaining more accurate RUL prediction results and the effectiveness of addressing replacement scheduling and spare parts provision dynamically, ensemble RUL prediction and online joint replacement-order optimization are paid specific attention to. Despite substantial works on those two aspects, there are still two limitations that compromise their performances in practical applications: 1) Existing ensemble RUL prediction methods neglected the nonlinear relationships among individual prediction models. 2) No online joint optimization model that utilizes ensemble RUL information is available. Faced with these two limitations, this paper first proposes a nonlinear ensemble RUL prediction method, which takes nonlinear relationships among models into consideration. Furthermore, an online joint replacement-order model is formulated using the ensemble RUL prediction results, and an iterated local search-based optimization algorithm is utilized for dynamically finding the near-optimal joint policies. Through the experimental study of milling cutter life tests, the proposed nonlinear ensemble RUL prediction method is verified with higher accuracy, and the joint optimization model utilizing the ensemble RUL results is shown to provide more effective joint policies.
Bayesian optimization based dynamic ensemble for time series forecasting
2022, Information Sciences
Among various time series (TS) forecasting methods, ensemble forecast is extensively acknowledged as a promising ensemble approach achieving great success in research and industry. Due to the high diversification of individual model assumptions, heterogeneous information fusion contributes to generating effective and robust forecasts for Economics, Meteorology, and Transportation. This paper proposes a Bayesian optimization-based dynamic ensemble (BODE) that overcomes the single model-based methods limitation and provides a dynamic ensemble forecast combination for TS with time-varying underlying patterns. The proposed BODE method combines ten disparate model candidates, including statistical methods, machine learning (ML)-based models, and the latest deep neural networks (DNN). We take into consideration their prediction performance for the recent past to adjust their weights for combination and apply the model-based Bayesian optimization algorithm (BOA) for the combination hyperparameter (HP) tuning to endow our method with higher adaptability and better generalization performance. Besides, the frequency impact of TS data on the ensemble forecast methods is under-researched in the current literature. Therefore, four groups of distinct seasonal TS datasets are investigated in this paper. The empirical result demonstrates that our method performs robustly better performance with the main reasons analyzed in a detailed ablation study.
A novel performance assessment method of the carbon efficiency for iron ore sintering process
2021, Journal of Process Control
Improving carbon efficiency is an effective way to save energy and reduce harmful emission for a sintering process. Optimizing carbon efficiency is an effective way to achieve that goal, and its precondition is to assess the performance of the carbon efficiency. However, there is seldom research about how to assess the carbon efficiency whether it needs to be optimized. To address this, this paper introduces a performance assessment method for evaluating the performance of the carbon efficiency. First, the sintering process and the key characteristics are analyzed, and the carbon efficiency indexes are defined. Second, the structure of the assessment method is presented. The method consists of a prediction model based on three NNs, and an assessment method based on the fuzzy synthetic evaluation method. Two-level combination strategy is proposed to improve prediction performance and assessment accuracy, with the using of bootstrap aggregating, linear combination, and majority voting. Finally, verification based on process data shows that the proposed method can assess the performance of the carbon efficiency with high accuracy. More specially, the prediction errors of the combination model for the CCR are basically in the range of [-2.738 kg/t, 3.442 kg/t], and for the CO/CO $_{2}$ they are basically in the range of [-8.16 × 10⁻³, 4.828 × 10⁻³]. The combination models have better prediction performance than single NNs. Moreover, the assessment accuracy of the proposed method is 87%, which has higher accuracy than other models. This model lays the groundwork of improving the carbon efficiency for sintering process.
Dynamic selection of forecast combiners
2016, Neurocomputing
Time series forecasting is an important research field in machine learning. Since the literature shows several techniques for the solution of this problem, combining outputs of different models is a simple and robust strategy. However, even when using combiners, the experimenter may face the following dilemma: which technique should one use to combine the individual predictors? Inspired by classification and pattern recognition algorithms, this work presents a dynamic selection method of forecast combiners. In the dynamic selection, each test pattern is submitted to a certain combiner according to a nearest neighbor rule. The proposed method was used to forecast eight time series with chaotic behavior in short and long term. In general, the dynamic selection presented satisfactory results for all datasets.

View all citing articles on Scopus

View full text

Model combination in neural-based forecasting

Abstract

Introduction

Section snippets

Model specification

The combination of estimates

Asymmetrical costs

Conclusions

Engineering Applications of Artificial Intelligence

Stochastic Processes and their Applications

Neural Networks

Computational Statistics & Data Analysis

International Journal of Forecasting

Unsupervised and supervised data classification via nonsmooth and global optimization

Top

The combination of forecasts

Operational Research Quarterly

Neural Networks for Pattern Recognition

Adaptive control algorithms

Identificação de redes neuronais Gaussianas como modelos de previsão (in Portuguese)

Investigação Operacional

A note on trend removal methods: The case of polynomial regression versus variate differencing

Econometrica

Deterministic vs. stochastic trend in US GNP, yet again

American Economic Review