Model combination in neural-based forecasting

https://doi.org/10.1016/j.ejor.2005.06.057Get rights and content

Abstract

This paper discusses different ways of combining neural predictive models or neural-based forecasts. The proposed approaches consider Gaussian radial basis function networks, which can be efficiently identified and estimated through recursive/adaptive methods. The usual framework for linearly combining estimates from different models is extended, to cope with the case where the forecasting errors from those models are correlated. A prefiltering methodology is proposed, addressing the problems raised by heavily nonstationary time series. Moreover, the paper discusses two approaches for decision-making from forecasting models: either inferring decisions from combined predictive estimates, or combining prescriptive solutions derived from different forecasting models.

Introduction

Time series forecasting is a common goal in data mining applications, where most often the recorded data is indexed in time, and the variables or attributes have distributional properties and correlation effects that are nonstationary in time. Without sacrificing predictive accuracy, the models to be used should be not too complex, should be both flexible and robust, and the methods to estimate those models should be efficient. Therefore, it is convenient to depart from the classic point of view of identifying a single, “clearly best” model, which might require a high computational burden for its identification and optimization.

Most references in the literature on neural-based forecasting follow that traditional paradigm, usually referring to the application of multilayer perceptrons (see [28] for a review). These are highly nonlinear models requiring optimization in a high-dimensional space of a nonlinear least-squares cost function, inevitably with many local optima (see [18] for a comprehensive introduction to the optimization of neural networks). Parameter updating in those models, given new data, is cumbersome and, most importantly, their direct application to nonstationary data is inadequate (see [23], [24]).

This paper seeks to discuss alternative approaches where, while still using supervised neural models, one can achieve good, if not better forecasting performance, through efficient recursive estimation and adaptive identification methods. All the parametric predictive models here proposed—neural or otherwise—are based on time-varying linear parameters that can be efficiently estimated in a recursive manner. Furthermore, we concur with the viewpoint that two or more suboptimal models, linearly composed or linearly combined, may, in general, constitute a better alternative to the optimization of a single neural model in terms of predictive accuracy, efficiency and robustness.

In most of the literature about time series forecasting and neural supervised learning, a classic paradigm is followed, where all effort is directed to the identification and estimation of a single model, in some sense optimal within a class of many possible models, different in structure, in size or in parameterization. The rationale behind this paradigm is the assumption that a “best” model can be conveniently identified for a given problem.

In real-world problems, the true model is likely to be unknown and some choices and assumptions have to be made such that the problem under study can be acceptably modelled and the underlying optimization problem solved. During this process, there are some issues that might be hard to sort out, such as choosing appropriate model selection criteria. In particular, if the chosen model is too complex or overparameterized, it can learn the noise intrinsic to the data, thus causing poor generalization performance, i.e., producing poor results when applied to new data (see, e.g., [3]).

As alternatives to the classic paradigm, several approaches have been proposed, where multiple models are explored and combined. This tends to minimize the implicit risk in taking into account just one model, even it is optimized, and that is limited in its capabilities with respect to the characteristics of the data and to the problem itself.

One may identify three fundamental ways of combining models, to yield a final estimate:

  • model mixing, where the chosen estimate is computed as a combination of estimates from different models;

  • model synthesis, where the chosen estimate is obtained from the (linear) combination of different, partial models, estimated in conjunction or in sequence; and,

  • model switching, where the chosen estimate is selected from the estimates of different models.

The main goal of the present work is to propose some guidelines for combining neural models or neural-based forecasts. While most of the optimization problems reported here are related to time series forecasting, some may be adapted to other problems, such as classification or clustering. The main models used in the paper are Gaussian radial basis function networks, a type of supervised neural networks that may be conveniently used as filtering models, and not just as strictly regressive models (Section 2). We propose some new ideas of using the model mixing approach (Section 3), provided the data is reasonably stationary, or the model synthesis approach (Section 4), when the data is clearly nonstationary. As prediction is just a means for supporting decision-making, in Section 5 we discuss whether one should combine the predictive models or the prescriptive ones.

Section snippets

Model specification

Supervised neural networks can be used as nonlinear autoregressive models, with the input patterns, xk, built from sequences of observations of a time series:xk=yk-pyk-2yk-1T.The network outputs are viewed as predicted estimates, in particular, one-step-ahead forecasts, yˆkyˆkk-1. This scheme can be easily adapted to longer horizons or to hybrid—causal and autoregressive—models.

The one-step-ahead forecasting errors are defined asekekk-1=yk-yˆkand are used to optimize the model, for a given

The combination of estimates

Probably, the most common approach to model combination is model mixing, where one combines the estimates produced by the individual models through weights. The first studies go back to Bates and Granger [2], and possibly others, who considered the linear combination of two different forecasting models. This approach was later extended to more models in [11], [20], [26]. Since then, many contributions have emerged on the linear combination of supervised neural models, including [15], [21] for

Asymmetrical costs

Prediction is just a means—albeit a very important one—of supporting decision-making. In this section, we wish to discuss two possible combining approaches in the context of optimal decision-making:

  • (CPred) first, combine (using optimal weights) the predictive estimates produced by distinct models and then determine the corresponding best decision; or,

  • (CDec) first, for each of the predictive models, determine the corresponding best decision, and then combine (using optimal weights) those

Conclusions

Computational efficiency is of critical importance when dealing with large data sets, large collections of data, or with streamed data. Further difficulties arise when the data is noisy, nonstationary and nonlinear, requiring more complex and flexible yet robust forecasting models. Model optimality is very difficult, or virtually impossible to achieve, but through the optimal combination of suboptimal solutions, one may hope to efficiently obtain better quality solutions.

In this paper, we

References (29)

  • K.H. Chan et al.

    A note on trend removal methods: The case of polynomial regression versus variate differencing

    Econometrica

    (1977)
  • F.X. Diebold et al.

    Deterministic vs. stochastic trend in US GNP, yet again

    American Economic Review

    (1996)
  • R.C. Eberhart, J. Kennedy, A new optimizer using Particle Swarm Theory, in: Proc. of the Sixth International Symposium...
  • P.S.A. Freitas, Combinação de Modelos Neuronais na Previsão de Séries Temporais, M.Sc. thesis, (in Portuguese), FCUL,...
  • Cited by (48)

    • Forecast combinations: An over 50-year review

      2023, International Journal of Forecasting
      Citation Excerpt :

      Additionally, an empirical study by Petropoulos, Kourentzes et al. (2018) showed that a weighted combination based on AIC improves the performance of the statistical benchmark they used. Linear combination approaches implicitly assume a linear dependence between constituent forecasts and the variable of interest (Donaldson & Kamstra, 1996; Freitas & Rodrigues, 2006), and may not result in the best forecast (Shi et al., 1999). This is especially true if the individual forecasts come from nonlinear models or if the relationship between combination members and the best forecast is characterized by nonlinear systems (Babikir & Mwambi, 2016).

    • Online joint replacement-order optimization driven by a nonlinear ensemble remaining useful life prediction method

      2022, Mechanical Systems and Signal Processing
      Citation Excerpt :

      According to the demonstration provided in [30], the linear ensemble of different time forecast models will inevitably jeopardize its robustness and generalization ability because of collinearity effect. Through further studies in [31,32], the impact of collinearity on the overall ensemble accuracy of individual prediction models can be reduced by considering the correlations among them. Motivated by this statement, we introduce the correlations among individual RUL models as the nonlinear term and formulate the nonlinear ensemble framework as follows.

    View all citing articles on Scopus
    View full text