Elsevier

Applied Soft Computing

Volume 11, Issue 2, March 2011, Pages 1529-1539
Applied Soft Computing

Fast meta-models for local fusion of multiple predictive models

https://doi.org/10.1016/j.asoc.2008.03.006Get rights and content

Abstract

Fusing the outputs of an ensemble of diverse predictive models usually boosts overall prediction accuracy. Such fusion is guided by each model's local performance, i.e., each model's prediction accuracy in the neighborhood of the probe point. Therefore, for each probe we instantiate a customized fusion mechanism. The fusion mechanism is a meta-model, i.e., a model that operates one level above the object-level models whose predictions we want to fuse. Like these models, such a meta-model is defined by structural and parametric information.

In this paper, we focus on the definition of the parametric information for a given structure. For each probe point, we either retrieve or compute the parameters to instantiate the associated meta-model. The retrieval approach is based on a CART-derived segmentation of the probe's state space, which contains the meta-model parameters. The computation approach is based on a run-time evaluation of each model's local performance in the neighborhood of the probe. We explore various structures for the meta-model, and for each structure we compare the pre-compiled (retrieval) or run-time (computation) approaches.

We demonstrate this fusion methodology in the context of multiple neural network models. However, our methodology is broadly applicable to other predictive modeling approaches. This fusion method is illustrated in the development of highly accurate models for emissions, efficiency, and load prediction in a complex power plant. The locally weighted fusion method boosts the predictive performance by 30–50% over the baseline single model approach for the various prediction targets. Relative to this approach, typical fusion strategies that use averaging or globally weighting schemes only produce a 2–6% performance boost over the same baseline.

Introduction

Developing accurate predictive models is a critical task in many applications. On one hand, there is a broad application of predictive models for pattern analysis and classification. On the other hand, an accurate predictive model for systems and processes is increasingly critical to the efficient design, utility, maintainability, and management of complex engineered systems. These models are typically utilized to realize the inherent transfer relationship between the inputs and outputs of a system. Here we use “system” to represent any entity for which some of its attributes (outputs) are influenced by other attributes (inputs) through an inherent functional mapping. Such a system model is often a necessary step in the optimization or management of certain behaviors of the system. Only when an accurate model is developed we can reliably approximate the behavior of the system of interest. Modeling techniques can typically be classified into physics-based first principles methods (e.g. lumped parameter models) or data-driven methods (e.g. neural network models).

In general, given a system input x, and a system response t(x) we construct a system model f(x). The model is always an approximation to the underlying system, and includes an error component ε(x), where t(x)=f(x)+ε(x). In some applications, f(x) may be derived through physics-based approaches such as finite elements or computational fluid dynamics. Typically, data-driven modeling is applied for modeling complex systems for which the physics is not well understood, or for which a physics-based model is very difficult to develop and maintain at the accuracy levels required of the application.

Regression and other generalized linear modeling techniques are well known in the statistics literature used to build data-driven models [1]. Neural networks are nonlinear universal functional approximators often used for generalized regression. Similar to other data-driven modeling techniques, a neural network is trained and validated using historical data to learn a regression function f(x). The prediction uncertainty of a neural network is due to several factors:

  • (1)

    data noise ε(x);

  • (2)

    model parameter misspecification caused by

    • (a)

      variation due to randomly sampling the training set,

    • (b)

      non-deterministic training results, and

    • (c)

      varying initial conditions during training; and

  • (3)

    model structure misspecification, e.g. not enough neurons.

While the issues mentioned above are common in data-driven models, they are inevitable in physics-based modeling as well due to simplification of the model structure and uncertainties involved in parameter estimations.

Given the fact that there are errors existing in the models one derives, research work has been conducted to calibrate these errors. In the neural network domain, prediction error has been studied in the context of computing confidence intervals for neural networks [2], [3], [4]. These studies include techniques from the field of nonlinear regression, for instance, the so-called “delta method” and “sandwich method” [5], and the “bootstrap method” [6]. The bootstrap method takes advantage of computational resources to provide a good estimation of the variance due to model parameter misspecification without assuming a normal probability distribution for the error term.

Researchers often ask the question of how one can mitigate these errors in the derived predictive models. Researchers have tried to tackle this problem from different angles, among which fusion of multiple models becomes an increasingly attractive and effective strategy. Given the various uncertainties associated with model accuracy, an ensemble of diverse models could complement each other. A fusion of these multiple models could generate an overall model with lower uncertainty and higher accuracy than is possible with a singleton model. A principal advantage of combining multiple models is the avoidance of loss of information that might result if only the best performing model is selected. Finding the best way of exploitation of the information contained in an ensemble of imperfect estimators is central to the idea of fusing multiple models for improved performance. This is the main focus of this paper.

In our previous work [24], we introduced a method for fusing multiple models based on their measured local performance in the neighborhood of interest. In our experiments we obtained promising results, between 18% and 38% over the average of 30 neural network predictors, which we used as our baseline. However, identifying local performance online could be a costly procedure that might not be suitable when response time is an issue. In this paper, we present different ways of performing such local fusion. In particular, we propose the use of classification and regression tree (CART) [25] to create and represent a segmentation of the probe's state space. This partition contains the parameters needed to instantiate the corresponding meta-model.

To appropriately compare this new approach with our previous work, we use the same object-level models (multiple neural networks) and the same application (emissions, efficiency, and load predictions in a complex power plant) as in [24]. We show that this local fusion method in general is very effective for boosting model prediction accuracy. Although the fusion strategy is demonstrated in the context of multiple neural network models, it can be easily applied to other predictive modeling techniques as well.

In Section 2, we will briefly review related work in the literature, while in Section 3 we describe the general methodology and process for generating and fusing multiple neural network predictors and different fusion strategies; In Section 4, we present the results from a set of experiments based on prediction of emissions, efficiency, and load in a complex real-world power plant, while in Section 5 we present our conclusions describe potential future work.

Section snippets

Related work

Fusion of multiple estimators to improve performance has been a vital research topic in the machine learning community. The idea of combining multiple models is to leverage their complementary predictive characteristics. Therefore, we need multiple diverse models in order to fuse them and get better performance. Various techniques have been proposed in the literature for creation of diverse models, such as: using randomized initial conditions, different topologies (i.e. model structure),

Fusion as a model

In this section, we describe the high-level framework and the process of developing multiple models and fusing them based on local performance.

Training of multiple models

The training data for each model is obtained by resampling from the original data with replacement to create a bootstrap. The size of the new data set is the same as the original data. This is illustrated in Fig. 4. A small portion of the historical dataset is randomly selected and excluded for model verification, and the remainder of the historical dataset is sampled with replacement to yield m synthetic training datasets, each of the same size, to train each of the m models. The standardized

Local fusion strategies

As we have defined the meta-model for fusion, we introduce different ways of estimating the model parameters in this section.

One approach is to estimate each model's weights and mean error for probe point Q based the probe's neighbors (peers). We have described a particular method in our previous work [24]. In this paper, we present other different neighborhood-based approaches and compare their results.

Another approach is to pre-compile the parameters into a model, for instance a

Experimental results

To demonstrate the effectiveness of the fusion strategy proposed in this paper, our locally weighted fusion techniques are compared to other techniques, specifically the simple average combination and weighed aggregation based on global performance. Weighted fusion based on global performance is realized by extending the neighborhood of any probe point to the entire historical dataset. In this method, the weights for the aggregation are constant for every probe point, reducing the computational

Conclusions and future work

There are several factors that cause prediction uncertainties for both data-driven and physics-based models. One effective way of reducing such uncertainties and therefore boosting prediction accuracy is by effectively fusing the outputs from a set of multiple diverse models. A key issue here is the design of an effective fusion strategy.

Techniques based on voting, ranking, weighting, fuzzy templates, and Dempster–Shafer theory have been studied in the pattern classification area. In

References (26)

  • A. Verikas et al.

    Soft combination of neural classifiers: a comparative study

    Pattern Recognition Letters

    (1999)
  • S. Hashem

    Optimal linear combinations of neural networks

    Neural Networks

    (1997)
  • D.H. Wolpert

    Stacked generalization

    Neural Networks

    (1992)
  • P. McCullagh et al.

    Generalized Linear Models

    (1989)
  • G. Papadopoulos et al.

    Confidence estimation methods for neural networks: a practical comparison

    IEEE Transactions on Neural Networks

    (2001)
  • T. Heskes

    Practical confidence and prediction intervals

    Advances in Neural Information Processing Systems

    (1997)
  • R. De Veaux et al.

    Prediction intervals for neural networks via nonlinear regression

    Technometrics

    (1998)
  • D. Lowe et al.

    Point-wise confidence interval estimation by neural networks: a comparative study based on automotive engine calibration

    Neural Computing and Applications

    (1999)
  • R. Tibshirani

    A comparison of some error estimates for neural networks models

    Neural Computing

    (1996)
  • B. Efron et al.

    An Introduction to the Bootstrap

    (1993)
  • L. Breiman

    Bagging predictors

    Machine Learning

    (1996)
  • A.J.C. Sharkey

    On combining artificial neural nets

    Connection Science

    (1996)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • Cited by (54)

    • A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization

      2020, Renewable and Sustainable Energy Reviews
      Citation Excerpt :

      This is parallel processed by a group of ANNs, i.e. base prediction models; the final linear sum being the output of the ensemble approach (for ANN) at a particular time stamp of a specific day. Moreover, ensemble's forecast accuracy can be improved by generating diversity among its base prediction models [190–192]. This is accomplished by: 1) using disparate prediction methods (SVMs, ANNs etc.); 2) employing the same prediction model, but with differential parameter settings (for example, varying the number of hidden neurons and neuronal layers); and 3) training the individual models by utilizing different datasets [98].

    • Predictor fusion for short-term traffic forecasting

      2018, Transportation Research Part C: Emerging Technologies
      Citation Excerpt :

      There is no single method that best all traffic variables under all traffic conditions. The strategy of predictor fusion is widely used to improve prediction accuracy in many fields, such as power (e.g., Bonissone et al., 2011), computer science (e.g., Loh & Henry, 2002) and biology (e.g., Chan & Stolfo, 1997). Motivated by this, a fusion-based framework is proposed to leverage the strengths of different machine learning tools using the same inputs for traffic prediction under a range of traffic conditions.

    • Ensemble of optimized echo state networks for remaining useful life prediction

      2018, Neurocomputing
      Citation Excerpt :

      The weights can be equal for all the models or can be proportional to a measure of the individual model performance properly estimated on a set of input–output data. On the contrary, local aggregation dynamically assigns a weight to each model according to its local performance typically evaluated considering a fraction of the available historical input–output patterns characterized by input signal values similar to that of the test pattern [14,52]. The idea behind local aggregation is that the individual model performance is typically different in the different parts of the input domain.

    View all citing articles on Scopus
    1

    Fellow IEEE.

    2

    Member IEEE.

    3

    Senior Member IEEE.

    View full text