Fast meta-models for local fusion of multiple predictive models

doi:10.1016/j.asoc.2008.03.006

Applied Soft Computing

Volume 11, Issue 2, March 2011, Pages 1529-1539

https://doi.org/10.1016/j.asoc.2008.03.006 Get rights and content

Abstract

Fusing the outputs of an ensemble of diverse predictive models usually boosts overall prediction accuracy. Such fusion is guided by each model's local performance, i.e., each model's prediction accuracy in the neighborhood of the probe point. Therefore, for each probe we instantiate a customized fusion mechanism. The fusion mechanism is a meta-model, i.e., a model that operates one level above the object-level models whose predictions we want to fuse. Like these models, such a meta-model is defined by structural and parametric information.

In this paper, we focus on the definition of the parametric information for a given structure. For each probe point, we either retrieve or compute the parameters to instantiate the associated meta-model. The retrieval approach is based on a CART-derived segmentation of the probe's state space, which contains the meta-model parameters. The computation approach is based on a run-time evaluation of each model's local performance in the neighborhood of the probe. We explore various structures for the meta-model, and for each structure we compare the pre-compiled (retrieval) or run-time (computation) approaches.

We demonstrate this fusion methodology in the context of multiple neural network models. However, our methodology is broadly applicable to other predictive modeling approaches. This fusion method is illustrated in the development of highly accurate models for emissions, efficiency, and load prediction in a complex power plant. The locally weighted fusion method boosts the predictive performance by 30–50% over the baseline single model approach for the various prediction targets. Relative to this approach, typical fusion strategies that use averaging or globally weighting schemes only produce a 2–6% performance boost over the same baseline.

Introduction

Developing accurate predictive models is a critical task in many applications. On one hand, there is a broad application of predictive models for pattern analysis and classification. On the other hand, an accurate predictive model for systems and processes is increasingly critical to the efficient design, utility, maintainability, and management of complex engineered systems. These models are typically utilized to realize the inherent transfer relationship between the inputs and outputs of a system. Here we use “system” to represent any entity for which some of its attributes (outputs) are influenced by other attributes (inputs) through an inherent functional mapping. Such a system model is often a necessary step in the optimization or management of certain behaviors of the system. Only when an accurate model is developed we can reliably approximate the behavior of the system of interest. Modeling techniques can typically be classified into physics-based first principles methods (e.g. lumped parameter models) or data-driven methods (e.g. neural network models).

In general, given a system input $\vec{x}$ , and a system response $t (\vec{x})$ we construct a system model $f (\vec{x})$ . The model is always an approximation to the underlying system, and includes an error component $ε (\vec{x})$ , where $t (\vec{x}) = f (\vec{x}) + ε (\vec{x})$ . In some applications, $f (\vec{x})$ may be derived through physics-based approaches such as finite elements or computational fluid dynamics. Typically, data-driven modeling is applied for modeling complex systems for which the physics is not well understood, or for which a physics-based model is very difficult to develop and maintain at the accuracy levels required of the application.

Regression and other generalized linear modeling techniques are well known in the statistics literature used to build data-driven models [1]. Neural networks are nonlinear universal functional approximators often used for generalized regression. Similar to other data-driven modeling techniques, a neural network is trained and validated using historical data to learn a regression function $f (\vec{x})$ . The prediction uncertainty of a neural network is due to several factors:

(1)
data noise $ε (\vec{x})$ ;
(2)
model parameter misspecification caused by
- (a)
  variation due to randomly sampling the training set,
- (b)
  non-deterministic training results, and
- (c)
  varying initial conditions during training; and
(3)
model structure misspecification, e.g. not enough neurons.

While the issues mentioned above are common in data-driven models, they are inevitable in physics-based modeling as well due to simplification of the model structure and uncertainties involved in parameter estimations.

Given the fact that there are errors existing in the models one derives, research work has been conducted to calibrate these errors. In the neural network domain, prediction error has been studied in the context of computing confidence intervals for neural networks [2], [3], [4]. These studies include techniques from the field of nonlinear regression, for instance, the so-called “delta method” and “sandwich method” [5], and the “bootstrap method” [6]. The bootstrap method takes advantage of computational resources to provide a good estimation of the variance due to model parameter misspecification without assuming a normal probability distribution for the error term.

Researchers often ask the question of how one can mitigate these errors in the derived predictive models. Researchers have tried to tackle this problem from different angles, among which fusion of multiple models becomes an increasingly attractive and effective strategy. Given the various uncertainties associated with model accuracy, an ensemble of diverse models could complement each other. A fusion of these multiple models could generate an overall model with lower uncertainty and higher accuracy than is possible with a singleton model. A principal advantage of combining multiple models is the avoidance of loss of information that might result if only the best performing model is selected. Finding the best way of exploitation of the information contained in an ensemble of imperfect estimators is central to the idea of fusing multiple models for improved performance. This is the main focus of this paper.

In our previous work [24], we introduced a method for fusing multiple models based on their measured local performance in the neighborhood of interest. In our experiments we obtained promising results, between 18% and 38% over the average of 30 neural network predictors, which we used as our baseline. However, identifying local performance online could be a costly procedure that might not be suitable when response time is an issue. In this paper, we present different ways of performing such local fusion. In particular, we propose the use of classification and regression tree (CART) [25] to create and represent a segmentation of the probe's state space. This partition contains the parameters needed to instantiate the corresponding meta-model.

To appropriately compare this new approach with our previous work, we use the same object-level models (multiple neural networks) and the same application (emissions, efficiency, and load predictions in a complex power plant) as in [24]. We show that this local fusion method in general is very effective for boosting model prediction accuracy. Although the fusion strategy is demonstrated in the context of multiple neural network models, it can be easily applied to other predictive modeling techniques as well.

In Section 2, we will briefly review related work in the literature, while in Section 3 we describe the general methodology and process for generating and fusing multiple neural network predictors and different fusion strategies; In Section 4, we present the results from a set of experiments based on prediction of emissions, efficiency, and load in a complex real-world power plant, while in Section 5 we present our conclusions describe potential future work.

Section snippets

Related work

Fusion of multiple estimators to improve performance has been a vital research topic in the machine learning community. The idea of combining multiple models is to leverage their complementary predictive characteristics. Therefore, we need multiple diverse models in order to fuse them and get better performance. Various techniques have been proposed in the literature for creation of diverse models, such as: using randomized initial conditions, different topologies (i.e. model structure),

Fusion as a model

In this section, we describe the high-level framework and the process of developing multiple models and fusing them based on local performance.

Training of multiple models

The training data for each model is obtained by resampling from the original data with replacement to create a bootstrap. The size of the new data set is the same as the original data. This is illustrated in Fig. 4. A small portion of the historical dataset is randomly selected and excluded for model verification, and the remainder of the historical dataset is sampled with replacement to yield m synthetic training datasets, each of the same size, to train each of the m models. The standardized

Local fusion strategies

As we have defined the meta-model for fusion, we introduce different ways of estimating the model parameters in this section.

One approach is to estimate each model's weights and mean error for probe point Q based the probe's neighbors (peers). We have described a particular method in our previous work [24]. In this paper, we present other different neighborhood-based approaches and compare their results.

Another approach is to pre-compile the parameters into a model, for instance a

Experimental results

To demonstrate the effectiveness of the fusion strategy proposed in this paper, our locally weighted fusion techniques are compared to other techniques, specifically the simple average combination and weighed aggregation based on global performance. Weighted fusion based on global performance is realized by extending the neighborhood of any probe point to the entire historical dataset. In this method, the weights for the aggregation are constant for every probe point, reducing the computational

Conclusions and future work

There are several factors that cause prediction uncertainties for both data-driven and physics-based models. One effective way of reducing such uncertainties and therefore boosting prediction accuracy is by effectively fusing the outputs from a set of multiple diverse models. A key issue here is the design of an effective fusion strategy.

Techniques based on voting, ranking, weighting, fuzzy templates, and Dempster–Shafer theory have been studied in the pattern classification area. In

References (26)

A. Verikas et al.
Soft combination of neural classifiers: a comparative study
Pattern Recognition Letters
(1999)
S. Hashem
Optimal linear combinations of neural networks
Neural Networks
(1997)
D.H. Wolpert
Stacked generalization
Neural Networks
(1992)
P. McCullagh et al.
Generalized Linear Models
(1989)
G. Papadopoulos et al.
Confidence estimation methods for neural networks: a practical comparison
IEEE Transactions on Neural Networks
(2001)
T. Heskes
Practical confidence and prediction intervals
Advances in Neural Information Processing Systems
(1997)
R. De Veaux et al.
Prediction intervals for neural networks via nonlinear regression
Technometrics
(1998)
D. Lowe et al.
Point-wise confidence interval estimation by neural networks: a comparative study based on automotive engine calibration
Neural Computing and Applications
(1999)
R. Tibshirani
A comparison of some error estimates for neural networks models
Neural Computing
(1996)
B. Efron et al.
An Introduction to the Bootstrap
(1993)

L. Breiman

Bagging predictors

Machine Learning

(1996)

A.J.C. Sharkey

On combining artificial neural nets

Connection Science

(1996)

L. Breiman

Random forests

Machine Learning

(2001)

Cited by (54)

A micro-service-based machinery monitoring solution towards realizing the Industry 4.0 vision in a real environment
2021, Procedia Computer Science
This work presents a modular Smart Maintenance Platform (SMP), which offers a toolbox of condition monitoring tools for the day-to-day inspection of the machinery equipment in a real industrial environment. SMP complements the manual inspection performed by the maintenance engineers with advanced data-analytics solutions that utilize both reactive and proactive techniques based on state-of-the-art machine learning proposals. Appropriate data acquisition mechanisms are built to acquire the sensorial data and transform them to suitable forms for data processing. Higher-level monitoring policies are enabled, through the fusion of multiple monitoring tasks increasing the precision of the results. The generated results are persisted for the continuous enhancement of the solution through the retraining of the proactive approach or the re-configuration of the reactive ones. The generated results are visualized in a user-friendly format, to help the maintenance engineers to assess the severity of the situation and to intervene in the machinery equipment if deemed necessary.
Multiple model-based control of multi variable continuous microbial fuel cell (CMFC) using machine learning approaches
2020, Computers and Chemical Engineering
Microbial fuel cells (MFCs) are promising technology to simultaneously harvest energy and produce clean water from wastewater or organic matter. However, MFC is a complex, time-variant, and nonlinear system facing significant challenges in modeling and control development. Some of the challenges can be addressed by a deeper understanding of the system through mathematical modeling. A temperature based mathematical model of the MFC is developed and qualitatively validated with the experimental data from the literature. The nonlinearity present in the MFC is addressed by using multiple models-based control strategy. Various machine learning approaches are used for control integration in the developed control strategy. The performance of the developed multiple models-based control strategy shows a reduction of 65% in the average settling time of the controller when compared with the single model linear controller. Among various machine learning approaches used, WkNN model switching criteria provides the best performance.
A review and evaluation of the state-of-the-art in PV solar power forecasting: Techniques and optimization
2020, Renewable and Sustainable Energy Reviews
Citation Excerpt :
This is parallel processed by a group of ANNs, i.e. base prediction models; the final linear sum being the output of the ensemble approach (for ANN) at a particular time stamp of a specific day. Moreover, ensemble's forecast accuracy can be improved by generating diversity among its base prediction models [190–192]. This is accomplished by: 1) using disparate prediction methods (SVMs, ANNs etc.); 2) employing the same prediction model, but with differential parameter settings (for example, varying the number of hidden neurons and neuronal layers); and 3) training the individual models by utilizing different datasets [98].
Integration of photovoltaics into power grids is difficult as solar energy is highly dependent on climate and geography; often fluctuating erratically. This causes penetrations and voltage surges, system instability, inefficient utilities planning and financial loss. Forecast models can help; however, time stamp, forecast horizon, input correlation analysis, data pre and post-processing, weather classification, network optimization, uncertainty quantification and performance evaluations need consideration. Thus, contemporary forecasting techniques are reviewed and evaluated. Input correlational analyses reveal that solar irradiance is most correlated with Photovoltaic output, and so, weather classification and cloud motion study are crucial. Moreover, the best data cleansing processes: normalization and wavelet transforms, and augmentation using generative adversarial network are recommended for network training and forecasting. Furthermore, optimization of inputs and network parameters, using genetic algorithm and particle swarm optimization, is emphasized. Next, established performance evaluation metrics MAE, RMSE and MAPE are discussed, with suggestions for including economic utility metrics. Subsequently, modelling approaches are critiqued, objectively compared and categorized into physical, statistical, artificial intelligence, ensemble and hybrid approaches. It is determined that ensembles of artificial neural networks are best for forecasting short term photovoltaic power forecast and online sequential extreme learning machine superb for adaptive networks; while Bootstrap technique optimum for estimating uncertainty. Additionally, convolutional neural network is found to excel in eliciting a model's deep underlying non-linear input-output relationships. The conclusions drawn impart fresh insights in photovoltaic power forecast initiatives, especially in the use of hybrid artificial neural networks and evolutionary algorithms.
An ensemble of models for integrating dependent sources of information for the prognosis of the remaining useful life of Proton Exchange Membrane Fuel Cells
2019, Mechanical Systems and Signal Processing
This paper presents a prognostic approach based on an ensemble of two degradation indicators for the prediction of the Remaining Useful Life (RUL) of a Proton Exchange Membrane Fuel Cell (PEMFC) stack. When the fuel cell stack experiences variable operating conditions, degradation indicators, such as the stack voltage and the stack State Of Health (SOH), are not able to individually provide precise and robust RUL predictions. The stack voltage does not directly measure the component degradation, as it is only related to degradation symptoms, which are significantly affected by operating conditions. The SOH provides aging information but it can only be measured at low frequency in industrial applications. The objective of this work is to combine the two indicators, leveraging their strengths and overcoming their drawbacks. Two different physics-based models are used to this aim: the first model receives a signal directly observable and related to the stack voltage, which can be frequently and easily measured; the second model is fed by periodic measurements from the physical characterization of the stack, which gives reliable information on the SOH evolution. The prognostic procedure is implemented using Particle Filtering (PF), and the outcomes of the two prognostic filters are aggregated to obtain the ensemble predictions. The ensemble-based approach employs a local aggregation technique that combines the outcomes of two prognostic models by assigning to each model a weight and a bias correction, which are obtained considering the individual models’ local performances. The dependence between the two indicators is also accounted for, by dependent Gamma processes. The results obtained show that the accuracy of the RUL predictions obtained by the proposed ensemble-based method outperforms that obtained by the individual models.
Predictor fusion for short-term traffic forecasting
2018, Transportation Research Part C: Emerging Technologies
Citation Excerpt :
There is no single method that best all traffic variables under all traffic conditions. The strategy of predictor fusion is widely used to improve prediction accuracy in many fields, such as power (e.g., Bonissone et al., 2011), computer science (e.g., Loh & Henry, 2002) and biology (e.g., Chan & Stolfo, 1997). Motivated by this, a fusion-based framework is proposed to leverage the strengths of different machine learning tools using the same inputs for traffic prediction under a range of traffic conditions.
Ensemble of optimized echo state networks for remaining useful life prediction
2018, Neurocomputing
Citation Excerpt :
The weights can be equal for all the models or can be proportional to a measure of the individual model performance properly estimated on a set of input–output data. On the contrary, local aggregation dynamically assigns a weight to each model according to its local performance typically evaluated considering a fraction of the available historical input–output patterns characterized by input signal values similar to that of the test pattern [14,52]. The idea behind local aggregation is that the individual model performance is typically different in the different parts of the input domain.
The use of Echo State Networks (ESNs) for the prediction of the Remaining Useful Life (RUL) of industrial components, i.e. the time left before the equipment will stop fulfilling its functions, is attractive because of their capability of handling the system dynamic behavior, the measurement noise, and the stochasticity of the degradation process. In particular, in this paper we originally resort to an ensemble of ESNs, for enhancing the performances of individual ESNs and providing also an estimation of the uncertainty affecting the RUL prediction. The main methodological novelties in our use of ESNs for RUL prediction are: i) the use of the individual ESN memory capacity within the dynamic procedure for aggregating of the ESNs outcomes; ii) the use of an additional ESN for estimating the RUL uncertainty, within the Mean Variance Estimation (MVE) approach. With these novelties, the developed approach outperforms a static ensemble and a standard MVE approach for uncertainty estimation in tests performed on a synthetic and two industrial datasets.

View all citing articles on Scopus

¹: Fellow IEEE.

²: Member IEEE.

³: Senior Member IEEE.

View full text

Fast meta-models for local fusion of multiple predictive models

Abstract

Introduction

Section snippets

Related work

Fusion as a model

Training of multiple models

Local fusion strategies

Experimental results

Conclusions and future work

Pattern Recognition Letters

Neural Networks

Neural Networks

Generalized Linear Models

Confidence estimation methods for neural networks: a practical comparison

IEEE Transactions on Neural Networks

Practical confidence and prediction intervals

Advances in Neural Information Processing Systems

Prediction intervals for neural networks via nonlinear regression

Technometrics

Point-wise confidence interval estimation by neural networks: a comparative study based on automotive engine calibration

Neural Computing and Applications

A comparison of some error estimates for neural networks models

Neural Computing

An Introduction to the Bootstrap

Bagging predictors

Machine Learning

On combining artificial neural nets

Connection Science

Random forests

Machine Learning