A fast and efficient conformal regressor with regularized extreme learning machine

doi:10.1016/j.neucom.2018.04.012

Neurocomputing

Volume 304, 23 August 2018, Pages 1-11

https://doi.org/10.1016/j.neucom.2018.04.012 Get rights and content

Abstract

A conformal regressor combines conformal prediction and a traditional regressor for point predictions. It produces a valid prediction interval for a new testing input such that the probability of the target output being not included in the prediction interval is not more than a preset significance level. Although conformal prediction is both theoretically and empirically valid, one main drawback of the existing conformal regressors is their computational inefficiency. This paper proposes a novel fast and efficient conformal regressor named LW-JP-RELM, with combination of the local-weighted jackknife prediction (LW-JP), a new variant of conformal prediction, and the regularized extreme learning machine (RELM). The development of our learning algorithm is important both for the applications of extreme learning machine and conformal prediction. On the one hand, LW-JP-RELM complements ELM with interval predictions that satisfy a given level of confidence. On the other hand, the underlying learning process and the outstanding learning ability of RELM make LW-JP-RELM a very fast and informationally efficient conformal regressor. In the experiments, the empirical validity and informational efficiency of our method were compared to those of the state-of-art on 20 public data sets and the results confirmed that LW-JP-RELM is a competitive and promising conformal regressor.

Introduction

Extreme learning machine(ELM) addresses the question of how to train the feed-forward neural networks fast without losing the learning ability and predicting performance. It is one of the fastest and most popular learning algorithms nowadays and owns the features that the parameters of the hidden nodes of ELM are randomly assigned without being tuned and the output weights are analytically determined [1], [2], [3].

The basic ELM was first proposed by Huang et al. in 2004 [4]. Since then, many variants of ELM have been developed to improve the ideas from many point of view. To make ELM more stable, Huang et al. employed the Tikhonov regularization to ELM which is called regularized ELM [3]. Chen et al. came up with a method to modify the sigmoid activation function [5]. Yuan et al. proposed a new way to solve the output weights [6]. To make ELM more flexible, generalize well and be able to adjust its structure automatically and dynamically, different kinds of incremental ELMs and pruned ELMs are proposed and studied by many researchers. The representative works are CI-ELM [7] and OP-ELM [8]. Recently proposed PCI-ELM, EPCI-ELM and DCI-ELM are more flexible and dynamic methods of this field [9]. To tackle the overfitting problem and take advantage of different ELMs, various kinds of ELM ensembles are studied and examined [10], [11]. To improve the unsupervised learning ability of ELM, a lot of related unsupervised learning methods are also proposed, such as self-organizing maps based extreme learning machine [12], [13]. Moreover, numerous techniques, such as Bayesian analysis, evolutionary computation and fuzzy computation, are combined with ELM to produce many variants of ELM and make further improvement of the original ELM [14], [15], [16], [17], [18], [19], [20], [21].Because of its fast learning speed and excellent performance with little human intervene [22], [23], ELM has attracted more and more attention recently and has progressed dramatically over the past decade, from theoretical analysis [24], [25], [26], [27], [28] to practical applications across many walks of life, such as medical applications, image processing, system modeling and so on [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]. For the readers who are interested in the history and insights of ELM, we recommend [41] and [42].

To date, almost all of the ELMs for regression have been designed only for point predictions, i.e. they provide a point prediction for every new data [36]. However, in many regression problems, not only point predictions, but also confidence intervals for new data are needed [43], [44], [45], [46]. Moreover, especially in some risk sensitive situations, the probability of the true label being not included in the prediction intervals should be under control, which means the probability should be not more than a significance level preset by human. This property of an interval predictor is called validity [47]. Although a few recent works of ELM have been proposed to tackle this problem, their methods are both based on Bayesian approaches and some prior assumptions of the parameters are needed [43], [44]. If the prior assumptions are correct in a particular application, Bayesian methods can give satisfying confidence intervals and they enjoy the property of being valid, but if not, the intervals output for the new data cannot be trusted, which has been shown in many experiments such as those in [48], [49]. As such, some methods other than the bayesian framework should be employed to make ELM produce valid prediction intervals, which is the first motivation for proposing the learning algorithm in this paper.

In this paper, we only combine the regularized ELM (RELM) instead of other variants of ELMs and conformal prediction to build the conformal regressor. There are three reasons for this choice. First, RELM learns fast, performs well in various kinds of data sets and applications, which is a good option as the underlying algorithm of conformal prediction. Second, the efficient computation of the leave-one-out predictions on the training set of RELM makes the whole learning framework learn even faster. Third, as the first paper to combine the ELM regressor and conformal prediction, RELM is a good choice to make the first attempt and to inspire and motivate more future works.

Conformal prediction [49], [50] is a learning method designed to complement the predictions of traditional learning algorithms, which are called underlying algorithms, with confidence measures. A combination of conformal prediction with a particular underlying algorithm for regression is a conformal regressor, which can produce prediction intervals for the inputs of the testing data. Under the standard assumption of the data being sampled independently and identically distributed, any conformal regressor is valid [49]. Since the framework of conformal prediction was developed, many conformal regressors have been proposed and shown to be empirically valid and useful in many real world applications [45], [46], [51], [52], [53]. Although conformal regressors are strongly guaranteed to be valid theoretically and empirically, the main drawback of the existing conformal regressors is their computational inefficiency, which results from the computational framework of the conformal prediction and the learning time of the underlying learning algorithms [45], [46].

To overcome the computational inefficiency exhibited by the conformal prediction, one way is to modify the learning framework of it. Up to now, some modifications of the framework have been proposed , including inductive conformal prediction [45], cross-conformal prediction [46], split conformal prediction and jackknife prediction [54]. Inductive conformal prediction and split conformal prediction also have solid theoretical bases and are more computationally efficient than the original conformal prediction, but they lose some of the informational efficiency as they partition the training set into two parts for the two stages of conformal prediction process. Cross-conformal prediction and jackknife prediction make full use of training data. Although not theoretically valid, they are shown to be empirically valid and informationally efficient in the empirical studies. Another way to speed up the learning process is to employ a fast and accurate learning method as the underlying algorithm, which is the reason of employing extreme learning machine in this paper.

Besides the computational efficiency, another issue of implementing the conformal prediction in the real world is the informational efficiency, which is reflected by the average interval length produced by the conformal regressor and whether the length varies along with the variance of the target output. The shorter the average interval length is and the more consistent the interval length and the variance of the target output are, the more information a conformal regressor sends. To make the prediction interval more informational, local-weighted conformal inference, which can also be called local-weighted conformal prediction, has been proposed based on the local-weighted residuals [54]. Local-weighted conformal inference needs another learning algorithm to estimate the variance of the target output given the input vector, which is another computational burden of regression conformal prediction.

Although the extreme learning machine learns very quickly, to our best knowledge, there are no studies using the extreme learning machine as the underlying algorithm to accelerate the computation of conformal prediction.

In order to get the best of both worlds, this paper proposes a novel conformal regressor named LW-JP-RELM which is a combination of locally-weighted jackknife prediction (LW-JP) [54] and RELM. Although the original approach and other variants of conformal prediction can also be combined with RELM, there is actually an additional advantage from using LW-JP. To be specific, the intervals produced by LW-JP are calculated based on the leave-one-out predictions of the underlying algorithm on the training set. Unlike many other underlying algorithms for which it takes long to obtain the leave-one-out predictions on the training set, RELM can calculate the predictions very fast [55]. In effect, the computational complexity of obtaining the predictions is equal to that of training RELM, which means that the predictions are just bonus for RELM when the training data are large and LW-JP-RELM can be very fast.

All in all, the leaning characters of RELM and the computational framework of LW-JP make the two methods the best combination and we propose it originally and examine it empirically in this paper.

The rest of this paper is organized as follows. Section 2 first briefly introduces conformal prediction, and then presents the learning framework of local-weighted jackknife prediction, the framework used in this paper. Section 3 reviews the regularized extreme learning machine and the formulas of the leave-one-out prediction and error of it. In Section 4, LW-JP-ELM is developed and the computational complexity is presented. Experiments are introduced in Section 5 and eventually, in Section 6, the conclusions of this paper are drawn.

Section snippets

The framework of conformal prediction and local-weighted jackknife prediction

In the rest of this paper, we denote the training set as $z^{l} = {(x_{i}, y_{i}),$ $i = 1, 2, \dots, l},$ where x_i ∈ Rⁿ is an input vector and y_i ∈ R the corresponding output label of x_i. Given a testing input $x_{l + 1},$ our goal is to construct a prediction set using the training data such that the set will contain the correct label $y_{l + 1}$ with a high probability.

Regularized extreme learning machine (RELM)

This section briefly reviews RELM and the formulas related to the leave-one-out prediction and error.

The output function of the single-hidden-layer feedforward neural networks (SLFN) for regression with one hidden layer is $f (x) = \sum_{j = 1}^{L} β_{j} g (w_{j}^{T} x + b_{j})$ where x ∈ Rⁿ is the input vector, L the number of the hidden nodes, $g (w_{j}^{T} x + b_{j})$ the jth activation function with the parameters (w_j, b_j) ∈ Rⁿ × R and β_j ∈ R the weight of the jth hidden node connecting the output node. To make the SLFN learn from data,

Local-weighted jackknife prediction with regularized extreme learning machine (LW-JP-RELM)

As in Algorithm 2, there are two parts where RELMs are needed. In the first part, a RELM regression function is learned from the training data and r_i for $i = 1, 2, \dots, l}$ are calculated. In the second part, ln(r_i) for $i = 1, 2, \dots, l$ are taken as output values of the corresponding inputs and another RELM is trained from these data. However, instead of using two totally different RELMs, we use a common H for the two regressors. As such, the only differences of the two regressors are their regularization

Experiments

This section gives the empirical studies of LW-JP-RELM.

First, we perform the experiments on a synthetic data set whose data are heteroscedastic to show that LW-JP-RELM is valid and how the variance of y influences the length of the prediction interval. Second, we compare LW-JP-RELM with five other algorithms including two state-of-the-art conformal regressors by measuring the error rates and the average interval sizes of the prediction intervals of each method on 20 public data sets.

All the

Conclusion

Conformal prediction is a promising method to complement traditional learning algorithm with a valid confidence measure. As the original approach of regression conformal prediction is very time consuming, faster algorithms need to be developed. This paper introduces a fast and efficient algorithm combining the ideas of jackknife prediction, a variant of conformal prediction and extreme learning machine. Experiments on the synthetic data sets and the public data sets showed that our algorithm

Acknowledgment

The authors would like to thank the anonymous editor and reviewers for their valuable comments and suggestions which improved this work.

The authors are also thankful to Application Basis and Cutting-edge Technology Research Projects of Tianjin City, China (14JCYBJC21800), The 2014 Annual China Public Industry (Meteorological) Research Project (GYHY201406004) and China Meteorological Administration: Development and Application of the Software System for a New Generation Weather Radar Building

Di Wang received his B.E. degree in electrical engineering and its automation from Tianjin University, China, in 2012. He is now a Ph.D. candidate at School of Electrical and Information Engineering, Tianjin University. His current research interests include extreme learning machine, conformal prediction. and machine learning.

References (63)

HuangG.B. et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
YuanY. et al.
Optimization approximation solution for regression problem based on extreme learning machine
Neurocomputing
(2011)
HuangG.B. et al.
Letters: convex incremental extreme learning machine
Neurocomputing
(2007)
SunY. et al.
Dynamic adjustment of hidden layer structure for convex incremental extreme learning machine
Neurocomputing
(2017)
CaoJ. et al.
Voting based extreme learning machine
Inf. Sci. Int. J.
(2012)
D. Stosic et al.
Voting based q-generalized extreme learning machine
Neurocomputing
(2016)
Y. Miche et al.
SOM-ELM-self-organized clustering using elm
Neurocomputing
(2015)
F. Kiaee et al.
Sparse Bayesian mixed-effects extreme learning machine, an approach for unobserved clustered heterogeneity
Neurocomputing
(2016)
ZengN. et al.
A switching delayed PSO optimized extreme learning machine for short-term load forecasting
Neurocomputing
(2017)
WangN. et al.
Hybrid recursive least squares algorithm for online sequential identification using data chunks
Neurocomputing
(2016)

LiuX. et al.

A comparative analysis of support vector machines and extreme learning machines

Neural Netw.

(2012)

J. Chorowski et al.

Review and performance comparison of SVM-and ELM-based classifiers

Neurocomputing

(2014)

WangY. et al.

A study on effectiveness of extreme learning machine

Neurocomputing

(2011)

WangD. et al.

An oscillation bound of the generalization performance of extreme learning machine and corresponding analysis

Neurocomputing

(2015)

HeQ. et al.

Clustering in extreme learning machine feature space

Neurocomputing

(2014)

LiuX. et al.

Multiple kernel extreme learning machine

Neurocomputing

(2015)

LiaoS. et al.

Meta-ELM: ELM with ELM hidden nodes

Neurocomputing

(2014)

LiX. et al.

Extreme learning machine based transfer learning for data classification

Neurocomputing

(2016)

WangN. et al.

Constructive multi-output extreme learning machine with application to large tanker motion dynamics identification

Neurocomputing

(2014)

HuangG. et al.

Trends in extreme learning machines: a review

Neural Netw.

(2015)

ZongW. et al.

Face recognition based on extreme learning machine

Neurocomputing

(2011)

S.F. Mahmood et al.

FASTA-ELM: a fast adaptive shrinkage/thresholding algorithm for extreme learning machine and its application to gender recognition

Neurocomputing

(2017)

ZhangL. et al.

Saliency detection via extreme learning machine

Neurocomputing

(2016)

ShangZ. et al.

Confidence-weighted extreme learning machine for regression problems

Neurocomputing

(2015)

H. Papadopoulos et al.

Reliable prediction intervals with regression neural networks

Neural Netw.

(2011)

ShaoZ. et al.

Efficient leave-one-out cross-validation-based regularized extreme learning machine

Neurocomputing

(2016)

B. Frénay et al.

Parameter-insensitive kernel in extreme learning for non-linear support vector regression

Neurocomputing

(2011)

HuangG.B. et al.

Extreme learning machines: a survey

Int. J. Mach. Learn. Cybern.

(2011)

HuangG.B. et al.

Extreme learning machine for regression and multiclass classification

IEEE Trans. Syst. Man Cybern. Part B (Cybern.)

(2012)

HuangG.B. et al.

Extreme learning machine: a new learning scheme of feedforward neural networks

Proceedings of the IEEE International Joint Conference on Neural Networks

(2004)

ChenZ.X. et al.

A modified extreme learning machine with sigmoidal activation functions

Neural Comput. Appl.

(2013)

Cited by (18)

Asymptotic analysis of locally weighted jackknife prediction
2020, Neurocomputing
Citation Excerpt :
By employing these two algorithms, the experimental results empirically showed that the proposed LW-JP algorithms are not only valid, but also achieve the state-of-the-art performance of conformal regressors [20,39]. Different from our previous work [20], this work is more concentrated on the theoretical analysis of LW-JP and the proposed algorithms in this work satisfy the conditions of being asymptotically valid interval predictors in Section 3.1, while the proposed algorithm in [20] does not meet the conditions, which makes it more difficult to analyse even with the assumptions made in this paper. The rest of this paper is organized as follows.
Locally weighted jackknife prediction(LW-JP) is a variant of conformal prediction, which can output interval prediction in regression problems built on traditional learning algorithms for point prediction named as underlying algorithms. Although empirical validity and efficiency of LW-JP have been reported in some works, there lacks theoretical understanding of it. This paper gives some theoretical analysis of LW-JP in the asymptotic setting, where the number of the training samples approaches infinity. Under some regularity assumptions and conditions, the asymptotic validity of LW-JP is proved in the nonlinear regression case with heteroscedastic errors. The proof is an extension of the asymptotic analysis of leave-one-out prediction intervals in linear regression with homoscedastic errors. Based on our analysis, two conformal regressors built on LW-JP are proposed and the experimental results showed that the algorithms are not only valid interval predictors, but also achieve the state-of-the-art performance of conformal regressors.
A conformal prediction inspired approach for distribution regression with random Fourier features
2020, Applied Soft Computing Journal
Citation Excerpt :
To date, the normalized absolute error has still been the most used normalized nonconformity measure, which has been combined with the variants of the original conformal prediction such as bootstrap conformal prediction [25], cross-conformal prediction [25], aggregated conformal prediction [26], out-of-bag conformal prediction [27] and locally weighted jackknife prediction [24] for developing computationally efficient conformal regressors. In [28], we proposed a conformal regressor which is the combination of regularized extreme learning machine (RELM) [29–32] and locally weighted jackknife prediction (LW-JP), and called LW-JP-RELM. LW-JP-RELM also outputs prediction intervals with varying sizes with the help of the normalized nonconformity measure.
Distribution regression refers to the regression case whose input objects are probability measures. A lot of machine learning applications can fit into this framework, such as multi-instance learning and learning from noisy data. This paper proposes an interval prediction algorithm for distribution regression. The algorithm is based on conformal prediction which aims to build reliable prediction systems. To the best of our knowledge, this is the first work to extend conformal prediction to distribution regression problems. Our approach first embeds the input distributions to a reproducing kernel Hilbert space by kernel mean embedding, and then learns a conformal regressor from the embeddings to the outputs. In order to make the whole process faster, we also employ random Fourier features to approximate the kernel. The algorithm was tested on synthetic data sets and applied to statistical postprocessing of ensemble forecasts for temperature and precipitation, which is the first attempt of applying conformal prediction to this application area. The experimental results demonstrate the empirical validity and the effectiveness of our approach when compared with the other widely used algorithms for postprocessing.
Pareto optimal control of the mean-field stochastic systems by adaptive dynamic programming algorithm
2020, ISA Transactions
The Pareto game for the model-free continuous-time stochastic system is studied through approximate/adaptive dynamic programming (ADP) in this paper. Firstly, the model-based online iterative algorithm is proposed, and it is proved that the control iterative sequence converges to the Pareto efficient solution, but the algorithm requires complete system parameters. Then, we derive the model-free iterative equation and develop the ADP algorithm to calculate the equation by collecting updated states and input information online. From the derivation of the ADP algorithm, the model-free iterative equation and the model-based iterative equation have the same solution, which means that the ADP algorithm can approximate the Pareto optimal solution. Next, the convergence analysis shows that the Pareto optimal strategy is uniquely determined by the ADP algorithm. Finally, two simulation examples confirm the feasibility of the ADP algorithm.
A fast conformal predictive system with regularized extreme learning machine
2020, Neural Networks
Citation Excerpt :
This paper focuses on the construction of a fast CPS, which can be especially applied to real-time conditions. In our earlier work about conformal prediction (Wang, Wang, & Shi, 2018), regularized extreme learning machine (RELM) (Huang et al., 2012) was used as the underlying algorithm to build the conformal regressor, which achieves the state-of-the-art performance of conformal regressors (Boström, Linusson, Löfström, & Johansson, 2016; Johansson, Boström, Löfström, & Linusson, 2014; Wang et al., 2018). The computational efficiency of that conformal regressor is based on the fast learning speed and closed form of the leave-one-out predictions on the training set of RELM.
A conformal predictive system(CPS) is based on the learning framework of conformal prediction, which outputs cumulative distribution functions(CDFs) for labels in regression problems. The CDFs output by a CPS provide useful information for users, as they not only provide probability for the events related to the test labels, but also can be transformed to prediction intervals with the corresponding quantiles. Moreover, CPSs have the property of validity since the distributions and intervals they output have statistical compatibility with the realizations. This property is very useful for many risk-sensitive applications such as financial time series forecast and weather forecast. However, as based on conformal predictors, CPSs inherit the computational issue. To build a fast CPS, in this paper, we propose a CPS with regularized extreme learning machine as the underlying algorithm. To be specific, we combine the leave-one-out cross-conformal predictive system(Leave-One-Out CCPS), a variant of the original CPS, with regularized extreme learning machine(RELM), which is named as LOO-CCPS-RELM. We analyse the computational complexity of it and prove its asymptotic validity based on some regularity assumptions. We also prove that the error rate of the prediction interval output by LOO-CCPS-RELM is under control in the asymptotic setting. Experiments with 20 public data sets were conducted to test LOO-CCPS-RELM and the results showed that LOO-CCPS-RELM is empirically valid and compared favourably with the other CPSs.
Extreme learning machine neural networks for adult skeletal age-at-death estimation
2020, Statistics and Probability in Forensic Anthropology
The aim of this chapter is to introduce the extreme learning machine algorithm to construct single-layer feedforward neural networks for age-at-death estimation. The most relevant notation and machine learning concepts are approached in context focusing on the most efficient algorithms to train this type of neural network. Conformal prediction, a technique used to construct confidence measures for machine learning algorithms, is introduced to derive prediction intervals for machine-based age-at-death estimation. A performance analysis is conducted to assess different aspects of this algorithm and how it performs on the difficult task of adult age-at-death estimation from skeletal remains.
Online Pareto optimal control of mean-field stochastic multi-player systems using policy iteration
2024, Science China Information Sciences

View all citing articles on Scopus

Ping Wang is a professor at School of Electrical and Information Engineering, Tianjin University, China. She is a Ph.D. supervisor in control science and engineering. Her research interests include pattern recognition and its application, image understanding and moving objects tracking.

Junzhi Shi received his B.E. degree from Qufu Normal University, China, in 2012. He is now a Ph.D. candidate at School of Electrical and Information Engineering, Tianjin University, China. His current research interests include image recognition, conformal prediction and machine learning.

View full text

A fast and efficient conformal regressor with regularized extreme learning machine

Abstract

Introduction

Section snippets

The framework of conformal prediction and local-weighted jackknife prediction

Regularized extreme learning machine (RELM)

Local-weighted jackknife prediction with regularized extreme learning machine (LW-JP-RELM)

Experiments

Conclusion

Acknowledgment

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Inf. Sci. Int. J.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neural Netw.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neural Netw.

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neural Netw.

Neurocomputing

Neurocomputing

Extreme learning machines: a survey

Int. J. Mach. Learn. Cybern.

Extreme learning machine for regression and multiclass classification

IEEE Trans. Syst. Man Cybern. Part B (Cybern.)

Extreme learning machine: a new learning scheme of feedforward neural networks

Proceedings of the IEEE International Joint Conference on Neural Networks

A modified extreme learning machine with sigmoidal activation functions

Neural Comput. Appl.