Elsevier

Neurocomputing

Volume 304, 23 August 2018, Pages 1-11
Neurocomputing

A fast and efficient conformal regressor with regularized extreme learning machine

https://doi.org/10.1016/j.neucom.2018.04.012Get rights and content

Abstract

A conformal regressor combines conformal prediction and a traditional regressor for point predictions. It produces a valid prediction interval for a new testing input such that the probability of the target output being not included in the prediction interval is not more than a preset significance level. Although conformal prediction is both theoretically and empirically valid, one main drawback of the existing conformal regressors is their computational inefficiency. This paper proposes a novel fast and efficient conformal regressor named LW-JP-RELM, with combination of the local-weighted jackknife prediction (LW-JP), a new variant of conformal prediction, and the regularized extreme learning machine (RELM). The development of our learning algorithm is important both for the applications of extreme learning machine and conformal prediction. On the one hand, LW-JP-RELM complements ELM with interval predictions that satisfy a given level of confidence. On the other hand, the underlying learning process and the outstanding learning ability of RELM make LW-JP-RELM a very fast and informationally efficient conformal regressor. In the experiments, the empirical validity and informational efficiency of our method were compared to those of the state-of-art on 20 public data sets and the results confirmed that LW-JP-RELM is a competitive and promising conformal regressor.

Introduction

Extreme learning machine(ELM) addresses the question of how to train the feed-forward neural networks fast without losing the learning ability and predicting performance. It is one of the fastest and most popular learning algorithms nowadays and owns the features that the parameters of the hidden nodes of ELM are randomly assigned without being tuned and the output weights are analytically determined [1], [2], [3].

The basic ELM was first proposed by Huang et al. in 2004 [4]. Since then, many variants of ELM have been developed to improve the ideas from many point of view. To make ELM more stable, Huang et al. employed the Tikhonov regularization to ELM which is called regularized ELM [3]. Chen et al. came up with a method to modify the sigmoid activation function [5]. Yuan et al. proposed a new way to solve the output weights [6]. To make ELM more flexible, generalize well and be able to adjust its structure automatically and dynamically, different kinds of incremental ELMs and pruned ELMs are proposed and studied by many researchers. The representative works are CI-ELM [7] and OP-ELM [8]. Recently proposed PCI-ELM, EPCI-ELM and DCI-ELM are more flexible and dynamic methods of this field [9]. To tackle the overfitting problem and take advantage of different ELMs, various kinds of ELM ensembles are studied and examined [10], [11]. To improve the unsupervised learning ability of ELM, a lot of related unsupervised learning methods are also proposed, such as self-organizing maps based extreme learning machine [12], [13]. Moreover, numerous techniques, such as Bayesian analysis, evolutionary computation and fuzzy computation, are combined with ELM to produce many variants of ELM and make further improvement of the original ELM [14], [15], [16], [17], [18], [19], [20], [21].Because of its fast learning speed and excellent performance with little human intervene [22], [23], ELM has attracted more and more attention recently and has progressed dramatically over the past decade, from theoretical analysis [24], [25], [26], [27], [28] to practical applications across many walks of life, such as medical applications, image processing, system modeling and so on [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40]. For the readers who are interested in the history and insights of ELM, we recommend [41] and [42].

To date, almost all of the ELMs for regression have been designed only for point predictions, i.e. they provide a point prediction for every new data [36]. However, in many regression problems, not only point predictions, but also confidence intervals for new data are needed [43], [44], [45], [46]. Moreover, especially in some risk sensitive situations, the probability of the true label being not included in the prediction intervals should be under control, which means the probability should be not more than a significance level preset by human. This property of an interval predictor is called validity [47]. Although a few recent works of ELM have been proposed to tackle this problem, their methods are both based on Bayesian approaches and some prior assumptions of the parameters are needed [43], [44]. If the prior assumptions are correct in a particular application, Bayesian methods can give satisfying confidence intervals and they enjoy the property of being valid, but if not, the intervals output for the new data cannot be trusted, which has been shown in many experiments such as those in [48], [49]. As such, some methods other than the bayesian framework should be employed to make ELM produce valid prediction intervals, which is the first motivation for proposing the learning algorithm in this paper.

In this paper, we only combine the regularized ELM (RELM) instead of other variants of ELMs and conformal prediction to build the conformal regressor. There are three reasons for this choice. First, RELM learns fast, performs well in various kinds of data sets and applications, which is a good option as the underlying algorithm of conformal prediction. Second, the efficient computation of the leave-one-out predictions on the training set of RELM makes the whole learning framework learn even faster. Third, as the first paper to combine the ELM regressor and conformal prediction, RELM is a good choice to make the first attempt and to inspire and motivate more future works.

Conformal prediction [49], [50] is a learning method designed to complement the predictions of traditional learning algorithms, which are called underlying algorithms, with confidence measures. A combination of conformal prediction with a particular underlying algorithm for regression is a conformal regressor, which can produce prediction intervals for the inputs of the testing data. Under the standard assumption of the data being sampled independently and identically distributed, any conformal regressor is valid [49]. Since the framework of conformal prediction was developed, many conformal regressors have been proposed and shown to be empirically valid and useful in many real world applications [45], [46], [51], [52], [53]. Although conformal regressors are strongly guaranteed to be valid theoretically and empirically, the main drawback of the existing conformal regressors is their computational inefficiency, which results from the computational framework of the conformal prediction and the learning time of the underlying learning algorithms [45], [46].

To overcome the computational inefficiency exhibited by the conformal prediction, one way is to modify the learning framework of it. Up to now, some modifications of the framework have been proposed , including inductive conformal prediction [45], cross-conformal prediction [46], split conformal prediction and jackknife prediction [54]. Inductive conformal prediction and split conformal prediction also have solid theoretical bases and are more computationally efficient than the original conformal prediction, but they lose some of the informational efficiency as they partition the training set into two parts for the two stages of conformal prediction process. Cross-conformal prediction and jackknife prediction make full use of training data. Although not theoretically valid, they are shown to be empirically valid and informationally efficient in the empirical studies. Another way to speed up the learning process is to employ a fast and accurate learning method as the underlying algorithm, which is the reason of employing extreme learning machine in this paper.

Besides the computational efficiency, another issue of implementing the conformal prediction in the real world is the informational efficiency, which is reflected by the average interval length produced by the conformal regressor and whether the length varies along with the variance of the target output. The shorter the average interval length is and the more consistent the interval length and the variance of the target output are, the more information a conformal regressor sends. To make the prediction interval more informational, local-weighted conformal inference, which can also be called local-weighted conformal prediction, has been proposed based on the local-weighted residuals [54]. Local-weighted conformal inference needs another learning algorithm to estimate the variance of the target output given the input vector, which is another computational burden of regression conformal prediction.

Although the extreme learning machine learns very quickly, to our best knowledge, there are no studies using the extreme learning machine as the underlying algorithm to accelerate the computation of conformal prediction.

In order to get the best of both worlds, this paper proposes a novel conformal regressor named LW-JP-RELM which is a combination of locally-weighted jackknife prediction (LW-JP) [54] and RELM. Although the original approach and other variants of conformal prediction can also be combined with RELM, there is actually an additional advantage from using LW-JP. To be specific, the intervals produced by LW-JP are calculated based on the leave-one-out predictions of the underlying algorithm on the training set. Unlike many other underlying algorithms for which it takes long to obtain the leave-one-out predictions on the training set, RELM can calculate the predictions very fast [55]. In effect, the computational complexity of obtaining the predictions is equal to that of training RELM, which means that the predictions are just bonus for RELM when the training data are large and LW-JP-RELM can be very fast.

All in all, the leaning characters of RELM and the computational framework of LW-JP make the two methods the best combination and we propose it originally and examine it empirically in this paper.

The rest of this paper is organized as follows. Section 2 first briefly introduces conformal prediction, and then presents the learning framework of local-weighted jackknife prediction, the framework used in this paper. Section 3 reviews the regularized extreme learning machine and the formulas of the leave-one-out prediction and error of it. In Section 4, LW-JP-ELM is developed and the computational complexity is presented. Experiments are introduced in Section 5 and eventually, in Section 6, the conclusions of this paper are drawn.

Section snippets

The framework of conformal prediction and local-weighted jackknife prediction

In the rest of this paper, we denote the training set as zl={(xi,yi), i=1,2,,l}, where xi ∈ Rn is an input vector and yi ∈ R the corresponding output label of xi. Given a testing input xl+1, our goal is to construct a prediction set using the training data such that the set will contain the correct label yl+1 with a high probability.

Regularized extreme learning machine (RELM)

This section briefly reviews RELM and the formulas related to the leave-one-out prediction and error.

The output function of the single-hidden-layer feedforward neural networks (SLFN) for regression with one hidden layer is f(x)=j=1Lβjg(wjTx+bj)where x ∈ Rn is the input vector, L the number of the hidden nodes, g(wjTx+bj) the jth activation function with the parameters (wj, bj) ∈ Rn × R and βj ∈ R the weight of the jth hidden node connecting the output node. To make the SLFN learn from data,

Local-weighted jackknife prediction with regularized extreme learning machine (LW-JP-RELM)

As in Algorithm 2, there are two parts where RELMs are needed. In the first part, a RELM regression function is learned from the training data and ri for i=1,2,,l} are calculated. In the second part, ln(ri) for i=1,2,,l are taken as output values of the corresponding inputs and another RELM is trained from these data. However, instead of using two totally different RELMs, we use a common H for the two regressors. As such, the only differences of the two regressors are their regularization

Experiments

This section gives the empirical studies of LW-JP-RELM.

First, we perform the experiments on a synthetic data set whose data are heteroscedastic to show that LW-JP-RELM is valid and how the variance of y influences the length of the prediction interval. Second, we compare LW-JP-RELM with five other algorithms including two state-of-the-art conformal regressors by measuring the error rates and the average interval sizes of the prediction intervals of each method on 20 public data sets.

All the

Conclusion

Conformal prediction is a promising method to complement traditional learning algorithm with a valid confidence measure. As the original approach of regression conformal prediction is very time consuming, faster algorithms need to be developed. This paper introduces a fast and efficient algorithm combining the ideas of jackknife prediction, a variant of conformal prediction and extreme learning machine. Experiments on the synthetic data sets and the public data sets showed that our algorithm

Acknowledgment

The authors would like to thank the anonymous editor and reviewers for their valuable comments and suggestions which improved this work.

The authors are also thankful to Application Basis and Cutting-edge Technology Research Projects of Tianjin City, China (14JCYBJC21800), The 2014 Annual China Public Industry (Meteorological) Research Project (GYHY201406004) and China Meteorological Administration: Development and Application of the Software System for a New Generation Weather Radar Building

Di Wang received his B.E. degree in electrical engineering and its automation from Tianjin University, China, in 2012. He is now a Ph.D. candidate at School of Electrical and Information Engineering, Tianjin University. His current research interests include extreme learning machine, conformal prediction. and machine learning.

References (63)

  • LiuX. et al.

    A comparative analysis of support vector machines and extreme learning machines

    Neural Netw.

    (2012)
  • J. Chorowski et al.

    Review and performance comparison of SVM-and ELM-based classifiers

    Neurocomputing

    (2014)
  • WangY. et al.

    A study on effectiveness of extreme learning machine

    Neurocomputing

    (2011)
  • WangD. et al.

    An oscillation bound of the generalization performance of extreme learning machine and corresponding analysis

    Neurocomputing

    (2015)
  • HeQ. et al.

    Clustering in extreme learning machine feature space

    Neurocomputing

    (2014)
  • LiuX. et al.

    Multiple kernel extreme learning machine

    Neurocomputing

    (2015)
  • LiaoS. et al.

    Meta-ELM: ELM with ELM hidden nodes

    Neurocomputing

    (2014)
  • LiX. et al.

    Extreme learning machine based transfer learning for data classification

    Neurocomputing

    (2016)
  • WangN. et al.

    Constructive multi-output extreme learning machine with application to large tanker motion dynamics identification

    Neurocomputing

    (2014)
  • HuangG. et al.

    Trends in extreme learning machines: a review

    Neural Netw.

    (2015)
  • ZongW. et al.

    Face recognition based on extreme learning machine

    Neurocomputing

    (2011)
  • S.F. Mahmood et al.

    FASTA-ELM: a fast adaptive shrinkage/thresholding algorithm for extreme learning machine and its application to gender recognition

    Neurocomputing

    (2017)
  • ZhangL. et al.

    Saliency detection via extreme learning machine

    Neurocomputing

    (2016)
  • ShangZ. et al.

    Confidence-weighted extreme learning machine for regression problems

    Neurocomputing

    (2015)
  • H. Papadopoulos et al.

    Reliable prediction intervals with regression neural networks

    Neural Netw.

    (2011)
  • ShaoZ. et al.

    Efficient leave-one-out cross-validation-based regularized extreme learning machine

    Neurocomputing

    (2016)
  • B. Frénay et al.

    Parameter-insensitive kernel in extreme learning for non-linear support vector regression

    Neurocomputing

    (2011)
  • HuangG.B. et al.

    Extreme learning machines: a survey

    Int. J. Mach. Learn. Cybern.

    (2011)
  • HuangG.B. et al.

    Extreme learning machine for regression and multiclass classification

    IEEE Trans. Syst. Man Cybern. Part B (Cybern.)

    (2012)
  • HuangG.B. et al.

    Extreme learning machine: a new learning scheme of feedforward neural networks

    Proceedings of the IEEE International Joint Conference on Neural Networks

    (2004)
  • ChenZ.X. et al.

    A modified extreme learning machine with sigmoidal activation functions

    Neural Comput. Appl.

    (2013)
  • Cited by (18)

    • Asymptotic analysis of locally weighted jackknife prediction

      2020, Neurocomputing
      Citation Excerpt :

      By employing these two algorithms, the experimental results empirically showed that the proposed LW-JP algorithms are not only valid, but also achieve the state-of-the-art performance of conformal regressors [20,39]. Different from our previous work [20], this work is more concentrated on the theoretical analysis of LW-JP and the proposed algorithms in this work satisfy the conditions of being asymptotically valid interval predictors in Section 3.1, while the proposed algorithm in [20] does not meet the conditions, which makes it more difficult to analyse even with the assumptions made in this paper. The rest of this paper is organized as follows.

    • A conformal prediction inspired approach for distribution regression with random Fourier features

      2020, Applied Soft Computing Journal
      Citation Excerpt :

      To date, the normalized absolute error has still been the most used normalized nonconformity measure, which has been combined with the variants of the original conformal prediction such as bootstrap conformal prediction [25], cross-conformal prediction [25], aggregated conformal prediction [26], out-of-bag conformal prediction [27] and locally weighted jackknife prediction [24] for developing computationally efficient conformal regressors. In [28], we proposed a conformal regressor which is the combination of regularized extreme learning machine (RELM) [29–32] and locally weighted jackknife prediction (LW-JP), and called LW-JP-RELM. LW-JP-RELM also outputs prediction intervals with varying sizes with the help of the normalized nonconformity measure.

    • A fast conformal predictive system with regularized extreme learning machine

      2020, Neural Networks
      Citation Excerpt :

      This paper focuses on the construction of a fast CPS, which can be especially applied to real-time conditions. In our earlier work about conformal prediction (Wang, Wang, & Shi, 2018), regularized extreme learning machine (RELM) (Huang et al., 2012) was used as the underlying algorithm to build the conformal regressor, which achieves the state-of-the-art performance of conformal regressors (Boström, Linusson, Löfström, & Johansson, 2016; Johansson, Boström, Löfström, & Linusson, 2014; Wang et al., 2018). The computational efficiency of that conformal regressor is based on the fast learning speed and closed form of the leave-one-out predictions on the training set of RELM.

    • Extreme learning machine neural networks for adult skeletal age-at-death estimation

      2020, Statistics and Probability in Forensic Anthropology
    View all citing articles on Scopus

    Di Wang received his B.E. degree in electrical engineering and its automation from Tianjin University, China, in 2012. He is now a Ph.D. candidate at School of Electrical and Information Engineering, Tianjin University. His current research interests include extreme learning machine, conformal prediction. and machine learning.

    Ping Wang is a professor at School of Electrical and Information Engineering, Tianjin University, China. She is a Ph.D. supervisor in control science and engineering. Her research interests include pattern recognition and its application, image understanding and moving objects tracking.

    Junzhi Shi received his B.E. degree from Qufu Normal University, China, in 2012. He is now a Ph.D. candidate at School of Electrical and Information Engineering, Tianjin University, China. His current research interests include image recognition, conformal prediction and machine learning.

    View full text