Building selective ensembles of Randomization Based Neural Networks with the successive projections algorithm

doi:10.1016/j.asoc.2017.08.007

Applied Soft Computing

Volume 70, September 2018, Pages 1135-1145

https://doi.org/10.1016/j.asoc.2017.08.007 Get rights and content

Highlights

•
We propose a selective ensemble method for Randomization Based Neural Networks (RNNs) based on the Successive Projections Algorithm (SPA).
•
The proposed method, named SERS, uses SPA for feature selection, neuron pruning and ensemble selection.
•
SERS was used to build three ensemble models based on Extreme Learning Machines, Feedforward Neural Network with Random Weights and Random Vector Functional Link networks.
•
The proposed methods result in compact models with performance comparable to other state-of-the-art RNN based methods.
•
Results showed that none of the previously proposed methods was able to achieve better results in both accuracy and model reduction.

Abstract

Randomization based methods for training neural networks have gained increasing attention in recent years and achieved remarkable performances on a wide variety of tasks. The interest in such methods relies on the fact that standard gradient based learning algorithms may often converge to local minima and are usually time consuming. Despite the good performance achieved by Randomization Based Neural Networks (RNNs), the random feature mapping procedure may generate redundant information, leading to suboptimal solutions. To overcome this problem, some strategies have been used such as feature selection, hidden neuron pruning and ensemble methods. Feature selection methods discard redundant information from the original dataset. Pruning methods eliminate hidden nodes with redundant information. Ensemble methods combine multiple models to generate a single one. Selective ensemble methods select a subset of all available models to generate the final model. In this paper, we propose a selective ensemble of RNNs based on the Successive Projections Algorithm (SPA), for regression problems. The proposed method, named Selective Ensemble of RNNs using the Successive projections algorithm (SERS), employs the SPA for three distinct tasks: feature selection, pruning and ensemble selection. SPA was originally developed as a feature selection technique and has been recently employed for RNN pruning. Herein, we show that it can also be employed for ensemble selection. The proposed framework was used to develop three selective ensemble models based on the three RNNs: Extreme Learning Machines (ELM), Feedforward Neural Network with Random Weights (FNNRW) and Random Vector Functional Link (RVFL). The performances of SERS-ELM, SERS-FNNRW and SERS-RVFL were assessed in terms of model accuracy and model complexity in several real world benchmark problems. Comparisons to related methods showed that SERS variants achieved similar accuracies with significant model complexity reduction. Among the proposed models, SERS-RVFL had the best accuracies and all variants had similar model complexities.

Graphical abstract

Introduction

Randomization Based Neural Networks (RNNs) is a class of Neural Networks (NN) in which several parameters are randomly assigned. The success of RNN can be observed in various domains [1]. The idea of randomly assigning neural network parameters is shared by different models like Random Vector Functional Link (RVFL, [2]) networks, Radial basis function neural networks with randomly generated centres [3], the Liquid State Machine [4] and the Feedforward Neural Network with Random Weights (FNNRW, [5]). Classical neural networks training approaches usually tune the parameters based on the derivatives of their loss function. Considering that the power of most NN relies on the nonlinear function in the hidden units and that it is the most common NN architecture, the optimization turns out to be a nonlinear least squares problem which is usually solved iteratively, with a slow convergence rate, often converging to a local minimum [6]. Randomization-based methods deal with this problem by either randomly fixing the network configurations or some parts of the network parameters, or randomly corrupting the input data or the parameters during the training [7].

As a result of randomly assigning the learning parameters, some suboptimal input weights may be drawn, which may have a negative impact on both the generalization ability and the performance stability of the NN [8]. To overcome such problem, feature selection, neuron pruning and ensemble methods are among the most used strategies. Feature selection methods aim to discard redundant information available in the feature set, thus generating more concise models which are less likely to suffer from overfitting. The success of such strategies in RNNs can be verified in many publications such as [9], [10]. Conversely, redundant information may be discarded by pruning RNNs. In pruning methods, hidden nodes with similar responses are discarded. This procedure also results in less complex models (reducing the number of hidden neurons) with improved generalization capability. RNN pruning methods available in [11], [12], [13] show the impact of this procedure in RNN's performance.

In a different direction, ensemble methods combine various models to generate a single one. This procedure improves the generalization capability and is the key idea behind successful learning algorithms like random forests [14]. According to [15], ensemble strategies may be specially suitable for RNN since such methods are highly unstable. Ensemble methods for RNN were proposed in recent papers such as [16], [17], [18] (see [7] for a survey on ensembles of RNNs). Although the performance of such methods seems promising, ensemble strategies result in a more complex model. To mitigate this drawback, some researchers proposed ensemble strategies in which the final ensemble is composed by a subset of all generated models. This procedure is named selective ensemble. In [19], the authors performed several experiments and suggested that selective ensembles may improve the generalization capability of ensemble models while reducing their complexity. This hypothesis is also supported by the results obtained in [16].

Inspired by these results, in this paper we propose a RNN selective ensemble method that uses feature selection and pruning strategies to reduce the complexity of the final model. In the proposed method, named Selective Ensemble of RNN using the Successive Projections Algorithm (SERS), we employ the Successive Projections Algorithm (SPA) in three different tasks: (1) selecting relevant features; (2) pruning unnecessary hidden neurons; and (3) selecting ensemble members. Although SPA was originally developed as a feature selection technique, it can also be employed for RNN pruning. In this context, the main contribution of this paper consists of extending the usage of SPA for the selection of ensemble members and combining the three aforementioned tasks into the proposed SERS method. Experiments were carried out in benchmark regression datasets, and the results showed that SERS achieved a performance which is comparable to some recently proposed RNN ensemble methods, while resulting in less complex models.

The remaining sections of this paper are organized as follows. Section 2 presents the basic concepts of Random Neural Networks. Section 3 describes the Successive Projections Algorithm. Section 4 introduces the proposed method. Section 5 shows the results obtained in numerical experiments conducted to illustrate the application of the proposed method. Concluding remarks are given in Section 6.

Section snippets

Randomization Based Neural Networks

In addition to the seminal studies of Rosenblatt [20] about the Perceptron model, the paper published by Schmidt et al. [5] was the first to investigate the effect of randomly setting NN's hidden weights in its performance. In the proposed method, named Feedforward Neural Network with Random Weights (FNNRW, [5]), the training procedure can be divided into two main steps: (1) random feature mapping and (2) linear parameters solving.

Suppose a Single Hidden Layer Feedforward Neural Network (SLFNN)

Successive Projections Algorithm

The Successive Projections Algorithm (SPA) was originally proposed for feature selection in the context of multivariate linear regression models for spectroscopic analysis [25] and has found many applications over the years, as described in a recent review paper [26]. In this section, we provide a brief description of SPA and its usage for feature selection in regression tasks. For further details, the reader is referred to [27].

Let $f_{d} \in ℝ^{N}$ denote the dth column of matrix X. The goal is to find a

Selective ensemble of RNNs using SPA

Although SPA has been originally proposed for feature selection, extending its application to either neuron pruning or ensemble model selection can be done without modifying the original SPA formulation. As stated in Section 2, the training procedure for a RNN model can be divided into two main steps. In the first step, the feature vectors go through a nonlinear transformation, being projected into a new feature space. This procedure is performed by the hidden layer neurons. After that, the

Experiments and results

To assess the performance of SERS, two sets of experiments were conducted. The first set is designed to highlight the performance of SPA for different tasks in the proposed method. In the first set, the effectiveness of SPA for feature selection is evaluated in the first experiment. The second experiment aims to verify the pruning capability of SPA. Finally, the ensemble selection capability of SPA is verified in the third experiment. In the second set of experiments, we compare SERS to similar

Conclusions

In this paper we propose a method to build parsimonious ensembles of RNNs for regression problems. The proposed method, named SERS, is composed by three steps and employs the Successive Projections Algorithm (SPA) in each of them to perform three different tasks: feature selection in step 1, pruning hidden neurons in step 2, and ensemble selection in step 3. All three tasks aim to reduce the complexity of the final model without compromising the model accuracy.

Two sets of numerical experiments

Acknowledgment

The authors would like to thank the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support (grants 303714/2014-0 and 305048/2016-3).

References (36)

D.P. Mesquita et al.
Classification with reject option for software defect prediction
Appl. Soft Comput.
(2016)
L. Zhang et al.
A survey of randomized algorithms for training neural networks
Inf. Sci.
(2016)
X. Xue et al.
Genetic ensemble of extreme learning machine
Neurocomputing
(2014)
D. Chyzhyk et al.
Evolutionary ELM wrapper feature selection for Alzheimer's disease cad on anatomical brain MRI
Neurocomputing
(2014)
A.S.C. Alencar et al.
A new pruning method for extreme learning machines via genetic algorithms
Appl. Soft Comput.
(2016)
B. Han et al.
LARSEN-ELM: selective ensemble of extreme learning machines using LARS for blended data
Neurocomputing
(2015)
Q.-Y. Zhu et al.
Evolutionary extreme learning machine
Pattern Recogn.
(2005)
Q. Yu et al.
Ensemble delta test-extreme learning machine (DT-ELM) for regression
Neurocomputing
(2014)
Z.-H. Zhou et al.
Ensembling neural networks: many could be better than all
Artif. Intell.
(2002)
Y. Ren et al.
Random vector functional link network for short-term electricity load demand forecasting
Inf. Sci.
(2016)

L. Zhang et al.

A comprehensive evaluation of random vector functional link networks

Inf. Sci.

(2016)

G.B. Huang et al.

Extreme learning machine: theory and applications

Neurocomputing

(2006)

M.C.U. Araújo et al.

The successive projections algorithm for variable selection in spectroscopic multicomponent analysis

Chemom. Intell. Lab. Syst.

(2001)

S.F.C. Soares et al.

The successive projections algorithm

Trends Anal. Chem.

(2013)

A.A. Gomes et al.

The successive projections algorithm for interval selection in trilinear partial least-squares with residual bilinearization

Anal. Chim. Acta

(2014)

Q. Yu et al.

Ensemble delta test-extreme learning machine (DT-ELM) for regression

Neurocomputing

(2014)

B. Igelnik et al.

Stochastic choice of basis functions in adaptive function approximation and the functional-link net

IEEE Trans. Neural Netw.

(1995)

D.S. Broomhead et al.

Radial basis functions, multi-variable functional interpolation and adaptive networks

Complex Syst.

(1988)

Cited by (29)

Random vector functional link network: Recent developments, applications, and future directions
2023, Applied Soft Computing
Neural networks have been successfully employed in various domains such as classification, regression and clustering, etc. Generally, the back propagation (BP) based iterative approaches are used to train the neural networks, however, it results in the issues of local minima, sensitivity to learning rate and slow convergence. To overcome these issues, randomization based neural networks such as random vector functional link (RVFL) network have been proposed. RVFL model has several characteristics such as fast training speed, direct links, simple architecture, and universal approximation capability, that make it a viable randomized neural network. This article presents the first comprehensive review of the evolution of RVFL model, which can serve as the extensive summary for the beginners as well as practitioners. We discuss the shallow RVFLs, ensemble RVFLs, deep RVFLs and ensemble deep RVFL models. The variations, improvements and applications of RVFL models are discussed in detail. Moreover, we discuss the different hyperparameter optimization techniques followed in the literature to improve the generalization performance of the RVFL model. Finally, we present potential future research directions/opportunities that can inspire the researchers to improve the RVFL’s architecture and learning algorithm further.
Fault analysis of photovoltaic based DC microgrid using deep learning randomized neural network
2022, Applied Soft Computing
Citation Excerpt :
However, in the case of ML-ELM selection of number of hidden layers, number of hidden layer neurons required in each layer and corresponding activation functions required large number of trials to get the final optimal set. Further, it is also found that the RVFLN provides better result when direct link is present between input and output layer [37–39]. Due to the presence of direct link it provides better generalization performance and lower model complexity.
The implementation of solar photovoltaic (PV) based Direct Current (DC) microgrid is limited due to lack of well-defined protection standards. These low voltage DC distribution networks are facing implementation challenges especially under multi-distributed generations subjected to various events such as PV arc faults, load switching faults, PV partial shading, change in PV irradiance and DC cable faults, etc. Thus a new deep learning neural network model is proposed for the accurate fault classification and distance calculation for an effective monitoring and protection coordination of photovoltaic based DC microgrid. The proposed model is the combination of an Adaptive variational mode decomposition (AVMD) and deep minimum variance random vector functional link network (DRVFLN). The optimal parameters of AVMD are selected using chaotic sine–cosine firefly algorithm (CSCFA) and efficient weighted kurtosis index (EWKU). Further, a novel non-iterative DRVFLN network is applied for accurate fault classification and location. In the DRVFLN direct connections are present from preceding layers to the forward layers of the network like random vector functional link network. These connections help to reduce the model complexity and also regularize the randomization. Further, the denoising criterion is also introduced in this network where the uncorrupted input can be recovered from the corrupted versions by using the autoencoder to obtained better results than the traditional networks. The performances of various models are evaluated using some performance index such as overall accuracy and sensitivity. Results conclude that the proposed AVMD-DRVFLN model classified the faults with an accuracy and sensitivity of 100% for all the events. The model performance is also tested and validated against the unwanted noise by considering different signal-to-noise ratio to ensure the robustness of the proposed model. Results conclude that the proposed model correctly classified the faults against such incorporation of noise. Similarly, the performances of various models for the estimation of fault distance are validated by computing the relative error. Results conclude that the proposed AVMD-DRVFLN model produces promising result in terms of relative error(which is less than 5%) for all the events. Comparisons with some existing models like extreme learning machine, support vector machine, random vector functional link network and deep extreme learning machine are included to validate the efficacy of the proposed AVMD-DRVFLN model. Results conclude that proposed method outperforms all the other methods. The effectiveness of the proposed AVMD-DRVFLN model for fault classification and distance estimation is established through rigorous case studies in MATLAB augmented by some real time test results.
Exploration of compressive sensing in the classification of frozen fish based on two-dimensional correlation spectrum
2022, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
Citation Excerpt :
According to Fig. 3a, when the number of runs is 9, 85 wavelength variables are screened, and the compression rate of variables is 42.29% (see Fig. 3). The SPA can find the set of variables containing the minimum redundant information from the spectral information, which minimizes the covariance between variables and greatly reduces the number of variables used in modeling, and improves the speed and efficiency of modeling [25]. According to the principle of SPA, root mean square error (RMSE) determines the quality of the model.
In order to classify imported frozen fish, effectively a spectral data compression method was presented based on two-dimensional correlation spectroscopy. In the experiment, the near-infrared spectral data of Oncorhynchus keta, Oncorhynchus nerka and Oncorhynchus gorbuscha of Salmonidae were collected. And two-dimensional correlation spectroscopy among the three fish samples was constructed. The study found that the auto-correlation peaks intensities at 650 nm, 1724 nm and 1908 nm were almost zero, which were taken as the separation point of the spectra. Therefore, each spectral data is divided into 4 segments and the integral of each segment is obtained. The original spectra of 201 points in each group were compressed into 4 points. Then, the compressed spectral data were input into the support vector machine to establish the discriminant model of three kinds of frozen fish. At the same time, the Competitive Adaptive Reweighted Sampling and the Successive Projections Algorithm were used to screen the original spectra. The classification results were compared with the result of the spectral data compression method of two-dimensional correlation spectroscopy. The result shows: the compression rate of the proposed method is 98.01%; the accuracy rate of support vector machine training set is 100%; the accuracy rate of validation set is up to 100%. The results shows that the proposed spectral data compression method based on two-dimensional correlation spectral technology has high compression rate and accurate classification.
A constructive approach to data-driven randomized learning for feedforward neural networks
2021, Applied Soft Computing
There is an issue with the way in which feedforward neural networks with random hidden nodes generate random parameters in order to obtain a good projection space. Typically, random weights and biases are both drawn from the same interval, which is misguided as they have different functions. Recently, more sophisticated methods of random parameters generation have been developed, such as the data-driven approach, where the sigmoids are placed in randomly selected regions of the input space and then their slopes are adjusted to the local fluctuations of the target function. In this work, we propose a new constructive data-driven method that builds iteratively the network architecture. This method successively generates new candidate hidden nodes and accepts them when the training error falls significantly. The threshold of acceptance is adapted throughout training, accepting at the beginning of the training process only those nodes which lead to the largest reductions in error. In the next stages, the threshold is successively reduced to accept only those nodes which model the target function details more accurately. This leads to a more compact network architecture, as it includes only ”significant” nodes. It is worth noting that redundant, random nodes, which are usually generated by existing randomized learning methods, are not accepted by the proposed method. We empirically compared our approach with several alternative methods, including its predecessor, competitive randomized learning solutions, a gradient-based network and a generalized additive model. We found that our proposed approach outperformed its competitors in terms of fitting accuracy.
An ensemble approach for supporting the respiratory isolation of presumed tuberculosis inpatients
2019, Neurocomputing
Citation Excerpt :
Their main drawback resides on the high computational efforts demanded, which may turn the process of achieving a satisfactory solution infeasible. Less computational strategies include hierarchical clustering [42,43], graphs [44], successive projection [45], and greedy schemes [46–48]. Although many committee approaches have been recently proposed, analogous to classical techniques, most solutions do not explore any criteria to guarantee, even up to some extent, that the samples generated for models’ production would be diverse.
Tuberculosis remains a global health challenge, especially in low and middle-income countries. New diagnostic tools can allow earlier diagnosis, reducing both the mortality and transmission in the community. In hospitals, the decision making relative to the allocation of presumed pulmonary tuberculosis inpatients in airborne rooms is critical, since no standard criterion has been established. In this paper, we propose a novel technique for developing a committee of classifiers aiming at supporting the decision making relative to inpatient respiratory isolation. The proposed approach is agnostic on the classification model adopted, exploiting tailored strategies for optimally integrating a small and diverse set of compact classifiers, resulting in highly accurate committees. The results confirm that the resulting committees have outperformed several recently proposed single-models and ensemble solutions, including deep learning techniques. As a practical benefit, the adoption of such decision support tool can reduce in almost one half the percentage of inpatients unnecessarily isolated at a university hospital.
On extreme learning machines in sequential and time series prediction: A non-iterative and approximate training algorithm for recurrent neural networks
2019, Neurocomputing
Recurrent neural networks (RNN) are a type of artificial neural networks (ANN) that have been successfully applied to many problems in artificial intelligence. However, they are expensive to train since the number of learned weights grows exponentially with the number of hidden neurons. Non-iterative training algorithms have been proposed to reduce the training time, mainly on feedforward ANN. In this work, the application of non-iterative randomized training algorithms to various RNN architectures, including Elman RNN, fully connected RNN, and long short-term memory (LSTM), are investigated. The mathematical formulation and theoretical computational complexity of the proposed algorithms are presented. Finally, their performance is empirically compared to other iterative RNN training algorithms on time series prediction and sequential decision-making problems. Non-iteratively-trained RNN architectures showed promising results as significant training speedup of up to 99%, and improved repeatability were achieved compared to backpropagation-trained RNN. Although the decrease in prediction accuracy was found to be statistically significant based on Friedman and ANOVA testing, some applications like real-time embedded systems can tolerate and make use of that.

View all citing articles on Scopus

View full text

Building selective ensembles of Randomization Based Neural Networks with the successive projections algorithm

Highlights

Abstract

Graphical abstract

Introduction

Section snippets

Randomization Based Neural Networks

Successive Projections Algorithm

Selective ensemble of RNNs using SPA

Experiments and results

Conclusions

Acknowledgment

Appl. Soft Comput.

Inf. Sci.

Neurocomputing

Neurocomputing

Appl. Soft Comput.

Neurocomputing

Pattern Recogn.

Neurocomputing

Artif. Intell.

Inf. Sci.

Inf. Sci.

Neurocomputing

Chemom. Intell. Lab. Syst.

Trends Anal. Chem.

Anal. Chim. Acta

Neurocomputing

Stochastic choice of basis functions in adaptive function approximation and the functional-link net

IEEE Trans. Neural Netw.

Radial basis functions, multi-variable functional interpolation and adaptive networks

Complex Syst.