Building selective ensembles of Randomization Based Neural Networks with the successive projections algorithm
Graphical abstract
Introduction
Randomization Based Neural Networks (RNNs) is a class of Neural Networks (NN) in which several parameters are randomly assigned. The success of RNN can be observed in various domains [1]. The idea of randomly assigning neural network parameters is shared by different models like Random Vector Functional Link (RVFL, [2]) networks, Radial basis function neural networks with randomly generated centres [3], the Liquid State Machine [4] and the Feedforward Neural Network with Random Weights (FNNRW, [5]). Classical neural networks training approaches usually tune the parameters based on the derivatives of their loss function. Considering that the power of most NN relies on the nonlinear function in the hidden units and that it is the most common NN architecture, the optimization turns out to be a nonlinear least squares problem which is usually solved iteratively, with a slow convergence rate, often converging to a local minimum [6]. Randomization-based methods deal with this problem by either randomly fixing the network configurations or some parts of the network parameters, or randomly corrupting the input data or the parameters during the training [7].
As a result of randomly assigning the learning parameters, some suboptimal input weights may be drawn, which may have a negative impact on both the generalization ability and the performance stability of the NN [8]. To overcome such problem, feature selection, neuron pruning and ensemble methods are among the most used strategies. Feature selection methods aim to discard redundant information available in the feature set, thus generating more concise models which are less likely to suffer from overfitting. The success of such strategies in RNNs can be verified in many publications such as [9], [10]. Conversely, redundant information may be discarded by pruning RNNs. In pruning methods, hidden nodes with similar responses are discarded. This procedure also results in less complex models (reducing the number of hidden neurons) with improved generalization capability. RNN pruning methods available in [11], [12], [13] show the impact of this procedure in RNN's performance.
In a different direction, ensemble methods combine various models to generate a single one. This procedure improves the generalization capability and is the key idea behind successful learning algorithms like random forests [14]. According to [15], ensemble strategies may be specially suitable for RNN since such methods are highly unstable. Ensemble methods for RNN were proposed in recent papers such as [16], [17], [18] (see [7] for a survey on ensembles of RNNs). Although the performance of such methods seems promising, ensemble strategies result in a more complex model. To mitigate this drawback, some researchers proposed ensemble strategies in which the final ensemble is composed by a subset of all generated models. This procedure is named selective ensemble. In [19], the authors performed several experiments and suggested that selective ensembles may improve the generalization capability of ensemble models while reducing their complexity. This hypothesis is also supported by the results obtained in [16].
Inspired by these results, in this paper we propose a RNN selective ensemble method that uses feature selection and pruning strategies to reduce the complexity of the final model. In the proposed method, named Selective Ensemble of RNN using the Successive Projections Algorithm (SERS), we employ the Successive Projections Algorithm (SPA) in three different tasks: (1) selecting relevant features; (2) pruning unnecessary hidden neurons; and (3) selecting ensemble members. Although SPA was originally developed as a feature selection technique, it can also be employed for RNN pruning. In this context, the main contribution of this paper consists of extending the usage of SPA for the selection of ensemble members and combining the three aforementioned tasks into the proposed SERS method. Experiments were carried out in benchmark regression datasets, and the results showed that SERS achieved a performance which is comparable to some recently proposed RNN ensemble methods, while resulting in less complex models.
The remaining sections of this paper are organized as follows. Section 2 presents the basic concepts of Random Neural Networks. Section 3 describes the Successive Projections Algorithm. Section 4 introduces the proposed method. Section 5 shows the results obtained in numerical experiments conducted to illustrate the application of the proposed method. Concluding remarks are given in Section 6.
Section snippets
Randomization Based Neural Networks
In addition to the seminal studies of Rosenblatt [20] about the Perceptron model, the paper published by Schmidt et al. [5] was the first to investigate the effect of randomly setting NN's hidden weights in its performance. In the proposed method, named Feedforward Neural Network with Random Weights (FNNRW, [5]), the training procedure can be divided into two main steps: (1) random feature mapping and (2) linear parameters solving.
Suppose a Single Hidden Layer Feedforward Neural Network (SLFNN)
Successive Projections Algorithm
The Successive Projections Algorithm (SPA) was originally proposed for feature selection in the context of multivariate linear regression models for spectroscopic analysis [25] and has found many applications over the years, as described in a recent review paper [26]. In this section, we provide a brief description of SPA and its usage for feature selection in regression tasks. For further details, the reader is referred to [27].
Let denote the dth column of matrix X. The goal is to find a
Selective ensemble of RNNs using SPA
Although SPA has been originally proposed for feature selection, extending its application to either neuron pruning or ensemble model selection can be done without modifying the original SPA formulation. As stated in Section 2, the training procedure for a RNN model can be divided into two main steps. In the first step, the feature vectors go through a nonlinear transformation, being projected into a new feature space. This procedure is performed by the hidden layer neurons. After that, the
Experiments and results
To assess the performance of SERS, two sets of experiments were conducted. The first set is designed to highlight the performance of SPA for different tasks in the proposed method. In the first set, the effectiveness of SPA for feature selection is evaluated in the first experiment. The second experiment aims to verify the pruning capability of SPA. Finally, the ensemble selection capability of SPA is verified in the third experiment. In the second set of experiments, we compare SERS to similar
Conclusions
In this paper we propose a method to build parsimonious ensembles of RNNs for regression problems. The proposed method, named SERS, is composed by three steps and employs the Successive Projections Algorithm (SPA) in each of them to perform three different tasks: feature selection in step 1, pruning hidden neurons in step 2, and ensemble selection in step 3. All three tasks aim to reduce the complexity of the final model without compromising the model accuracy.
Two sets of numerical experiments
Acknowledgment
The authors would like to thank the Brazilian National Council for Scientific and Technological Development (CNPq) for the financial support (grants 303714/2014-0 and 305048/2016-3).
References (36)
- et al.
Classification with reject option for software defect prediction
Appl. Soft Comput.
(2016) - et al.
A survey of randomized algorithms for training neural networks
Inf. Sci.
(2016) - et al.
Genetic ensemble of extreme learning machine
Neurocomputing
(2014) - et al.
Evolutionary ELM wrapper feature selection for Alzheimer's disease cad on anatomical brain MRI
Neurocomputing
(2014) - et al.
A new pruning method for extreme learning machines via genetic algorithms
Appl. Soft Comput.
(2016) - et al.
LARSEN-ELM: selective ensemble of extreme learning machines using LARS for blended data
Neurocomputing
(2015) - et al.
Evolutionary extreme learning machine
Pattern Recogn.
(2005) - et al.
Ensemble delta test-extreme learning machine (DT-ELM) for regression
Neurocomputing
(2014) - et al.
Ensembling neural networks: many could be better than all
Artif. Intell.
(2002) - et al.
Random vector functional link network for short-term electricity load demand forecasting
Inf. Sci.
(2016)
A comprehensive evaluation of random vector functional link networks
Inf. Sci.
Extreme learning machine: theory and applications
Neurocomputing
The successive projections algorithm for variable selection in spectroscopic multicomponent analysis
Chemom. Intell. Lab. Syst.
The successive projections algorithm
Trends Anal. Chem.
The successive projections algorithm for interval selection in trilinear partial least-squares with residual bilinearization
Anal. Chim. Acta
Ensemble delta test-extreme learning machine (DT-ELM) for regression
Neurocomputing
Stochastic choice of basis functions in adaptive function approximation and the functional-link net
IEEE Trans. Neural Netw.
Radial basis functions, multi-variable functional interpolation and adaptive networks
Complex Syst.
Cited by (29)
Random vector functional link network: Recent developments, applications, and future directions
2023, Applied Soft ComputingFault analysis of photovoltaic based DC microgrid using deep learning randomized neural network
2022, Applied Soft ComputingCitation Excerpt :However, in the case of ML-ELM selection of number of hidden layers, number of hidden layer neurons required in each layer and corresponding activation functions required large number of trials to get the final optimal set. Further, it is also found that the RVFLN provides better result when direct link is present between input and output layer [37–39]. Due to the presence of direct link it provides better generalization performance and lower model complexity.
Exploration of compressive sensing in the classification of frozen fish based on two-dimensional correlation spectrum
2022, Spectrochimica Acta - Part A: Molecular and Biomolecular SpectroscopyCitation Excerpt :According to Fig. 3a, when the number of runs is 9, 85 wavelength variables are screened, and the compression rate of variables is 42.29% (see Fig. 3). The SPA can find the set of variables containing the minimum redundant information from the spectral information, which minimizes the covariance between variables and greatly reduces the number of variables used in modeling, and improves the speed and efficiency of modeling [25]. According to the principle of SPA, root mean square error (RMSE) determines the quality of the model.
A constructive approach to data-driven randomized learning for feedforward neural networks
2021, Applied Soft ComputingAn ensemble approach for supporting the respiratory isolation of presumed tuberculosis inpatients
2019, NeurocomputingCitation Excerpt :Their main drawback resides on the high computational efforts demanded, which may turn the process of achieving a satisfactory solution infeasible. Less computational strategies include hierarchical clustering [42,43], graphs [44], successive projection [45], and greedy schemes [46–48]. Although many committee approaches have been recently proposed, analogous to classical techniques, most solutions do not explore any criteria to guarantee, even up to some extent, that the samples generated for models’ production would be diverse.