Hierarchical ensemble of Extreme Learning Machine
Introduction
Extreme Learning Machine (ELM) was first proposed by Huang et al. [1] for generalized single-hidden layer feedforward neural networks (SLFNs). In contrast to the traditional neural networks which require great effort in hyper-parameter tuning, ELM randomly generates its hidden weights and biases first, and then mathematically calculates output weights by solving a Ridge Regression problem [2], [3]. Due to the unique characteristics, i.e., fast learning speed, ease of implementation, and universal approximation capability [4], ELM has been widely applied in image recognition [5], [6], [7], remote sensing image classification [8], [9], [10], and protein structure prediction [11].
Actually, ELM or its variants can also be regarded as Randomized Neural Networks [12]. Since the random nature of ELM’s hidden layer, ELM potentially yields an unstable prediction, which results in a large number of hidden neurons is required to guarantee its performance. To address this problem, many approaches have been proposed to improve ELMs. A straightforward idea is to optimize ELM’s hidden layer parameters utilizing heuristic searching, such as Differential Evolution [13], Memetic Algorithm [3], and Evolutionary Multi-objective Algorithm [14], however, which generally implies a high computational cost. In [15], Kernel-based ELM (KELM) was proposed by assuming ELM’s hidden mapping unknown to users, and have shown can outperform Support Vector Machine (SVM) in many applications.
In recent years, deep learning [16], which learns feature representation via a hierarchical structure, has achieved remarkable success in various fields. Inspired by this, developing deep representation methods based on ELM has attracted increasing attention [4], [17], [18], [19]. For example, multi-layer ELM (ML-ELM) [4] that constructs deep representation by stacking a series of ELM autoencoders (ELM-AEs) sequentially. Instead, Zhou and Feng [20] first proposed the Deep Forest method (gcForest), a random forest ensemble approach, showing the power of integrating representation learning and ensemble learning, furthermore, providing a solution to build deep representation using traditional methods.
It is well known that ensemble learning is effective for combining multiple learning methods to yield better performance. Although ensemble methods have been used to improve ELM, e.g. Voting-based ELM (V-ELM) [21], the integration of representation learning and ensemble learning has not been drawing enough attention. In [22], a hierarchical ELM ensemble (H-ELM-E), an ensemble of ensembles, was used to fuse different image features. Similarly, in [12], a trained combiner is used to integrate component ELMs, however, it essentially is weighted voting and is not able to do representation learning.
In this paper, we aim at developing a novel hierarchical ensemble of ELM (HE-ELM) for representation learning. Unlike the traditional ensemble methods which make a final decision on a shallow ensemble, the proposed method includes multiple re-representation layers and adopts two diversity encouraging strategies to avoid model overfitting.
This paper is structured as follows. Section 2 reviews the related works and gives the main motivation. Section 3 describes the proposed method and two diversity encouraging strategies. The experiment results are given in Section 4. The conclusions are discussed in Section 5.
Section snippets
Briefs of ELM
Theorem 1 Learning can be made without iteratively tuning (artificial) hidden nodes (or hundred types of biological neurons) even though the modeling of biological neurons may be unknown as long as they are nonlinear piecewise continuous, and such a network can approximate any continuous target function with any small error and can also separate any disjoint regions without tuning hidden neurons [23].
Consider a data set with training samples in (n-dimensional feature space) and class labels
Hierarchical ensemble of ELM
In this section, we shall first introduce the overall architecture of the proposed method, and then we will describe two diversity encouraging strategies.
Experiment setup
We evaluate the performance of our proposed HE-ELM on 22 benchmark data sets that taken from UCI repository.1 All the data sets are normalized using max-min normalization in prepossessing, and 60% of the labeled samples of each data set are selected randomly for training and the rest for testing. We run a group of experiments to compare HE-ELM with basic Extreme Learning Machine (ELM) [1], Multiple Layer Perceptron (MLP) [26], Multiple-Layer Extreme
Conclusions
We have introduced a novel hierarchical ensemble method based on ELM for classification, using two re-representation layers to re-represent features by adding predictions of component ELMs into the initial features. To encourage the diversity of component ELMs, we introduce two strategies, including sparse connection, which aims at randomly disconnecting a percentage of hidden connections, and feature bagging, which aims at increasing the number of sub-samples by subspace sampling. Simulations
Acknowledgments
We thank the reviewers for their valuable comments and suggestions. This work was partially supported by the National Natural Science Foundation of China under grant nos. 61773355, 61603355, the Fundamental Research Funds for National University, China University of Geosciences(Wuhan) grant no. G1323541717, and the National Nature Science Foundation of Hubei Province, China grant no. 2018CFB528.
References (28)
- et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006) - et al.
Memetic extreme learning machine
Pattern Recognit.
(2016) - et al.
Human action recognition using extreme learning machine based on visual vocabularies
Neurocomputing
(2010) - et al.
Face recognition based on extreme learning machine
Neurocomputing
(2011) - et al.
Ensemble of extreme learning machines with trained classifier combination and statistical features for hyperspectral data
Neurocomputing
(2018) - et al.
Evolutionary extreme learning machine
Pattern Recognit.
(2005) - et al.
Voting based extreme learning machine
Inf. Sci.
(2012) - et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989) - et al.
Extreme learning machine: a new learning scheme of feedforward neural networks
Neural Networks, 2004. Proceedings. 2004 IEEE International Joint Conference on, Volume 2
(2004) - et al.
Extreme learning machine for multilayer perceptron
IEEE Trans. Neural Netw. Learn. Syst.
(2016)
Traffic sign recognition using kernel extreme learning machines with deep perceptual features
IEEE Trans. Intell. Transp. Syst.
Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine
Remote Sens.
Kernel-based extreme learning machine for remote-sensing image classification
Remote Sens. Lett.
Classification of hyperspectral remote sensing image using hierarchical local-receptive-field-based extreme learning machine
IEEE Geosci. Remote Sens. Lett.
Cited by (46)
Transformer-BLS: An efficient learning algorithm based on multi-head attention mechanism and incremental learning algorithms
2024, Expert Systems with ApplicationsSemi-supervised learning with graph convolutional extreme learning machines
2023, Expert Systems with ApplicationsHierarchical Cat and Mouse based ensemble extreme learning machine for spectrum sensing data falsification attack detection in cognitive radio network
2022, Microprocessors and MicrosystemsCitation Excerpt :HCM-EELM is the integration of Cat and Mouse optimizer (CMO) algorithm and Hierarchical Ensemble Extreme Learning Machine (HE-ELM). HE-ELM [31] is described as the improved version of Extreme Learning Machine (ELM) that improves the accuracy of classification. The structure of HE-ELM comprised of two-representation layer and decision layer.
A hybrid adaptive teaching–learning-based optimization and differential evolution for parameter identification of photovoltaic models
2020, Energy Conversion and ManagementCitation Excerpt :In future work, we will apply the ATLDE for the practical PV models under varied temperature condition. In addition, the double diode model should be paid more attention because many methods do not provide a more accurate and reliable result, and the development of hybrid approaches, such as heuristics and analytical methods, heuristics and heuristics, heuristics and machine learning techniques [63,64], may be an effective solution. The source code used in this paper can be obtained from the authors upon request.
Optimal power flow by means of improved adaptive differential evolution
2020, EnergyCitation Excerpt :Combining with machine learning techniques, such as extreme machine learning [47] and naive bayes [48], in EJADE-SP is also an interesting future direction.