Elsevier

Pattern Recognition Letters

Volume 116, 1 December 2018, Pages 101-106
Pattern Recognition Letters

Hierarchical ensemble of Extreme Learning Machine

https://doi.org/10.1016/j.patrec.2018.06.015Get rights and content

Highlights

  • A novel hierarchical ensemble of ELM integrated with representation learning and ensemble learning.

  • Deep cascade structure is used for re-representing features.

  • Sparse connection and feature bagging are used to encourage individual diversity.

  • HE-ELM significantly outperforms many existing ensemble and representation learning methods.

Abstract

Extreme Learning Machine (ELM), which is proposed for generalized single-hidden layer feedforward neural networks, has become a popular research topic due to its unique characteristics. However, the random nature inherent in ELM’s hidden layer results in unstable performance and a large number of hidden neurons is required, making the risk of overfitting increased. In this paper, we propose a simple but effective ensemble approach, called Hierarchical Ensemble of Extreme Learning Machine (HE-ELM), to improve ELM. To encourage the diversity of component ELMs, two strategies are taken into account, namely, the sparse connection to component ELMs and feature bagging. The resulting architecture is able to integrate both representation learning and ensemble learning with relatively fewer parameters and consists of independent component ELMs, making it easy to implement, train, and apply in practice. We compare results of the proposed HE-ELM with existing methods for 22 classification problems, showing that HE-ELM is able to achieve significant improvement in terms of classification accuracy, with a reduced risk of overfitting the training data.

Introduction

Extreme Learning Machine (ELM) was first proposed by Huang et al. [1] for generalized single-hidden layer feedforward neural networks (SLFNs). In contrast to the traditional neural networks which require great effort in hyper-parameter tuning, ELM randomly generates its hidden weights and biases first, and then mathematically calculates output weights by solving a Ridge Regression problem [2], [3]. Due to the unique characteristics, i.e., fast learning speed, ease of implementation, and universal approximation capability [4], ELM has been widely applied in image recognition [5], [6], [7], remote sensing image classification [8], [9], [10], and protein structure prediction [11].

Actually, ELM or its variants can also be regarded as Randomized Neural Networks [12]. Since the random nature of ELM’s hidden layer, ELM potentially yields an unstable prediction, which results in a large number of hidden neurons is required to guarantee its performance. To address this problem, many approaches have been proposed to improve ELMs. A straightforward idea is to optimize ELM’s hidden layer parameters utilizing heuristic searching, such as Differential Evolution [13], Memetic Algorithm [3], and Evolutionary Multi-objective Algorithm [14], however, which generally implies a high computational cost. In [15], Kernel-based ELM (KELM) was proposed by assuming ELM’s hidden mapping unknown to users, and have shown can outperform Support Vector Machine (SVM) in many applications.

In recent years, deep learning [16], which learns feature representation via a hierarchical structure, has achieved remarkable success in various fields. Inspired by this, developing deep representation methods based on ELM has attracted increasing attention [4], [17], [18], [19]. For example, multi-layer ELM (ML-ELM) [4] that constructs deep representation by stacking a series of ELM autoencoders (ELM-AEs) sequentially. Instead, Zhou and Feng [20] first proposed the Deep Forest method (gcForest), a random forest ensemble approach, showing the power of integrating representation learning and ensemble learning, furthermore, providing a solution to build deep representation using traditional methods.

It is well known that ensemble learning is effective for combining multiple learning methods to yield better performance. Although ensemble methods have been used to improve ELM, e.g. Voting-based ELM (V-ELM) [21], the integration of representation learning and ensemble learning has not been drawing enough attention. In [22], a hierarchical ELM ensemble (H-ELM-E), an ensemble of ensembles, was used to fuse different image features. Similarly, in [12], a trained combiner is used to integrate component ELMs, however, it essentially is weighted voting and is not able to do representation learning.

In this paper, we aim at developing a novel hierarchical ensemble of ELM (HE-ELM) for representation learning. Unlike the traditional ensemble methods which make a final decision on a shallow ensemble, the proposed method includes multiple re-representation layers and adopts two diversity encouraging strategies to avoid model overfitting.

This paper is structured as follows. Section 2 reviews the related works and gives the main motivation. Section 3 describes the proposed method and two diversity encouraging strategies. The experiment results are given in Section 4. The conclusions are discussed in Section 5.

Section snippets

Briefs of ELM

Theorem 1

Learning can be made without iteratively tuning (artificial) hidden nodes (or hundred types of biological neurons) even though the modeling of biological neurons may be unknown as long as they are nonlinear piecewise continuous, and such a network can approximate any continuous target function with any small error and can also separate any disjoint regions without tuning hidden neurons [23].

Consider a data set with training samples X={xi}i=1N in Rn (n-dimensional feature space) and class labels

Hierarchical ensemble of ELM

In this section, we shall first introduce the overall architecture of the proposed method, and then we will describe two diversity encouraging strategies.

Experiment setup

We evaluate the performance of our proposed HE-ELM on 22 benchmark data sets that taken from UCI repository.1 All the data sets are normalized using max-min normalization in prepossessing, and 60% of the labeled samples of each data set are selected randomly for training and the rest for testing. We run a group of experiments to compare HE-ELM with basic Extreme Learning Machine (ELM) [1], Multiple Layer Perceptron (MLP) [26], Multiple-Layer Extreme

Conclusions

We have introduced a novel hierarchical ensemble method based on ELM for classification, using two re-representation layers to re-represent features by adding predictions of component ELMs into the initial features. To encourage the diversity of component ELMs, we introduce two strategies, including sparse connection, which aims at randomly disconnecting a percentage of hidden connections, and feature bagging, which aims at increasing the number of sub-samples by subspace sampling. Simulations

Acknowledgments

We thank the reviewers for their valuable comments and suggestions. This work was partially supported by the National Natural Science Foundation of China under grant nos. 61773355, 61603355, the Fundamental Research Funds for National University, China University of Geosciences(Wuhan) grant no. G1323541717, and the National Nature Science Foundation of Hubei Province, China grant no. 2018CFB528.

References (28)

  • Y. Zeng et al.

    Traffic sign recognition using kernel extreme learning machines with deep perceptual features

    IEEE Trans. Intell. Transp. Syst.

    (2017)
  • C. Chen et al.

    Spectral-spatial classification of hyperspectral image based on kernel extreme learning machine

    Remote Sens.

    (2014)
  • M. Pal et al.

    Kernel-based extreme learning machine for remote-sensing image classification

    Remote Sens. Lett.

    (2013)
  • Q. Lv et al.

    Classification of hyperspectral remote sensing image using hierarchical local-receptive-field-based extreme learning machine

    IEEE Geosci. Remote Sens. Lett.

    (2016)
  • Cited by (46)

    • Hierarchical Cat and Mouse based ensemble extreme learning machine for spectrum sensing data falsification attack detection in cognitive radio network

      2022, Microprocessors and Microsystems
      Citation Excerpt :

      HCM-EELM is the integration of Cat and Mouse optimizer (CMO) algorithm and Hierarchical Ensemble Extreme Learning Machine (HE-ELM). HE-ELM [31] is described as the improved version of Extreme Learning Machine (ELM) that improves the accuracy of classification. The structure of HE-ELM comprised of two-representation layer and decision layer.

    • A hybrid adaptive teaching–learning-based optimization and differential evolution for parameter identification of photovoltaic models

      2020, Energy Conversion and Management
      Citation Excerpt :

      In future work, we will apply the ATLDE for the practical PV models under varied temperature condition. In addition, the double diode model should be paid more attention because many methods do not provide a more accurate and reliable result, and the development of hybrid approaches, such as heuristics and analytical methods, heuristics and heuristics, heuristics and machine learning techniques [63,64], may be an effective solution. The source code used in this paper can be obtained from the authors upon request.

    • Optimal power flow by means of improved adaptive differential evolution

      2020, Energy
      Citation Excerpt :

      Combining with machine learning techniques, such as extreme machine learning [47] and naive bayes [48], in EJADE-SP is also an interesting future direction.

    View all citing articles on Scopus
    View full text