Elsevier

Neurocomputing

Volume 207, 26 September 2016, Pages 560-567
Neurocomputing

Hessian semi-supervised extreme learning machine

https://doi.org/10.1016/j.neucom.2016.05.039Get rights and content

Abstract

Extreme learning machine (ELM) has emerged as an efficient and effective learning algorithm for classification and regression tasks. Most of the existing research on the ELMs mainly focus on supervised learning. Recently, researchers have extended ELMs for semi-supervised learning, in which they exploit both the labeled and unlabeled data in order to enhance the learning performances. They have incorporated Laplacian regularization to determine the geometry of the underlying manifold. However, Laplacian regularization lacks extrapolating power and biases the solution towards a constant function. In this paper, we present a novel algorithm called Hessian semi-supervised ELM (HSS-ELM) to enhance the semi-supervised learning of ELM. Unlike the Laplacian regularization, the Hessian regularization that favors functions whose values vary linearly along the geodesic distance and preserves the local manifold structure well. This leads to good extrapolating power. Furthermore, HSS-ELM maintains almost all the advantages of the traditional ELM such as the significant training efficiency and straight forward implementation for multiclass classification problems. The proposed algorithm is tested on publicly available data sets. The experimental results demonstrate that our proposed algorithm is competitive with the state-of-the-art semi-supervised learning algorithms in term of accuracy. Additionally, HSS-ELM requires remarkably less training time compared to semi-supervised SVMs/regularized least-squares methods.

Introduction

The extreme learning machine (ELM) is a relatively new training algorithm for a single-hidden layer feedforward network (SLFN) that enables fast training of the network [1]. Many existing SLFN algorithms such as the back-propagation algorithm [2] and the Levenberg–Marquardt algorithm [3], utilize gradient descent optimization to adjust the weights and biases of the neurons at both the hidden and output layers of the network.

Support vector machine (SVM) is considered one of the most successful algorithms for training SLFNs, which is a maximal margin classifier established under the framework of structural risk minimization [4], [5]. The formulation used in SVM can be solved conveniently since the dual problem of SVM is a quadratic programming. SVMs have been applied extensively in many applications mainly due to simplicity and stable generalization performances [6], [7], [8], [9].

Recently, Huang et al. [10], [1], [11] proposed a new algorithm termed as extreme learning machine (ELM) to train SLFNs. Unlike the conventional approaches, ELM only needs to analytically calculate the output weights while the input weights and hidden layer biases are randomly generated. Despite this simplicity, however, ELM not only reaches the smallest training error, but also the smallest norm of output weights that leads to a good generalization performance [12]. Recent research studies show that ELM has comparable or even better performances in prediction accuracy compared to SVM [11], [13], [10]. In recent years, many extension of basic ELMs have been tailored for solving specific problems, e.g. online sequential data [14], [15], [16], imbalanced data [17], [18] and noisy/missing data [19], [20], [21].

Most of the existing works in ELM mainly focus on supervised learning that requires large number of labeled patterns for classification and regression tasks. In practice, it is cumbersome to collect a large amount of labeled data as it is both expensive and time consuming. While, obtaining unlabeled data is both easier and more cost effective. To circumvent the problem faced in supervised learning, semi-supervised learning (SSL) algorithms have been introduced. SSL algorithms take advantage of both labeled and unlabeled data in order to improve the prediction accuracy while saving the labor cost for annotating large amount of label data [22], [23].

Manifold regularization based approaches have been widely applied in semi-supervised learning algorithms. Manifold regularization enhances the performances of semi-supervised learning by trying to explore the geometry of intrinsic data probability of distribution. One of the most popular manifold regularization is the Laplacian regularization [24], [25], in which it utilizes graph Laplacian to determine the geometry of the underlying manifold. Laplacian regularization has been implemented in various fields such as in sparse coding [26], [27], classification [25], [28] and feature selection [29], [30]. Recently, Laplacian regularization has been also applied in ELM for semi-supervised learning tasks [31], [32], [33], [34], [35].

Although, semi-supervised learning methods based on Laplacian regularization produce good performance, they suffer from few drawbacks. Its performance worsens when there are only few labeled data available as it lacks extrapolating power. Furthermore, it has been reported that Laplacian regularization biases the solution towards a constant function due to its constant null space and cannot preserve well the local topology [36].

In contrast, Hessian regularization has a richer null space and can favor the learned functions whose values vary linearly along the data manifold. Furthermore, it can exploit the intrinsic local geometry of the data manifold well and has a better extrapolating power [37], [38], [39], [36]. Thus, Hessian regularization is more suited for semi-supervised learning compared to Laplacian regularization. Hessian regularization has been extensively implemented in many semi-supervised applications such as in kernel regression [36], classification [40], [41], [42], sparse coding [43], [44] and feature selection [45].

In this paper, we extend the ELM to handle semi-supervised learning problems by introducing Hessian regularization into ELM. Unlike the Laplacian regularization which was used in the previous semi-supervised ELM algorithms, Hessian regularization favors functions whose values vary linearly with respect to geodesic distance and preserves the local manifold well. Therefore, Hessian regularization enhances the performance of ELM in semi-supervised learning. Furthermore, the proposed algorithm inherits the computational efficiency and learning capability of traditional ELM, especially for multiclass classification problems. We conducted several experiments using the standard data sets to evaluate our algorithm against state-of-the-art semi-supervised algorithms. The results show that the proposed algorithm is competitive with other semi-supervised algorithms in terms of accuracy and requires much less training time compared to semi-supervised SVMs/regularized least-square based methods for multiclass classification problems.

This paper is organized as follows: Section 2 contains the description of ELM, manifold regularization and semi-supervised ELM (SS-ELM). In Section 3, we present our proposed framework which consists of Hessian regularization and HSS-ELM formulation. Experimental results are presented in Section 4. Section 5 concludes the study.

Section snippets

Related work

In this section, we present a brief description of ELM, manifold regularization and semi-supervised extreme learning (SS-ELM), which are the underlying basis of our work.

Proposed framework

In this section, we present our algorithm HSS-ELM in detail. Our algorithm incorporates the Hessian energy as a regularizer that takes account of local manifold structure of the data space.

Experimental results

In this section, we present the results of the experiments conducted to evaluate the performance of our proposed algorithm against the state-of-the-art semi-supervised algorithms such as the transductive SVM (TSVM) [48], LapSVM [25], LapRLS [25], HesSVM [40] and SS-ELM [33].

Conclusion

In this paper, we proposed a new algorithm called HSS-ELM, where the traditional ELM was extended for semi-supervised learning. We have incorporated Hessian regularization into ELM, which has more favorable properties for semi-supervised learning than Laplacian regularization. The Hessian regularization allows functions that extrapolate, i.e. functions whose values are not limited to the range of the training output. This extrapolation ability leads to significant improvement on performance

Acknowledgment

This work was supported by the HIR-MOHE Grant No. UM.C/625/1/HIR/MOHE/ENG/42.

Ganesh Krishnasamy received the B.Eng. and M.Eng. degrees in electrical and electronic engineering from the Universiti Kebangsaan Malaysia in 2004 and 2007 respectively. He is currently pursuing the Ph.D. degree in electrical engineering with the Department of Electrical Engineering, University of Malaya, Malaysia. His current research interests include the field of computer vision, machine learning and optimization.

References (48)

  • Q. Yu et al.

    Regularized extreme learning machine for regression with missing data

    Neurocomputing

    (2013)
  • Y. Wang et al.

    Semi-supervised classification learning by discrimination-aware manifold regularization

    Neurocomputing

    (2015)
  • J. Liu et al.

    Selmsemi-supervised ELM with application in sparse calibrated location estimation

    Neurocomputing

    (2011)
  • Y. Zhou et al.

    Semi-supervised extreme learning machine with manifold and pairwise constraints regularization

    Neurocomputing

    (2015)
  • Y. Gu et al.

    Semi-supervised deep extreme learning machine for wi-fi based localization

    Neurocomputing

    (2015)
  • M. Zheng et al.

    Hessian sparse coding

    Neurocomputing

    (2014)
  • W. Liu et al.

    Multiview Hessian discriminative sparse coding for image annotation

    Comput. Vis. Image Underst.

    (2014)
  • G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in:...
  • D.E. Rumelhart et al.

    Learning representations by back-propagation errors

    Nature

    (1986)
  • M. Hagan et al.

    Training feedforward networks with the Marquardt algorithm

    IEEE Trans. Neural Netw.

    (1994)
  • V.N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag New York, Inc, New York, NY, USA,...
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • J.A.K. Suykens et al.

    Least squares support vector machine classifiers

    Neural Process. Lett.

    (1999)
  • G.-B. Huang et al.

    Extreme learning machine for regression and multiclass classification

    IEEE Trans. Syst. Man Cybern. Part B: Cybern.

    (2012)
  • Cited by (0)

    Ganesh Krishnasamy received the B.Eng. and M.Eng. degrees in electrical and electronic engineering from the Universiti Kebangsaan Malaysia in 2004 and 2007 respectively. He is currently pursuing the Ph.D. degree in electrical engineering with the Department of Electrical Engineering, University of Malaya, Malaysia. His current research interests include the field of computer vision, machine learning and optimization.

    Raveendran Paramesran received the B.Sc. and M.Sc. degrees in electrical engineering from South Dakota State University, Brookings, South Dakota, USA in 1984 and 1985 respectively. He was a systems designer with Daktronics, USA before joining the Department of Electrical Engineering at University of Malaya, Kuala Lumpur, as a lecturer in 1986. In 1992, he received a Ronpaku scholarship from Japan to pursue Doctorate in Engineering, which he completed in 1994 at University of Tokushima, Japan. He was promoted as associate professor in 1995 and was promoted as professor in 2003. His research areas include image and video analysis, formulation of new image descriptors for image analysis, fast computation of orthogonal moments, analysis of EEG signals, and data modeling of substance concentration acquired from non-invasive methods. His contributions can be seen in the form of journal publications, conference proceedings, chapters in books and an international patent to predict blood glucose levels using non-parametric model. He has successfully supervised the completion of 10 Ph.D. students and 12 students in M.Eng.Sc. (Masters by research). His current research interests include image and video analysis, formulation of new image descriptors for image analysis, fast computation of orthogonal moments, analysis of electroencephalography signals, and data modeling of substance concentration acquired from non-invasive methods.

    View full text