Hessian semi-supervised extreme learning machine
Introduction
The extreme learning machine (ELM) is a relatively new training algorithm for a single-hidden layer feedforward network (SLFN) that enables fast training of the network [1]. Many existing SLFN algorithms such as the back-propagation algorithm [2] and the Levenberg–Marquardt algorithm [3], utilize gradient descent optimization to adjust the weights and biases of the neurons at both the hidden and output layers of the network.
Support vector machine (SVM) is considered one of the most successful algorithms for training SLFNs, which is a maximal margin classifier established under the framework of structural risk minimization [4], [5]. The formulation used in SVM can be solved conveniently since the dual problem of SVM is a quadratic programming. SVMs have been applied extensively in many applications mainly due to simplicity and stable generalization performances [6], [7], [8], [9].
Recently, Huang et al. [10], [1], [11] proposed a new algorithm termed as extreme learning machine (ELM) to train SLFNs. Unlike the conventional approaches, ELM only needs to analytically calculate the output weights while the input weights and hidden layer biases are randomly generated. Despite this simplicity, however, ELM not only reaches the smallest training error, but also the smallest norm of output weights that leads to a good generalization performance [12]. Recent research studies show that ELM has comparable or even better performances in prediction accuracy compared to SVM [11], [13], [10]. In recent years, many extension of basic ELMs have been tailored for solving specific problems, e.g. online sequential data [14], [15], [16], imbalanced data [17], [18] and noisy/missing data [19], [20], [21].
Most of the existing works in ELM mainly focus on supervised learning that requires large number of labeled patterns for classification and regression tasks. In practice, it is cumbersome to collect a large amount of labeled data as it is both expensive and time consuming. While, obtaining unlabeled data is both easier and more cost effective. To circumvent the problem faced in supervised learning, semi-supervised learning (SSL) algorithms have been introduced. SSL algorithms take advantage of both labeled and unlabeled data in order to improve the prediction accuracy while saving the labor cost for annotating large amount of label data [22], [23].
Manifold regularization based approaches have been widely applied in semi-supervised learning algorithms. Manifold regularization enhances the performances of semi-supervised learning by trying to explore the geometry of intrinsic data probability of distribution. One of the most popular manifold regularization is the Laplacian regularization [24], [25], in which it utilizes graph Laplacian to determine the geometry of the underlying manifold. Laplacian regularization has been implemented in various fields such as in sparse coding [26], [27], classification [25], [28] and feature selection [29], [30]. Recently, Laplacian regularization has been also applied in ELM for semi-supervised learning tasks [31], [32], [33], [34], [35].
Although, semi-supervised learning methods based on Laplacian regularization produce good performance, they suffer from few drawbacks. Its performance worsens when there are only few labeled data available as it lacks extrapolating power. Furthermore, it has been reported that Laplacian regularization biases the solution towards a constant function due to its constant null space and cannot preserve well the local topology [36].
In contrast, Hessian regularization has a richer null space and can favor the learned functions whose values vary linearly along the data manifold. Furthermore, it can exploit the intrinsic local geometry of the data manifold well and has a better extrapolating power [37], [38], [39], [36]. Thus, Hessian regularization is more suited for semi-supervised learning compared to Laplacian regularization. Hessian regularization has been extensively implemented in many semi-supervised applications such as in kernel regression [36], classification [40], [41], [42], sparse coding [43], [44] and feature selection [45].
In this paper, we extend the ELM to handle semi-supervised learning problems by introducing Hessian regularization into ELM. Unlike the Laplacian regularization which was used in the previous semi-supervised ELM algorithms, Hessian regularization favors functions whose values vary linearly with respect to geodesic distance and preserves the local manifold well. Therefore, Hessian regularization enhances the performance of ELM in semi-supervised learning. Furthermore, the proposed algorithm inherits the computational efficiency and learning capability of traditional ELM, especially for multiclass classification problems. We conducted several experiments using the standard data sets to evaluate our algorithm against state-of-the-art semi-supervised algorithms. The results show that the proposed algorithm is competitive with other semi-supervised algorithms in terms of accuracy and requires much less training time compared to semi-supervised SVMs/regularized least-square based methods for multiclass classification problems.
This paper is organized as follows: Section 2 contains the description of ELM, manifold regularization and semi-supervised ELM (SS-ELM). In Section 3, we present our proposed framework which consists of Hessian regularization and HSS-ELM formulation. Experimental results are presented in Section 4. Section 5 concludes the study.
Section snippets
Related work
In this section, we present a brief description of ELM, manifold regularization and semi-supervised extreme learning (SS-ELM), which are the underlying basis of our work.
Proposed framework
In this section, we present our algorithm HSS-ELM in detail. Our algorithm incorporates the Hessian energy as a regularizer that takes account of local manifold structure of the data space.
Experimental results
In this section, we present the results of the experiments conducted to evaluate the performance of our proposed algorithm against the state-of-the-art semi-supervised algorithms such as the transductive SVM (TSVM) [48], LapSVM [25], LapRLS [25], HesSVM [40] and SS-ELM [33].
Conclusion
In this paper, we proposed a new algorithm called HSS-ELM, where the traditional ELM was extended for semi-supervised learning. We have incorporated Hessian regularization into ELM, which has more favorable properties for semi-supervised learning than Laplacian regularization. The Hessian regularization allows functions that extrapolate, i.e. functions whose values are not limited to the range of the training output. This extrapolation ability leads to significant improvement on performance
Acknowledgment
This work was supported by the HIR-MOHE Grant No. UM.C/625/1/HIR/MOHE/ENG/42.
Ganesh Krishnasamy received the B.Eng. and M.Eng. degrees in electrical and electronic engineering from the Universiti Kebangsaan Malaysia in 2004 and 2007 respectively. He is currently pursuing the Ph.D. degree in electrical engineering with the Department of Electrical Engineering, University of Malaya, Malaysia. His current research interests include the field of computer vision, machine learning and optimization.
References (48)
- et al.
Steel plates fault diagnosis on the basis of support vector machines
Neurocomputing
(2015) - et al.
Fault detection based on a robust one class support vector machine
Neurocomputing
(2014) - et al.
Support vector machines with piecewise linear feature mapping
Neurocomputing
(2013) - et al.
Extreme learning machinetheory and applications
Neurocomputing
(2006) - et al.
Eeg-based vigilance estimation using extreme learning machines
Neurocomputing
(2013) - et al.
Online sequential extreme learning machine with forgetting mechanism
Neurocomputing
(2012) - et al.
A semi-supervised online sequential extreme learning machine method
Neurocomputing
(2016) - et al.
Weighted extreme learning machine for imbalance learning
Neurocomputing
(2013) - et al.
Boosting weighted ELM for imbalanced learning
Neurocomputing
(2014) - et al.
Robust extreme learning machine
Neurocomputing
(2013)
Regularized extreme learning machine for regression with missing data
Neurocomputing
Semi-supervised classification learning by discrimination-aware manifold regularization
Neurocomputing
Selmsemi-supervised ELM with application in sparse calibrated location estimation
Neurocomputing
Semi-supervised extreme learning machine with manifold and pairwise constraints regularization
Neurocomputing
Semi-supervised deep extreme learning machine for wi-fi based localization
Neurocomputing
Hessian sparse coding
Neurocomputing
Multiview Hessian discriminative sparse coding for image annotation
Comput. Vis. Image Underst.
Learning representations by back-propagation errors
Nature
Training feedforward networks with the Marquardt algorithm
IEEE Trans. Neural Netw.
Support-vector networks
Mach. Learn.
Least squares support vector machine classifiers
Neural Process. Lett.
Extreme learning machine for regression and multiclass classification
IEEE Trans. Syst. Man Cybern. Part B: Cybern.
Cited by (0)
Ganesh Krishnasamy received the B.Eng. and M.Eng. degrees in electrical and electronic engineering from the Universiti Kebangsaan Malaysia in 2004 and 2007 respectively. He is currently pursuing the Ph.D. degree in electrical engineering with the Department of Electrical Engineering, University of Malaya, Malaysia. His current research interests include the field of computer vision, machine learning and optimization.
Raveendran Paramesran received the B.Sc. and M.Sc. degrees in electrical engineering from South Dakota State University, Brookings, South Dakota, USA in 1984 and 1985 respectively. He was a systems designer with Daktronics, USA before joining the Department of Electrical Engineering at University of Malaya, Kuala Lumpur, as a lecturer in 1986. In 1992, he received a Ronpaku scholarship from Japan to pursue Doctorate in Engineering, which he completed in 1994 at University of Tokushima, Japan. He was promoted as associate professor in 1995 and was promoted as professor in 2003. His research areas include image and video analysis, formulation of new image descriptors for image analysis, fast computation of orthogonal moments, analysis of EEG signals, and data modeling of substance concentration acquired from non-invasive methods. His contributions can be seen in the form of journal publications, conference proceedings, chapters in books and an international patent to predict blood glucose levels using non-parametric model. He has successfully supervised the completion of 10 Ph.D. students and 12 students in M.Eng.Sc. (Masters by research). His current research interests include image and video analysis, formulation of new image descriptors for image analysis, fast computation of orthogonal moments, analysis of electroencephalography signals, and data modeling of substance concentration acquired from non-invasive methods.