Elsevier

Neurocomputing

Volume 72, Issues 13–15, August 2009, Pages 3066-3076
Neurocomputing

Partial Lanczos extreme learning machine for single-output regression problems

https://doi.org/10.1016/j.neucom.2009.03.016Get rights and content

Abstract

There are two problems preventing the further development of extreme learning machine (ELM). First, the ill-conditioning of hidden layer output matrix reduces the stability of ELM. Second, the complexity of singular value decomposition (SVD) for computing Moore–Penrose generalized inverse limits the learning speed of ELM. For these two problems, this paper proposes the partial Lanczos ELM (PL-ELM) which employs the hybrid of partial Lanczos bidiagonalization and SVD to compute output weights. Experimental results indicate that, compared with ELM, PL-ELM not only effectively improves the stability and generalization performance but also raises the learning speed.

Introduction

Recently, Huang et al. [1], [2] have proposed a novel learning algorithm for single hidden layer feedforward networks (SLFNs) named extreme learning machine (ELM). In ELM, input weights and hidden layer bias are randomly chosen, and output weights are analytically determined based on the Moore–Penrose generalized inverse of hidden layer output matrix. ELM provides better generalization performance with extremely fast learning speed than gradient-based learning algorithms. In addition, ELM avoids many difficulties faced by gradient-based learning methods such as stopping criteria, learning rate, learning epochs and local minima [1], [3]. Depending upon these advantages, ELM has been successfully applied in many areas, such as classification [4], function approximation [5], [6], nontechnical loss analysis [7], terrain reconstruction [8] and protein structure prediction [9].

In order to enhance the performance of ELM, many improved models of ELM have been proposed. We roughly summarize these improved models of ELM into four types: incremental type, optimization type, replacement type and ensemble type. The incremental type of ELM creates new hidden layer neurons one by one according to certain criteria, which is very suitable for stream data [10], [11], [12]. The optimization type of ELM employs some techniques, such as evolutionary algorithm and linear programming, to tune input weights and hidden layer bias and optimize the network structure; it achieves good generalization performance and much more compact network structure [13], [14], [15]. The replacement type of ELM replaces the activation functions of ELM (sigmoid and RBF) with sine and cosine functions (or other functions), which is helpful to improve the accuracy and the convergence rate for the problem of function approximation [6]. In the ensemble type of ELM, different ELMs are trained by disjoint subsets of data, but they can share the same hidden layer neurons [5], [16].

Although these models improve the performance of ELM to a certain degree, there still exists great development space for the stability and the learning speed of ELM. We firstly analyze the critical factor influencing the stability of ELM.

As rigorously proven by Huang et al. [1], input weights and hidden layer bias of SLFNs can be randomly chosen without tuning, and then the hidden layer output matrix H of SLFNs can remain unchanged in the whole learning process. Thus, training a SLFN is equivalent to finding the minimal Euclidean norm solution β of the linear system Hβ=T [1]:β=HTsuchthatβ=argminβHβ-T2where β denotes the output weight matrix of the trained ELM, H denotes the Moore–Penrose generalized inverse of the hidden layer output matrix H, T denotes the target vector. The detailed specification of these variables is given in [1].

ELM computes the Moore–Penrose generalized inverse H in Eq. (1) based on the singular value decomposition (SVD) of H [17]. The SVD of HRN×n,Nn, is given byH=UΣVT=i=1nuiσiviTwhere U=(u1,,un), V=(v1,,vn), Σ=diag(σ1,,σn), σiΣ,i=1,,n are singular values of H with the order σ1σn0, and the rank r of H equals the number of strictly positive singular values, i.e. σr>σr+1==σn=0.

Based on Eq. (2), the Moore–Penrose generalized inverse H in Eq. (1) is computed byH=VΣUT=i=1rviuiTσiand the output weight matrix of ELM is computed byβ=HT=i=1rviuiTσiT

In practical applications, the target vector T usually contains a certain degree of perturbations. Let e denote the perturbation in T and T˜=T+e denote the perturbed target vector. The perturbed output weight matrix β˜ of ELM can be computed byβ˜=HT˜=HT+He=i=1rviuiTσiT+i=1rviuiTσie

When the hidden layer output matrix H has a very large condition number (i.e. H tending to be ill-conditioned), there always exist some very small (positive and approximate to zero) singular values in H. From the two summation items in the right hand of Eq. (5) we can find that, when divided by small singular values, the output weights become very large and tend to be greatly influenced by the perturbation e. The large output weights also weaken the generalization capability of ELM, because the trained network will behave very different if test data change but slightly away from the training data. The ill-conditioning of hidden layer output matrix is the critical factor influencing the stability of ELM.

We then analyze the critical factor limiting the learning speed of ELM as follows. The learning time of ELM is mainly consumed by computing the Moore–Penrose generalized inverse H [1]. As mentioned above (see Eqs. (2), (3), (4)), computing H requires the SVD of H. For matrix HRN×n, the computational complexity of SVD is O(4Nn2+8n3) [18]. When the number of hidden layer neurons (i.e. n) becomes large, the computational complexity of SVD will significantly rise. Although some methods are proposed to compact the network structure of ELM [13], [14], they make ELM lose its property of determining network structure completely independent from training data. The large computational complexity of SVD is the critical factor limiting the learning speed of ELM.

To improve the stability of ELM, it is necessary to incorporate some regularization methods for reducing the influences of ill-conditioning and perturbations. Truncated SVD (TSVD) and Tikhonov regularization [19], [20] are two kinds of widely used regularization methods. As proven by [19], [21], TSVD can provide similar results with Tikhonov regularization. TSVD only focuses on the contributions associated with the k largest singular values and effectively avoids the adverse effects caused by the smallest singular values. According to TSVD, the Moore–Penrose generalized inverse of H can be computed byH=i=1κviuiTσi,κr=rank(H)nwhere κ is the truncation number of TSVD for H, usually κn.

To improve the learning speed of ELM, it is necessary to reduce the computational complexity of SVD. Although TSVD provides more stable solutions than SVD, it still depends on the results of SVD and has the similar computational complexity with SVD [22]. Lanczos bidiagonalization (LBD) is an efficient iterative method for computing the SVD of large and ill-conditioned matrices [23], [24], [25]. Given the appropriate number of iterations, we can conduct partial LBD for a matrix. Partial LBD not only makes the computation of SVD fairly inexpensive but also provides good approximation to the singular triplets associated with the largest singular values of the matrix [26]. Partial LBD has been applied to the computation of many regularization methods, such as Tikhonov regularization and TSVD [27], [28], [29].

In this paper, we propose an enhanced ELM, called partial Lanczos ELM (PL-ELM), which computes the output weights based on the hybrid of partial LBD and SVD. PL-ELM first implements partial LBD to the hidden layer output matrix H so that the linear system Hβ=T can be projected onto a small-size Krylov subspace, then PL-ELM determines the approximate solution of output weights from the Krylov subspace. Since the dimension of the Krylov subspace is usually much smaller than that of hidden layer output matrix, PL-ELM can significantly reduce the computational complexity compared with ELM. On the other hand, the results of PL-ELM are very similar to those achieved by directly applying TSVD to the linear system Hβ=T. Currently, PL-ELM mainly deals with the single-output regression problem. We validate the performance of PL-ELM using several benchmark regression data sets.

This paper is organized as follows. Section 2 briefly reviews the partial LBD algorithm; Section 3 proposes the PL-ELM algorithm and then analyzes the computational complexity, the relative perturbation bound and the parameter choice of the proposed PL-ELM algorithm; Section 4 shows experimental results and discussions; Section 5 contains the conclusion and the consideration of future research.

Section snippets

Review of partial LBD

This section briefly reviews partial LBD. The detailed descriptions of partial LBD can be found in [23], [30], [31] with slightly different notations. The present notation in this section is consistent with that of ELM [1]. Given the matrix HRN×n and the iteration number k(k<n), LBD generates a sequence of Lanczos vectors ujRN and vjRn and scalars αj and γj, (j=1,,k):

Choose a nonzero starting vector p0RN, let γ1=p02, u1=p0/γ1 and v00. Then implement Lanczos iterations for j=1,,k αjvj=HT

The approximate solution to the linear least squares problem

This section computes the approximate solution to the linear least squares problemminxHx-T2,HRN×n,Nnin a Krylov subspace based on the hybrid of partial LBD and SVD. We summarize the computation process of the approximate solution as follows.

Given matrix HRN×n, target vector T=[t1,,tN]TRN and the iteration number k,

Step 1 Implement k LBD iterations to matrix H by assigning T as the starting vector (cf. Eqs. (7), (8)). Then, obtain a lower bidiagonal matrix BkR(k+1)×k and two orthonormal

Experiments

Besides the proposed PL-ELM, three state-of-art regression methods and a modified ELM are employed for performance comparison:

  • SVR: support vector regression [39];

  • ELM: extreme learning machine [1];

  • E-ELM: evolutionary extreme learning machine [13];

  • T-ELM: the modified ELM that directly applies TSVD to compute Moore–Penrose generalized inverse instead of SVD.

We compare performances of the five batch learning regression methods using eight benchmark data sets. These data sets are collected from the

Conclusion

In order to improve the stability and the learning speed of ELM, this paper proposes an enhanced ELM named partial Lanczos ELM (PL-ELM). In PL-ELM, the output weights are computed by the hybrid of partial Lanczos bidiagonalization and SVD, which gives PL-ELM two advantages. First, PL-ELM can effectively filter out the adverse effects caused by the smallest singular values in the ill-conditioned hidden layer output matrix. Second, PL-ELM can significantly reduce computational complexity by

Acknowledgements

This work was supported by the project (60674073) of the National Nature Science Foundation of China, the project (2006BAB14B05) of the National Major Technology R&D Program of China and the project (2006CB403405) of the National Basic Research Program of China (973 Program). National High Technology Research and Development Program of P.R. China (863 Program) (2007AA04Z158). All of these supports are appreciated. We would also thank the associate editor and two anonymous referees for their

Xiaoliang Tang received the B.S. degree in Electronic and Information Engineering from Dalian University of Technology, Dalian, China, in 2003. He is currently pursuing the Ph.D. degree at the same University. His research interests include semi-supervised learning and neural networks.

References (44)

  • G.B. Huang et al.

    Can threshold networks be trained directly?

    IEEE Trans. Circuits Syst. II Exp. Briefs

    (2006)
  • G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in:...
  • R.X. Zhang et al.

    Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis

    IEEE ACM Trans. Comput. Biol. Bioinformatics

    (2007)
  • G.B. Huang et al.

    Universal approximation using incremental constructive feedforward networks with random hidden nodes

    IEEE Trans. Neural Networks

    (2006)
  • A.H. Nizar et al.

    Power utility nontechnical loss analysis with extreme learning machine method

    IEEE Trans. Power Syst.

    (2008)
  • C.W.T. Yeu et al.

    A new machine learning paradigm for terrain reconstruction

    IEEE Geosci. Remote Sensing Lett.

    (2006)
  • G.B. Huang et al.

    Real-time learning capability of neural networks

    IEEE Trans. Neural Networks

    (2006)
  • N.Y. Liang et al.

    A fast and accurate online sequential learning algorithm for feedforward networks

    IEEE Trans. Neural Networks

    (2006)
  • G.H. Golub et al.

    Matrix Computations

    (1996)
  • P.C. Hansen

    Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion

    (1998)
  • H. Engl et al.

    Regularization of Inverse Problems (Mathematics and Its Applications)

    (2000)
  • R.D. Fierro et al.

    Regularization by truncated total least squares

    SIAM J. Sci. Comput.

    (1997)
  • Cited by (61)

    • Microcalcification diagnosis in digital mammography using extreme learning machine based on hidden Markov tree model of dual-tree complex wavelet transform

      2017, Expert Systems with Applications
      Citation Excerpt :

      Extreme learning machine (ELM) (Huang, Zhu, & Siew, 2004; Huang, Chen, & Siew, 2006; Huang, Zhu, & Siew, 2006), a very simple and effective learning algorithm, shows a variety of benefits: the mathematical derivation is simple, the learning speed is fast, the nodes of hidden layer need not be tuned, etc. Therefore, ELM has been widely applied in solving regression, classification, clustering, feature learning, compressive sensing theory and application problems (Huang, 2014; Li, Yang, & Burdet, 2016; Sun, Liu, Huang, & Zhang, 2016; Tang, & Han, 2009; Tang, Deng, & Huang, 2016), such as breast cancer detection and classification (Malar, Kandaswamy, Chakravarthy, & Dharan, 2012; Wang, Yu, Kang, Zhao, & Qu, 2014; Xie, Li, & Ma, 2016). In this paper, after feature extraction and selection, we apply ELM as the classifier to diagnose microcalcifications.

    • Class-specific cost regulation extreme learning machine for imbalanced classification

      2017, Neurocomputing
      Citation Excerpt :

      For these reasons, Huang et al. proposed the extreme learning machine (ELM) as an extension of the traditional training methods of SLFNs, in which the hidden layer do not need to be tuned and can be generated randomly [2]. Due to its faster speed and better generalization performance, ELM has received much attention and many new progresses have been made [3–9]. For example, Huang et al. proposed the incremental ELM (I-ELM) and its improved variants [10–13], by adopting an incremental construction method to adjust the number of the hidden nodes.

    View all citing articles on Scopus

    Xiaoliang Tang received the B.S. degree in Electronic and Information Engineering from Dalian University of Technology, Dalian, China, in 2003. He is currently pursuing the Ph.D. degree at the same University. His research interests include semi-supervised learning and neural networks.

    Min Han received her B.S. and M.S degree from Department of Electrical Engineering Dalian University of Technology respectively in 1982 and 1993, and received the M.S degree and Ph.D. degree in Kyushu University, Japan, in 1996 and 1999. She is a Professor at School of Electronic and Information Engineering, Dalian University of Technology. Her current research interests are neural network, pattern recognition and chaos.

    View full text