Partial Lanczos extreme learning machine for single-output regression problems

doi:10.1016/j.neucom.2009.03.016

Neurocomputing

Volume 72, Issues 13–15, August 2009, Pages 3066-3076

https://doi.org/10.1016/j.neucom.2009.03.016 Get rights and content

Abstract

There are two problems preventing the further development of extreme learning machine (ELM). First, the ill-conditioning of hidden layer output matrix reduces the stability of ELM. Second, the complexity of singular value decomposition (SVD) for computing Moore–Penrose generalized inverse limits the learning speed of ELM. For these two problems, this paper proposes the partial Lanczos ELM (PL-ELM) which employs the hybrid of partial Lanczos bidiagonalization and SVD to compute output weights. Experimental results indicate that, compared with ELM, PL-ELM not only effectively improves the stability and generalization performance but also raises the learning speed.

Introduction

Recently, Huang et al. [1], [2] have proposed a novel learning algorithm for single hidden layer feedforward networks (SLFNs) named extreme learning machine (ELM). In ELM, input weights and hidden layer bias are randomly chosen, and output weights are analytically determined based on the Moore–Penrose generalized inverse of hidden layer output matrix. ELM provides better generalization performance with extremely fast learning speed than gradient-based learning algorithms. In addition, ELM avoids many difficulties faced by gradient-based learning methods such as stopping criteria, learning rate, learning epochs and local minima [1], [3]. Depending upon these advantages, ELM has been successfully applied in many areas, such as classification [4], function approximation [5], [6], nontechnical loss analysis [7], terrain reconstruction [8] and protein structure prediction [9].

In order to enhance the performance of ELM, many improved models of ELM have been proposed. We roughly summarize these improved models of ELM into four types: incremental type, optimization type, replacement type and ensemble type. The incremental type of ELM creates new hidden layer neurons one by one according to certain criteria, which is very suitable for stream data [10], [11], [12]. The optimization type of ELM employs some techniques, such as evolutionary algorithm and linear programming, to tune input weights and hidden layer bias and optimize the network structure; it achieves good generalization performance and much more compact network structure [13], [14], [15]. The replacement type of ELM replaces the activation functions of ELM (sigmoid and RBF) with sine and cosine functions (or other functions), which is helpful to improve the accuracy and the convergence rate for the problem of function approximation [6]. In the ensemble type of ELM, different ELMs are trained by disjoint subsets of data, but they can share the same hidden layer neurons [5], [16].

Although these models improve the performance of ELM to a certain degree, there still exists great development space for the stability and the learning speed of ELM. We firstly analyze the critical factor influencing the stability of ELM.

As rigorously proven by Huang et al. [1], input weights and hidden layer bias of SLFNs can be randomly chosen without tuning, and then the hidden layer output matrix H of SLFNs can remain unchanged in the whole learning process. Thus, training a SLFN is equivalent to finding the minimal Euclidean norm solution β of the linear system $H β = T$ [1]: $β = H^{†} T such that β = \underset{β}{\arg \min} ∥ H β - T ∥_{2}$ where β denotes the output weight matrix of the trained ELM, $H^{†}$ denotes the Moore–Penrose generalized inverse of the hidden layer output matrix H, T denotes the target vector. The detailed specification of these variables is given in [1].

ELM computes the Moore–Penrose generalized inverse $H^{†}$ in Eq. (1) based on the singular value decomposition (SVD) of H [17]. The SVD of $H \in R^{N \times n}, N ⩾ n$ , is given by $H = U Σ V^{T} = \sum_{i = 1}^{n} u_{i} σ_{i} v_{i}^{T}$ where $U = (u_{1}, \dots, u_{n})$ , $V = (v_{1}, \dots, v_{n})$ , $Σ = diag (σ_{1}, \dots, σ_{n})$ , $σ_{i} \in Σ, i = 1, \dots, n$ are singular values of H with the order $σ_{1} ⩾ \dots ⩾ σ_{n} ⩾ 0$ , and the rank r of H equals the number of strictly positive singular values, i.e. $σ_{r} > σ_{r + 1} = \dots = σ_{n} = 0$ .

Based on Eq. (2), the Moore–Penrose generalized inverse $H^{†}$ in Eq. (1) is computed by $H^{†} = V Σ^{†} U^{T} = \sum_{i = 1}^{r} \frac{v_{i} u_{i}^{T}}{σ_{i}}$ and the output weight matrix of ELM is computed by $β = H^{†} T = \sum_{i = 1}^{r} \frac{v_{i} u_{i}^{T}}{σ_{i}} T$

In practical applications, the target vector T usually contains a certain degree of perturbations. Let e denote the perturbation in T and $\tilde{T} = T + e$ denote the perturbed target vector. The perturbed output weight matrix $\tilde{β}$ of ELM can be computed by $\tilde{β} = H^{†} \tilde{T} = H^{†} T + H^{†} e = \sum_{i = 1}^{r} \frac{v_{i} u_{i}^{T}}{σ_{i}} T + \sum_{i = 1}^{r} \frac{v_{i} u_{i}^{T}}{σ_{i}} e$

When the hidden layer output matrix H has a very large condition number (i.e. H tending to be ill-conditioned), there always exist some very small (positive and approximate to zero) singular values in H. From the two summation items in the right hand of Eq. (5) we can find that, when divided by small singular values, the output weights become very large and tend to be greatly influenced by the perturbation e. The large output weights also weaken the generalization capability of ELM, because the trained network will behave very different if test data change but slightly away from the training data. The ill-conditioning of hidden layer output matrix is the critical factor influencing the stability of ELM.

We then analyze the critical factor limiting the learning speed of ELM as follows. The learning time of ELM is mainly consumed by computing the Moore–Penrose generalized inverse $H^{†}$ [1]. As mentioned above (see Eqs. (2), (3), (4)), computing $H^{†}$ requires the SVD of H. For matrix $H \in R^{N \times n}$ , the computational complexity of SVD is $O (4 N n^{2} + 8 n^{3})$ [18]. When the number of hidden layer neurons (i.e. n) becomes large, the computational complexity of SVD will significantly rise. Although some methods are proposed to compact the network structure of ELM [13], [14], they make ELM lose its property of determining network structure completely independent from training data. The large computational complexity of SVD is the critical factor limiting the learning speed of ELM.

To improve the stability of ELM, it is necessary to incorporate some regularization methods for reducing the influences of ill-conditioning and perturbations. Truncated SVD (TSVD) and Tikhonov regularization [19], [20] are two kinds of widely used regularization methods. As proven by [19], [21], TSVD can provide similar results with Tikhonov regularization. TSVD only focuses on the contributions associated with the k largest singular values and effectively avoids the adverse effects caused by the smallest singular values. According to TSVD, the Moore–Penrose generalized inverse of H can be computed by $H^{†} = \sum_{i = 1}^{κ} \frac{v_{i} u_{i}^{T}}{σ_{i}}, κ ⩽ r = rank (H) ⩽ n$ where κ is the truncation number of TSVD for H, usually $κ ⪡ n$ .

To improve the learning speed of ELM, it is necessary to reduce the computational complexity of SVD. Although TSVD provides more stable solutions than SVD, it still depends on the results of SVD and has the similar computational complexity with SVD [22]. Lanczos bidiagonalization (LBD) is an efficient iterative method for computing the SVD of large and ill-conditioned matrices [23], [24], [25]. Given the appropriate number of iterations, we can conduct partial LBD for a matrix. Partial LBD not only makes the computation of SVD fairly inexpensive but also provides good approximation to the singular triplets associated with the largest singular values of the matrix [26]. Partial LBD has been applied to the computation of many regularization methods, such as Tikhonov regularization and TSVD [27], [28], [29].

In this paper, we propose an enhanced ELM, called partial Lanczos ELM (PL-ELM), which computes the output weights based on the hybrid of partial LBD and SVD. PL-ELM first implements partial LBD to the hidden layer output matrix H so that the linear system $H β = T$ can be projected onto a small-size Krylov subspace, then PL-ELM determines the approximate solution of output weights from the Krylov subspace. Since the dimension of the Krylov subspace is usually much smaller than that of hidden layer output matrix, PL-ELM can significantly reduce the computational complexity compared with ELM. On the other hand, the results of PL-ELM are very similar to those achieved by directly applying TSVD to the linear system $H β = T$ . Currently, PL-ELM mainly deals with the single-output regression problem. We validate the performance of PL-ELM using several benchmark regression data sets.

This paper is organized as follows. Section 2 briefly reviews the partial LBD algorithm; Section 3 proposes the PL-ELM algorithm and then analyzes the computational complexity, the relative perturbation bound and the parameter choice of the proposed PL-ELM algorithm; Section 4 shows experimental results and discussions; Section 5 contains the conclusion and the consideration of future research.

Section snippets

Review of partial LBD

This section briefly reviews partial LBD. The detailed descriptions of partial LBD can be found in [23], [30], [31] with slightly different notations. The present notation in this section is consistent with that of ELM [1]. Given the matrix $H \in R^{N \times n}$ and the iteration number $k (k < n)$ , LBD generates a sequence of Lanczos vectors $u_{j} \in R^{N}$ and $v_{j} \in R^{n}$ and scalars $α_{j}$ and $γ_{j}$ , $(j = 1, \dots, k)$ :

Choose a nonzero starting vector $p_{0} \in R^{N}$ , let $γ_{1} = ∥ p_{0} ∥_{2}$ , $u_{1} = p_{0} / γ_{1}$ and $v_{0} \equiv 0$ . Then implement Lanczos iterations for $j = 1, \dots, k$ $α_{j} v_{j} = H^{T}$

The approximate solution to the linear least squares problem

This section computes the approximate solution to the linear least squares problem $\min_{x} ∥ Hx - T ∥_{2}, H \in R^{N \times n}, N ⩾ n$ in a Krylov subspace based on the hybrid of partial LBD and SVD. We summarize the computation process of the approximate solution as follows.

Given matrix $H \in R^{N \times n}$ , target vector $T = [t_{1}, \dots, t_{N}]^{T} \in R^{N}$ and the iteration number k,

Step 1 Implement k LBD iterations to matrix H by assigning T as the starting vector (cf. Eqs. (7), (8)). Then, obtain a lower bidiagonal matrix $B_{k} \in R^{(k + 1) \times k}$ and two orthonormal

Experiments

Besides the proposed PL-ELM, three state-of-art regression methods and a modified ELM are employed for performance comparison:

•
SVR: support vector regression [39];
•
ELM: extreme learning machine [1];
•
E-ELM: evolutionary extreme learning machine [13];
•
T-ELM: the modified ELM that directly applies TSVD to compute Moore–Penrose generalized inverse instead of SVD.

We compare performances of the five batch learning regression methods using eight benchmark data sets. These data sets are collected from the

Conclusion

In order to improve the stability and the learning speed of ELM, this paper proposes an enhanced ELM named partial Lanczos ELM (PL-ELM). In PL-ELM, the output weights are computed by the hybrid of partial Lanczos bidiagonalization and SVD, which gives PL-ELM two advantages. First, PL-ELM can effectively filter out the adverse effects caused by the smallest singular values in the ill-conditioned hidden layer output matrix. Second, PL-ELM can significantly reduce computational complexity by

Acknowledgements

This work was supported by the project (60674073) of the National Nature Science Foundation of China, the project (2006BAB14B05) of the National Major Technology R&D Program of China and the project (2006CB403405) of the National Basic Research Program of China (973 Program). National High Technology Research and Development Program of P.R. China (863 Program) (2007AA04Z158). All of these supports are appreciated. We would also thank the associate editor and two anonymous referees for their

Xiaoliang Tang received the B.S. degree in Electronic and Information Engineering from Dalian University of Technology, Dalian, China, in 2003. He is currently pursuing the Ph.D. degree at the same University. His research interests include semi-supervised learning and neural networks.

References (44)

G.B. Huang et al.
Extreme learning machine: theory and applications
Neurocomputing
(2006)
F. Han et al.
Improved extreme learning machine for function approximation by encoding a priori information
Neurocomputing
(2006)
G. Wang et al.
A protein secondary structure prediction framework based on the Extreme Learning Machine
Neurocomputing
(2008)
G.-B. Huang et al.
Enhanced random search based incremental extreme learning machine
Neurocomputing
(2008)
G.B. Huang et al.
Convex incremental extreme learning machine
Neurocomputing
(2007)
G.B. Huang et al.
Incremental extreme learning machine with fully complex hidden nodes
Neurocomputing
(2008)
Q.Y. Zhu et al.
Evolutionary extreme learning machine
Pattern Recognition
(2005)
X.K. Wei et al.
Linear programming minimum sphere set covering for extreme learning machines
Neurocomputing
(2008)
H.-J. Rong et al.
A fast pruned-extreme learning machine for classification problem
Neurocomputing
(2008)
L. Elden
Partial least-squares vs. Lanczos bidiagonalization—I: analysis of a projection method for multiple regression
Comput. Stat. Data Anal.
(2004)

G.B. Huang et al.

Can threshold networks be trained directly?

IEEE Trans. Circuits Syst. II Exp. Briefs

(2006)

G.B. Huang, Q.Y. Zhu, C.K. Siew, Extreme learning machine: a new learning scheme of feedforward neural networks, in:...

R.X. Zhang et al.

Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis

IEEE ACM Trans. Comput. Biol. Bioinformatics

(2007)

G.B. Huang et al.

Universal approximation using incremental constructive feedforward networks with random hidden nodes

IEEE Trans. Neural Networks

(2006)

A.H. Nizar et al.

Power utility nontechnical loss analysis with extreme learning machine method

IEEE Trans. Power Syst.

(2008)

C.W.T. Yeu et al.

A new machine learning paradigm for terrain reconstruction

IEEE Geosci. Remote Sensing Lett.

(2006)

G.B. Huang et al.

Real-time learning capability of neural networks

IEEE Trans. Neural Networks

(2006)

N.Y. Liang et al.

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Networks

(2006)

G.H. Golub et al.

Matrix Computations

(1996)

P.C. Hansen

Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion

(1998)

H. Engl et al.

Regularization of Inverse Problems (Mathematics and Its Applications)

(2000)

R.D. Fierro et al.

Regularization by truncated total least squares

SIAM J. Sci. Comput.

(1997)

Cited by (61)

A modified Lanczos Algorithm for fast regularization of extreme learning machines
2020, Neurocomputing
This paper presents a new regularization for Extreme Learning Machines (ELMs). ELMs are Randomized Neural Networks (RNNs) that are known for their fast training speed and good accuracy. Nevertheless the complexity of ELMs has to be selected, and regularization has to be performed in order to avoid underfitting or overfitting. Therefore, a novel Regularization is proposed using a modified Lanczos Algorithm: Iterative Lanczos Extreme Learning Machine (Lan-ELM). As summarized in the experimental Section, the computational time is on average divided by 4 and the Normalized MSE is on average reduced by 11%. In addition, the proposed method can be intuitively parallelized, which makes it a very valuable tool to analyze huge data sets in real-time.
Temperature extraction for Brillouin optical fiber sensing system based on extreme learning machine
2019, Optics Communications
The use of extreme learning machine (ELM) network to extract temperature distribution from the measured Brillouin gain spectra (BGSs) along the sensing fiber obtained by Brillouin optical fiber sensors is proposed and demonstrated experimentally. Compared with conventional curve fitting method (CFM), ELM network trained by a set of ideal BGSs can extract temperature information directly from the measured BGSs obtained by Brillouin optical time domain reflectometer (BOTDR) system without the need of determining Brillouin frequency shift (BFS) and converting BFS to temperature. The BGSs linewidth is taken into account to construct the ideal BGSs by using Pseudo-Voigt curve for ELM training. The performance of ELM is analyzed in detail and compared with that of widely-used Lorentzian CFM, and the experiment results show that ELM can provide higher accuracy even at large frequency scanning step and faster processing speed. Therefore, the proposed ELM approach is feasible and effective for temperature extraction in Brillouin optical fiber sensing system.
Microcalcification diagnosis in digital mammography using extreme learning machine based on hidden Markov tree model of dual-tree complex wavelet transform
2017, Expert Systems with Applications
Citation Excerpt :
Extreme learning machine (ELM) (Huang, Zhu, & Siew, 2004; Huang, Chen, & Siew, 2006; Huang, Zhu, & Siew, 2006), a very simple and effective learning algorithm, shows a variety of benefits: the mathematical derivation is simple, the learning speed is fast, the nodes of hidden layer need not be tuned, etc. Therefore, ELM has been widely applied in solving regression, classification, clustering, feature learning, compressive sensing theory and application problems (Huang, 2014; Li, Yang, & Burdet, 2016; Sun, Liu, Huang, & Zhang, 2016; Tang, & Han, 2009; Tang, Deng, & Huang, 2016), such as breast cancer detection and classification (Malar, Kandaswamy, Chakravarthy, & Dharan, 2012; Wang, Yu, Kang, Zhao, & Qu, 2014; Xie, Li, & Ma, 2016). In this paper, after feature extraction and selection, we apply ELM as the classifier to diagnose microcalcifications.
Diagnosis of benign and malignant microcalcifications in digital mammography using Computer-aided Diagnosis (CAD) system is critical for the early diagnosis of breast cancer. Wavelet transform based diagnosis methods are effective to accomplish this task, but limited by representing the correlation within each wavelet scale, these methods neglect the correlation between wavelet scales. In this paper, we apply the hidden Markov tree model of dual-tree complex wavelet transform (DTCWT-HMT) for microcalcification diagnosis in digital mammography. DTCWT-HMT can effectively capture the correlation between different wavelet coefficients and model the statistical dependencies and non-Gaussian statistics of real signals, is used to characterize microcalcifications for the diagnosis of benign and malignant cases. The combined features which consist of the DTCWT-HMT features and the DTCWT features are optimized by genetic algorithm (GA). Extreme learning machine (ELM), an efficient learning theory is employed as the classifier to diagnose the benign and malignant microcalcifications. The validity of the proposed method is evaluated on the Nijmegen, MIAS and DDSM datasets using area under curve (AUC) of receiver operating characteristic (ROC). The AUC values of 0.9856, 0.9941 and 0.9168 of the proposed method are achieved on Nijmegen, MIAS and DDSM, respectively. We compare the proposed method with state-of-the-art diagnosis methods, and the experimental results show the effectiveness of the proposed method for the diagnosis of the benign and malignant microcalcifications in mammograms in terms of the accuracy and stability.
Class-specific cost regulation extreme learning machine for imbalanced classification
2017, Neurocomputing
Citation Excerpt :
For these reasons, Huang et al. proposed the extreme learning machine (ELM) as an extension of the traditional training methods of SLFNs, in which the hidden layer do not need to be tuned and can be generated randomly [2]. Due to its faster speed and better generalization performance, ELM has received much attention and many new progresses have been made [3–9]. For example, Huang et al. proposed the incremental ELM (I-ELM) and its improved variants [10–13], by adopting an incremental construction method to adjust the number of the hidden nodes.
Due to its much faster speed and better generalization performance, extreme learning machine (ELM) has attracted much attention as an effective learning approach. However, ELM rarely involves strategies for imbalanced data distributions which may exist in many fields. Existing approaches for imbalance learning only consider the effect of the number of the class samples ignoring the dispersion degree of the data, and may lead to the suboptimal learning results. In this paper, we will propose a novel ELM, class-specific cost regulation extreme learning machine (CCR-ELM), together with its kernel based extension, for binary and multiclass classification problems with imbalanced data distributions. CCR-ELM introduces class-specific regulation cost for misclassification of each class in the performance index as the tradeoff of structural risk and empirical risk. The performance of CCR-ELM is verified using a number of benchmark datasets and the real blast furnace status diagnosis problem. Experimental results show that CCR-ELM can achieve better performance for classification problems with imbalanced data distributions than the original ELM and existing ELM imbalance learning approach, and the kernel based CCR-ELM can improve the performance further.
Reducing training requirements through evolutionary based dimension reduction and subject transfer
2017, Neurocomputing
Training Brain Computer Interface (BCI) systems to understand the intention of a subject through Electroencephalogram (EEG) data currently requires multiple training sessions with a subject in order to develop the necessary expertise to distinguish signals for different tasks. Conventionally the task of training the subject is done by introducing a training and calibration stage during which some feedback is presented to the subject. This training session can take several hours which is not appropriate for on-line EEG-based BCI systems. An alternative approach is to use previous recording sessions of the same person or some other subjects that performed the same tasks (subject transfer) for training the classifiers. The main aim of this study is to generate a methodology that allows the use of data from other subjects while reducing the dimensions of the data. The study investigates several possibilities for reducing the necessary training and calibration period in subjects and the classifiers and addresses the impact of i) evolutionary subject transfer and ii) adapting previously trained methods (retraining) using other subjects data. Our results suggest reduction to 40% of target subject data is sufficient for training the classifier. Our results also indicate the superiority of the approaches that incorporated evolutionary subject transfer and highlights the feasibility of adapting a system trained on other subjects.
FASTA-ELM: A fast adaptive shrinkage/thresholding algorithm for extreme learning machine and its application to gender recognition
2017, Neurocomputing
Extreme learning machine (ELM) is an interesting algorithm for learning the hidden layer of single layer feed forward neural networks. However, one of the main shortcomings restricting further improvement of ELM is the complexity of singular value decomposition (SVD) for computing the Moore-Penrose generalized inverse of the hidden layer matrix. This paper presents a new algorithm named fast adaptive shrinkage/thresholding algorithm ELM (FASTA-ELM) which uses an extension of forward-backward splitting (FBS) to compute the smallest norm of the output weights in ELM. The proposed FASTA-ELM algorithm is evaluated on face gender recognition problem using 5 benchmarked datasets. The results indicate that FASTA-ELM provides efficient performance and outperforms the standard ELM and two other variants of ELM in terms of generalization ability and computational time. Furthermore, the recognition performance of FASTA-ELM is comparable to other state-of-the-art face gender recognition methods.

View all citing articles on Scopus

Min Han received her B.S. and M.S degree from Department of Electrical Engineering Dalian University of Technology respectively in 1982 and 1993, and received the M.S degree and Ph.D. degree in Kyushu University, Japan, in 1996 and 1999. She is a Professor at School of Electronic and Information Engineering, Dalian University of Technology. Her current research interests are neural network, pattern recognition and chaos.

View full text

Partial Lanczos extreme learning machine for single-output regression problems

Abstract

Introduction

Section snippets

Review of partial LBD

The approximate solution to the linear least squares problem

Experiments

Conclusion

Acknowledgements

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Neurocomputing

Pattern Recognition

Neurocomputing

Neurocomputing

Comput. Stat. Data Anal.

Can threshold networks be trained directly?

IEEE Trans. Circuits Syst. II Exp. Briefs

Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis

IEEE ACM Trans. Comput. Biol. Bioinformatics

Universal approximation using incremental constructive feedforward networks with random hidden nodes

IEEE Trans. Neural Networks

Power utility nontechnical loss analysis with extreme learning machine method

IEEE Trans. Power Syst.

A new machine learning paradigm for terrain reconstruction

IEEE Geosci. Remote Sensing Lett.

Real-time learning capability of neural networks

IEEE Trans. Neural Networks

A fast and accurate online sequential learning algorithm for feedforward networks

IEEE Trans. Neural Networks

Matrix Computations

Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion

Regularization of Inverse Problems (Mathematics and Its Applications)

Regularization by truncated total least squares

SIAM J. Sci. Comput.