Interlocking of learning and orthonormalization in RRLSA

doi:10.1016/S0925-2312(02)00671-9

Neurocomputing

Volume 49, Issues 1–4, December 2002, Pages 429-433

https://doi.org/10.1016/S0925-2312(02)00671-9 Get rights and content

Abstract

In sequential principal component analyzers based on deflation of the input vector, deviations from orthogonality of the previous eigenvector estimates may entail a severe loss of orthogonality in the next stages. A combination of the learning method with subsequent Gram–Schmidt orthonormalization solves this problem, but increases the computational effort. For the “robust recursive least squares learning algorithm” we show how the effort may be reduced by a factor of up to two by interlocking learning and the Gram–Schmidt method.

Introduction

Some neural networks for principal component analysis—e.g. the Generalized Hebbian Algorithm (GHA) [4] or recursive least squares methods [1], [3]—use a sequential method to determine the eigenvectors. The eigenvector with the largest eigenvalue is obtained by applying a single-unit learning rule to the original data. To get the second eigenvector (with the second-largest eigenvalue), the projection onto the first eigenvector is subtracted from each data vector, and a second single-unit network is trained with these “deflated” vectors. This procedure is repeated until the desired number of eigenvectors is extracted.

This method works well if the units are trained sequentially. In this case, the already trained eigenvectors are almost exactly perpendicular to each other, which leads to deflated data vectors that are perpendicular to all of these eigenvectors. In some applications, however, the procedure is applied simultaneously to all units, so all eigenvectors are modified after the presentation of a new input vector. In this case, the orthogonality of the first eigenvectors is not guaranteed, which results in a deflated vector that is not exactly perpendicular to the first eigenvectors. Specifically in methods suitable for tracking non-stationary data—like the “robust recursive least squares learning algorithm” (RRLSA) [3] with unlearning—all eigenvectors are jittering and will therefore be only approximately orthogonal to each other.

The fact that the deflated vector is not perpendicular to the previous eigenvectors becomes problematic, if the next eigenvector direction has a small eigenvalue. The variance in the direction of the next eigenvector could then be smaller than the variance perpendicular to this eigenvector, the latter being introduced by deflation in the case of non-orthogonal previous eigenvector estimates. The next estimated eigenvector would therefore not be perpendicular to the previous eigenvectors, but would mostly lie in the subspace spanned by them. It is thus not useful to continue the extraction of eigenvectors after this processing stage.

This problem can be solved by a subsequent orthonormalization of the eigenvector estimates using the Gram–Schmidt method [2]. However, Gram–Schmidt orthonormalization requires 2nm² operations (additions, multiplications) to orthonormalize m eigenvector estimates of dimension n, thus the entire method is dominated by this complexity. This contribution aims at reducing this effort by interlocking the learning method—specifically RRLSA [3]—with the Gram–Schmidt method, exploiting the fact that the orthonormalization works on vectors that were obtained by applying a learning rule to already orthonormal vectors. RRLSA was chosen as the learning mechanism since—like other recursive least square methods [1]—it is fast and not suffering from accuracy-speed trade-offs like least mean square methods, and proved to be robust in several applications.

Section 2 describes the method where learning and orthonormalization are separated, Section 3 introduces the method where both steps are interlocked, followed by a discussion in Section 4.

Section snippets

RRLSA with Gram–Schmidt orthonormalization

One step of the RRLSA method [3] takes as input m orthonormal weight vectors $w_{1},…, w_{m}$ of dimension n (the eigenvector estimates), and m scalar values L₁,…,L_m corresponding to the length of the weight vectors in the original notation. It produces m vectors $v_{1},…, v_{m}$ which are usually not orthogonal any longer. These are orthonormalized by the Gram–Schmidt process yielding the new weight vectors $w_{1} ′,…, w_{m} ′$ , and the new length values L₁′,…,L_m′ are obtained through $L_{j} ′=|| v_{j} ||$ .

For an input vector $x$ , the

Interlocked method

Obviously we can express $w ′_{k}$ as $w_{k} ′= ∑ j=1 k q_{kj} w_{j} +p_{k} x .$ The interlocked method recursively computes q_kj and p_k for increasing k and determines $w ′_{k}$ using (4). The equations described below were derived by inserting , into (3) under consideration of the orthonormality assumption $w_{i}^{T} w_{j} =δ_{ij}$ . Besides q_kj and p_k, the recursion uses the auxiliary variables t_k, d_k, s_k, and r_kj. In the order of update in the implementation, we obtain $t_{k} =t_{k−1} +p_{k−1}^{2} with t_{1} =0,$ $d_{k} =d_{k−1} −y_{k−1}^{2} with d_{1} =|| x ||^{2},$ $s_{k} =(αL_{k} +βd_{k})y_{k},$ $L′_{k}^{2} =(αL_{k})^{2}$

Discussion

We tested three methods—RRLSA without orthonormalization (RRLSA), RRLSA with modified Gram–Schmidt orthonormalization [2] (RRLSA+MGS), and the interlocked method (INTER)—for a principal component analysis of image windows [4]. In the first experiment we chose n=m=64, in the second n=144, m=30. The other parameters were identical in both experiments: $100 000$ training steps, β=1, and α starting at 0.99 and exponentially approaching 1 with a final value of 0.999. The C language data type “double”

Conclusions

The method suggested in this work interlocks RRLSA [3] with a subsequent Gram–Schmidt orthonormalization. It leads to a speed-up of approximately 2 for large n which was confirmed in simulations. The orthonormality error was observed to be somewhat larger than for a separate Gram–Schmidt orthonormalization, but was shown to not increase over time. The method should therefore be suitable for any application, where all eigenvectors have to be estimated simultaneously, where the method has to stay

Acknowledgements

I am grateful to Bärbel Herrnberger for improvements of the manuscript.

References (4)

T.D. Sanger
Optimal unsupervised learning in a single-layer linear feedforward neural network
Neural Networks
(1989)
S. Bannour et al.
Principal component extraction using recursive least squares learning
IEEE Trans. Neural Networks
(1995)

There are more references available in the full text version of this article.

Cited by (12)

Adaptive local Principal Component Analysis improves the clustering of high-dimensional data
2024, Pattern Recognition
In local Principal Component Analysis (PCA), a distribution is approximated by multiple units, each representing a local region by a hyper-ellipsoid obtained through PCA. We present an extension for local PCA which adaptively adjusts both the learning rate of each unit and the potential function which guides the competition between the local units. Our local PCA method is an online neural network method where unit centers and shapes are modified after the presentation of each data point. For several benchmark distributions, we demonstrate that our method improves the overall quality of clustering, especially for high-dimensional distributions where many conventional methods do not perform satisfactorily. Our online method is also well suited for the processing of streaming data: The two adaptive mechanisms lead to a quick reorganization of the clustering when the underlying distribution changes.
Modeling of GPS-TEC using QR-decomposition over the low latitude sector during disturbed geomagnetic conditions
2019, Advances in Space Research
Citation Excerpt :
Improvement is crucial for applications where significant orthonormal vectors are estimated simultaneously rather than sequentially. In addition, the loss of orthonormality can have a detrimental effect on subsequent processing steps, such as the calculation of distance measures for competition in the EOF method (Moller, 2002). This paper aims to model the variations of storm time TEC from QR decomposition and compare them with GPS-TEC in different solar activity conditions and at different latitudes.
Given the continuous operation of satellite-based navigation applications, modeling of Total Electron Content (TEC) during magnetic disturbed periods is a significant challenge. The Global Positioning System (GPS)-TEC observations of the Bangalore station (13.02°N, 77.57°E) has been considered and covers the period (2009–2016) of the solar cycle 24. The study emphases on the analysis of TEC variations in eight geomagnetic storms of different intensity: (−223 nT < Dst < −80 nT). The QR decomposition is computed using the Gram-Schmidt (GS) process and is based on observational data from low latitude sectors. For interpolation, the QR model was evaluated on storms that occurred during different periods of solar activity (2009–2016), while for extrapolation the assessment was conducted for the intense storm of March 17, 2015 (St. Patrick's Day storm: Dst −223 nT) in different latitudes, covering the Asian sector between 10°N and 26°N. The R1 and Q1 modes patterns are consistent with changes in the solar proxy index (F10.7) and with regular daily variations and the correlation coefficient is 0.80 and 0.99. The post-residue between the QR model TEC and the GPS- TEC values is ±3 TECU. The QR model captured the TEC responses in consecutive storm cases (18–24, February 2014).
The spatial variation of the TEC deviations increases as it moves towards the Equatorial Ionization Anomaly (EIA) crest from the magnetic equator and decreases beyond the crest. The proposed work could be useful for the further study of the Global Navigation Satellite System (GNSS) performance during geomagnetic magnetic disturbed periods.
A new local PCA-SOM algorithm
2008, Neurocomputing
This paper proposes a local PCA-SOM algorithm. The new competition measure is computational efficient, and implicitly incorporates the Mahalanobis distance and the reconstruction error. The matrix inversion or PCA decomposition for each data input is not needed as compared to the previous models. Moreover, the local data distribution is completely stored in the covariance matrix instead of the pre-defined numbers of the principal components. Thus, no priori information of the optimal principal subspace is required. Experiments on both the synthesis data and a pattern learning task are carried out to show the performance of the proposed method.
First-order approximation of Gram-Schmidt orthonormalization beats deflation in coupled PCA learning rules
2006, Neurocomputing
In coupled learning rules for principal component analysis, eigenvectors and eigenvalues are simultaneously estimated in a coupled system of equations. Coupled single-neuron rules have favorable convergence properties. For the estimation of multiple eigenvectors, orthonormalization methods have to be applied, either full Gram–Schmidt orthonormalization, its first-order approximation as used in Oja's stochastic gradient ascent algorithm, or deflation as in Sanger's generalized Hebbian algorithm. This paper reports the observation that a first-order approximation of Gram–Schmidt orthonormalization is superior to the standard deflation procedure in coupled learning rules. The first-order approximation exhibits a smaller orthonormality error and produces eigenvectors and eigenvalues of better quality. This improvement is essential for applications where multiple principal eigenvectors have to be estimated simultaneously rather than sequentially. Moreover, loss of orthonormality may have an harmful effect on subsequent processing stages, like the computation of distance measures for competition in local PCA methods.
An extension of neural gas to local PCA
2004, Neurocomputing
Citation Excerpt :
Since all eigenvector and eigenvalue estimates are updated simultaneously, the PCA method had to be combined with a subsequent Gram–Schmidt step to avoid a collapse of orthogonality. Gram–Schmidt orthonormalization is costly, but interlocking of learning and orthonormalization can reduce the total computational effort [17] (see Appendix A). In tests with low- and high-dimensional data our method proved to work robustly, and the classification quality in the digit recognition task is comparable to other local PCA methods.
We suggest an extension of the neural gas vector quantization method to local principal component analysis. The distance measure for the competition between local units combines a normalized Mahalanobis distance in the principal subspace and the squared reconstruction error, with the weighting of both measures depending on the residual variance in the minor subspace. A recursive least-squares method performs the local principal component analysis. The method is tested on synthetic two- and three-dimensional data and on the recognition of handwritten digits.
Adaptive dimensionality reduction for neural network-based online principal component analysis
2021, PLoS ONE

View all citing articles on Scopus

View full text

LettersInterlocking of learning and orthonormalization in RRLSA

Abstract

Introduction

Section snippets

RRLSA with Gram–Schmidt orthonormalization

Interlocked method

Discussion

Conclusions

Acknowledgements

Neural Networks

Principal component extraction using recursive least squares learning

IEEE Trans. Neural Networks

Letters
Interlocking of learning and orthonormalization in RRLSA