Elsevier

Neurocomputing

Volume 49, Issues 1–4, December 2002, Pages 429-433
Neurocomputing

Letters
Interlocking of learning and orthonormalization in RRLSA

https://doi.org/10.1016/S0925-2312(02)00671-9Get rights and content

Abstract

In sequential principal component analyzers based on deflation of the input vector, deviations from orthogonality of the previous eigenvector estimates may entail a severe loss of orthogonality in the next stages. A combination of the learning method with subsequent Gram–Schmidt orthonormalization solves this problem, but increases the computational effort. For the “robust recursive least squares learning algorithm” we show how the effort may be reduced by a factor of up to two by interlocking learning and the Gram–Schmidt method.

Introduction

Some neural networks for principal component analysis—e.g. the Generalized Hebbian Algorithm (GHA) [4] or recursive least squares methods [1], [3]—use a sequential method to determine the eigenvectors. The eigenvector with the largest eigenvalue is obtained by applying a single-unit learning rule to the original data. To get the second eigenvector (with the second-largest eigenvalue), the projection onto the first eigenvector is subtracted from each data vector, and a second single-unit network is trained with these “deflated” vectors. This procedure is repeated until the desired number of eigenvectors is extracted.

This method works well if the units are trained sequentially. In this case, the already trained eigenvectors are almost exactly perpendicular to each other, which leads to deflated data vectors that are perpendicular to all of these eigenvectors. In some applications, however, the procedure is applied simultaneously to all units, so all eigenvectors are modified after the presentation of a new input vector. In this case, the orthogonality of the first eigenvectors is not guaranteed, which results in a deflated vector that is not exactly perpendicular to the first eigenvectors. Specifically in methods suitable for tracking non-stationary data—like the “robust recursive least squares learning algorithm” (RRLSA) [3] with unlearning—all eigenvectors are jittering and will therefore be only approximately orthogonal to each other.

The fact that the deflated vector is not perpendicular to the previous eigenvectors becomes problematic, if the next eigenvector direction has a small eigenvalue. The variance in the direction of the next eigenvector could then be smaller than the variance perpendicular to this eigenvector, the latter being introduced by deflation in the case of non-orthogonal previous eigenvector estimates. The next estimated eigenvector would therefore not be perpendicular to the previous eigenvectors, but would mostly lie in the subspace spanned by them. It is thus not useful to continue the extraction of eigenvectors after this processing stage.

This problem can be solved by a subsequent orthonormalization of the eigenvector estimates using the Gram–Schmidt method [2]. However, Gram–Schmidt orthonormalization requires 2nm2 operations (additions, multiplications) to orthonormalize m eigenvector estimates of dimension n, thus the entire method is dominated by this complexity. This contribution aims at reducing this effort by interlocking the learning method—specifically RRLSA [3]—with the Gram–Schmidt method, exploiting the fact that the orthonormalization works on vectors that were obtained by applying a learning rule to already orthonormal vectors. RRLSA was chosen as the learning mechanism since—like other recursive least square methods [1]—it is fast and not suffering from accuracy-speed trade-offs like least mean square methods, and proved to be robust in several applications.

Section 2 describes the method where learning and orthonormalization are separated, Section 3 introduces the method where both steps are interlocked, followed by a discussion in Section 4.

Section snippets

RRLSA with Gram–Schmidt orthonormalization

One step of the RRLSA method [3] takes as input m orthonormal weight vectors w1,…,wm of dimension n (the eigenvector estimates), and m scalar values L1,…,Lm corresponding to the length of the weight vectors in the original notation. It produces m vectors v1,…,vm which are usually not orthogonal any longer. These are orthonormalized by the Gram–Schmidt process yielding the new weight vectors w1′,…,wm, and the new length values L1′,…,Lm′ are obtained through Lj′=||vj||.

For an input vector x, the

Interlocked method

Obviously we can express wk aswk′=j=1kqkjwj+pkx.The interlocked method recursively computes qkj and pk for increasing k and determines wk using (4). The equations described below were derived by inserting , into (3) under consideration of the orthonormality assumption wiTwjij. Besides qkj and pk, the recursion uses the auxiliary variables tk, dk, sk, and rkj. In the order of update in the implementation, we obtaintk=tk−1+pk−12witht1=0,dk=dk−1−yk−12withd1=||x||2,sk=(αLk+βdk)yk,L′k2=(αLk)2

Discussion

We tested three methods—RRLSA without orthonormalization (RRLSA), RRLSA with modified Gram–Schmidt orthonormalization [2] (RRLSA+MGS), and the interlocked method (INTER)—for a principal component analysis of image windows [4]. In the first experiment we chose n=m=64, in the second n=144, m=30. The other parameters were identical in both experiments: 100000 training steps, β=1, and α starting at 0.99 and exponentially approaching 1 with a final value of 0.999. The C language data type “double”

Conclusions

The method suggested in this work interlocks RRLSA [3] with a subsequent Gram–Schmidt orthonormalization. It leads to a speed-up of approximately 2 for large n which was confirmed in simulations. The orthonormality error was observed to be somewhat larger than for a separate Gram–Schmidt orthonormalization, but was shown to not increase over time. The method should therefore be suitable for any application, where all eigenvectors have to be estimated simultaneously, where the method has to stay

Acknowledgements

I am grateful to Bärbel Herrnberger for improvements of the manuscript.

References (4)

There are more references available in the full text version of this article.

Cited by (12)

  • Modeling of GPS-TEC using QR-decomposition over the low latitude sector during disturbed geomagnetic conditions

    2019, Advances in Space Research
    Citation Excerpt :

    Improvement is crucial for applications where significant orthonormal vectors are estimated simultaneously rather than sequentially. In addition, the loss of orthonormality can have a detrimental effect on subsequent processing steps, such as the calculation of distance measures for competition in the EOF method (Moller, 2002). This paper aims to model the variations of storm time TEC from QR decomposition and compare them with GPS-TEC in different solar activity conditions and at different latitudes.

  • A new local PCA-SOM algorithm

    2008, Neurocomputing
  • An extension of neural gas to local PCA

    2004, Neurocomputing
    Citation Excerpt :

    Since all eigenvector and eigenvalue estimates are updated simultaneously, the PCA method had to be combined with a subsequent Gram–Schmidt step to avoid a collapse of orthogonality. Gram–Schmidt orthonormalization is costly, but interlocking of learning and orthonormalization can reduce the total computational effort [17] (see Appendix A). In tests with low- and high-dimensional data our method proved to work robustly, and the classification quality in the digit recognition task is comparable to other local PCA methods.

View all citing articles on Scopus
View full text