LettersInterlocking of learning and orthonormalization in RRLSA
Introduction
Some neural networks for principal component analysis—e.g. the Generalized Hebbian Algorithm (GHA) [4] or recursive least squares methods [1], [3]—use a sequential method to determine the eigenvectors. The eigenvector with the largest eigenvalue is obtained by applying a single-unit learning rule to the original data. To get the second eigenvector (with the second-largest eigenvalue), the projection onto the first eigenvector is subtracted from each data vector, and a second single-unit network is trained with these “deflated” vectors. This procedure is repeated until the desired number of eigenvectors is extracted.
This method works well if the units are trained sequentially. In this case, the already trained eigenvectors are almost exactly perpendicular to each other, which leads to deflated data vectors that are perpendicular to all of these eigenvectors. In some applications, however, the procedure is applied simultaneously to all units, so all eigenvectors are modified after the presentation of a new input vector. In this case, the orthogonality of the first eigenvectors is not guaranteed, which results in a deflated vector that is not exactly perpendicular to the first eigenvectors. Specifically in methods suitable for tracking non-stationary data—like the “robust recursive least squares learning algorithm” (RRLSA) [3] with unlearning—all eigenvectors are jittering and will therefore be only approximately orthogonal to each other.
The fact that the deflated vector is not perpendicular to the previous eigenvectors becomes problematic, if the next eigenvector direction has a small eigenvalue. The variance in the direction of the next eigenvector could then be smaller than the variance perpendicular to this eigenvector, the latter being introduced by deflation in the case of non-orthogonal previous eigenvector estimates. The next estimated eigenvector would therefore not be perpendicular to the previous eigenvectors, but would mostly lie in the subspace spanned by them. It is thus not useful to continue the extraction of eigenvectors after this processing stage.
This problem can be solved by a subsequent orthonormalization of the eigenvector estimates using the Gram–Schmidt method [2]. However, Gram–Schmidt orthonormalization requires 2nm2 operations (additions, multiplications) to orthonormalize m eigenvector estimates of dimension n, thus the entire method is dominated by this complexity. This contribution aims at reducing this effort by interlocking the learning method—specifically RRLSA [3]—with the Gram–Schmidt method, exploiting the fact that the orthonormalization works on vectors that were obtained by applying a learning rule to already orthonormal vectors. RRLSA was chosen as the learning mechanism since—like other recursive least square methods [1]—it is fast and not suffering from accuracy-speed trade-offs like least mean square methods, and proved to be robust in several applications.
Section 2 describes the method where learning and orthonormalization are separated, Section 3 introduces the method where both steps are interlocked, followed by a discussion in Section 4.
Section snippets
RRLSA with Gram–Schmidt orthonormalization
One step of the RRLSA method [3] takes as input m orthonormal weight vectors of dimension n (the eigenvector estimates), and m scalar values L1,…,Lm corresponding to the length of the weight vectors in the original notation. It produces m vectors which are usually not orthogonal any longer. These are orthonormalized by the Gram–Schmidt process yielding the new weight vectors , and the new length values L1′,…,Lm′ are obtained through .
For an input vector , the
Interlocked method
Obviously we can express asThe interlocked method recursively computes qkj and pk for increasing k and determines using (4). The equations described below were derived by inserting , into (3) under consideration of the orthonormality assumption . Besides qkj and pk, the recursion uses the auxiliary variables tk, dk, sk, and rkj. In the order of update in the implementation, we obtain
Discussion
We tested three methods—RRLSA without orthonormalization (RRLSA), RRLSA with modified Gram–Schmidt orthonormalization [2] (RRLSA+MGS), and the interlocked method (INTER)—for a principal component analysis of image windows [4]. In the first experiment we chose n=m=64, in the second n=144, m=30. The other parameters were identical in both experiments: training steps, β=1, and α starting at 0.99 and exponentially approaching 1 with a final value of 0.999. The C language data type “double”
Conclusions
The method suggested in this work interlocks RRLSA [3] with a subsequent Gram–Schmidt orthonormalization. It leads to a speed-up of approximately 2 for large n which was confirmed in simulations. The orthonormality error was observed to be somewhat larger than for a separate Gram–Schmidt orthonormalization, but was shown to not increase over time. The method should therefore be suitable for any application, where all eigenvectors have to be estimated simultaneously, where the method has to stay
Acknowledgements
I am grateful to Bärbel Herrnberger for improvements of the manuscript.
References (4)
Optimal unsupervised learning in a single-layer linear feedforward neural network
Neural Networks
(1989)- et al.
Principal component extraction using recursive least squares learning
IEEE Trans. Neural Networks
(1995)
Cited by (12)
Adaptive local Principal Component Analysis improves the clustering of high-dimensional data
2024, Pattern RecognitionModeling of GPS-TEC using QR-decomposition over the low latitude sector during disturbed geomagnetic conditions
2019, Advances in Space ResearchCitation Excerpt :Improvement is crucial for applications where significant orthonormal vectors are estimated simultaneously rather than sequentially. In addition, the loss of orthonormality can have a detrimental effect on subsequent processing steps, such as the calculation of distance measures for competition in the EOF method (Moller, 2002). This paper aims to model the variations of storm time TEC from QR decomposition and compare them with GPS-TEC in different solar activity conditions and at different latitudes.
A new local PCA-SOM algorithm
2008, NeurocomputingAn extension of neural gas to local PCA
2004, NeurocomputingCitation Excerpt :Since all eigenvector and eigenvalue estimates are updated simultaneously, the PCA method had to be combined with a subsequent Gram–Schmidt step to avoid a collapse of orthogonality. Gram–Schmidt orthonormalization is costly, but interlocking of learning and orthonormalization can reduce the total computational effort [17] (see Appendix A). In tests with low- and high-dimensional data our method proved to work robustly, and the classification quality in the digit recognition task is comparable to other local PCA methods.