Efficiently updating and tracking the dominant kernel principal components
Introduction
Since the introduction of Support Vector Machines (SVM) (Vapnik, 1995) many learning algorithms are being transferred to a kernel representation (Schölkopf and Smola, 2002, Suykens et al., 2002). The important benefit lies in the fact that nonlinearities can be allowed, while avoiding having to solve a nonlinear optimization problem. The transfer is implicitly accomplished by means of a nonlinear map in a Reproducing Kernel Hilbert Space (RKHS) (Wahba, 1990).
When data points are given, a square symmetrical kernel Gram matrix with can be computed, where the kernel provides a similarity measure between pairs of data points: A whole myriad of methods (e.g. Kernel Principal Components Analysis (KPCA) (Schölkopf, Smola, & Müller, 1998), Fixed Size Least Squares Support Vector Machines (FS-LSSVM) (Suykens et al., 2002), denoising (Jade et al., 2003, Mika et al., 1999, Rosipal et al., 2001, Schölkopf et al., 1998), data modelling ((Twining & Taylor, 2003) etc.)) rely, either directly or indirectly, on the dominant eigenvectors of , where . The usual tool for computing this eigenspace is the Singular Value Decomposition (SVD) because of its high numerical accuracy (Golub & Loan, 1989). The drawback is that standard routines for the batch computation of an SVD require operations (or , for instance in deflation schemes (Golub & Loan, 1989)).
It is clear that for large datasets (as for example in image processing, computer vision or object recognition), the kernel method with its powerful advantage of dealing with nonlinearities, is computationally severely limited. For large datasets an eigendecomposition of can simply become too time-consuming. But also memory is an issue as memory units are required. In addition, computations may have to be repeated several times if the problem is nonstationary. Some applications require adaptive algorithms. A truly online algorithm for extracting kernel principal components has not been reported so far, although the problem is important and attempts to tackle it are presently undertaken (Kim et al., 2003, Kuh, 2001, Rosipal and Girolami, 2001).
The literature that spreads over the last three decades shows onsiderable effort spent in dealing with these issues, with a common denominator called SVD updating (Badeau et al., 2004, Bunch and Nielsen, 1978, Businger, 1970, Chandrasekaran et al., 1997, Levy and Lindenbaum, 2000, Moonen et al., 1992), the process of using the previous eigenspace when a new data instance arrives. Most of them are methods in which the eigenspace is incrementally updated when more rows (or more columns) are added to the involved matrix.
Yet, in the particular context of kernel algorithms, where the matrix dimensions correspond to the number of training data points, those updating schemes cannot be applied in a straightforward manner. The kernel matrix namely expands both in row and column dimension when a point is added. To our present knowledge, this case has not been addressed yet. In this paper we present a fast online updating and downdating procedure for the dominant eigenspace in case of a growing symmetrical square matrix, with a computational complexity of and memory requirements of per iteration, and with very limited loss of accuracy. For the situation of subspace tracking, where one keeps the dimension of the subspace fixed, also an efficient downsizing mechanism is proposed.
In Section 2 we propose an updating and downdating scheme for an existing eigenspace. In Section 3 we develop an efficient way to downsize the row dimension of an updated eigenspace. In Section 4 we present experiments on some typical benchmark data sets to demonstrate that accuracy and stability of the eigenvalues and eigenvectors are preserved during and after iteration. Section 5 summarizes the key results.
Section snippets
Updating and downdating
In this section we describe the updating in detail for one step of the iterative procedure and show that it compares favourably to the complexity of a batch SVD. Due to the particular structure of our updating scheme, downdating will just be a special case of updating.
Downsizing
As mentioned in the previous section, for each update the row dimension of the eigenspace matrix increases. This is not practical for on-line applications where one wants a subspace tracker, where the dimension of the eigenspace is to remain constant. This means we must downsize the matrix after each update, thereby preserving orthogonality. In this section we propose an efficient scheme to accomplish that and recover the orthogonal eigenspace, up to a unitary transformation.
In downsizing, the
Experiments
We implemented the proposed up/downdating and downsizing schemes in Matlab. A straightforward comparison of our (nonoptimized) code with the built-in nonrecursive Matlab SVDS routine shows that a computational gain is indeed achieved, especially for large dimensions (1000 and more) and few eigenvectors (order 10–100). Yet, in the experiments we primarily aim to characterize how well the true eigenvectors are approximated whilst tracking (i.e. up/downdating and downsizing) for the kernel matrix
Conclusions
This paper introduced a novel up- and downdating algorithm for the dominant eigenspace of a square large-scale symmetrical matrix, in which the adaptation simultaneously occurs both in the rows and columns. Additionally, a downsizing mechanism was proposed such that the algorithm is also capable of tracking the dominant eigenspace (while the dimension remains constant).
The dominant eigenspace of such matrix is relevant for several kernel based methods in machine learning (Suykens et al., 2002)
Acknowledgements
This research work was carried out at the ESAT laboratory of the Katholieke Universiteit Leuven. It is supported by several grants: Research Council KU Leuven: Concerted Research Action GOA-Mefisto-666 (Mathem. Eng.), GOA-Ambiorics, IDO (IOTA Oncology, Genetic networks), Ph.D./postdoc & fellow grants; Flemish Government: Fund for Scientific Research Flanders (Ph.D./postdoc grants, projects G.0407.02 (support vector machines), G.0256.97 (subspace), G.0499.04 (robust statistics), G.0211.05 (nonl.
References (24)
- et al.
An eigenspace update algorithm for image analysis
Graphical Models and Image Processing: GMIP
(1997) - et al.
Feature extraction and denoising using kernel PCA
Chemical Engineering Science
(2003) - et al.
The use of kernel principal component analysis to model data distributions
Pattern Recognition
(2003) - et al.
Sliding window adaptive SVD algorithms
IEEE Transactions on Signal Processing
(2004) - et al.
Detection of abrupt changes — theory and application
(1993) - Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. URL...
- et al.
Updating the singular value decomposition
Numerische Mathematik
(1978) - Businger, P. (1970). Updating a singular value decomposition. BIT 10. pp....
- et al.
Matrix computations
(1989) - et al.
Comparing different neural network architectures for classifying handwritten digits
(1989)
Cited by (43)
A recursive least square algorithm for online kernel principal component extraction
2017, NeurocomputingCitation Excerpt :Originally stated as a Gram-matrix eigendecomposition problem [2], thus solvable by classical linear algebra methods [4], this technique faces problems with large scale datasets, for which the computational burden involved in Gram-matrix construction and factorization may turn the extraction process infeasible. To address these problems, several authors have proposed incremental [5–7] and more recently online kernel component extraction algorithms [8–10]. Some examples are the online kernel Hebbian algorithm (OKHA) [8] and the subset kernel Hebbian algorithm (SubKHA) [9], which are extensions of the kernel Hebbian algorithm (KHA) [6].
Heterogeneous data analysis: Online learning for medical-image-based diagnosis
2017, Pattern RecognitionCitation Excerpt :For example, in Ref. [47] the use of composite kernels in extracting interesting visual features from images was successfully demonstrated. The criterion for selecting the best kernel function involves finding the kernel that produces the largest eigenvalue [31–35]. Then, the eigenvector corresponding to the maximum eigenvalue provides the optimum solution.
Incremental kernel spectral clustering for online learning of non-stationary data
2014, NeurocomputingCitation Excerpt :Finally, Section 6 concludes the paper. In contrast with other techniques that compute approximate eigenvectors of large matrices like the Nyström method [32], the work presented in [14] or the above-mentioned algorithms [11] and [26], the eigen-approximation we use to evolve the initial model is model-based [4]. This means that based on a training set (in our case the cluster centroids) out-of-sample eigenvectors are calculated using Eq. (7).
Online prediction model based on the SVD-KPCA method
2013, ISA TransactionsROIPCA: an online memory-restricted PCA algorithm based on rank-one updates
2023, Information and InferenceNon-linear process monitoring using kernel principal component analysis: A review of the basic and modified techniques with industrial applications
2022, Brazilian Journal of Chemical Engineering