Efficiently updating and tracking the dominant kernel principal components

doi:10.1016/j.neunet.2006.09.012

Neural Networks

Volume 20, Issue 2, March 2007, Pages 220-229

https://doi.org/10.1016/j.neunet.2006.09.012 Get rights and content

Abstract

The dominant set of eigenvectors of the symmetrical kernel Gram matrix is used in many important kernel methods (like e.g. kernel Principal Component Analysis, feature approximation, denoising, compression, prediction) in the machine learning area. Yet in the case of dynamic and/or large-scale data, the batch calculation nature and computational demands of the eigenvector decomposition limit these methods in numerous applications. In this paper we present an efficient incremental approach for fast calculation of the dominant kernel eigenbasis, which allows us to track the kernel eigenspace dynamically. Experiments show that our updating scheme delivers a numerically stable and accurate approximation for eigenvalues and eigenvectors at every iteration in comparison to the batch algorithm.

Introduction

Since the introduction of Support Vector Machines (SVM) (Vapnik, 1995) many learning algorithms are being transferred to a kernel representation (Schölkopf and Smola, 2002, Suykens et al., 2002). The important benefit lies in the fact that nonlinearities can be allowed, while avoiding having to solve a nonlinear optimization problem. The transfer is implicitly accomplished by means of a nonlinear map in a Reproducing Kernel Hilbert Space $H_{k}$ (RKHS) (Wahba, 1990).

When $n$ data points $x_{i} \in R^{p}$ are given, a square symmetrical $n \times n$ kernel Gram matrix $K$ with $K_{i j} = k (x_{i}, x_{j})$ can be computed, where the kernel $k$ provides a similarity measure between pairs of data points: $k : R^{p} \times R^{p} \to R : (x_{i}, x_{j}) \mapsto k (x_{i}, x_{j}) .$ A whole myriad of methods (e.g. Kernel Principal Components Analysis (KPCA) (Schölkopf, Smola, & Müller, 1998), Fixed Size Least Squares Support Vector Machines (FS-LSSVM) (Suykens et al., 2002), denoising (Jade et al., 2003, Mika et al., 1999, Rosipal et al., 2001, Schölkopf et al., 1998), data modelling ((Twining & Taylor, 2003) etc.)) rely, either directly or indirectly, on the dominant $m$ eigenvectors of $K$ , where $m ≪ n$ . The usual tool for computing this eigenspace is the Singular Value Decomposition (SVD) because of its high numerical accuracy (Golub & Loan, 1989). The drawback is that standard routines for the batch computation of an SVD require $O (n^{3})$ operations (or $O (m n^{2})$ , for instance in deflation schemes (Golub & Loan, 1989)).

It is clear that for large datasets (as for example in image processing, computer vision or object recognition), the kernel method with its powerful advantage of dealing with nonlinearities, is computationally severely limited. For large datasets an eigendecomposition of $K$ can simply become too time-consuming. But also memory is an issue as $O (n^{2})$ memory units are required. In addition, computations may have to be repeated several times if the problem is nonstationary. Some applications require adaptive algorithms. A truly online algorithm for extracting kernel principal components has not been reported so far, although the problem is important and attempts to tackle it are presently undertaken (Kim et al., 2003, Kuh, 2001, Rosipal and Girolami, 2001).

The literature that spreads over the last three decades shows onsiderable effort spent in dealing with these issues, with a common denominator called SVD updating (Badeau et al., 2004, Bunch and Nielsen, 1978, Businger, 1970, Chandrasekaran et al., 1997, Levy and Lindenbaum, 2000, Moonen et al., 1992), the process of using the previous eigenspace when a new data instance arrives. Most of them are methods in which the eigenspace is incrementally updated when more rows (or more columns) are added to the involved matrix.

Yet, in the particular context of kernel algorithms, where the matrix dimensions correspond to the number of training data points, those updating schemes cannot be applied in a straightforward manner. The kernel matrix namely expands both in row and column dimension when a point is added. To our present knowledge, this case has not been addressed yet. In this paper we present a fast online updating and downdating procedure for the dominant eigenspace in case of a growing symmetrical square matrix, with a computational complexity of $O (n m^{2})$ and memory requirements of $O (n m)$ per iteration, and with very limited loss of accuracy. For the situation of subspace tracking, where one keeps the dimension of the subspace fixed, also an efficient downsizing mechanism is proposed.

In Section 2 we propose an updating and downdating scheme for an existing eigenspace. In Section 3 we develop an efficient way to downsize the row dimension of an updated eigenspace. In Section 4 we present experiments on some typical benchmark data sets to demonstrate that accuracy and stability of the eigenvalues and eigenvectors are preserved during and after iteration. Section 5 summarizes the key results.

Section snippets

Updating and downdating

In this section we describe the updating in detail for one step of the iterative procedure and show that it compares favourably to the complexity of a batch SVD. Due to the particular structure of our updating scheme, downdating will just be a special case of updating.

Downsizing

As mentioned in the previous section, for each update the row dimension of the eigenspace matrix increases. This is not practical for on-line applications where one wants a subspace tracker, where the dimension of the eigenspace is to remain constant. This means we must downsize the matrix after each update, thereby preserving orthogonality. In this section we propose an efficient scheme to accomplish that and recover the orthogonal eigenspace, up to a unitary transformation.

In downsizing, the

Experiments

We implemented the proposed up/downdating and downsizing schemes in Matlab. A straightforward comparison of our (nonoptimized) code with the built-in nonrecursive Matlab SVDS routine shows that a computational gain is indeed achieved, especially for large dimensions (1000 and more) and few eigenvectors (order 10–100). Yet, in the experiments we primarily aim to characterize how well the true eigenvectors are approximated whilst tracking (i.e. up/downdating and downsizing) for the kernel matrix

Conclusions

This paper introduced a novel up- and downdating algorithm for the dominant eigenspace of a square large-scale symmetrical matrix, in which the adaptation simultaneously occurs both in the rows and columns. Additionally, a downsizing mechanism was proposed such that the algorithm is also capable of tracking the dominant eigenspace (while the dimension remains constant).

The dominant eigenspace of such matrix is relevant for several kernel based methods in machine learning (Suykens et al., 2002)

Acknowledgements

This research work was carried out at the ESAT laboratory of the Katholieke Universiteit Leuven. It is supported by several grants: Research Council KU Leuven: Concerted Research Action GOA-Mefisto-666 (Mathem. Eng.), GOA-Ambiorics, IDO (IOTA Oncology, Genetic networks), Ph.D./postdoc & fellow grants; Flemish Government: Fund for Scientific Research Flanders (Ph.D./postdoc grants, projects G.0407.02 (support vector machines), G.0256.97 (subspace), G.0499.04 (robust statistics), G.0211.05 (nonl.

References (24)

S. Chandrasekaran et al.
An eigenspace update algorithm for image analysis
Graphical Models and Image Processing: GMIP
(1997)
A.M. Jade et al.
Feature extraction and denoising using kernel PCA
Chemical Engineering Science
(2003)
C.J. Twining et al.
The use of kernel principal component analysis to model data distributions
Pattern Recognition
(2003)
R. Badeau et al.
Sliding window adaptive SVD algorithms
IEEE Transactions on Signal Processing
(2004)
M. Basseville et al.
Detection of abrupt changes — theory and application
(1993)
Blake, C., & Merz, C. (1998). UCI repository of machine learning databases. URL...
J.R. Bunch et al.
Updating the singular value decomposition
Numerische Mathematik
(1978)
Businger, P. (1970). Updating a singular value decomposition. BIT 10. pp....
G.H. Golub et al.
Matrix computations
(1989)
I. Guyon et al.
Comparing different neural network architectures for classifying handwritten digits
(1989)

Kim, K. I., Franz, M., & Schölkopf, B. (2003). Kernel Hebbian algorithm for iterative kernel principal component...

Kuh, A. (2001). Adaptive kernel methods for CDMA systems. In Proc. of the international joint conference on neural...

Cited by (43)

A recursive least square algorithm for online kernel principal component extraction
2017, Neurocomputing
Citation Excerpt :
Originally stated as a Gram-matrix eigendecomposition problem [2], thus solvable by classical linear algebra methods [4], this technique faces problems with large scale datasets, for which the computational burden involved in Gram-matrix construction and factorization may turn the extraction process infeasible. To address these problems, several authors have proposed incremental [5–7] and more recently online kernel component extraction algorithms [8–10]. Some examples are the online kernel Hebbian algorithm (OKHA) [8] and the subset kernel Hebbian algorithm (SubKHA) [9], which are extensions of the kernel Hebbian algorithm (KHA) [6].
The online extraction of kernel principal components has gained increased attention, and several algorithms proposed recently explore kernelized versions of the generalized Hebbian algorithm (GHA) [1], a well-known principal component analysis (PCA) extraction rule. Consequently, the convergence speed of such algorithms and the accuracy of the extracted components are highly dependent on a proper choice of the learning rate, a problem dependent factor. This paper proposes a new online fixed-point kernel principal component extraction algorithm, exploring the minimization of a recursive least-square error function, conjugated with an approximated deflation transform using component estimates obtained by the algorithm, implicitly applied upon data. The proposed technique automatically builds a concise dictionary to expand kernel components, involves simple recursive equations to dynamically define a specific learning rate to each component under extraction, and has a linear computational complexity regarding dictionary size. As compared to state-of-art kernel principal component extraction algorithms, results show improved convergence speed and accuracy of the components produced by the proposed method in five open-access databases.
Heterogeneous data analysis: Online learning for medical-image-based diagnosis
2017, Pattern Recognition
Citation Excerpt :
For example, in Ref. [47] the use of composite kernels in extracting interesting visual features from images was successfully demonstrated. The criterion for selecting the best kernel function involves finding the kernel that produces the largest eigenvalue [31–35]. Then, the eigenvector corresponding to the maximum eigenvalue provides the optimum solution.
Heterogeneous Data Analysis (HDA) is proposed to address a learning problem of medical image databases of Computed Tomographic Colonography (CTC). The databases are generated from clinical CTC images using a Computer-aided Detection (CAD) system, the goal of which is to aid radiologists' interpretation of CTC images by providing highly accurate, machine-based detection of colonic polyps. We aim to achieve a high detection accuracy in CAD in a clinically realistic context, in which additional CTC cases of new patients are added regularly to an existing database. In this context, the CAD performance can be improved by exploiting the heterogeneity information that is brought into the database through the addition of diverse and disparate patient populations. In the HDA, several quantitative criteria of data compatibility are proposed for efficient management of these online images. After an initial supervised offline learning phase, the proposed online learning method decides whether the online data are heterogeneous or homogeneous. Our previously developed Principal Composite Kernel Feature Analysis (PC-KFA) is applied to the online data, managed with HDA, for iterative construction of a linear subspace of a high-dimensional feature space by maximizing the variance of the non-linearly transformed samples. The experimental results showed that significant improvements in the data compatibility were obtained when the online PC-KFA was used, based on an accuracy measure for long-term sequential online datasets. The computational time is reduced by more than 93% in online training compared with that of offline training.
Incremental kernel spectral clustering for online learning of non-stationary data
2014, Neurocomputing
Citation Excerpt :
Finally, Section 6 concludes the paper. In contrast with other techniques that compute approximate eigenvectors of large matrices like the Nyström method [32], the work presented in [14] or the above-mentioned algorithms [11] and [26], the eigen-approximation we use to evolve the initial model is model-based [4]. This means that based on a training set (in our case the cluster centroids) out-of-sample eigenvectors are calculated using Eq. (7).
In this work a new model for online clustering named Incremental kernel spectral clustering (IKSC) is presented. It is based on kernel spectral clustering (KSC), a model designed in the Least Squares Support Vector Machines (LS-SVMs) framework, with primal-dual setting. The IKSC model is developed to quickly adapt itself to a changing environment, in order to learn evolving clusters with high accuracy. In contrast with other existing incremental spectral clustering approaches, the eigen-updating is performed in a model-based manner, by exploiting one of the Karush–Kuhn–Tucker (KKT) optimality conditions of the KSC problem. We test the capacities of IKSC with some experiments conducted on computer-generated data and a real-world data-set of PM₁₀ concentrations registered during a pollution episode occurred in Northern Europe in January 2010. We observe that our model is able to precisely recognize the dynamics of shifting patterns in a non-stationary context.
Online prediction model based on the SVD-KPCA method
2013, ISA Transactions
This paper proposes a new method for online identification of a nonlinear system modelled on Reproducing Kernel Hilbert Space (RKHS). The proposed SVD–KPCA method uses the Singular Value Decomposition (SVD) technique to update the principal components. Then we use the Reduced Kernel Principal Component Analysis (RKPCA) to approach the principal components which represent the observations selected by the KPCA method.
ROIPCA: an online memory-restricted PCA algorithm based on rank-one updates
2023, Information and Inference
Non-linear process monitoring using kernel principal component analysis: A review of the basic and modified techniques with industrial applications
2022, Brazilian Journal of Chemical Engineering

View all citing articles on Scopus

View full text

Efficiently updating and tracking the dominant kernel principal components

Abstract

Introduction

Section snippets

Updating and downdating

Downsizing

Experiments

Conclusions

Acknowledgements

Graphical Models and Image Processing: GMIP

Chemical Engineering Science

Pattern Recognition

Sliding window adaptive SVD algorithms

IEEE Transactions on Signal Processing

Detection of abrupt changes — theory and application

Updating the singular value decomposition

Numerische Mathematik

Matrix computations

Comparing different neural network architectures for classifying handwritten digits