Abstract
For Principal Component Analysis in Reproducing Kernel Hilbert Spaces (KPCA), optimization over sets containing only linear combinations of all n-tuples of kernel functions is investigated, where n is a positive integer smaller than the number of data. Upper bounds on the accuracy in approximating the optimal solution, achievable without restrictions on the number of kernel functions, are derived. The rates of decrease of the upper bounds for increasing number n of kernel functions are given by the summation of two terms, one proportional to n −1/2 and the other to n −1, and depend on the maximum eigenvalue of the Gram matrix of the kernel with respect to the data. Primal and dual formulations of KPCA are considered. The estimates provide insights into the effectiveness of sparse KPCA techniques, aimed at reducing the computational costs of expansions in terms of kernel units.
Similar content being viewed by others
References
Achlioptas, D., McSherry, F., Schölkopf, B.: Sampling techniques for kernel methods. In: Proceedings of NIPS 2001, Vancouver, BC, Canada, 3–8 December 2001. Advances in Neural Information Processing Systems, vol. 14, pp. 335–342. MIT Press, Cambridge (2001)
Aronszajn, N.: Theory of reproducing kernels. Trans. AMS 68, 337–404 (1950)
Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inf. Theory 39, 930–945 (1993)
Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups. Springer, New York (1984)
Cortes, C., Vapnik, V.: Support vector networks. Mach. Learn. 20, 1–25 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2003, first published in 2000)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. AMS 39, 1–49 (2001)
Dahlquist, G., Bjorck, A.: Numerical Methods in Scientific Computing. SIAM, Philadelphia (to appear); http://www.mai.liu.se/~akbjo/NMbook.html
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
Dunford, N., Schwartz, J.T.: Linear Operators. Part II: Spectral Theory. Interscience, New York (1963)
Friedman, A.: Foundations of Modern Analysis. Dover, New York (1982)
Girosi, F.: An equivalence between sparse approximation and support vector machines. Neural Comput. 10, 1455–1480 (1998)
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995)
Jolliffe, I.T.: Principal Component Analysis. Springer Series in Statistics. Springer, New York (1986)
Jones, L.K.: A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training. Ann. Stat. 20, 608–613 (1992)
Kolmogorov, A.N., Fomin, S.V.: Introductory Real Analysis. Dover, New York (1975)
Kůrková, V.: Dimension-independent rates of approximation by neural networks. In: Warwick, K., Kárný, M. (eds.) Computer-Intensive Methods in Control and Signal Processing. The Curse of Dimensionality, pp. 261–270. Birkhäuser, Basel (1997)
Kůrková, V., Sanguineti, M.: Bounds on rates of variable-basis and neural-network approximation. IEEE Trans. Inf. Theory 47, 2659–2665 (2001)
Kůrková, V., Sanguineti, M.: Comparison of worst case errors in linear and neural network approximation. IEEE Trans. Inf. Theory 48, 264–275 (2002)
Kůrková, V., Sanguineti, M.: Error estimates for approximate optimization by the extended Ritz method. SIAM J. Optim. 15, 461–487 (2005)
Kůrková, V., Sanguineti, M.: Learning with generalization capability by kernel methods of bounded complexity. J. Complex. 21, 350–367 (2005)
Kůrková, V., Savický, P., Hlaváčková, K.: Representations and rates of approximation of real-valued Boolean functions by neural networks. Neural Netw. 11, 651–659 (1998)
Parzen, E.: An approach to time series analysis. Ann. Math. Stat. 32, 951–989 (1961)
Pisier, G.: Remarques sur un resultat non publié de B. Maurey. Séminaire d’Analyse Fonctionnelle 1980/81. École Polytechnique, Centre de Mathématiques, Palaiseau, France. Exposé no. V, V. 1–V. 12
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78, 1481–1497 (1990)
Poggio, T., Girosi, F.: Regularization algorithms for learning that are equivalent to multilayer networks. Science 247, 978–982 (1990)
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, Cambridge (1992)
Rudin, W.: Functional Analysis. McGraw-Hill, New York (1973)
Schölkopf, B., Smola, A.: Learning with Kernels—Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge (2002)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10, 1299–1319 (1998)
Schölkopf, B., Mika, S., Burges, C., Knirsch, P., Müller, K.-R., Rätsch, G., Smola, A.: Input space vs. feature space in kernel-based methods. IEEE Trans. Neural Netw. 10, 1000–1017 (1999)
Schönberg, I.J.: Metric spaces and completely monotone functions. Ann. Math. 39, 811–841 (1938)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Shawe-Taylor, J., Williams, C.K.I., Cristianini, N., Kandola, J.: On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Trans. Inf. Theory 51, 2510–2522 (2005)
Suykens, J.A.K., Van Gestel, T., Vandewalle, J., De Moor, B.: A support vector machine formulation to PCA analysis and its kernel version. IEEE Trans. Neural Netw. 14, 447–450 (2003)
Tikhonov, A.N.: Solutions of incorrectly formulated problems and the regularization method. Sov. Math. Dokl. 4, 1035–1038 (1963)
Tikhonov, A.N., Arsenin, V.Y.: Solutions of Ill-Posed Problems. Winston, Washington (1977)
Vasin, V.V.: Relationship of several variational methods for the approximate solution of ill-posed problems. Math. Notes Acad. Sci. USSR 7(3/4), 161–165 (1970) (Translated from Matematicheskie Zametki 7(3), 265–272 (1970))
Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59. SIAM, Philadelphia (1990)
Williams, C.K.I., Seeger, M.: Using the Nyström method to speed up kernel machines. In: Leen, T.K., Dietterich, T.G., Tresp, V. (eds.) Advances in Neural Information Processing Systems, vol. 13, pp. 682–688. MIT Press, Cambridge (2001)
Zeidler, E.: Nonlinear Functional Analysis and Its Applications III. Variational Methods and Optimization. Springer, New York (1985)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gnecco, G., Sanguineti, M. Accuracy of suboptimal solutions to kernel principal component analysis. Comput Optim Appl 42, 265–287 (2009). https://doi.org/10.1007/s10589-007-9108-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-007-9108-y