Abstract
Principal Component Analysis (PCA) is a very well known statistical tool. Kernel PCA is a nonlinear extension to PCA based on the kernel paradigm. In this paper we characterize the projections found by Kernel PCA from a information theoretic perspective. We prove that Kernel PCA provides optimum entropy projections in the input space when the Gaussian kernel is used for the mapping and a sample estimate of Renyi’s entropy based on the Parzen window method is employed. The information theoretic interpretation motivates the choice and specifices the kernel used for the transformation to feature space.
This work was supported in part by Fundação para a Ciência e a Tecnologia (FCT) grant SFRH/BD/18217/2004 and NSF grant ECS-0300340.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Diamantaras, K.I., Kung, S.Y.: Principal Component Neural Networks: Theory and Applications. John Wiley & Sons, Chichester (1996)
Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation 10(5), 1299–1319 (1998)
Schölkopf, B., Smola, A., Müller, K.R.: Kernel principal component analysis. In: Gerstner, W., Hasler, M., Germond, A., Nicoud, J.-D. (eds.) ICANN 1997. LNCS, vol. 1327, pp. 583–588. Springer, Heidelberg (1997)
Williams, C.K.I.: On a connection between kernel pca and metric multidimensional scaling. Machine Learning 46(1–3), 11–19 (2002)
Bengio, Y., Delalleau, O., Roux, N.L., Paiement, J.F., Vincent, P., Ouimet, M.: Learning eigenfunctions links spectral embedding and kernel PCA. Neural Computation 16(10), 2197–2219 (2004)
Bach, F.R., Jordan, M.I.: Kernel independent component analysis. Journal of Machine Learning Research 3, 1–48 (2002)
Rényi, A.: On measures of entropy and information. Selected paper of Alfréd Rényi, vol. 2, pp. 565–580. Akademiai Kiado, Budapest (1976)
Príncipe, J.C., Xu, D., Fisher, J.W.: Information theoretic learning. In: Haykin, S. (ed.) Unsupervised Adaptive Filtering, vol. 2, pp. 265–319. John Wiley & Sons, Chichester (2000)
Parzen, E.: On the estimation of a probability density function and the mode. The Annals of Mathematical Statistics 33(2), 1065–1076 (1962)
Jenssen, R., Erdogmus, D., Príncipe, J.C., Eltoft, T.: Towards a unification of information theoretic learning and kernel methods. In: Proc. MLSP 2004, São Luís, Brazil (2004)
Shannon, C.E.: A mathematical theory of communication. The Bell System Technical Journal 27, 379–423, 623–656 (1948)
Sanger, T.D.: Optimal unsupervised learning in a single layer linear feedforward neural network. Neural Networks 2(7), 459–473 (1989)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paiva, A.R.C., Xu, JW., Príncipe, J.C. (2006). Kernel Principal Components Are Maximum Entropy Projections. In: Rosca, J., Erdogmus, D., Príncipe, J.C., Haykin, S. (eds) Independent Component Analysis and Blind Signal Separation. ICA 2006. Lecture Notes in Computer Science, vol 3889. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11679363_105
Download citation
DOI: https://doi.org/10.1007/11679363_105
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32630-4
Online ISBN: 978-3-540-32631-1
eBook Packages: Computer ScienceComputer Science (R0)