Some Equivalences between Kernel Methods and Information Theoretic Methods

Jenssen, Robert; Eltoft, Torbjørn; Erdogmus, Deniz; Principe, Jose C.

doi:10.1007/s11265-006-9771-8

Robert Jenssen¹,
Torbjørn Eltoft¹,
Deniz Erdogmus² &
…
Jose C. Principe³

167 Accesses
21 Citations
Explore all metrics

Abstract

In this paper, we discuss some equivalences between two recently introduced statistical learning schemes, namely Mercer kernel methods and information theoretic methods. We show that Parzen window-based estimators for some information theoretic cost functions are also cost functions in a corresponding Mercer kernel space. The Mercer kernel is directly related to the Parzen window. Furthermore, we analyze a classification rule based on an information theoretic criterion, and show that this corresponds to a linear classifier in the kernel space. By introducing a weighted Parzen window density estimator, we also formulate the support vector machine in this information theoretic perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel Methods

Classification Using the Zipfian Kernel

Article 12 April 2015

Marcel Jiřina & Marcel Jiřina Jr.

Kernel Methods

References

J. Shawe-Taylor and N. Cristianini, “Kernel Methods for Pattern Analysis,” Cambridge University Press, 2004.
K.R. Müller, S. Mika, G. Rätsch, K. Tsuda and B. Schölkopf, “An Introduction to Kernel-Based Learning Algorithms,” IEEE Trans. Neural Netw., vol. 12, no. 2, 2001, pp. 181–201.
Article Google Scholar
F. Perez-Cruz and O. Bousquet, “Kernel Methods and Their Potential Use in Signal Processing,” IEEE Signal Process. Mag., 2004, pp. 57–65, May.
B. Schölkopf and A.J. Smola, “Learning with Kernels,” MIT, Cambridge, 2002.
Google Scholar
C. Cortes and V.N. Vapnik, “Support Vector Networks,” Mach. Learn., vol. 20, 1995, pp. 273–297.
Google Scholar
V.N. Vapnik, “The Nature of Statistical Learning Theory,” Springer, Berlin Heidelberg New York, 1995.
MATH Google Scholar
N. Cristianini and J. Shawe-Taylor, “An Introduction to Support Vector Machines,” Cambridge University Press, Cambridge, 2000.
Google Scholar
C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Knowledge Discovery and Data Mining, vol. 2, no. 2, 1998, pp. 121–167.
Article Google Scholar
T. Hastie, S. Rosset, R. Tibshirani and J. Zhu, “The Entire Regularization Path for the Support Vector Machine,” J. Mach. Learn. Res., vol. 5, 2004, pp. 1391–1415.
Google Scholar
B. Schölkopf, A.J. Smola and K.R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Comput., vol. 10, 1998, pp. 1299–1319.
Article Google Scholar
S. Mika, G. Rätsch, J. Weston, B. Schölkopf and K.R. Müller, “Fisher Discriminant Analysis with Kernels,” in Proceedings of IEEE International Workshop on Neural Networks for Signal Processing, Madison, USA, August 23–25, 1999, pp. 41–48.
V. Roth and V. Steinhage, “Nonlinear Discriminant Analysis using Kernel Functions,” in Advances in Neural Information Processing Systems 12, MIT, Cambridge, 2000, pp. 568–574.
Y.A. LeCun, L.D. Jackel, L. Bottou, A. Brunot, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, U.A. Müller, E. Säckinger, P.Y. Simard and V.N. Vapnik, “Learning Algorithms for Classification: A Comparison on Handwritten Digit Reconstruction,” Neural Netw., 1995, pp. 261–276.
K.R. Müller, A.J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen and V.N. Vapnik, “Predicting Time Series with Support Vector Machines,” in Proceedings of International Conference on Artificial Neural Networks—Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 1997, vol. 1327, pp. 999–1004.
A. Zien, G. Rätsch, S. Mika, B. Schölkopf, T. Lengauer and K.R. Müller, “Engineering Support Vector Machine Kernels that Recognize Translation Invariant Sites in DNA,” Bioinformatics, vol. 16, 2000, pp. 906–914.
Article Google Scholar
J. Principe, D. Xu and J. Fisher, “Information Theoretic Learning,” in Unsupervised Adaptive Filtering, S. Haykin (Ed.), Wiley, New York, 2000, vol. I, Chapter 7.
Google Scholar
J.C. Principe, D. Xu, Q. Zhao and J.W. Fisher, “Learning From Examples with Information Theoretic Criteria,” J. VLSI Signal Process., vol. 26, no. 1, 2000, pp. 61–77.
Article Google Scholar
S. Haykin, (Ed.), “Unsupervised Adaptive Filtering: Volume 1, Blind Source Separation, Wiley, New York, 2000.
E. Parzen, “On the Estimation of a Probability Density Function and the Mode,” Ann. Math. Stat., vol. 32, 1962, pp. 1065–1076.
MathSciNet Google Scholar
L. Devroye, “On Random Variate Generation when only Moments or Fourier Coefficients are known,” Math. Comput. Simul., vol. 31, 1989, pp. 71–89.
Article MathSciNet Google Scholar
B.W. Silverman, “Density Estimation for Statistics and Data Analysis,” Chapman & Hall, London, 1986.
MATH Google Scholar
D.W. Scott, “Multivariate Density Estimation, ” Wiley, New York, 1992.
MATH Google Scholar
M.P. Wand and M.C. Jones, “Kernel Smooting, ” Chapman & Hall, London, 1995.
Google Scholar
P.A. Viola, N.N. Schraudolph and T.J. Sejnowski, “Empirical Entropy Manipulation for Real-World Problems,” in Advances in Neural Information Processing Systems, 8, MIT, Cambridge, 1995, pp. 851–857.
P. Viola and W.M. Wells, “Alignment by Maximization of Mutual Information,” Int. J. Comput. Vis., vol. 24, no. 2, 1997, pp. 137–154.
Article Google Scholar
D. Xu, “Energy, Entropy and Information Potential for Neural Computation, Ph.D. thesis, University of Florida, Gainesville, FL, USA, 1999.
A. Renyi, “Some Fundamental Questions of Information Theory,” Selected Papers of Alfred Renyi, Akademiai Kiado, Budapest, vol. 2, 1976, pp. 526–552.
A. Renyi, “On Measures of Entropy and Information,” Selected Papers of Alfred Renyi, Akademiai Kiado, Budapest, vol. 2, 1976, pp. 565–580.
M. Lazaro, I. Santamaria, D. Erdogmus, K.E. Hild II, C. Pantaleon and J.C. Principe, “Stochastic Blind Equalization Based on PDF Fitting using Parzen Estimator,” IEEE Trans. Signal Process., vol. 53, no. 2, 2005, pp. 696–704.
Article MathSciNet Google Scholar
D. Erdogmus, K.E. Hild, Y.N. Rao and J.C. Principe, “Minimax Mutual Information Approach for Independent Component Analysis,” Neural Comput., vol. 16, 2004, pp. 1235–1252.
Article Google Scholar
D. Erdogmus, K.E. Hild, J.C. Principe, M. Lazaro and I. Santamaria, “Adaptive Blind Deconvolution of Linear Channels using Renyi’s Entropy with Parzen Window Estimation,” IEEE Trans. Signal Process., vol. 52, no. 6, 2004, pp. 1489–1498.
Article MathSciNet Google Scholar
D. Erdogmus and J.C. Principe, “Convergence Properties and Data Efficiency of the Minimum Error-Entropy Criterion in Adaline Training,” IEEE Trans. Signal Process., vol. 51, no. 7, 2003, pp. 1966–1978.
Article Google Scholar
D. Erdogmus, K.E. Hild and J.C. Principe, “Blind Source Separation using Renyi’s α-Marginal Entropies,” Neurocomputing, vol. 49, 2002, pp. 25–38.
Article Google Scholar
I. Santamaria, D. Erdogmus and J.C. Principe, “Entropy Minimization for Supervised Digital Communications Channel Equalization,” IEEE Trans. Signal Process., vol. 50, no. 5, 2002, pp. 1184–1192.
Article Google Scholar
D. Erdogmus and J.C. Principe, “Generalized Information Potential Criterion for Adaptive System Training,” IEEE Trans. Neural Netw., vol. 13, no. 5, 2002, pp. 1035–1044.
Article Google Scholar
D. Erdogmus and J.C. Principe, “An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems,” IEEE Trans. Signal Process., vol. 50, no. 7, 2002, pp. 1780–1786.
Article Google Scholar
J. Mercer, “Functions of Positive and Negative Type and their Connection with the Theory of Integral Equations,” Philos. Trans. Roy. Soc. London, vol. A, 1909, pp. 415–446.
Google Scholar
M. Girolami, “Mercer Kernel-Based Clustering in Feature Space,” IEEE Trans. Neural Netw., vol. 13, no. 3, 2002, pp. 780–784.
Article Google Scholar
I.S. Dhillon, Y. Guan and B. Kulis, “Kernel K-means, Spectral Clustering and Normalized Cuts,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, USA, August 22–25, 2004, pp. 551–556.
L. Devroye and G. Lugosi, “Combinatorial Methods in Density Estimation,” Springer, Berlin Heidelberg New York, 2001.
MATH Google Scholar
J.H. Friedman, “On Bias, Variance, 0/1 Loss, and the Curse-Of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, no. 1, 1997, pp. 55–77.
Article Google Scholar
M. Girolami, “Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem,” Neural Comput., vol. 14, no. 3, 2002, pp. 669–688.
Article Google Scholar
D.W. Scott, “Parametric Statistical Modeling by Integrated Squared Error,” Technometrics, vol. 43, 2001, pp. 274–285.
Article MathSciNet Google Scholar
J.N. Kapur, “Measures of Information and their Applications,” Wiley, New York, 1994.
MATH Google Scholar
R. Jenssen, J.C. Principe and T. Eltoft, “Information Cut and Information Forces for Clustering,” in Proceedings of IEEE International Workshop on Neural Networks for Signal Processing, Toulouse, France, September 17–19, 2003, pp. 459–468.
M. Di Marzio and C.C. Taylor, “Kernel Density Classification and Boosting: An L₂ Analysis,” Stat. Comput., vol. 15, no. 2, 2005, pp. 113–123.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Physics and Technology, University of Tromsø, N–9037, Tromsø, Norway
Robert Jenssen & Torbjørn Eltoft
Computer Science and Engineering Department, Oregon Graduate Institute, OHSU, Portland, OR, 97006, USA
Deniz Erdogmus
Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL, 32611, USA
Jose C. Principe

Authors

Robert Jenssen
View author publications
You can also search for this author in PubMed Google Scholar
Torbjørn Eltoft
View author publications
You can also search for this author in PubMed Google Scholar
Deniz Erdogmus
View author publications
You can also search for this author in PubMed Google Scholar
Jose C. Principe
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jenssen, R., Eltoft, T., Erdogmus, D. et al. Some Equivalences between Kernel Methods and Information Theoretic Methods. J VLSI Sign Process Syst Sign Image Video Technol 45, 49–65 (2006). https://doi.org/10.1007/s11265-006-9771-8

Download citation

Published: 05 December 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s11265-006-9771-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Some Equivalences between Kernel Methods and Information Theoretic Methods

Abstract

Access this article

Similar content being viewed by others

Kernel Methods

Classification Using the Zipfian Kernel

Kernel Methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Some Equivalences between Kernel Methods and Information Theoretic Methods

Abstract

Access this article

Similar content being viewed by others

Kernel Methods

Classification Using the Zipfian Kernel

Kernel Methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation