Skip to main content
Log in

Some Equivalences between Kernel Methods and Information Theoretic Methods

  • Published:
Journal of VLSI signal processing systems for signal, image and video technology Aims and scope Submit manuscript

Abstract

In this paper, we discuss some equivalences between two recently introduced statistical learning schemes, namely Mercer kernel methods and information theoretic methods. We show that Parzen window-based estimators for some information theoretic cost functions are also cost functions in a corresponding Mercer kernel space. The Mercer kernel is directly related to the Parzen window. Furthermore, we analyze a classification rule based on an information theoretic criterion, and show that this corresponds to a linear classifier in the kernel space. By introducing a weighted Parzen window density estimator, we also formulate the support vector machine in this information theoretic perspective.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. J. Shawe-Taylor and N. Cristianini, “Kernel Methods for Pattern Analysis,” Cambridge University Press, 2004.

  2. K.R. Müller, S. Mika, G. Rätsch, K. Tsuda and B. Schölkopf, “An Introduction to Kernel-Based Learning Algorithms,” IEEE Trans. Neural Netw., vol. 12, no. 2, 2001, pp. 181–201.

    Article  Google Scholar 

  3. F. Perez-Cruz and O. Bousquet, “Kernel Methods and Their Potential Use in Signal Processing,” IEEE Signal Process. Mag., 2004, pp. 57–65, May.

  4. B. Schölkopf and A.J. Smola, “Learning with Kernels,” MIT, Cambridge, 2002.

    Google Scholar 

  5. C. Cortes and V.N. Vapnik, “Support Vector Networks,” Mach. Learn., vol. 20, 1995, pp. 273–297.

    Google Scholar 

  6. V.N. Vapnik, “The Nature of Statistical Learning Theory,” Springer, Berlin Heidelberg New York, 1995.

    MATH  Google Scholar 

  7. N. Cristianini and J. Shawe-Taylor, “An Introduction to Support Vector Machines,” Cambridge University Press, Cambridge, 2000.

    Google Scholar 

  8. C.J.C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Knowledge Discovery and Data Mining, vol. 2, no. 2, 1998, pp. 121–167.

    Article  Google Scholar 

  9. T. Hastie, S. Rosset, R. Tibshirani and J. Zhu, “The Entire Regularization Path for the Support Vector Machine,” J. Mach. Learn. Res., vol. 5, 2004, pp. 1391–1415.

    Google Scholar 

  10. B. Schölkopf, A.J. Smola and K.R. Müller, “Nonlinear Component Analysis as a Kernel Eigenvalue Problem,” Neural Comput., vol. 10, 1998, pp. 1299–1319.

    Article  Google Scholar 

  11. S. Mika, G. Rätsch, J. Weston, B. Schölkopf and K.R. Müller, “Fisher Discriminant Analysis with Kernels,” in Proceedings of IEEE International Workshop on Neural Networks for Signal Processing, Madison, USA, August 23–25, 1999, pp. 41–48.

  12. V. Roth and V. Steinhage, “Nonlinear Discriminant Analysis using Kernel Functions,” in Advances in Neural Information Processing Systems 12, MIT, Cambridge, 2000, pp. 568–574.

  13. Y.A. LeCun, L.D. Jackel, L. Bottou, A. Brunot, C. Cortes, J.S. Denker, H. Drucker, I. Guyon, U.A. Müller, E. Säckinger, P.Y. Simard and V.N. Vapnik, “Learning Algorithms for Classification: A Comparison on Handwritten Digit Reconstruction,” Neural Netw., 1995, pp. 261–276.

  14. K.R. Müller, A.J. Smola, G. Rätsch, B. Schölkopf, J. Kohlmorgen and V.N. Vapnik, “Predicting Time Series with Support Vector Machines,” in Proceedings of International Conference on Artificial Neural Networks—Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 1997, vol. 1327, pp. 999–1004.

  15. A. Zien, G. Rätsch, S. Mika, B. Schölkopf, T. Lengauer and K.R. Müller, “Engineering Support Vector Machine Kernels that Recognize Translation Invariant Sites in DNA,” Bioinformatics, vol. 16, 2000, pp. 906–914.

    Article  Google Scholar 

  16. J. Principe, D. Xu and J. Fisher, “Information Theoretic Learning,” in Unsupervised Adaptive Filtering, S. Haykin (Ed.), Wiley, New York, 2000, vol. I, Chapter 7.

    Google Scholar 

  17. J.C. Principe, D. Xu, Q. Zhao and J.W. Fisher, “Learning From Examples with Information Theoretic Criteria,” J. VLSI Signal Process., vol. 26, no. 1, 2000, pp. 61–77.

    Article  Google Scholar 

  18. S. Haykin, (Ed.), “Unsupervised Adaptive Filtering: Volume 1, Blind Source Separation, Wiley, New York, 2000.

  19. E. Parzen, “On the Estimation of a Probability Density Function and the Mode,” Ann. Math. Stat., vol. 32, 1962, pp. 1065–1076.

    MathSciNet  Google Scholar 

  20. L. Devroye, “On Random Variate Generation when only Moments or Fourier Coefficients are known,” Math. Comput. Simul., vol. 31, 1989, pp. 71–89.

    Article  MathSciNet  Google Scholar 

  21. B.W. Silverman, “Density Estimation for Statistics and Data Analysis,” Chapman & Hall, London, 1986.

    MATH  Google Scholar 

  22. D.W. Scott, “Multivariate Density Estimation, ” Wiley, New York, 1992.

    MATH  Google Scholar 

  23. M.P. Wand and M.C. Jones, “Kernel Smooting, ” Chapman & Hall, London, 1995.

    Google Scholar 

  24. P.A. Viola, N.N. Schraudolph and T.J. Sejnowski, “Empirical Entropy Manipulation for Real-World Problems,” in Advances in Neural Information Processing Systems, 8, MIT, Cambridge, 1995, pp. 851–857.

  25. P. Viola and W.M. Wells, “Alignment by Maximization of Mutual Information,” Int. J. Comput. Vis., vol. 24, no. 2, 1997, pp. 137–154.

    Article  Google Scholar 

  26. D. Xu, “Energy, Entropy and Information Potential for Neural Computation, Ph.D. thesis, University of Florida, Gainesville, FL, USA, 1999.

  27. A. Renyi, “Some Fundamental Questions of Information Theory,” Selected Papers of Alfred Renyi, Akademiai Kiado, Budapest, vol. 2, 1976, pp. 526–552.

  28. A. Renyi, “On Measures of Entropy and Information,” Selected Papers of Alfred Renyi, Akademiai Kiado, Budapest, vol. 2, 1976, pp. 565–580.

  29. M. Lazaro, I. Santamaria, D. Erdogmus, K.E. Hild II, C. Pantaleon and J.C. Principe, “Stochastic Blind Equalization Based on PDF Fitting using Parzen Estimator,” IEEE Trans. Signal Process., vol. 53, no. 2, 2005, pp. 696–704.

    Article  MathSciNet  Google Scholar 

  30. D. Erdogmus, K.E. Hild, Y.N. Rao and J.C. Principe, “Minimax Mutual Information Approach for Independent Component Analysis,” Neural Comput., vol. 16, 2004, pp. 1235–1252.

    Article  Google Scholar 

  31. D. Erdogmus, K.E. Hild, J.C. Principe, M. Lazaro and I. Santamaria, “Adaptive Blind Deconvolution of Linear Channels using Renyi’s Entropy with Parzen Window Estimation,” IEEE Trans. Signal Process., vol. 52, no. 6, 2004, pp. 1489–1498.

    Article  MathSciNet  Google Scholar 

  32. D. Erdogmus and J.C. Principe, “Convergence Properties and Data Efficiency of the Minimum Error-Entropy Criterion in Adaline Training,” IEEE Trans. Signal Process., vol. 51, no. 7, 2003, pp. 1966–1978.

    Article  Google Scholar 

  33. D. Erdogmus, K.E. Hild and J.C. Principe, “Blind Source Separation using Renyi’s α-Marginal Entropies,” Neurocomputing, vol. 49, 2002, pp. 25–38.

    Article  Google Scholar 

  34. I. Santamaria, D. Erdogmus and J.C. Principe, “Entropy Minimization for Supervised Digital Communications Channel Equalization,” IEEE Trans. Signal Process., vol. 50, no. 5, 2002, pp. 1184–1192.

    Article  Google Scholar 

  35. D. Erdogmus and J.C. Principe, “Generalized Information Potential Criterion for Adaptive System Training,” IEEE Trans. Neural Netw., vol. 13, no. 5, 2002, pp. 1035–1044.

    Article  Google Scholar 

  36. D. Erdogmus and J.C. Principe, “An Error-Entropy Minimization Algorithm for Supervised Training of Nonlinear Adaptive Systems,” IEEE Trans. Signal Process., vol. 50, no. 7, 2002, pp. 1780–1786.

    Article  Google Scholar 

  37. J. Mercer, “Functions of Positive and Negative Type and their Connection with the Theory of Integral Equations,” Philos. Trans. Roy. Soc. London, vol. A, 1909, pp. 415–446.

    Google Scholar 

  38. M. Girolami, “Mercer Kernel-Based Clustering in Feature Space,” IEEE Trans. Neural Netw., vol. 13, no. 3, 2002, pp. 780–784.

    Article  Google Scholar 

  39. I.S. Dhillon, Y. Guan and B. Kulis, “Kernel K-means, Spectral Clustering and Normalized Cuts,” in Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, USA, August 22–25, 2004, pp. 551–556.

  40. L. Devroye and G. Lugosi, “Combinatorial Methods in Density Estimation,” Springer, Berlin Heidelberg New York, 2001.

    MATH  Google Scholar 

  41. J.H. Friedman, “On Bias, Variance, 0/1 Loss, and the Curse-Of-Dimensionality,” Data Mining and Knowledge Discovery, vol. 1, no. 1, 1997, pp. 55–77.

    Article  Google Scholar 

  42. M. Girolami, “Orthogonal Series Density Estimation and the Kernel Eigenvalue Problem,” Neural Comput., vol. 14, no. 3, 2002, pp. 669–688.

    Article  Google Scholar 

  43. D.W. Scott, “Parametric Statistical Modeling by Integrated Squared Error,” Technometrics, vol. 43, 2001, pp. 274–285.

    Article  MathSciNet  Google Scholar 

  44. J.N. Kapur, “Measures of Information and their Applications,” Wiley, New York, 1994.

    MATH  Google Scholar 

  45. R. Jenssen, J.C. Principe and T. Eltoft, “Information Cut and Information Forces for Clustering,” in Proceedings of IEEE International Workshop on Neural Networks for Signal Processing, Toulouse, France, September 17–19, 2003, pp. 459–468.

  46. M. Di Marzio and C.C. Taylor, “Kernel Density Classification and Boosting: An L2 Analysis,” Stat. Comput., vol. 15, no. 2, 2005, pp. 113–123.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jenssen, R., Eltoft, T., Erdogmus, D. et al. Some Equivalences between Kernel Methods and Information Theoretic Methods. J VLSI Sign Process Syst Sign Image Video Technol 45, 49–65 (2006). https://doi.org/10.1007/s11265-006-9771-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-006-9771-8

Keywords

Navigation