Skip to main content

Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces

  • Chapter
  • First Online:
Information Theoretic Learning

Part of the book series: Information Science and Statistics ((ISS))

  • 4377 Accesses

Abstract

The common problem faced by many data processing professionals is how to best extract the information contained in data. In our daily lives and in our professions, we are bombarded by huge amounts of data, but most often data are not our primary interest. Data hides, either in time structure or in spatial redundancy, important clues to answer the information-processing questions we pose. We are using the term information in the colloquial sense, and therefore it may mean different things to different people, which is OK for now. We all realize that the use of computers and the Web accelerated tremendously the accessibility and the amount of data being generated. Therefore the pressure to distill information from data will mount at an increasing pace in the future, and old ways of dealing with this problem will be forced to evolve and adapt to the new reality. To many (including the author) this represents nothing less than a paradigm shift, from hypothesis-based, to evidence-based science and it will affect the core design strategies in many disciplines including learning theory and adaptive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aczél J., Daróczy Z., On measures of information and their characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.

    Google Scholar 

  2. Aronszajn N., The theory of reproducing kernels and their applications, Cambridge Philos. Soc. Proc., vol. 39:133–153, 1943.

    Article  MathSciNet  Google Scholar 

  3. Aronszajn N., Theory of reproducing kernels, Trans. of the Amer. Math. Soc., 68(3):337–404, 1950.

    Article  MATH  MathSciNet  Google Scholar 

  4. Berlinet A., Thomas-Agnan C., Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer, Norwell, MA, 2003.

    Google Scholar 

  5. Bregman L.M. (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7:200–217.

    Article  Google Scholar 

  6. Burbea J., Rao C, Entropy differential metric, distance and divergence measures in probability spaces: A unified approach, J. Multivar. Anal., 12:575–596, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  7. Casals J., Jutten C., Taleb A., Source separation techniques applied to linear prediction, Proc. ICA’00, Helsinki, Finland, pp. 193–204, 2000.

    Google Scholar 

  8. Chi C., Chen C., Cumulant-based inverse filter criteria for MIMO blind deconvolution: Properties, algorithms, and application to D/CDMA systems in multipath, IEEE Trans. Signal Process., 49(7):1282–1299, 2001

    Article  Google Scholar 

  9. Cover T., Thomas J., Elements of Information Theory, Wiley, New York, 1991

    Book  MATH  Google Scholar 

  10. Csiszar I., Information type measures of difference of probability distributions and indirect observations, Stuia Sci. Math. Hungary, 2: 299–318, 1967.

    MATH  MathSciNet  Google Scholar 

  11. Deco G., Obradovic D., An Information-Theoretic Approach to Neural Computing, Springer, New York, 1996.

    Book  MATH  Google Scholar 

  12. DeFigueiredo R., A generalized Fock space framework for nonlinear system and signal analysis, IEEE Trans. Circuits Syst., CAS-30(9):637–647, Sept. 1983.

    Article  MathSciNet  Google Scholar 

  13. Dhillon I., Guan Y., Kulisweifeng B., Kernel k-means, spectral clustering and normalized cuts”, Proc.Tenth ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining (KDD), pp. 551–556, August 2004.

    Google Scholar 

  14. Erdogmus D., Principe J. From linear adaptive filtering to nonlinear signal processing” IEEE SP Mag., 23:14–33, 2006.

    Article  Google Scholar 

  15. Fano R., Transmission of Information: A Statistical Theory of Communications, MIT Press, New York, 1961.

    Google Scholar 

  16. Feng X., Loparo K., Fang Y., Optimal state estimation for stochastic systems: An information theoretic approach, IEEE Trans. Autom. Control, 42(6):771–785, 1997.

    Article  MATH  MathSciNet  Google Scholar 

  17. Fisher J., Ihler A., Viola P., Learning informative statistics: A nonparametric approach, Proceedings of NIPS’00, pp. 900–906, 2000.

    Google Scholar 

  18. Fock V., The Theory of Space Time and Gravitation”, Pergamon Press, New York, 1959.

    Google Scholar 

  19. Fu K., Statistical pattern recognition, in Adaptive, Learning and Pattern Recognition Systems, Mendel and Fu Eds., Academic Press, New York, 1970, pp. 35–76.

    Chapter  Google Scholar 

  20. Fukunaga K., An Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972

    Google Scholar 

  21. Girolami M., Orthogonal series density estimation and the kernel eigenvalue problem. Neural Comput., 14(3):669–688, 2002.

    Article  MATH  MathSciNet  Google Scholar 

  22. Hardle W., Applied Nonparametric Regression, Econometric Society Monographs vol 19, Cambridge University Press, New York, 1990.

    Google Scholar 

  23. Hartley R., Transmission of information, Bell Syst. Tech. J., 7:535, 1928.

    Article  Google Scholar 

  24. Haykin S. (ed.), Blind Deconvolution, Prentice-Hall, Upper Saddle River, NJ, 1994.

    Google Scholar 

  25. Haykin S., Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999.

    MATH  Google Scholar 

  26. Hinton G. and Sejnowski T., Unsupervised learning: Foundations of neural computation, MIT Press, Cambridge, MA, 1999.

    Google Scholar 

  27. Hyvarinen A., Karhunen J., Oja E., Independent Component Analysis, Wiley, New York, 2001.

    Book  Google Scholar 

  28. Jones M., McKay I., Hu T., Variable location and scale density estimation, Ann. Inst. Statist. Math., 46:345–52, 1994.

    MathSciNet  Google Scholar 

  29. Jumarie G., Relative Information, Springer Verlag, New York, 1990

    Book  MATH  Google Scholar 

  30. Kailath T., RKHS approach to detection and estimation problems–part I: Deterministic signals in Gaussian noise, IEEE Trans. Inf. Theor., IT-17(5):530–549, Sept. 1971.

    Article  MathSciNet  Google Scholar 

  31. Kailath T. and Duttweiler D., An RKHS approach to detection and estimation problems-part III: Generalized innovations representations and a likelihood-ratio formula, IEEE Trans. Inf. Theor., IT-18(6):30–45, November 1972.

    MathSciNet  Google Scholar 

  32. Kailath T. and Weinert H., An RKHS approach to detection and estimation problems-part II: Gaussian signal detection, IEEE Trans. Inf. Theor., IT-21(1):15–23, January 1975.

    Article  MathSciNet  Google Scholar 

  33. Kapur J., Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.

    MATH  Google Scholar 

  34. Kass R. and Vos P., Geometrical Foundations of Asymptotic Inference, Wiley, New York, 1997.

    Book  MATH  Google Scholar 

  35. Kolmogorov A., Interpolation and extrapolation of stationary random processes, Rand Co. (translation from the Russian), Santa Monica, CA, 1962.

    Google Scholar 

  36. LeCun Y., Chopra S., Hadsell R., Ranzato M., Huang F., A tutorial on energy-based learning, in Predicting Structured Data, Bakir, Hofman, Scholkopf, Smola, Taskar (Eds.), MIT Press, Boston, 2006.

    Google Scholar 

  37. Linsker R., Towards an organizing principle for a layered perceptual network. In D. Z. Anderson (Ed.), Neural Information Processing Systems - Natural and Synthetic. American Institute of Physics, New York, 1988.

    Google Scholar 

  38. Liu W., Pokarel P., Principe J., The kernel LMS algorithm, IEEE Trans. Signal Process., 56(2):543–554, Feb. 2008.

    Article  MathSciNet  Google Scholar 

  39. Loève, M.M., Probability Theory, VanNostrand, Princeton, NJ, 1955.

    MATH  Google Scholar 

  40. Mate L., Hilbert Space Methods in Science and Engineering, Hilger, New York, 1989.

    MATH  Google Scholar 

  41. Menendez M., Morales D., Pardo L., Salicru M., Asymptotic behavior and stastistical applications of divergence measures in multinomial populations: a unified study, Statistical Papers, 36–129, 1995.

    Google Scholar 

  42. Mercer J., Functions of positive and negative type, and their connection with the theory of integral equations, Philosoph. Trans. Roy. Soc. Lond., 209:415–446, 1909.

    Article  MATH  Google Scholar 

  43. Muller K., Smola A., Ratsch G., Scholkopf B., Kohlmorgen J., Vapnik V., Predicting time series with support vector machines. In Proceedings of International Conference on Artificial Neural Networks, Lecture Notes in Computer Science, volume 1327, pages 999–1004, Springer-Verlag, Berlin, 1997.

    Google Scholar 

  44. Nilsson N., Learning Machines, Morgan Kauffman, San Mateo, Ca, 1933.

    Google Scholar 

  45. Papoulis A., Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965.

    MATH  Google Scholar 

  46. Parzen E., Statistical inference on time series by Hilbert space methods, Tech. Report 23, Stat. Dept., Stanford Univ., 1959.

    Google Scholar 

  47. Parzen E., On the estimation of a probability density function and the mode, Ann. Math. Statist., 33:1065–1067, 1962.

    Article  MATH  MathSciNet  Google Scholar 

  48. Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.

    Google Scholar 

  49. Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970.

    Google Scholar 

  50. Rosenblatt M., Remarks on some nonparametric estimates of a density function, Ann. Math. Statist., 27:832–837, 1956.

    Article  MATH  MathSciNet  Google Scholar 

  51. Salicru M., Menendez M., Morales D., Pardo L., Asymptotic distribution of (h,ϕ)-entropies, Comm. Statist. Theor. Meth., 22(7):2015–2031, 1993.

    Article  MATH  MathSciNet  Google Scholar 

  52. Schölkopf B., Smola A., Muller K., Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10:1299–1319, 1998.

    Article  Google Scholar 

  53. Schölkopf B. and Smola A., Learning with Kernels. MIT Press, Cambridge, MA, 2002

    Google Scholar 

  54. Shannon C., and Weaver W., The mathematical Theory of Communication, University of Illinois Press, Urbana, 1949.

    MATH  Google Scholar 

  55. Shawe-Taylor J. Cristianini N., Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.

    Google Scholar 

  56. Suykens J., Gestel T., Brabanter J., Moor B., Vandewalle J., Least Squares Support Vector Machines, Word Scientific, Singapore, 2002.

    Google Scholar 

  57. Tishby N., Pereira F., and Bialek W., The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377, 1999.

    Google Scholar 

  58. Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995

    Book  MATH  Google Scholar 

  59. Wahba G., Spline Models for Observational Data, SIAM,. Philadelphia, PA, 1990, vol. 49.

    Google Scholar 

  60. Watanabe S., Pattern Recognition: Human and Mechanical. Wiley, New York, 1985.

    Google Scholar 

  61. Werbos P., Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D. Thesis, Harvard University, Cambridge, 1974.

    Google Scholar 

  62. Widrow B., S. Stearns, Adaptive Signal Processing, Prentice Hall, Englewood Cliffs, NJ, 1985.

    Google Scholar 

  63. Wiener N., Nonlinear Problems in Random Theory, MIT, Boston, 1958.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Principe, J.C. (2010). Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces. In: Information Theoretic Learning. Information Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1570-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-1570-2_1

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-1569-6

  • Online ISBN: 978-1-4419-1570-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics