Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces

Principe, José C.

doi:10.1007/978-1-4419-1570-2_1

José C. Principe²

Part of the book series: Information Science and Statistics ((ISS))

4377 Accesses

Abstract

The common problem faced by many data processing professionals is how to best extract the information contained in data. In our daily lives and in our professions, we are bombarded by huge amounts of data, but most often data are not our primary interest. Data hides, either in time structure or in spatial redundancy, important clues to answer the information-processing questions we pose. We are using the term information in the colloquial sense, and therefore it may mean different things to different people, which is OK for now. We all realize that the use of computers and the Web accelerated tremendously the accessibility and the amount of data being generated. Therefore the pressure to distill information from data will mount at an increasing pace in the future, and old ways of dealing with this problem will be forced to evolve and adapt to the new reality. To many (including the author) this represents nothing less than a paradigm shift, from hypothesis-based, to evidence-based science and it will affect the core design strategies in many disciplines including learning theory and adaptive systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Reproducing Kernel Hilbert Space, Representer Theorem

Kernel Methods

The Reproducing Kernel Property and Its Space: More or Less Standard Examples of Applications

References

Aczél J., Daróczy Z., On measures of information and their characterizations, Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.
Google Scholar
Aronszajn N., The theory of reproducing kernels and their applications, Cambridge Philos. Soc. Proc., vol. 39:133–153, 1943.
Article MathSciNet Google Scholar
Aronszajn N., Theory of reproducing kernels, Trans. of the Amer. Math. Soc., 68(3):337–404, 1950.
Article MATH MathSciNet Google Scholar
Berlinet A., Thomas-Agnan C., Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer, Norwell, MA, 2003.
Google Scholar
Bregman L.M. (1967). The relaxation method of finding the common points of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7:200–217.
Article Google Scholar
Burbea J., Rao C, Entropy differential metric, distance and divergence measures in probability spaces: A unified approach, J. Multivar. Anal., 12:575–596, 1982.
Article MATH MathSciNet Google Scholar
Casals J., Jutten C., Taleb A., Source separation techniques applied to linear prediction, Proc. ICA’00, Helsinki, Finland, pp. 193–204, 2000.
Google Scholar
Chi C., Chen C., Cumulant-based inverse filter criteria for MIMO blind deconvolution: Properties, algorithms, and application to D/CDMA systems in multipath, IEEE Trans. Signal Process., 49(7):1282–1299, 2001
Article Google Scholar
Cover T., Thomas J., Elements of Information Theory, Wiley, New York, 1991
Book MATH Google Scholar
Csiszar I., Information type measures of difference of probability distributions and indirect observations, Stuia Sci. Math. Hungary, 2: 299–318, 1967.
MATH MathSciNet Google Scholar
Deco G., Obradovic D., An Information-Theoretic Approach to Neural Computing, Springer, New York, 1996.
Book MATH Google Scholar
DeFigueiredo R., A generalized Fock space framework for nonlinear system and signal analysis, IEEE Trans. Circuits Syst., CAS-30(9):637–647, Sept. 1983.
Article MathSciNet Google Scholar
Dhillon I., Guan Y., Kulisweifeng B., Kernel k-means, spectral clustering and normalized cuts”, Proc.Tenth ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining (KDD), pp. 551–556, August 2004.
Google Scholar
Erdogmus D., Principe J. From linear adaptive filtering to nonlinear signal processing” IEEE SP Mag., 23:14–33, 2006.
Article Google Scholar
Fano R., Transmission of Information: A Statistical Theory of Communications, MIT Press, New York, 1961.
Google Scholar
Feng X., Loparo K., Fang Y., Optimal state estimation for stochastic systems: An information theoretic approach, IEEE Trans. Autom. Control, 42(6):771–785, 1997.
Article MATH MathSciNet Google Scholar
Fisher J., Ihler A., Viola P., Learning informative statistics: A nonparametric approach, Proceedings of NIPS’00, pp. 900–906, 2000.
Google Scholar
Fock V., The Theory of Space Time and Gravitation”, Pergamon Press, New York, 1959.
Google Scholar
Fu K., Statistical pattern recognition, in Adaptive, Learning and Pattern Recognition Systems, Mendel and Fu Eds., Academic Press, New York, 1970, pp. 35–76.
Chapter Google Scholar
Fukunaga K., An Introduction to Statistical Pattern Recognition, Academic Press, New York, 1972
Google Scholar
Girolami M., Orthogonal series density estimation and the kernel eigenvalue problem. Neural Comput., 14(3):669–688, 2002.
Article MATH MathSciNet Google Scholar
Hardle W., Applied Nonparametric Regression, Econometric Society Monographs vol 19, Cambridge University Press, New York, 1990.
Google Scholar
Hartley R., Transmission of information, Bell Syst. Tech. J., 7:535, 1928.
Article Google Scholar
Haykin S. (ed.), Blind Deconvolution, Prentice-Hall, Upper Saddle River, NJ, 1994.
Google Scholar
Haykin S., Neural Networks: A Comprehensive Foundation, Prentice Hall, Upper Saddle River, NJ, 1999.
MATH Google Scholar
Hinton G. and Sejnowski T., Unsupervised learning: Foundations of neural computation, MIT Press, Cambridge, MA, 1999.
Google Scholar
Hyvarinen A., Karhunen J., Oja E., Independent Component Analysis, Wiley, New York, 2001.
Book Google Scholar
Jones M., McKay I., Hu T., Variable location and scale density estimation, Ann. Inst. Statist. Math., 46:345–52, 1994.
MathSciNet Google Scholar
Jumarie G., Relative Information, Springer Verlag, New York, 1990
Book MATH Google Scholar
Kailath T., RKHS approach to detection and estimation problems–part I: Deterministic signals in Gaussian noise, IEEE Trans. Inf. Theor., IT-17(5):530–549, Sept. 1971.
Article MathSciNet Google Scholar
Kailath T. and Duttweiler D., An RKHS approach to detection and estimation problems-part III: Generalized innovations representations and a likelihood-ratio formula, IEEE Trans. Inf. Theor., IT-18(6):30–45, November 1972.
MathSciNet Google Scholar
Kailath T. and Weinert H., An RKHS approach to detection and estimation problems-part II: Gaussian signal detection, IEEE Trans. Inf. Theor., IT-21(1):15–23, January 1975.
Article MathSciNet Google Scholar
Kapur J., Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.
MATH Google Scholar
Kass R. and Vos P., Geometrical Foundations of Asymptotic Inference, Wiley, New York, 1997.
Book MATH Google Scholar
Kolmogorov A., Interpolation and extrapolation of stationary random processes, Rand Co. (translation from the Russian), Santa Monica, CA, 1962.
Google Scholar
LeCun Y., Chopra S., Hadsell R., Ranzato M., Huang F., A tutorial on energy-based learning, in Predicting Structured Data, Bakir, Hofman, Scholkopf, Smola, Taskar (Eds.), MIT Press, Boston, 2006.
Google Scholar
Linsker R., Towards an organizing principle for a layered perceptual network. In D. Z. Anderson (Ed.), Neural Information Processing Systems - Natural and Synthetic. American Institute of Physics, New York, 1988.
Google Scholar
Liu W., Pokarel P., Principe J., The kernel LMS algorithm, IEEE Trans. Signal Process., 56(2):543–554, Feb. 2008.
Article MathSciNet Google Scholar
Loève, M.M., Probability Theory, VanNostrand, Princeton, NJ, 1955.
MATH Google Scholar
Mate L., Hilbert Space Methods in Science and Engineering, Hilger, New York, 1989.
MATH Google Scholar
Menendez M., Morales D., Pardo L., Salicru M., Asymptotic behavior and stastistical applications of divergence measures in multinomial populations: a unified study, Statistical Papers, 36–129, 1995.
Google Scholar
Mercer J., Functions of positive and negative type, and their connection with the theory of integral equations, Philosoph. Trans. Roy. Soc. Lond., 209:415–446, 1909.
Article MATH Google Scholar
Muller K., Smola A., Ratsch G., Scholkopf B., Kohlmorgen J., Vapnik V., Predicting time series with support vector machines. In Proceedings of International Conference on Artificial Neural Networks, Lecture Notes in Computer Science, volume 1327, pages 999–1004, Springer-Verlag, Berlin, 1997.
Google Scholar
Nilsson N., Learning Machines, Morgan Kauffman, San Mateo, Ca, 1933.
Google Scholar
Papoulis A., Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1965.
MATH Google Scholar
Parzen E., Statistical inference on time series by Hilbert space methods, Tech. Report 23, Stat. Dept., Stanford Univ., 1959.
Google Scholar
Parzen E., On the estimation of a probability density function and the mode, Ann. Math. Statist., 33:1065–1067, 1962.
Article MATH MathSciNet Google Scholar
Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.
Google Scholar
Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970.
Google Scholar
Rosenblatt M., Remarks on some nonparametric estimates of a density function, Ann. Math. Statist., 27:832–837, 1956.
Article MATH MathSciNet Google Scholar
Salicru M., Menendez M., Morales D., Pardo L., Asymptotic distribution of (h,ϕ)-entropies, Comm. Statist. Theor. Meth., 22(7):2015–2031, 1993.
Article MATH MathSciNet Google Scholar
Schölkopf B., Smola A., Muller K., Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput., 10:1299–1319, 1998.
Article Google Scholar
Schölkopf B. and Smola A., Learning with Kernels. MIT Press, Cambridge, MA, 2002
Google Scholar
Shannon C., and Weaver W., The mathematical Theory of Communication, University of Illinois Press, Urbana, 1949.
MATH Google Scholar
Shawe-Taylor J. Cristianini N., Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK, 2004.
Google Scholar
Suykens J., Gestel T., Brabanter J., Moor B., Vandewalle J., Least Squares Support Vector Machines, Word Scientific, Singapore, 2002.
Google Scholar
Tishby N., Pereira F., and Bialek W., The information bottleneck method. In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, pp. 368–377, 1999.
Google Scholar
Vapnik V., The Nature of Statistical Learning Theory, Springer-Verlag, New York, 1995
Book MATH Google Scholar
Wahba G., Spline Models for Observational Data, SIAM,. Philadelphia, PA, 1990, vol. 49.
Google Scholar
Watanabe S., Pattern Recognition: Human and Mechanical. Wiley, New York, 1985.
Google Scholar
Werbos P., Beyond regression: New tools for prediction and analysis in the behavioral sciences, Ph.D. Thesis, Harvard University, Cambridge, 1974.
Google Scholar
Widrow B., S. Stearns, Adaptive Signal Processing, Prentice Hall, Englewood Cliffs, NJ, 1985.
Google Scholar
Wiener N., Nonlinear Problems in Random Theory, MIT, Boston, 1958.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Electrical Engineering & Biomedical Engineering, University of Florida, Gainesville, FL, 32611, NEB 451, Bldg. 33, USA
José C. Principe

Authors

José C. Principe
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Principe, J.C. (2010). Information Theory, Machine Learning, and Reproducing Kernel Hilbert Spaces. In: Information Theoretic Learning. Information Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1570-2_1

Download citation

DOI: https://doi.org/10.1007/978-1-4419-1570-2_1
Published: 19 March 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-1569-6
Online ISBN: 978-1-4419-1570-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics