Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation

Erdogmus, Deniz; Principe, Jose C.; Hild, Kenneth E.

doi:10.1023/A:1015064029375

Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation

Published: March 2002

Volume 1, pages 85–108, (2002)
Cite this article

Natural Computing Aims and scope Submit manuscript

Deniz Erdogmus¹,
Jose C. Principe¹ &
Kenneth E. Hild II¹

214 Accesses
13 Citations
Explore all metrics

Abstract

Second order statistics have formed the basisof learning and adaptation due to its appealand analytical simplicity. On the other hand,in many realistic engineering problemsrequiring adaptive solutions, it is notsufficient to consider only the second orderstatistics of the underlying distributions. Entropy, being the average information contentof a distribution, is a better-suited criterionfor adaptation purposes, since it allows thedesigner to manipulate the information contentof the signals rather than merely their power. This paper introduces a nonparametric estimatorof Renyi's entropy, which can be utilized inany adaptation scenario where entropy plays arole. This nonparametric estimator leads to aninteresting analogy between learning andinteracting particles in a potential field. Itturns out that learning by second orderstatistics is a special case of thisinteraction model for learning. We investigatethe mathematical properties of thisnonparametric entropy estimator, provide batchand stochastic gradient expressions foroff-line and on-line adaptation, and illustratethe performance of the corresponding algorithmsin examples of supervised and unsupervisedtraining, including time-series prediction andICA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

Article Open access 22 June 2022

Principled Design of Continuous Stochastic Search: From Theory to Practice

Statistics with set-valued functions: applications to inverse approximate optimization

Article 07 March 2018

References

Amari S (1985) Differential-Geometrical Methods in Statistics. Springer-Verlag, Berlin
Google Scholar
Amari S (1998) Natural gradient works efficiently in learning. Neural Computation 10: 251-276
Google Scholar
Bell A and Sejnowski T (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7: 1129-1159
Google Scholar
Bishop C (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford
Google Scholar
Comon P (1994) Independent component analysis, a new concept? Signal Proc. 36: 287-314
Google Scholar
Cover T and Thomas J (1991) Elements of Information Theory. John Wiley, New York
Google Scholar
Deco G and Obradovic D (1996) An Information-Theoretic Approach to Neural Computing. Springer, NY
Google Scholar
Erdogmus D and Principe JC (2002) Generalized information potential criterion for adaptive system training. To appear in IEEE Transactions on Neural Networks
Erdogmus D and Principe JC (2001) An on-line adaptation algorithm for adaptive system training with minimum error entropy: Stochastic information gradient: 7-12, ICA
Erdogmus D, Hild II KE and Principe JC (2002) Blind Source Separation Using Renyi's ?-Marginal Entropies. To appear in Neurocomputation
Erdogmus D, Hild II KE and Principe JC (2002) Do Hebbian synapses estimate entropy? Submitted, NNSP
Fisher JW (1997) Nonlinear extensions to the minimum average correlation energy filter. Ph.D. Dissertation, University of Florida
Fukunaga K (1972) An Introduction to Statistical Pattern Recognition. Academic Press, New York, NY
Google Scholar
Haykin S (1984) Introduction to Adaptive Filters. MacMillan, NY
Google Scholar
Hild II KE, Erdogmus D and Principe JC blind source separation using renyi's mutual information. IEEE Signal Processing Letters 8: 174-176
Hild II KE, Erdogmus D and Principe JC (2001) On-line minimum mutual information method for time-varying blind source separation: 126-131, ICA
Hyvarinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10: 626-634.
Google Scholar
Hyvarinen A, Karhunen J and Oja E (2001) Independent Component Analysis. Wiley, New York
Google Scholar
Lee TW, Bell AJ and Orglmeister R (1997) Blind source separation of real world signals. International Conference of Neural Networks 4: 2129-2134
Google Scholar
Oja E (1983) Subspace Methods of Pattern Recognition. Wiley, New York
Google Scholar
Parzen E (1967) On estimation of a probability density function and mode. In: Time Series Analysis Papers. Holden-Day, Inc., CA
Google Scholar
Principe JC, Xu D and Fisher JW (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised Adaptive Filtering, pp. 265-319. John Wiley & Son, New York
Google Scholar
Renyi A (1970) Probability Theory. American Elsevier Publishing Company Inc., New York
Google Scholar
Scharf LL (1990) Statistical Signal Processing: Detection, Estimation, and Time Series Analysis. Addison Wesley, New York
Google Scholar
Shannon CE (1948) A mathematical theory of communications. Bell Sys. T. J. 27: 379-423, 623-656
Google Scholar
Wessberg J, Stambaugh CR, Kralik JD, Beck PD, Laubach M, Chapin JK, Kim J, Biggs SJ, Srinivasan MA, Nicolelis MAL (2000) Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408: 361-365
Google Scholar
Widrow B and Stearns SD (1985) Adaptive Signal Processing. Prentice Hall, NJ
Google Scholar
Wiener N (1949) Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications. MIT Press, Cambridge, MA
Google Scholar
Xu D (1999) Energy, entropy, and information potential for neural computation. Ph.D. Dissertation, University of Florida
Yang H and Amari S (1997) Adaptive online learning algorithms for blind separation: maximum entropy and minimum mutual information. Neural Computation 9: 1457-1482
Google Scholar

Download references

Author information

Authors and Affiliations

Computational NeuroEngineering Laboratory, Electrical & Computer Engineering Department, University of Florida, Gainesville, FL, 32611, USA
Deniz Erdogmus, Jose C. Principe & Kenneth E. Hild II

Authors

Deniz Erdogmus
View author publications
You can also search for this author in PubMed Google Scholar
Jose C. Principe
View author publications
You can also search for this author in PubMed Google Scholar
Kenneth E. Hild II
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erdogmus, D., Principe, J.C. & Hild, K.E. Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation. Natural Computing 1, 85–108 (2002). https://doi.org/10.1023/A:1015064029375

Download citation

Issue Date: March 2002
DOI: https://doi.org/10.1023/A:1015064029375

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation

Abstract

Access this article

Similar content being viewed by others

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

Principled Design of Continuous Stochastic Search: From Theory to Practice

Statistics with set-valued functions: applications to inverse approximate optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation

Abstract

Access this article

Similar content being viewed by others

Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions

Principled Design of Continuous Stochastic Search: From Theory to Practice

Statistics with set-valued functions: applications to inverse approximate optimization

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation