Skip to main content
Log in

Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

Second order statistics have formed the basisof learning and adaptation due to its appealand analytical simplicity. On the other hand,in many realistic engineering problemsrequiring adaptive solutions, it is notsufficient to consider only the second orderstatistics of the underlying distributions. Entropy, being the average information contentof a distribution, is a better-suited criterionfor adaptation purposes, since it allows thedesigner to manipulate the information contentof the signals rather than merely their power. This paper introduces a nonparametric estimatorof Renyi's entropy, which can be utilized inany adaptation scenario where entropy plays arole. This nonparametric estimator leads to aninteresting analogy between learning andinteracting particles in a potential field. Itturns out that learning by second orderstatistics is a special case of thisinteraction model for learning. We investigatethe mathematical properties of thisnonparametric entropy estimator, provide batchand stochastic gradient expressions foroff-line and on-line adaptation, and illustratethe performance of the corresponding algorithmsin examples of supervised and unsupervisedtraining, including time-series prediction andICA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amari S (1985) Differential-Geometrical Methods in Statistics. Springer-Verlag, Berlin

    Google Scholar 

  • Amari S (1998) Natural gradient works efficiently in learning. Neural Computation 10: 251-276

    Google Scholar 

  • Bell A and Sejnowski T (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7: 1129-1159

    Google Scholar 

  • Bishop C (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford

    Google Scholar 

  • Comon P (1994) Independent component analysis, a new concept? Signal Proc. 36: 287-314

    Google Scholar 

  • Cover T and Thomas J (1991) Elements of Information Theory. John Wiley, New York

    Google Scholar 

  • Deco G and Obradovic D (1996) An Information-Theoretic Approach to Neural Computing. Springer, NY

    Google Scholar 

  • Erdogmus D and Principe JC (2002) Generalized information potential criterion for adaptive system training. To appear in IEEE Transactions on Neural Networks

  • Erdogmus D and Principe JC (2001) An on-line adaptation algorithm for adaptive system training with minimum error entropy: Stochastic information gradient: 7-12, ICA

  • Erdogmus D, Hild II KE and Principe JC (2002) Blind Source Separation Using Renyi's ?-Marginal Entropies. To appear in Neurocomputation

  • Erdogmus D, Hild II KE and Principe JC (2002) Do Hebbian synapses estimate entropy? Submitted, NNSP

  • Fisher JW (1997) Nonlinear extensions to the minimum average correlation energy filter. Ph.D. Dissertation, University of Florida

  • Fukunaga K (1972) An Introduction to Statistical Pattern Recognition. Academic Press, New York, NY

    Google Scholar 

  • Haykin S (1984) Introduction to Adaptive Filters. MacMillan, NY

    Google Scholar 

  • Hild II KE, Erdogmus D and Principe JC blind source separation using renyi's mutual information. IEEE Signal Processing Letters 8: 174-176

  • Hild II KE, Erdogmus D and Principe JC (2001) On-line minimum mutual information method for time-varying blind source separation: 126-131, ICA

  • Hyvarinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10: 626-634.

    Google Scholar 

  • Hyvarinen A, Karhunen J and Oja E (2001) Independent Component Analysis. Wiley, New York

    Google Scholar 

  • Lee TW, Bell AJ and Orglmeister R (1997) Blind source separation of real world signals. International Conference of Neural Networks 4: 2129-2134

    Google Scholar 

  • Oja E (1983) Subspace Methods of Pattern Recognition. Wiley, New York

    Google Scholar 

  • Parzen E (1967) On estimation of a probability density function and mode. In: Time Series Analysis Papers. Holden-Day, Inc., CA

    Google Scholar 

  • Principe JC, Xu D and Fisher JW (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised Adaptive Filtering, pp. 265-319. John Wiley & Son, New York

    Google Scholar 

  • Renyi A (1970) Probability Theory. American Elsevier Publishing Company Inc., New York

    Google Scholar 

  • Scharf LL (1990) Statistical Signal Processing: Detection, Estimation, and Time Series Analysis. Addison Wesley, New York

    Google Scholar 

  • Shannon CE (1948) A mathematical theory of communications. Bell Sys. T. J. 27: 379-423, 623-656

    Google Scholar 

  • Wessberg J, Stambaugh CR, Kralik JD, Beck PD, Laubach M, Chapin JK, Kim J, Biggs SJ, Srinivasan MA, Nicolelis MAL (2000) Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408: 361-365

    Google Scholar 

  • Widrow B and Stearns SD (1985) Adaptive Signal Processing. Prentice Hall, NJ

    Google Scholar 

  • Wiener N (1949) Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications. MIT Press, Cambridge, MA

    Google Scholar 

  • Xu D (1999) Energy, entropy, and information potential for neural computation. Ph.D. Dissertation, University of Florida

  • Yang H and Amari S (1997) Adaptive online learning algorithms for blind separation: maximum entropy and minimum mutual information. Neural Computation 9: 1457-1482

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Erdogmus, D., Principe, J.C. & Hild, K.E. Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation. Natural Computing 1, 85–108 (2002). https://doi.org/10.1023/A:1015064029375

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1015064029375

Navigation