Abstract
Second order statistics have formed the basisof learning and adaptation due to its appealand analytical simplicity. On the other hand,in many realistic engineering problemsrequiring adaptive solutions, it is notsufficient to consider only the second orderstatistics of the underlying distributions. Entropy, being the average information contentof a distribution, is a better-suited criterionfor adaptation purposes, since it allows thedesigner to manipulate the information contentof the signals rather than merely their power. This paper introduces a nonparametric estimatorof Renyi's entropy, which can be utilized inany adaptation scenario where entropy plays arole. This nonparametric estimator leads to aninteresting analogy between learning andinteracting particles in a potential field. Itturns out that learning by second orderstatistics is a special case of thisinteraction model for learning. We investigatethe mathematical properties of thisnonparametric entropy estimator, provide batchand stochastic gradient expressions foroff-line and on-line adaptation, and illustratethe performance of the corresponding algorithmsin examples of supervised and unsupervisedtraining, including time-series prediction andICA.
Similar content being viewed by others
References
Amari S (1985) Differential-Geometrical Methods in Statistics. Springer-Verlag, Berlin
Amari S (1998) Natural gradient works efficiently in learning. Neural Computation 10: 251-276
Bell A and Sejnowski T (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7: 1129-1159
Bishop C (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford
Comon P (1994) Independent component analysis, a new concept? Signal Proc. 36: 287-314
Cover T and Thomas J (1991) Elements of Information Theory. John Wiley, New York
Deco G and Obradovic D (1996) An Information-Theoretic Approach to Neural Computing. Springer, NY
Erdogmus D and Principe JC (2002) Generalized information potential criterion for adaptive system training. To appear in IEEE Transactions on Neural Networks
Erdogmus D and Principe JC (2001) An on-line adaptation algorithm for adaptive system training with minimum error entropy: Stochastic information gradient: 7-12, ICA
Erdogmus D, Hild II KE and Principe JC (2002) Blind Source Separation Using Renyi's ?-Marginal Entropies. To appear in Neurocomputation
Erdogmus D, Hild II KE and Principe JC (2002) Do Hebbian synapses estimate entropy? Submitted, NNSP
Fisher JW (1997) Nonlinear extensions to the minimum average correlation energy filter. Ph.D. Dissertation, University of Florida
Fukunaga K (1972) An Introduction to Statistical Pattern Recognition. Academic Press, New York, NY
Haykin S (1984) Introduction to Adaptive Filters. MacMillan, NY
Hild II KE, Erdogmus D and Principe JC blind source separation using renyi's mutual information. IEEE Signal Processing Letters 8: 174-176
Hild II KE, Erdogmus D and Principe JC (2001) On-line minimum mutual information method for time-varying blind source separation: 126-131, ICA
Hyvarinen A (1999) Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks 10: 626-634.
Hyvarinen A, Karhunen J and Oja E (2001) Independent Component Analysis. Wiley, New York
Lee TW, Bell AJ and Orglmeister R (1997) Blind source separation of real world signals. International Conference of Neural Networks 4: 2129-2134
Oja E (1983) Subspace Methods of Pattern Recognition. Wiley, New York
Parzen E (1967) On estimation of a probability density function and mode. In: Time Series Analysis Papers. Holden-Day, Inc., CA
Principe JC, Xu D and Fisher JW (2000) Information theoretic learning. In: Haykin S (ed) Unsupervised Adaptive Filtering, pp. 265-319. John Wiley & Son, New York
Renyi A (1970) Probability Theory. American Elsevier Publishing Company Inc., New York
Scharf LL (1990) Statistical Signal Processing: Detection, Estimation, and Time Series Analysis. Addison Wesley, New York
Shannon CE (1948) A mathematical theory of communications. Bell Sys. T. J. 27: 379-423, 623-656
Wessberg J, Stambaugh CR, Kralik JD, Beck PD, Laubach M, Chapin JK, Kim J, Biggs SJ, Srinivasan MA, Nicolelis MAL (2000) Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408: 361-365
Widrow B and Stearns SD (1985) Adaptive Signal Processing. Prentice Hall, NJ
Wiener N (1949) Extrapolation, Interpolation, and Smoothing of Stationary Time Series with Engineering Applications. MIT Press, Cambridge, MA
Xu D (1999) Energy, entropy, and information potential for neural computation. Ph.D. Dissertation, University of Florida
Yang H and Amari S (1997) Adaptive online learning algorithms for blind separation: maximum entropy and minimum mutual information. Neural Computation 9: 1457-1482
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Erdogmus, D., Principe, J.C. & Hild, K.E. Beyond second-order statistics for learning: A pairwise interaction model for entropy estimation. Natural Computing 1, 85–108 (2002). https://doi.org/10.1023/A:1015064029375
Issue Date:
DOI: https://doi.org/10.1023/A:1015064029375