Renyi’s Entropy, Divergence and Their Nonparametric Estimators

Xu, Dongxin; Erdogmuns, Deniz

doi:10.1007/978-1-4419-1570-2_2

Dongxin Xu² &
Deniz Erdogmuns²

Part of the book series: Information Science and Statistics ((ISS))

4640 Accesses

Abstract

It is evident from Chapter 1 that Shannon’s entropy occupies a central role in information-theoretic studies. Yet, the concept of information is so rich that perhaps there is no single definition that will be able to quantify information properly. Moreover, from an engineering perspective, one must estimate entropy from data which is a nontrivial matter. In this book we concentrate on Alfred Renyi’s seminal work on information theory to derive a set of estimators to apply entropy and divergence as cost functions in adaptation and learning. Therefore, we are mainly interested in computationally simple, nonparametric estimators that are continuous and differentiable in terms of the samples to yield well-behaved gradient algorithms that can optimize adaptive system parameters. There are many factors that affect the determination of the optimum of the performance surface, such as gradient noise, learning rates, and misadjustment, therefore in these types of applications the entropy estimator’s bias and variance are not as critical as, for instance, in coding or rate distortion theories. Moreover in adaptation one is only interested in the extremum (maximum or minimum) of the cost, with creates independence from its actual values, because only relative assessments are necessary. Following our nonparametric goals, what matters most in learning is to develop cost functions or divergence measures that can be derived directly from data without further assumptions to capture as much structure as possible within the data’s probability density function (PDF).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SOME BASIC OPTIMIZATION THEOREMS

The B-exponential divergence and its generalizations with applications to parametric estimation

Article 17 November 2018

Generalized Bayes Minimax Estimators of the Variance of a Multivariate Normal Distribution

Article 17 April 2023

References

Aczél J., Daróczy Z., On measures of information and their characterizations,Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.
Google Scholar
Basu A., Lindsay B. Minimum disparity estimation in the continuous case: Efficiency, distributions, robustness,Ann. Inst.Statist. Math.., 46:683–705, 1994.
Article MATH MathSciNet Google Scholar
Bengtsson I., Zyczkowski K.,Geometry of quantum states, Cambridge, UK, 2006.
Book MATH Google Scholar
Bhattacharyya A., On a measure of divergence between two statistical populations defined by their probability distributions,Bul. Calcutta Math. Soc., 35:99–109, 1943.
MATH Google Scholar
Bourbaki N.,Topological Vector Spaces, Springer, 1987
Google Scholar
Campbell L., A coding theorem and Renyi’s entropy,Inf. Control, 8:423–429, 1965
Article MATH Google Scholar
Chernoff H., A measure of asymptotic efficiency of tests for a hypothesis based on a sum of observations.Ann. Math. Stat., 23:493–507, 1952.
Article MATH MathSciNet Google Scholar
Cover T., Thomas J.,Elements of Information Theory, Wiley, New York, 1991
Book MATH Google Scholar
Erdogmus D., Information theoretic learning: Renyi’s entropy and its applications to adaptive systems training, Ph.D. Dissertation, University of Florida, Gainesville, 2002.
Google Scholar
Erdogmus D., Hild K., Principe J., Beyond second order statistics for learning: a pairwise interaction model for entropy estimation,J. Natural Comput., 1(1):85–108, 2003.
Article MathSciNet Google Scholar
Fine S., Scheinberg K., Cristianini N., Shawe-Taylor J., Williamson B., Efficient SVM training using low-rank kernel representations,J. Mach. Learn. Res., 2:243–264, 2001.
Google Scholar
Golub G., Van Loan C.,Matrix Computation, 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland, 1996.
Google Scholar
Gonzalez T., Clustering to minimize the maximum intercluster distance.Theor. Comput. Sci., 38:293–306, 1985.
Article MATH Google Scholar
Grassberger, P., I. Procaccia, Characterization of strange attractors,Phys. Rev. Lett., 50(5):346–349, 1983.
Article MathSciNet Google Scholar
Greengard L., Rokhlin V., A fast algorithm for particle simulations.J. Comput. Phys., 73(2):325–348, 1987.
Article MATH MathSciNet Google Scholar
Greengard L., Strain J., The fast Gauss transform.SIAM J. Sci. Statist. Comput., 12(1):79–94, 1991.
Article MATH MathSciNet Google Scholar
Hart, P., Moment distributions in economics: an exposition,J. Royal. Statis Soc. Ser. A, 138:423–434, 1975.
Article Google Scholar
Havrda J., Charvat, F., Quantification methods of classification processes: concept of structural a entropy,Kybernetica 3:30, 1967.
MATH MathSciNet Google Scholar
Horn D., Gottlieb A., Algorithm for data clustering in pattern recognition problems based on quantum mechanics,Phys. Rev. Lett., 88(1):018702, 2002.
Article Google Scholar
Jizba P., Toshihico T., The world according to Renyi: Thermodynamics of multifractal systems,Ann. Phys., 312:17–59, 2004.
Article MATH Google Scholar
Kapur J.,Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.
MATH Google Scholar
Kawai A, Fukushige T., $105/Gflops astrophysical N-body simulation with reconfigurable add-in card and hierarchical tree algorithm, inProc. SC2006, IEEE Computer Society Press, Tampa FL, 2006.
Google Scholar
Kolmogorov A., Sur la notion de la moyenne,Atti della R. Accademia Nazionale dei Lincei, 12:388–391, 1930.
Google Scholar
Kullback S.,Information theory and statistics, Dover, Mineola, NY, 1959.
MATH Google Scholar
Lutwak E., Yang D., Zhang G., Cramér–Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information,IEEE Trans. Info. Theor.., 51(2):473–479, 2005.
Article MATH MathSciNet Google Scholar
Nagumo M., Uber eine klasse von mittelwerte,Japanese J. Math.., 7:71, 1930.
MATH Google Scholar
Pardo L.,Statistical Inference based on Divergence measures, Chapman & Hall, Boca raton, FL, 2006.
MATH Google Scholar
Parzen E., On the estimation of a probability density function and the mode,Ann. Math. Statist.., 33:1065–1067, 1962.
Article MATH MathSciNet Google Scholar
Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.
Google Scholar
Rao S., Unsupervised Learning: An Information Theoretic Learning Approach, Ph.D. thesis, University of Florida, Gainesville, 2008.
Google Scholar
Renyi A., On measures of entropy and information,Proc. of the 4th Berkeley Symp. Math. Statist. Prob. 1960, vol. I, Berkeley University Press, pp. 457, 1961.
Google Scholar
Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970.
Google Scholar
Renyi A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976.
Google Scholar
Renyi A., Some fundamental questions about information theory, in Renyi, A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976.
Google Scholar
Rudin W.Principles of Mathematical Analysis. McGraw-Hill, New York, 1976.
MATH Google Scholar
Seth S., and Principe J., On speeding up computation in information theoretic learning, inProc. IJCNN 2009, Atlanta, GA, 2009.
Google Scholar
Silverman B.,Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986.
Book MATH Google Scholar
Song, K., Renyi information, log likelihood and an intrinsic distribution measure,J. of Stat. Plan. and Inference, 93: 51–69, 2001.
Article MATH Google Scholar
Torkkola K., Feature extraction by non-parametric mutual information maximization,J. Mach. Learn. Res.., 3:1415–1438, 2003.
MATH MathSciNet Google Scholar
Tsallis C., Possible generalization of Boltzmann Gibbs statistics,J. Stat. Phys.., 52:479, 1988.
Article MATH MathSciNet Google Scholar
von Neumann, J.,Mathematical Foundations of Quantum Mechanics, Princeton University Press, Princeton, NJ, 1955.
MATH Google Scholar
Xu D., Energy, Entropy and Information Potential for Neural Computation, PhD Dissertation, University of Florida, Gainesville, 1999
Google Scholar
Yang C., Duraiswami R., Gumerov N., Davis L., Improved fast Gauss transform and efficient kernel density estimation. InProc. ICCV 2003, pages 464–471, 2003.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. Electrical Engineering & Biomedical Engineering, University of Florida, Gainesville, FL, 32611, NEB 451, Bldg. 33, USA
Dongxin Xu & Deniz Erdogmuns

Authors

Dongxin Xu
View author publications
You can also search for this author in PubMed Google Scholar
Deniz Erdogmuns
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Xu, D., Erdogmuns, D. (2010). Renyi’s Entropy, Divergence and Their Nonparametric Estimators. In: Information Theoretic Learning. Information Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1570-2_2

Download citation

DOI: https://doi.org/10.1007/978-1-4419-1570-2_2
Published: 19 March 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-1569-6
Online ISBN: 978-1-4419-1570-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics