Skip to main content

Renyi’s Entropy, Divergence and Their Nonparametric Estimators

  • Chapter
  • First Online:
Information Theoretic Learning

Part of the book series: Information Science and Statistics ((ISS))

Abstract

It is evident from Chapter 1 that Shannon’s entropy occupies a central role in information-theoretic studies. Yet, the concept of information is so rich that perhaps there is no single definition that will be able to quantify information properly. Moreover, from an engineering perspective, one must estimate entropy from data which is a nontrivial matter. In this book we concentrate on Alfred Renyi’s seminal work on information theory to derive a set of estimators to apply entropy and divergence as cost functions in adaptation and learning. Therefore, we are mainly interested in computationally simple, nonparametric estimators that are continuous and differentiable in terms of the samples to yield well-behaved gradient algorithms that can optimize adaptive system parameters. There are many factors that affect the determination of the optimum of the performance surface, such as gradient noise, learning rates, and misadjustment, therefore in these types of applications the entropy estimator’s bias and variance are not as critical as, for instance, in coding or rate distortion theories. Moreover in adaptation one is only interested in the extremum (maximum or minimum) of the cost, with creates independence from its actual values, because only relative assessments are necessary. Following our nonparametric goals, what matters most in learning is to develop cost functions or divergence measures that can be derived directly from data without further assumptions to capture as much structure as possible within the data’s probability density function (PDF).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aczél J., Daróczy Z., On measures of information and their characterizations,Mathematics in Science and Engineering, vol. 115, Academic Press, New York, 1975.

    Google Scholar 

  2. Basu A., Lindsay B. Minimum disparity estimation in the continuous case: Efficiency, distributions, robustness,Ann. Inst.Statist. Math.., 46:683–705, 1994.

    Article  MATH  MathSciNet  Google Scholar 

  3. Bengtsson I., Zyczkowski K.,Geometry of quantum states, Cambridge, UK, 2006.

    Book  MATH  Google Scholar 

  4. Bhattacharyya A., On a measure of divergence between two statistical populations defined by their probability distributions,Bul. Calcutta Math. Soc., 35:99–109, 1943.

    MATH  Google Scholar 

  5. Bourbaki N.,Topological Vector Spaces, Springer, 1987

    Google Scholar 

  6. Campbell L., A coding theorem and Renyi’s entropy,Inf. Control, 8:423–429, 1965

    Article  MATH  Google Scholar 

  7. Chernoff H., A measure of asymptotic efficiency of tests for a hypothesis based on a sum of observations.Ann. Math. Stat., 23:493–507, 1952.

    Article  MATH  MathSciNet  Google Scholar 

  8. Cover T., Thomas J.,Elements of Information Theory, Wiley, New York, 1991

    Book  MATH  Google Scholar 

  9. Erdogmus D., Information theoretic learning: Renyi’s entropy and its applications to adaptive systems training, Ph.D. Dissertation, University of Florida, Gainesville, 2002.

    Google Scholar 

  10. Erdogmus D., Hild K., Principe J., Beyond second order statistics for learning: a pairwise interaction model for entropy estimation,J. Natural Comput., 1(1):85–108, 2003.

    Article  MathSciNet  Google Scholar 

  11. Fine S., Scheinberg K., Cristianini N., Shawe-Taylor J., Williamson B., Efficient SVM training using low-rank kernel representations,J. Mach. Learn. Res., 2:243–264, 2001.

    Google Scholar 

  12. Golub G., Van Loan C.,Matrix Computation, 3rd ed. The Johns Hopkins University Press, Baltimore, Maryland, 1996.

    Google Scholar 

  13. Gonzalez T., Clustering to minimize the maximum intercluster distance.Theor. Comput. Sci., 38:293–306, 1985.

    Article  MATH  Google Scholar 

  14. Grassberger, P., I. Procaccia, Characterization of strange attractors,Phys. Rev. Lett., 50(5):346–349, 1983.

    Article  MathSciNet  Google Scholar 

  15. Greengard L., Rokhlin V., A fast algorithm for particle simulations.J. Comput. Phys., 73(2):325–348, 1987.

    Article  MATH  MathSciNet  Google Scholar 

  16. Greengard L., Strain J., The fast Gauss transform.SIAM J. Sci. Statist. Comput., 12(1):79–94, 1991.

    Article  MATH  MathSciNet  Google Scholar 

  17. Hart, P., Moment distributions in economics: an exposition,J. Royal. Statis Soc. Ser. A, 138:423–434, 1975.

    Article  Google Scholar 

  18. Havrda J., Charvat, F., Quantification methods of classification processes: concept of structural a entropy,Kybernetica 3:30, 1967.

    MATH  MathSciNet  Google Scholar 

  19. Horn D., Gottlieb A., Algorithm for data clustering in pattern recognition problems based on quantum mechanics,Phys. Rev. Lett., 88(1):018702, 2002.

    Article  Google Scholar 

  20. Jizba P., Toshihico T., The world according to Renyi: Thermodynamics of multifractal systems,Ann. Phys., 312:17–59, 2004.

    Article  MATH  Google Scholar 

  21. Kapur J.,Measures of Information and their Applications, Wiley Eastern Ltd, New Delhi, 1994.

    MATH  Google Scholar 

  22. Kawai A, Fukushige T., $105/Gflops astrophysical N-body simulation with reconfigurable add-in card and hierarchical tree algorithm, inProc. SC2006, IEEE Computer Society Press, Tampa FL, 2006.

    Google Scholar 

  23. Kolmogorov A., Sur la notion de la moyenne,Atti della R. Accademia Nazionale dei Lincei, 12:388–391, 1930.

    Google Scholar 

  24. Kullback S.,Information theory and statistics, Dover, Mineola, NY, 1959.

    MATH  Google Scholar 

  25. Lutwak E., Yang D., Zhang G., Cramér–Rao and moment-entropy inequalities for Renyi entropy and generalized Fisher information,IEEE Trans. Info. Theor.., 51(2):473–479, 2005.

    Article  MATH  MathSciNet  Google Scholar 

  26. Nagumo M., Uber eine klasse von mittelwerte,Japanese J. Math.., 7:71, 1930.

    MATH  Google Scholar 

  27. Pardo L.,Statistical Inference based on Divergence measures, Chapman & Hall, Boca raton, FL, 2006.

    MATH  Google Scholar 

  28. Parzen E., On the estimation of a probability density function and the mode,Ann. Math. Statist.., 33:1065–1067, 1962.

    Article  MATH  MathSciNet  Google Scholar 

  29. Principe, J., Xu D., Fisher J., Information theoretic learning, in unsupervised adaptive filtering, Simon Haykin (Ed.), pp. 265–319, Wiley, New York, 2000.

    Google Scholar 

  30. Rao S., Unsupervised Learning: An Information Theoretic Learning Approach, Ph.D. thesis, University of Florida, Gainesville, 2008.

    Google Scholar 

  31. Renyi A., On measures of entropy and information,Proc. of the 4th Berkeley Symp. Math. Statist. Prob. 1960, vol. I, Berkeley University Press, pp. 457, 1961.

    Google Scholar 

  32. Renyi A., Probability Theory, North-Holland, University Amsterdam, 1970.

    Google Scholar 

  33. Renyi A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976.

    Google Scholar 

  34. Renyi A., Some fundamental questions about information theory, in Renyi, A. (Ed.),Selected Papers of Alfred Renyi, vol. 2, Akademia Kiado, Budapest, 1976.

    Google Scholar 

  35. Rudin W.Principles of Mathematical Analysis. McGraw-Hill, New York, 1976.

    MATH  Google Scholar 

  36. Seth S., and Principe J., On speeding up computation in information theoretic learning, inProc. IJCNN 2009, Atlanta, GA, 2009.

    Google Scholar 

  37. Silverman B.,Density Estimation for Statistics and Data Analysis, Chapman and Hall, London, 1986.

    Book  MATH  Google Scholar 

  38. Song, K., Renyi information, log likelihood and an intrinsic distribution measure,J. of Stat. Plan. and Inference, 93: 51–69, 2001.

    Article  MATH  Google Scholar 

  39. Torkkola K., Feature extraction by non-parametric mutual information maximization,J. Mach. Learn. Res.., 3:1415–1438, 2003.

    MATH  MathSciNet  Google Scholar 

  40. Tsallis C., Possible generalization of Boltzmann Gibbs statistics,J. Stat. Phys.., 52:479, 1988.

    Article  MATH  MathSciNet  Google Scholar 

  41. von Neumann, J.,Mathematical Foundations of Quantum Mechanics, Princeton University Press, Princeton, NJ, 1955.

    MATH  Google Scholar 

  42. Xu D., Energy, Entropy and Information Potential for Neural Computation, PhD Dissertation, University of Florida, Gainesville, 1999

    Google Scholar 

  43. Yang C., Duraiswami R., Gumerov N., Davis L., Improved fast Gauss transform and efficient kernel density estimation. InProc. ICCV 2003, pages 464–471, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Xu, D., Erdogmuns, D. (2010). Renyi’s Entropy, Divergence and Their Nonparametric Estimators. In: Information Theoretic Learning. Information Science and Statistics. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-1570-2_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-1570-2_2

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-1569-6

  • Online ISBN: 978-1-4419-1570-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics