Skip to main content

Information Theory, Relative Entropy and Statistics

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5363))

Abstract

Shannon’s Information Theory (IT) (1948) definitely established the purely mathematical nature of entropy and relative entropy, in contrast to the previous identification by Boltzmann (1872) of his “H-functional” as the physical entropy of earlier thermodynamicians (Carnot, Clausius, Kelvin). The following recounting is attributed to Shannon (Tribus and McIrvine 1971):

My greatest concern was what to call it. I thought of calling it “information”, but the word was overly used, so I decided to call it “uncertainty”. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Amari, S.-I.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, Heidelberg (1985)

    MATH  Google Scholar 

  • Bavaud, F.: The Quasisymmetric Side of Gravity Modelling. Environment and Planning A 34, 61–79 (2002a)

    Article  Google Scholar 

  • Bavaud, F.: Quotient Dissimilarities, Euclidean Embeddability, and Huygens’s Weak Principle. In: Jajuga, K., Solkolowski, A., Bock, H.-H. (eds.) Classification, Clustering and Data Analysis, pp. 195–202. Springer, Heidelberg (2002b)

    Chapter  Google Scholar 

  • Bavaud, F., Xanthos, A.: Thermodynamique et Statistique Textuelle: concepts et illustrations. In: Proceedings of JADT 2002 (6èmes Journées internationales d’Analyse statistique des Données Textuelles), St-Malo (2002)

    Google Scholar 

  • Billingsley, P.: Statistical Inference for Markov Processes. University of Chicago Press, Chicago (1961)

    MATH  Google Scholar 

  • Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete multivariate Analysis. The MIT Press, Cambridge (1975)

    MATH  Google Scholar 

  • Boltzmann, L.: Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. Sitzungsberichte der Akademie der Wissenschaften 66, 275–370 (1872)

    Google Scholar 

  • Cardoso, J.-F.: Dependence, Correlation and Gaussianity in Independent Component Analysis. Journal of Machine Learning Research 4, 1177–1203 (2003)

    MathSciNet  MATH  Google Scholar 

  • Caussinus, H.: Contribution à l’analyse statistique des tableaux de corrélation. Annales de la Faculté des Sciences de Toulouse 29, 77–183 (1966)

    Article  MATH  Google Scholar 

  • Christensen, R.: Log-Linear Models. Springer, Heidelberg (1990)

    Book  MATH  Google Scholar 

  • Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)

    Book  MATH  Google Scholar 

  • Cramer, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)

    MATH  Google Scholar 

  • Csiszár, I.: I-Divergence Geometry of Probability Distribution and Minimization Problems. The Annals of Probability 3, 146–158 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  • Csiszár, I., Körner, J.: Towards a general theory of source networks. IEEE Trans. Inform. Theory 26, 155–165 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  • Csiszár, I., Tusnády, G.: Information Geometry and Aternating Minimization Procedures. Statistics and Decisions (suppl. 1), 205–237 (1984)

    MATH  Google Scholar 

  • Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc. B 39, 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  • Ferguson, T.S.: Prior Distributions on Spaces of Probability Measures. The Annals of Statistics 2, 615–629 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  • Jaynes, E.T.: Information theory and statistical mechanics. Physical Review 108, 171–190 (1957)

    Article  MathSciNet  MATH  Google Scholar 

  • Jaynes, E.T.: Where do we stand on maximum entropy? In: Maximum Entropy Formalism Conference. MIT, Cambridge (1978)

    Google Scholar 

  • Kullback, S.: Information Theory and Statistics. Wiley, Chichester (1959)

    MATH  Google Scholar 

  • Lee, T.-W., Girolami, M., Bell, A.J., Sejnowski, T.J.: A unifying Information-Theoretic Framework for Independent Component Analysis. Computers and Mathematics with Applications 39, 1–21 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, M., Vitanyi, P.: An Introduction to Kolmogorov complexity and its applications. Springer, Heidelberg (1997)

    Book  MATH  Google Scholar 

  • MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  • Popper, K.: Conjectures and Refutations, Routledge (1963)

    Google Scholar 

  • Robert, C.P.: The Bayesian Choice, 2nd edn. Springer, Heidelberg (2001)

    Google Scholar 

  • Sanov, I.N.: On the probability of large deviations of random variables. Mat. Sbornik 42, 11–44 (1957); (English translation in Sel. Trans. Math. Statist. Probab., pp.213–244 (1961) (in Russian)

    MathSciNet  Google Scholar 

  • Saporta, G.: Probabilités, Analyse de Données et Statistique, Editions Technip, Paris (1990)

    Google Scholar 

  • Simon, G.: Additivity of Information in Exponential Family Power Laws. Journal of the American Statistical Association 68, 478–482 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  • Shannon, C.E.: A mathematical theory of communication. Bell System Tech. J. 27, 379–423, 623-656 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  • Tribus, M., McIrvine, E.C.: Energy and Information. Scientific American 224, 178–184 (1971)

    Google Scholar 

  • Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bavaud, F. (2009). Information Theory, Relative Entropy and Statistics. In: Sommaruga, G. (eds) Formal Theories of Information. Lecture Notes in Computer Science, vol 5363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00659-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00659-3_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00658-6

  • Online ISBN: 978-3-642-00659-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics