Abstract
Shannon’s Information Theory (IT) (1948) definitely established the purely mathematical nature of entropy and relative entropy, in contrast to the previous identification by Boltzmann (1872) of his “H-functional” as the physical entropy of earlier thermodynamicians (Carnot, Clausius, Kelvin). The following recounting is attributed to Shannon (Tribus and McIrvine 1971):
My greatest concern was what to call it. I thought of calling it “information”, but the word was overly used, so I decided to call it “uncertainty”. When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, “You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, nobody knows what entropy really is, so in a debate you will always have the advantage.”
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Amari, S.-I.: Differential-Geometrical Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, Heidelberg (1985)
Bavaud, F.: The Quasisymmetric Side of Gravity Modelling. Environment and Planning A 34, 61–79 (2002a)
Bavaud, F.: Quotient Dissimilarities, Euclidean Embeddability, and Huygens’s Weak Principle. In: Jajuga, K., Solkolowski, A., Bock, H.-H. (eds.) Classification, Clustering and Data Analysis, pp. 195–202. Springer, Heidelberg (2002b)
Bavaud, F., Xanthos, A.: Thermodynamique et Statistique Textuelle: concepts et illustrations. In: Proceedings of JADT 2002 (6èmes Journées internationales d’Analyse statistique des Données Textuelles), St-Malo (2002)
Billingsley, P.: Statistical Inference for Markov Processes. University of Chicago Press, Chicago (1961)
Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete multivariate Analysis. The MIT Press, Cambridge (1975)
Boltzmann, L.: Weitere Studien über das Wärmegleichgewicht unter Gasmolekülen. Sitzungsberichte der Akademie der Wissenschaften 66, 275–370 (1872)
Cardoso, J.-F.: Dependence, Correlation and Gaussianity in Independent Component Analysis. Journal of Machine Learning Research 4, 1177–1203 (2003)
Caussinus, H.: Contribution à l’analyse statistique des tableaux de corrélation. Annales de la Faculté des Sciences de Toulouse 29, 77–183 (1966)
Christensen, R.: Log-Linear Models. Springer, Heidelberg (1990)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, Chichester (1991)
Cramer, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)
Csiszár, I.: I-Divergence Geometry of Probability Distribution and Minimization Problems. The Annals of Probability 3, 146–158 (1975)
Csiszár, I., Körner, J.: Towards a general theory of source networks. IEEE Trans. Inform. Theory 26, 155–165 (1980)
Csiszár, I., Tusnády, G.: Information Geometry and Aternating Minimization Procedures. Statistics and Decisions (suppl. 1), 205–237 (1984)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. J. Roy. Stat. Soc. B 39, 1–22 (1977)
Ferguson, T.S.: Prior Distributions on Spaces of Probability Measures. The Annals of Statistics 2, 615–629 (1974)
Jaynes, E.T.: Information theory and statistical mechanics. Physical Review 108, 171–190 (1957)
Jaynes, E.T.: Where do we stand on maximum entropy? In: Maximum Entropy Formalism Conference. MIT, Cambridge (1978)
Kullback, S.: Information Theory and Statistics. Wiley, Chichester (1959)
Lee, T.-W., Girolami, M., Bell, A.J., Sejnowski, T.J.: A unifying Information-Theoretic Framework for Independent Component Analysis. Computers and Mathematics with Applications 39, 1–21 (2000)
Li, M., Vitanyi, P.: An Introduction to Kolmogorov complexity and its applications. Springer, Heidelberg (1997)
MacKay, D.J.C.: Information Theory, Inference and Learning Algorithms. Cambridge University Press, Cambridge (2003)
Popper, K.: Conjectures and Refutations, Routledge (1963)
Robert, C.P.: The Bayesian Choice, 2nd edn. Springer, Heidelberg (2001)
Sanov, I.N.: On the probability of large deviations of random variables. Mat. Sbornik 42, 11–44 (1957); (English translation in Sel. Trans. Math. Statist. Probab., pp.213–244 (1961) (in Russian)
Saporta, G.: Probabilités, Analyse de Données et Statistique, Editions Technip, Paris (1990)
Simon, G.: Additivity of Information in Exponential Family Power Laws. Journal of the American Statistical Association 68, 478–482 (1973)
Shannon, C.E.: A mathematical theory of communication. Bell System Tech. J. 27, 379–423, 623-656 (1948)
Tribus, M., McIrvine, E.C.: Energy and Information. Scientific American 224, 178–184 (1971)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Bavaud, F. (2009). Information Theory, Relative Entropy and Statistics. In: Sommaruga, G. (eds) Formal Theories of Information. Lecture Notes in Computer Science, vol 5363. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00659-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-00659-3_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00658-6
Online ISBN: 978-3-642-00659-3
eBook Packages: Computer ScienceComputer Science (R0)