Abstract
We give a characterization of Maximum Entropy/Minimum Relative Entropy inference by providingt wo ‘strongen tropy concentration’ theorems. These theorems unify and generalize Jaynes’ ‘concentration phenomenon’ and Van Campenhout and Cover’s ‘conditional limit theorem’. The theorems characterize exactly in what sense a ‘prior’ distribution Q conditioned on a given constraint and the distribution Ṗ minimizing D(P∥Q) over all P satisfyingthe constraint are ‘close’ to each other. We show how our theorems are related to ‘universal models’ for exponential families, thereby establishinga link with Rissanen’s MDL/stochastic complexity. We then apply our theorems to establish the relationship (A) between entropy concentration and a game-theoretic characterization of Maximum Entropy Inference due to Topsøe and others; (B) between maximum entropy distributions and sequences that are random (in the sense of Martin-Löf/Kolmogorov) with respect to the given constraint. These two applications have strong implications for the use of Maximum Entropy distributions in sequential prediction tasks, both for the logarithmic loss and for general loss functions. We identify circumstances under which Maximum Entropy predictions are almost optimal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
K. Azoury and M. Warmuth. Relative loss bounds for on-line density estimation with the exponential family of distributions. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’ 99), pages 31–40. Morgan Kaufmann, 1999.
A. Barron, J. Rissanen, and B. Yu. The minimum description length principle in coding and modeling. IEEE Transactions on Information Theory, 44(6):2743–2760, 1998.
P. Billingsley. Convergence of Probability Measures. Wiley, 1968.
T.M. Cover and J.A. Thomas. Elements of Information Theory. Wiley Interscience, New York, 1991.
I. Csiszár. I-divergence geometry of probability distributions and minimization problems. The Annals of Probability, 3(1):146–158, 1975.
I. Csiszár. Sanov property, generalized i-projection and a conditional limit theorem. The Annals of Probability, 12(3):768–793, 1984.
I. Csiszár. Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. The Annals of Statistics, 19(4):2032–2066, 1991.
M. Feder. Maximum entropy as a special case of the minimum description length criterion. IEEE Transactions on Information Theory, 32(6):847–849, 1986.
W. Feller. An Introduction to Probability Theory and Its Applications, volume 2. Wiley, 1968. Third edition.
P.D. Grünwald. Maximum entropy and the glasses you are looking through. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI 2000). Morgan Kaufmann Publishers, 2000.
P.D. Grünwald. The Minimum Description Length Principle and Reasoning under Uncertainty. PhD thesis, University of Amsterdam, The Netherlands, October 1998. Available as ILLC Dissertation Series 1998-03; see http://www.//cwi.nl/~pdg.
P.D. Grünwald. Strong entropy concentration, coding, game theory and randomness. Technical Report 010, EURANDOM, 2001.
E.T. Jaynes. Where do we stand on maximum entropy? In R.D. Levine and M. Tribus, editors, The Maximum Entropy Formalism, pages 15–118. MIT Press, Cambridge, MA, 1978.
E.T. Jaynes. On the rationale of maximum-entropy methods. Proceedings of the IEEE, 70(939-951), 1982.
E.T. Jaynes. Papers on Probability, Statistics and Statistical Physics. Kluwer Academic Publishers, second edition, 1989.
E.T. Jaynes. Probability theory: the logic of science. Available at ftp://bayes.wustl.edu/Jaynes.book/, 1996.
J.N. Kapur and H.K Kesavan. Entropy Optimization Principles with Applications. Academic Press, Inc., 1992.
R.E. Kass and P.W. Voss. Geometrical Foundations of Asymptotic Inference. Wiley Interscience, 1997.
J. Lafferty. Additive models, boosting and inference for generalized divergences. In Proceedings of the Twelfth Annual Workshop on Computational Learning Theory (COLT’ 99), 1999.
M. Li and P.M.B. Vitányi. An Introduction to Kolmogorov Complexity and Its Applications. Springer-Verlag, New York, revised and expanded second edition, 1997.
N. Merhav and M. Feder. A strongv ersion of the redundancy-capacity theorem of universal coding. IEEE Transactions on Information Theory, 41(3):714–722, 1995.
J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific Publishing Company, 1989.
J. Rissanen. Strongoptimalit y of the normalized ML models as universal codes, 2001. To appear in IEEE Transactions on Information Theory.
F. Topsøe. Information theoretical optimization techniques. Kybernetika, 15(1), 1979.
J. van Campenhout and T. Cover. Maximum entropy and conditional probability. IEEE Transactions on Information Theory, IT-27(4):483–489, 1981.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grünwald, P. (2001). Strong Entropy Concentration, Game Theory, and Algorithmic Randomness. In: Helmbold, D., Williamson, B. (eds) Computational Learning Theory. COLT 2001. Lecture Notes in Computer Science(), vol 2111. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44581-1_21
Download citation
DOI: https://doi.org/10.1007/3-540-44581-1_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42343-0
Online ISBN: 978-3-540-44581-4
eBook Packages: Springer Book Archive