Abstract
In the minimum description length (MDL) and Bayesian criteria, we construct description length of data z n = z 1 ⋯ z n of length n such that the length divided by n almost converges to its entropy rate as n → ∞, assuming z i is in a finite set A. In model selection, if we knew the true probability P of z n ∈ A n, we would choose a model F such that the posterior probability of F given z n is maximized. But, in many situations, we use Q:A n → [0,1] such that \(\sum_{z^n\in A^n}Q(z^n)\leq 1\) rather than P because only data z n are available. In this paper, we consider an extension such that each of the attributes in data can be either discrete or continuous. The main issue is what Q is qualified to be an alternative to P in the generalized situations. We propose the condition in terms of the Radon-Nikodym derivative of P with respect to Q, and give the procedure of constructing Q in the general setting. As a result, we obtain the MDL/Bayesian criteria in a general sense.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barron, A.R.: The Strong Ergodic Theorem for Densities: Generalized Shannon-McMillan-Breiman Theorem. Annals. Probability 13(4), 1292–1303 (1985)
Buntine, W.L.: Learning Classification Trees. Statistics and Computing 2, 63–73 (1991)
Billingsley, P.: Probability & Measure, 3rd edn. Wiley, New York (1995)
Cooper, G.F., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9, 309–347 (1992)
Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (1995)
Dowe, D.L.: Foreword re C. S. Wallace. Computer Journal 51(5), 523–560 (2008), Christopher Stewart WALLACE (1933-2004) memorial special issue
Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Bandyopadhyay, P.S., Forster, M.R. (eds.) Handbook of the Philosophy of Science (HPS). Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier (June 2011)
Krichevsky, R.E., Trofimov, V.K.: The Performance of Universal Encoding. IEEE Trans. Inform. Theory 27(2), 199–207 (1981)
Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statistics 22(1), 79–86 (1951)
Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)
Ryabko, B.: Compression-Based Methods for Nonparametric Prediction and Estimation of Some Characteristics of Time Series. IEEE Trans. on Inform. Theory 55(9), 4309–4315 (2009)
Solomonoff, R.: A Formal Theory of Inductive Inference. Information and Control 22, 224–254 (1964)
Suzuki, J.: On Strong Consistency of Model Selection in Classification. IEEE Trans. on Inform. Theory 52(11), 4767–4774 (2006)
Suzuki, J.: A Construction of Bayesian Networks from Databases on an MDL Principle. In: The Ninth Conference on Uncertainty in Artificial Intelligence, vol. 7, pp. 266–273 (1993)
Suzuki, J.: The Universal Measure for General Sources and its Application to MDL/Bayesian Criteria. In: Data Compression Conference 2011, Snowbird, Utah (2011)
Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11(2), 185–194 (1968)
Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer (2005) ISBN: 0-387-23795-X
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Suzuki, J. (2013). MDL/Bayesian Criteria Based on Universal Coding/Measure. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-44958-1_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-44957-4
Online ISBN: 978-3-642-44958-1
eBook Packages: Computer ScienceComputer Science (R0)