Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7070))

Abstract

In the minimum description length (MDL) and Bayesian criteria, we construct description length of data z n = z 1 ⋯ z n of length n such that the length divided by n almost converges to its entropy rate as n → ∞, assuming z i is in a finite set A. In model selection, if we knew the true probability P of z n ∈ A n, we would choose a model F such that the posterior probability of F given z n is maximized. But, in many situations, we use Q:A n → [0,1] such that \(\sum_{z^n\in A^n}Q(z^n)\leq 1\) rather than P because only data z n are available. In this paper, we consider an extension such that each of the attributes in data can be either discrete or continuous. The main issue is what Q is qualified to be an alternative to P in the generalized situations. We propose the condition in terms of the Radon-Nikodym derivative of P with respect to Q, and give the procedure of constructing Q in the general setting. As a result, we obtain the MDL/Bayesian criteria in a general sense.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barron, A.R.: The Strong Ergodic Theorem for Densities: Generalized Shannon-McMillan-Breiman Theorem. Annals. Probability 13(4), 1292–1303 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  2. Buntine, W.L.: Learning Classification Trees. Statistics and Computing 2, 63–73 (1991)

    Article  Google Scholar 

  3. Billingsley, P.: Probability & Measure, 3rd edn. Wiley, New York (1995)

    MATH  Google Scholar 

  4. Cooper, G.F., Herskovits, E.: A Bayesian Method for the Induction of Probabilistic Networks from Data. Machine Learning 9, 309–347 (1992)

    MATH  Google Scholar 

  5. Cover, T.M., Thomas, J.A.: Elements of Information Theory, 2nd edn. Wiley, New York (1995)

    Google Scholar 

  6. Dowe, D.L.: Foreword re C. S. Wallace. Computer Journal 51(5), 523–560 (2008), Christopher Stewart WALLACE (1933-2004) memorial special issue

    Article  Google Scholar 

  7. Dowe, D.L.: MML, hybrid Bayesian network graphical models, statistical consistency, invariance and uniqueness. In: Bandyopadhyay, P.S., Forster, M.R. (eds.) Handbook of the Philosophy of Science (HPS). Philosophy of Statistics, vol. 7, pp. 901–982. Elsevier (June 2011)

    Google Scholar 

  8. Krichevsky, R.E., Trofimov, V.K.: The Performance of Universal Encoding. IEEE Trans. Inform. Theory 27(2), 199–207 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Statistics 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  10. Rissanen, J.: Modeling by shortest data description. Automatica 14, 465–471 (1978)

    Article  MATH  Google Scholar 

  11. Ryabko, B.: Compression-Based Methods for Nonparametric Prediction and Estimation of Some Characteristics of Time Series. IEEE Trans. on Inform. Theory 55(9), 4309–4315 (2009)

    Article  MathSciNet  Google Scholar 

  12. Solomonoff, R.: A Formal Theory of Inductive Inference. Information and Control 22, 224–254 (1964)

    Article  MathSciNet  Google Scholar 

  13. Suzuki, J.: On Strong Consistency of Model Selection in Classification. IEEE Trans. on Inform. Theory 52(11), 4767–4774 (2006)

    Article  Google Scholar 

  14. Suzuki, J.: A Construction of Bayesian Networks from Databases on an MDL Principle. In: The Ninth Conference on Uncertainty in Artificial Intelligence, vol. 7, pp. 266–273 (1993)

    Google Scholar 

  15. Suzuki, J.: The Universal Measure for General Sources and its Application to MDL/Bayesian Criteria. In: Data Compression Conference 2011, Snowbird, Utah (2011)

    Google Scholar 

  16. Wallace, C.S., Boulton, D.M.: An information measure for classification. Computer Journal 11(2), 185–194 (1968)

    Article  MATH  Google Scholar 

  17. Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer (2005) ISBN: 0-387-23795-X

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Suzuki, J. (2013). MDL/Bayesian Criteria Based on Universal Coding/Measure. In: Dowe, D.L. (eds) Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence. Lecture Notes in Computer Science, vol 7070. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-44958-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-44958-1_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-44957-4

  • Online ISBN: 978-3-642-44958-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics