Skip to main content

An Information Theoretic Optimal Classifier for Semi-supervised Learning

  • Conference paper
  • 1303 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3177))

Abstract

Model uncertainty refers to the risk associated with basing prediction on only one model. In semi-supervised learning, this uncertainty is greater than in supervised learning (for the same total number of instances) given that many data points are unlabelled. An optimal Bayes classifier (OBC) reduces model uncertainty by averaging predictions across the entire model space weighted by the models’ posterior probabilities. For a given model space and prior distribution OBC produces the lowest risk. We propose an information theoretic method to construct an OBC for probabilistic semi-supervised learning using Markov chain Monte Carlo sampling. This contrasts with typical semi-supervised learning that attempts to find the single most probable model using EM. Empirical results verify that OBC yields more accurate predictions than the best single model.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agusta, Y., Dowe, D.L.: Unsupervised Learning of Correlated Multivariate Gaussian Mixture Models Using MML. In: Australian Conference on Artificial Intelligence (2003)

    Google Scholar 

  2. Baxter, R.A., Oliver, J.J.: Finding Overlapping Components with MML. Statistics and Computing 10, 5–16 (2000)

    Article  Google Scholar 

  3. Conway, J.H., Sloane, N.J.A.: Sphere Packings, Lattices and Groups. Springer, London (1988)

    MATH  Google Scholar 

  4. Gilks, W., Richardson, S., Spiegelhalter, D.: Markov Chain Monte Carlo in Practice. Interdisciplinary Statistics. Chapman and Hall, Boca Raton (1996)

    Google Scholar 

  5. Hansen, M.H., Yu, B.: Model selection and the principle of minimum description length. J. American Statistical Association 96, 746–774 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  6. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  7. Oliver, J.J., Baxter, R.A.: MML and Bayesianism: similarities and differences, Dept. of Computer Science, Monash University, Clayton, Victoria 3168, Australia, Technical Report TR 206 (1994)

    Google Scholar 

  8. Oliver, J.J., Baxter, R.A., Wallace, C.S.: Unsupervised Learning Using MML, Machine Learning. In: Proceedings of the Thirteenth International Conference (1996)

    Google Scholar 

  9. Quinlan, R., Rivest, R.L.: Inferring Decision Trees Using the Minimum Description Length Principle. Information and Computation 80(3), 227–248 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  10. Rissanen, J.: Stochastic complexity. J. Royal Statistical Society, Series B 49(3), 223–239 (1987)

    MATH  MathSciNet  Google Scholar 

  11. Solomonoff, R.J.: A Formal Theory of Induction Inference. Information and Control, Part I 7(1), 1–22 (1964)

    Article  MATH  MathSciNet  Google Scholar 

  12. Stephens, M.: Dealing with label-switching in mixture models. Journal of the Royal Statistical Society, Series B 62, 795–809 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  13. Wallace, C.S., Boulton, D.M.: An Information Measure for Classification. Computer Journal 11, 185–195 (1968)

    MATH  Google Scholar 

  14. Wallace, C.S., Dowe, D.L.: Minimum Message Length and Kolmogorov Complexity. The computer Journal 42(4), 270–283 (1999)

    Article  MATH  Google Scholar 

  15. Wallace, C.S., Freeman, P.R.: Estimation and inference by compact encoding (with discussion). Journal of the Royal Statistical Society series B 49, 240–265 (1987)

    MATH  MathSciNet  Google Scholar 

  16. Wallace, C.S., Patrick, J.D.: Coding Decision Trees. Machine Learning 11, 7–22 (1993)

    Article  MATH  Google Scholar 

  17. Yin, K., Davidson, I.: Bayesian Model Averaging Across Model Spaces via Compact Encoding. In: Eighth International Symposium on Artificial Intelligence and Mathematics (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yin, K., Davidson, I. (2004). An Information Theoretic Optimal Classifier for Semi-supervised Learning. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_110

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28651-6_110

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22881-3

  • Online ISBN: 978-3-540-28651-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics