Skip to main content

Co-training from an Incremental EM Perspective

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3177))

Abstract

We study classification when the majority of data is unlabeled, and only a small fraction is labeled: the so-called semi-supervised learning situation. Blum and Mitchell’s co-training is a popular semi-supervised algorithm [1] to use when we have multiple independent views of the entities to classify. An example of a multi-view situation is classifying web pages: one view may describe the pages by the words that occur on them, another view describes the pages by the words in the hyperlinks that point to them. In co-training two learners each form a model from the labeled data and then incrementally label small subsets of the unlabeled data for each other. The learners then re-estimate their model from the labeled data and the psuedo-labels provided by the learners. Though some analysis of the algorithm’s performance exists [1] the computation performed is still not well understood. We propose that each view in co-training is effectively performing incremental EM as postulated by Neal and Hinton [3], combined with a Bayesian classifier. This analysis suggests improvements over the core co-training algorithm. We introduce variations, which result in faster convergence to the maximum possible accuracy of classification than the core co-training algorithm, and therefore increase the learning efficiency. We empirically verify our claim for a number of data sets in the context of belief network learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blum, A., Mitchell, T.: Combining Labeled and Unlabeled Data with Co- training. In: COLT (1998)

    Google Scholar 

  2. Davidson, I., Aminian, M.: Using the Central Limit Theorem for Belief Network Learning. In: Proceedings of the 8th Int. Symposium on A.I. and Math (2004)

    Google Scholar 

  3. Neal, M., Hinton, G.: A New View of the EM Algorithm that Justifies Incremental and Other Variants. In: Biometrika (1993)

    Google Scholar 

  4. Muslea, I., Minton, S., Knoblock, C.: Active + Semi-Supervised Learning = Robust Multi-View Learning. In: Proceedings of the 19th International Conference on Machine Learning, pp. 435–442 (2002)

    Google Scholar 

  5. Nigam, K., Ghani, R.: Understanding the Behavior of Co-training. In: Proceedings of the Workshop on Text Mining at the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD (2000)

    Google Scholar 

  6. Seeger, M.: Learning with Labeled and Unlabeled Data. Technical Report (2000)

    Google Scholar 

  7. Bilmes, J.A.: A Gentle Tutorial of the EM Algorithm, TR-97-021 (April 1998)

    Google Scholar 

  8. Byrne, W., Gunawardana, A.: Comments on “Efficient Training Algorithm for HMM’s Using Incremental Estimation”. IEEE Transactions on Speech and Audio Processing 8(6) (November 2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aminian, M. (2004). Co-training from an Incremental EM Perspective. In: Yang, Z.R., Yin, H., Everson, R.M. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2004. IDEAL 2004. Lecture Notes in Computer Science, vol 3177. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28651-6_114

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-28651-6_114

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22881-3

  • Online ISBN: 978-3-540-28651-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics