Skip to main content

A Comparative Study on the Use of Labeled and Unlabeled Data for Large Margin Classifiers

  • Conference paper
Natural Language Processing – IJCNLP 2004 (IJCNLP 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

Abstract

We propose to use both labeled and unlabeled data with the Expectation-Maximization (EM) algorithm in order to estimate the generative model and use this model to construct a Fisher kernel. The Naive Bayes generative probability is used to model a document. Through the experiments of text categorization, we empirically show that, (a) the Fisher kernel with labeled and unlabeled data outperforms Naive Bayes classifiers with EM and other methods for a sufficient amount of labeled data, (b) the value of additional unlabeled data diminishes when the labeled data size is large enough for estimating a reliable model, (c) the use of categories as latent variables is effective, and (d) larger unlabeled training datasets yield better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)

    MATH  MathSciNet  Google Scholar 

  2. Herbrich, R., Graepel, T.: A PAC-bayesian margin bound for linear classifiers: Why SVMs work. Advances in Neural Information Processing Systems 12, 224–230 (2000)

    Google Scholar 

  3. Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical Report AIM-1625, Artifical Intelligence Laboratory, Massachusetts Institute of Technology (1998)

    Google Scholar 

  4. Hofmann, T.: Learning the similarity of documents: An information geometric approach to document retrieval and categorization. Advances in Neural Information Processing Systems 12, 914–920 (2000)

    Google Scholar 

  5. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1998)

    Google Scholar 

  6. Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)

    Google Scholar 

  7. Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of 16th International Conference on Machine Learning (ICML 1999), pp. 200–209 (1999)

    Google Scholar 

  8. Kass, R.E., Vos, P.W.: Geometrical foundations of asymptotic inference. Wiley, New York (1997)

    MATH  Google Scholar 

  9. Kressel, U.: Pairwise classication and support vector machines. In: Schölkopf, C.J.C., Burgesa, A.J. (eds.) Advances in Kernel Methods *Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)

    Google Scholar 

  10. Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), pp. 192–199 (2001)

    Google Scholar 

  11. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)

    Google Scholar 

  12. Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)

    Article  MATH  Google Scholar 

  13. Smola, A.J., Bartlett, P.J., Schölkopf, B., Schuurmans, D.: Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  14. Tsuda, K., Kawanabe, M.: The leave-one-out kernel. In: Proceedings of International Conference on Artificial Neural Networks, pp. 727–732 (2002)

    Google Scholar 

  15. Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Computation 14(10), 2397–2414 (2002)

    Article  MATH  Google Scholar 

  16. Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Networks 11(2), 271–282 (1998)

    Article  Google Scholar 

  17. Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Takamura, H., Okumura, M. (2005). A Comparative Study on the Use of Labeled and Unlabeled Data for Large Margin Classifiers. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_48

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30211-7_48

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-24475-2

  • Online ISBN: 978-3-540-30211-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics