Abstract
We propose to use both labeled and unlabeled data with the Expectation-Maximization (EM) algorithm in order to estimate the generative model and use this model to construct a Fisher kernel. The Naive Bayes generative probability is used to model a document. Through the experiments of text categorization, we empirically show that, (a) the Fisher kernel with labeled and unlabeled data outperforms Naive Bayes classifiers with EM and other methods for a sufficient amount of labeled data, (b) the value of additional unlabeled data diminishes when the labeled data size is large enough for estimating a reliable model, (c) the use of categories as latent variables is effective, and (d) larger unlabeled training datasets yield better results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B 39(1), 1–38 (1977)
Herbrich, R., Graepel, T.: A PAC-bayesian margin bound for linear classifiers: Why SVMs work. Advances in Neural Information Processing Systems 12, 224–230 (2000)
Hofmann, T., Puzicha, J.: Statistical models for co-occurrence data. Technical Report AIM-1625, Artifical Intelligence Laboratory, Massachusetts Institute of Technology (1998)
Hofmann, T.: Learning the similarity of documents: An information geometric approach to document retrieval and categorization. Advances in Neural Information Processing Systems 12, 914–920 (2000)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. Advances in Neural Information Processing Systems 11, 487–493 (1998)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142 (1998)
Joachims, T.: Transductive inference for text classification using support vector machines. In: Proceedings of 16th International Conference on Machine Learning (ICML 1999), pp. 200–209 (1999)
Kass, R.E., Vos, P.W.: Geometrical foundations of asymptotic inference. Wiley, New York (1997)
Kressel, U.: Pairwise classication and support vector machines. In: Schölkopf, C.J.C., Burgesa, A.J. (eds.) Advances in Kernel Methods *Support Vector Learning, pp. 255–268. The MIT Press, Cambridge (1999)
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of Second Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 2001), pp. 192–199 (2001)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: Proceedings of AAAI 1998 Workshop on Learning for Text Categorization, pp. 41–48 (1998)
Nigam, K., Mccallum, A., Thrun, S., Mitchell, T.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39(2/3), 103–134 (2000)
Smola, A.J., Bartlett, P.J., Schölkopf, B., Schuurmans, D.: Advances in Large Margin Classifiers. MIT Press, Cambridge (2000)
Tsuda, K., Kawanabe, M.: The leave-one-out kernel. In: Proceedings of International Conference on Artificial Neural Networks, pp. 727–732 (2002)
Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Computation 14(10), 2397–2414 (2002)
Ueda, N., Nakano, R.: Deterministic annealing EM algorithm. Neural Networks 11(2), 271–282 (1998)
Vapnik, V.: Statistical Learning Theory. John Wiley, New York (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Takamura, H., Okumura, M. (2005). A Comparative Study on the Use of Labeled and Unlabeled Data for Large Margin Classifiers. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_48
Download citation
DOI: https://doi.org/10.1007/978-3-540-30211-7_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)