Abstract
In this paper, we propose an extended variant of the parametric mixture model (PMM), which we recently proposed for multi-class and multi-labeled text categorization. In the extended model (EPMM), latent categories are incorporated in the PMM so that it can adaptively control the model’s flexibility according to the data while maintaining the validity of parametric mixture assumption of the original PMM. In the multi-label setting, we experimentally compare a Naive Bayes classifier (NB), Support Vector Machines (SVM), PMM and EPMM for their robustness against classification noise as well as classification performance. The results show that EPMM provides higher classification performance than PMM while keeping the advantage of greater robustness against noise than that by NB and SVM.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. In: NIPS, vol. 14, pp. 601–608 (2003)
Dempster, A.P., Larid, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Stat. Soc. Ser. B 39, 1–38 (1977)
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999, pp. 50–57 (1999)
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Joachims, T.: Making large-scale SVM learning practical. In: Advances in Kernel Methods - Support Vector Learning, pp. 41–56 (1999)
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In: ICML 1999, pp. 268–277 (1999)
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Ueda, N., Saito, K.: Single-shot detection of multi-category text using parametric mixture models. In: SIGKDD 2002, pp. 626–631
Vapnik, V.N.: Statistical learning theory. Inc. John Wiley & Sons, Chichester (1998)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999, pp. 42–49 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kaneda, Y., Ueda, N., Saito, K. (2004). Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3214. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30133-2_81
Download citation
DOI: https://doi.org/10.1007/978-3-540-30133-2_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23206-3
Online ISBN: 978-3-540-30133-2
eBook Packages: Springer Book Archive