Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization

Kaneda, Yuji; Ueda, Naonori; Saito, Kazumi

doi:10.1007/978-3-540-30133-2_81

Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization

Yuji Kaneda²¹,
Naonori Ueda²¹ &
Kazumi Saito²¹

Conference paper

553 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3214))

Abstract

In this paper, we propose an extended variant of the parametric mixture model (PMM), which we recently proposed for multi-class and multi-labeled text categorization. In the extended model (EPMM), latent categories are incorporated in the PMM so that it can adaptively control the model’s flexibility according to the data while maintaining the validity of parametric mixture assumption of the original PMM. In the multi-label setting, we experimentally compare a Naive Bayes classifier (NB), Support Vector Machines (SVM), PMM and EPMM for their robustness against classification noise as well as classification performance. The results show that EPMM provides higher classification performance than PMM while keeping the advantage of greater robustness against noise than that by NB and SVM.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. In: NIPS, vol. 14, pp. 601–608 (2003)
Google Scholar
Dempster, A.P., Larid, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Stat. Soc. Ser. B 39, 1–38 (1977)
MATH Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999, pp. 50–57 (1999)
Google Scholar
Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Chapter Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Advances in Kernel Methods - Support Vector Learning, pp. 41–56 (1999)
Google Scholar
Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In: ICML 1999, pp. 268–277 (1999)
Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)
Article MATH Google Scholar
Ueda, N., Saito, K.: Single-shot detection of multi-category text using parametric mixture models. In: SIGKDD 2002, pp. 626–631
Google Scholar
Vapnik, V.N.: Statistical learning theory. Inc. John Wiley & Sons, Chichester (1998)
MATH Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999, pp. 42–49 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237, Japan
Yuji Kaneda, Naonori Ueda & Kazumi Saito

Authors

Yuji Kaneda
View author publications
You can also search for this author in PubMed Google Scholar
Naonori Ueda
View author publications
You can also search for this author in PubMed Google Scholar
Kazumi Saito
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

KES International, 2nd Floor, 145-157 St John Street, EC1V 4PY, London, United Kingdom
Mircea Gh. Negoita
Centre for SMART systems Engineering Research Centre, University of Brighton, BN2 4GJ, Moulsecoomb, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, Mawson Lakes, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaneda, Y., Ueda, N., Saito, K. (2004). Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3214. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30133-2_81

Download citation

DOI: https://doi.org/10.1007/978-3-540-30133-2_81
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23206-3
Online ISBN: 978-3-540-30133-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics