Skip to main content

Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3214))

Abstract

In this paper, we propose an extended variant of the parametric mixture model (PMM), which we recently proposed for multi-class and multi-labeled text categorization. In the extended model (EPMM), latent categories are incorporated in the PMM so that it can adaptively control the model’s flexibility according to the data while maintaining the validity of parametric mixture assumption of the original PMM. In the multi-label setting, we experimentally compare a Naive Bayes classifier (NB), Support Vector Machines (SVM), PMM and EPMM for their robustness against classification noise as well as classification performance. The results show that EPMM provides higher classification performance than PMM while keeping the advantage of greater robustness against noise than that by NB and SVM.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. In: NIPS, vol. 14, pp. 601–608 (2003)

    Google Scholar 

  2. Dempster, A.P., Larid, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Stat. Soc. Ser. B 39, 1–38 (1977)

    MATH  Google Scholar 

  3. Hofmann, T.: Probabilistic latent semantic indexing. In: SIGIR 1999, pp. 50–57 (1999)

    Google Scholar 

  4. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  5. Joachims, T.: Making large-scale SVM learning practical. In: Advances in Kernel Methods - Support Vector Learning, pp. 41–56 (1999)

    Google Scholar 

  6. Morik, K., Brockhausen, P., Joachims, T.: Combining statistical learning with a knowledge-based approach - a case study in intensive care monitoring. In: ICML 1999, pp. 268–277 (1999)

    Google Scholar 

  7. Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Text classification from labeled and unlabeled documents using EM. Machine Learning 39, 103–134 (2000)

    Article  MATH  Google Scholar 

  8. Ueda, N., Saito, K.: Single-shot detection of multi-category text using parametric mixture models. In: SIGKDD 2002, pp. 626–631

    Google Scholar 

  9. Vapnik, V.N.: Statistical learning theory. Inc. John Wiley & Sons, Chichester (1998)

    MATH  Google Scholar 

  10. Yang, Y., Liu, X.: A re-examination of text categorization methods. In: SIGIR 1999, pp. 42–49 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kaneda, Y., Ueda, N., Saito, K. (2004). Extended Parametric Mixture Model for Robust Multi-labeled Text Categorization. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2004. Lecture Notes in Computer Science(), vol 3214. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30133-2_81

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30133-2_81

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23206-3

  • Online ISBN: 978-3-540-30133-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics