Skip to main content

Using Maximum Entropy Model for Chinese Text Categorization

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3007))

Abstract

Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1), 76–88 (1999)

    Article  Google Scholar 

  2. Adwait, R.: Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania (1998)

    Google Scholar 

  3. Nigam, K., Lafferty, J., McCallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering (1999)

    Google Scholar 

  4. Adwait, R.: A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania (1997)

    Google Scholar 

  5. Martin, S., Ney, H., Zaplo, J.: Smoothing Methods in Maximum Entropy Language Modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AR, pp. 545–548 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, R., Tao, X., Tang, L., Hu, Y. (2004). Using Maximum Entropy Model for Chinese Text Categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds) Advanced Web Technologies and Applications. APWeb 2004. Lecture Notes in Computer Science, vol 3007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24655-8_63

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24655-8_63

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21371-0

  • Online ISBN: 978-3-540-24655-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics