Using Maximum Entropy Model for Chinese Text Categorization

Li, Ronglu; Tao, Xiaopeng; Tang, Lei; Hu, Yunfa

doi:10.1007/978-3-540-24655-8_63

Using Maximum Entropy Model for Chinese Text Categorization

Ronglu Li¹⁶,
Xiaopeng Tao¹⁶,
Lei Tang¹⁶ &
…
Yunfa Hu¹⁶

Conference paper

551 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3007))

Abstract

Maximum Entropy Model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. In this paper, we use maximum entropy model for text categorization. We compare and analyze its categorization performance using different approaches for text feature generation, different number of features and smoothing technique. Moreover, in experiments we compare it to Bayes, KNN and SVM, and show that its performance is higher than Bayes and comparable with KNN and SVM. We think it is a promising technique for text categorization.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1), 76–88 (1999)
Article Google Scholar
Adwait, R.: Maximum Entropy Models for Natural Language Ambiguity Resolution. PhD thesis, University of Pennsylvania (1998)
Google Scholar
Nigam, K., Lafferty, J., McCallum, A.: Using Maximum Entropy for Text Classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering (1999)
Google Scholar
Adwait, R.: A Simple Introduction to Maximum Entropy Models for Natural Language Processing. Technical Report 97-08, Institute for Research in Cognitive Science, University of Pennsylvania (1997)
Google Scholar
Martin, S., Ney, H., Zaplo, J.: Smoothing Methods in Maximum Entropy Language Modeling. In: IEEE International Conference on Acoustics, Speech and Signal Processing, Phoenix, AR, pp. 545–548 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Technology and Information Department, Fudan University, 200433, Shanghai, China
Ronglu Li, Xiaopeng Tao, Lei Tang & Yunfa Hu

Authors

Ronglu Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaopeng Tao
View author publications
You can also search for this author in PubMed Google Scholar
Lei Tang
View author publications
You can also search for this author in PubMed Google Scholar
Yunfa Hu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Chinese University of Hong Kong, Hong Kong, China
Jeffrey Xu Yu
The University of News South Wales, NSW 2052, Australia
Xuemin Lin
Department of Computer Science, Tsinghua University, 100084, Beijing, P.R. China
Hongjun Lu
Victoria University, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, R., Tao, X., Tang, L., Hu, Y. (2004). Using Maximum Entropy Model for Chinese Text Categorization. In: Yu, J.X., Lin, X., Lu, H., Zhang, Y. (eds) Advanced Web Technologies and Applications. APWeb 2004. Lecture Notes in Computer Science, vol 3007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24655-8_63

Download citation

DOI: https://doi.org/10.1007/978-3-540-24655-8_63
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21371-0
Online ISBN: 978-3-540-24655-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics