A High Performance Prototype System for Chinese Text Categorization

Fan, Xinghua

doi:10.1007/11925231_97

Xinghua Fan²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4293))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

742 Accesses
2 Citations

Abstract

How to improve the accuracy of categorization is a big challenge in text categorization. This paper proposes a high performance prototype system for Chinese text categorization, which mainly includes feature extraction subsystem, feature selection subsystem, and reliability evaluation subsystem for classification results. The proposed prototype system employs a two-step classifying strategy. First, the features that are effective for all testing texts are used to classify texts. Then, the reliability evaluation subsystem evaluates the classification results directly according to the outputs of the classifier, and divides them into two parts: texts classified reliable or not. Only for the texts classified unreliable at the first step, go to the second step. Second, a classifier uses the features that are more subtle and powerful for those texts classified unreliable to classify the texts. The proposed prototype system is successfully implemented in a case that exploits a Naive Bayesian classifier as the classifier in the first and second steps. Experiments show that the proposed prototype system achieves a high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 239.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Lewis, D.: Naive Bayes at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)
Chapter Google Scholar
Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Google Scholar
Mitchell, T.M.: Machine Learning. McCraw Hill, New York (1996)
MATH Google Scholar
Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of SIGIR 1999, pp. 42–49 (1999)
Google Scholar
Fan, X.: Causality Reasoning and Text Categorization, Postdoctoral Research Report of Tsinghua University, P.R. China (April 2004)
Google Scholar
Fan, X., Sun, M., Choi, K.-s., Zhang, Q.: Classifying Chinese texts in two steps. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 302–313. Springer, Heidelberg (2005)
Chapter Google Scholar
Fan, X., Sun, M.: A high performance two-class Chinese text categorization method. Chinese Journal of Computers 29(1), 124–131 (2006)
MathSciNet Google Scholar
Dumais, S.T., Platt, J., Hecherman, D., Sahami, M.: Inductive Learning Algorithms and Representation for Text Categorization. In: Proceedings of CIKM 1998, Bethesda, MD, pp. 148–155 (1998)
Google Scholar
Sahami, M., Dumais, S., Hecherman, D., Horvitz, E.A.: Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization: Papers from the AAAI Workshop, pp. 55–62, Madison Wisconsin. AAAI Technical Report WS-98-05 (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, P.R. China
Xinghua Fan

Authors

Xinghua Fan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, México
Alexander Gelbukh
Instituto Nacional de Astrofísica, Óptica y Electrónica (INAOE), Luis Enrique Erro No. 1, Sta. Ma. Tonanzintla, 72840, Puebla, México
Carlos Alberto Reyes-Garcia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fan, X. (2006). A High Performance Prototype System for Chinese Text Categorization. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_97

Download citation

DOI: https://doi.org/10.1007/11925231_97
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-49026-5
Online ISBN: 978-3-540-49058-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics