Skip to main content

A High Performance Prototype System for Chinese Text Categorization

  • Conference paper
MICAI 2006: Advances in Artificial Intelligence (MICAI 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4293))

Included in the following conference series:

Abstract

How to improve the accuracy of categorization is a big challenge in text categorization. This paper proposes a high performance prototype system for Chinese text categorization, which mainly includes feature extraction subsystem, feature selection subsystem, and reliability evaluation subsystem for classification results. The proposed prototype system employs a two-step classifying strategy. First, the features that are effective for all testing texts are used to classify texts. Then, the reliability evaluation subsystem evaluates the classification results directly according to the outputs of the classifier, and divides them into two parts: texts classified reliable or not. Only for the texts classified unreliable at the first step, go to the second step. Second, a classifier uses the features that are more subtle and powerful for those texts classified unreliable to classify the texts. The proposed prototype system is successfully implemented in a case that exploits a Naive Bayesian classifier as the classifier in the first and second steps. Experiments show that the proposed prototype system achieves a high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 239.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  2. Lewis, D.: Naive Bayes at Forty: The Independence Assumption in Information Retrieval. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 4–15. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  3. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  4. Mitchell, T.M.: Machine Learning. McCraw Hill, New York (1996)

    MATH  Google Scholar 

  5. Yang, Y., Liu, X.: A Re-examination of Text Categorization Methods. In: Proceedings of SIGIR 1999, pp. 42–49 (1999)

    Google Scholar 

  6. Fan, X.: Causality Reasoning and Text Categorization, Postdoctoral Research Report of Tsinghua University, P.R. China (April 2004)

    Google Scholar 

  7. Fan, X., Sun, M., Choi, K.-s., Zhang, Q.: Classifying Chinese texts in two steps. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 302–313. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Fan, X., Sun, M.: A high performance two-class Chinese text categorization method. Chinese Journal of Computers 29(1), 124–131 (2006)

    MathSciNet  Google Scholar 

  9. Dumais, S.T., Platt, J., Hecherman, D., Sahami, M.: Inductive Learning Algorithms and Representation for Text Categorization. In: Proceedings of CIKM 1998, Bethesda, MD, pp. 148–155 (1998)

    Google Scholar 

  10. Sahami, M., Dumais, S., Hecherman, D., Horvitz, E.A.: Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization: Papers from the AAAI Workshop, pp. 55–62, Madison Wisconsin. AAAI Technical Report WS-98-05 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fan, X. (2006). A High Performance Prototype System for Chinese Text Categorization. In: Gelbukh, A., Reyes-Garcia, C.A. (eds) MICAI 2006: Advances in Artificial Intelligence. MICAI 2006. Lecture Notes in Computer Science(), vol 4293. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11925231_97

Download citation

  • DOI: https://doi.org/10.1007/11925231_97

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-49026-5

  • Online ISBN: 978-3-540-49058-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics