Skip to main content

Chinese Documents Classification Based on N-Grams

  • Conference paper
  • First Online:
Computational Linguistics and Intelligent Text Processing (CICLing 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2276))

Abstract

Traditional Chinese documents classifiers are based on keywords in the documents, which need dictionaries support and efficient segmentation procedures. This paper explores the techniques of utilizing N-gram information to categorize Chinese documents so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A Chinese documents classification system following above described techniques is implemented with Naive Bayes, kNN and hierarchical classification methods. Experimental results show that our system can achieve satisfactory performance, which is comparable with other traditional classifiers.

This work was supported by China Post-doctoral Science Foundation and the Natural Science Foundation of China (NSFC) under grant number 60173027.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Masand, et al. Classifying news stories using memory-based reasoning. In International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59–65, 1992.

    Google Scholar 

  2. K. Lang. Newsweeder: learning to filter netnews. In International Conference on Machine Learning (ICML), 1995.

    Google Scholar 

  3. T. Joachims, et al. Webwatcher: A tour guide for the World Wide Web. In International Joint Conference on Artificial Intelligence (IJCAI), 1997.

    Google Scholar 

  4. T. Zou, et al. The Design and Implementation of an Automatic Chinese Documents Classification System, Journal of Chinese Information Processing, 13(3): 26–32, 1999. (In Chinese).

    Google Scholar 

  5. Y. Liu, Q. Tan, and X. Shen. Modern Chinese Segmentation Specification and Automatic Segmentation Methods for Information Processing, Tsinghua University Press. (In Chinese).

    Google Scholar 

  6. Z. Wu and G. Tseng. Chinese Text Segmentation for Text Retrieval: Achievements and Problems. Journal of th American Society for Information Science, 44:532–542, October 1993.

    Google Scholar 

  7. B. Zhao and L. Xu. Processing Chinese Information with Computer, Vol.2. Space Publisher House, 1988. (In Chinese).

    Google Scholar 

  8. S. Zhou. Key Techniques of Chinese Text Database. PhD thesis of Fudan University, China. 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Zhou, S., Guan, J. (2002). Chinese Documents Classification Based on N-Grams. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2002. Lecture Notes in Computer Science, vol 2276. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45715-1_43

Download citation

  • DOI: https://doi.org/10.1007/3-540-45715-1_43

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43219-7

  • Online ISBN: 978-3-540-45715-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics