Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2911))

Included in the following conference series:

  • 850 Accesses

Abstract

This paper explores the techniques of utilizing N-gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N-grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Masand, B.: Classifying News Stories Using Memory-based Reasoning. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)

    Google Scholar 

  2. Lang, K.: Newsweeder: Learning to Filter Netnews. In: International Conference on Machine Learning, ICML (1995)

    Google Scholar 

  3. Joachims, T.: Webwatcher: a Tour Guide for the World Wide Web. In: International Joint Conference on Artificial Intelligence, IJCAI (1997)

    Google Scholar 

  4. Huang, X., Wu, L.: SVM based Document Classification System. Pattern Recognition and Artificial Intelligence 11(2), 147–153 (1998) (in Chinese)

    Google Scholar 

  5. Zou, T.: The Design and Implementation of an Automatic Chinese Documents Classification System. Journal of Chinese Information Processing 13(3), 26–32 (1999) (in Chinese)

    Google Scholar 

  6. Li, G.: A Log-Likelihood-Ratio-Test-Based Feature Word Selection Approach in Text Categorization. Journal of Chinese Information Processing 13(4), 16–21 (1999) (in Chinese)

    Google Scholar 

  7. Zhan, X.: Hierarchical Method for Chinese Document Classification. Journal of Chinese Information Processing 13(6), 20–25 (1999) (in Chinese)

    Google Scholar 

  8. Diao, Q.: Term Weighting and Classification Algorithms. Journal of Chinese Information Processing 14(3), 25–29 (2000) (in Chinese)

    MathSciNet  Google Scholar 

  9. Liu, Y., Tan, Q., Shen, X.: Modern Chinese Segmentation Specification and Automatic Segmentation Methods for Information Processing. Tsinghua University Press, Beijing (in Chinese)

    Google Scholar 

  10. Zhao, B., Xu, L.: Processing Chinese Information with Computer, Space Publisher House, 2 (1988) (in Chinese)

    Google Scholar 

  11. Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Machine Learning: Proc. of the 13th International Conference, Morgan Kaufman, San Francisco (1996)

    Google Scholar 

  12. Siedleck, W., klansky, J.: A Note on Genetic Algorithms for Large-scale Feature Selection. IEEE Transactions on Computers 10, 335–347 (1989)

    Google Scholar 

  13. Punch, W.: Further Research on Feature Selection and Classification Using Genetic Algorithms. In: Proceedings of the International Conference on Genetic Algorithms, pp. 557–564. Springer, Heidelberg

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guan, J., Zhou, S. (2003). Hierarchical Classification of Chinese Documents Based on N-grams. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, S.R., Myaeng, SH. (eds) Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. ICADL 2003. Lecture Notes in Computer Science, vol 2911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24594-0_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24594-0_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20608-8

  • Online ISBN: 978-3-540-24594-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics