Abstract
This paper explores the techniques of utilizing N-gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N-grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Masand, B.: Classifying News Stories Using Memory-based Reasoning. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)
Lang, K.: Newsweeder: Learning to Filter Netnews. In: International Conference on Machine Learning, ICML (1995)
Joachims, T.: Webwatcher: a Tour Guide for the World Wide Web. In: International Joint Conference on Artificial Intelligence, IJCAI (1997)
Huang, X., Wu, L.: SVM based Document Classification System. Pattern Recognition and Artificial Intelligence 11(2), 147–153 (1998) (in Chinese)
Zou, T.: The Design and Implementation of an Automatic Chinese Documents Classification System. Journal of Chinese Information Processing 13(3), 26–32 (1999) (in Chinese)
Li, G.: A Log-Likelihood-Ratio-Test-Based Feature Word Selection Approach in Text Categorization. Journal of Chinese Information Processing 13(4), 16–21 (1999) (in Chinese)
Zhan, X.: Hierarchical Method for Chinese Document Classification. Journal of Chinese Information Processing 13(6), 20–25 (1999) (in Chinese)
Diao, Q.: Term Weighting and Classification Algorithms. Journal of Chinese Information Processing 14(3), 25–29 (2000) (in Chinese)
Liu, Y., Tan, Q., Shen, X.: Modern Chinese Segmentation Specification and Automatic Segmentation Methods for Information Processing. Tsinghua University Press, Beijing (in Chinese)
Zhao, B., Xu, L.: Processing Chinese Information with Computer, Space Publisher House, 2 (1988) (in Chinese)
Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Machine Learning: Proc. of the 13th International Conference, Morgan Kaufman, San Francisco (1996)
Siedleck, W., klansky, J.: A Note on Genetic Algorithms for Large-scale Feature Selection. IEEE Transactions on Computers 10, 335–347 (1989)
Punch, W.: Further Research on Feature Selection and Classification Using Genetic Algorithms. In: Proceedings of the International Conference on Genetic Algorithms, pp. 557–564. Springer, Heidelberg
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guan, J., Zhou, S. (2003). Hierarchical Classification of Chinese Documents Based on N-grams. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, S.R., Myaeng, SH. (eds) Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. ICADL 2003. Lecture Notes in Computer Science, vol 2911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24594-0_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-24594-0_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20608-8
Online ISBN: 978-3-540-24594-0
eBook Packages: Springer Book Archive