Hierarchical Classification of Chinese Documents Based on N-grams

Guan, Jihong; Zhou, Shuigeng

doi:10.1007/978-3-540-24594-0_66

Jihong Guan⁹ &
Shuigeng Zhou¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2911))

Included in the following conference series:

International Conference on Asian Digital Libraries

850 Accesses

Abstract

This paper explores the techniques of utilizing N-gram information to categorize Chinese text documents hierarchically so that the classifier can shake off the burden of large dictionaries and complex segmentation processing, and subsequently be domain and time independent. A hierarchical Chinese text classifier is implemented. Experimental results show that hierarchically classifying Chinese text documents based N-grams can achieve satisfactory performance and outperforms the other traditional Chinese text classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Masand, B.: Classifying News Stories Using Memory-based Reasoning. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)
Google Scholar
Lang, K.: Newsweeder: Learning to Filter Netnews. In: International Conference on Machine Learning, ICML (1995)
Google Scholar
Joachims, T.: Webwatcher: a Tour Guide for the World Wide Web. In: International Joint Conference on Artificial Intelligence, IJCAI (1997)
Google Scholar
Huang, X., Wu, L.: SVM based Document Classification System. Pattern Recognition and Artificial Intelligence 11(2), 147–153 (1998) (in Chinese)
Google Scholar
Zou, T.: The Design and Implementation of an Automatic Chinese Documents Classification System. Journal of Chinese Information Processing 13(3), 26–32 (1999) (in Chinese)
Google Scholar
Li, G.: A Log-Likelihood-Ratio-Test-Based Feature Word Selection Approach in Text Categorization. Journal of Chinese Information Processing 13(4), 16–21 (1999) (in Chinese)
Google Scholar
Zhan, X.: Hierarchical Method for Chinese Document Classification. Journal of Chinese Information Processing 13(6), 20–25 (1999) (in Chinese)
Google Scholar
Diao, Q.: Term Weighting and Classification Algorithms. Journal of Chinese Information Processing 14(3), 25–29 (2000) (in Chinese)
MathSciNet Google Scholar
Liu, Y., Tan, Q., Shen, X.: Modern Chinese Segmentation Specification and Automatic Segmentation Methods for Information Processing. Tsinghua University Press, Beijing (in Chinese)
Google Scholar
Zhao, B., Xu, L.: Processing Chinese Information with Computer, Space Publisher House, 2 (1988) (in Chinese)
Google Scholar
Koller, D., Sahami, M.: Toward Optimal Feature Selection. In: Machine Learning: Proc. of the 13th International Conference, Morgan Kaufman, San Francisco (1996)
Google Scholar
Siedleck, W., klansky, J.: A Note on Genetic Algorithms for Large-scale Feature Selection. IEEE Transactions on Computers 10, 335–347 (1989)
Google Scholar
Punch, W.: Further Research on Feature Selection and Classification Using Genetic Algorithms. In: Proceedings of the International Conference on Genetic Algorithms, pp. 557–564. Springer, Heidelberg
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Wuhan University, Wuhan, 420079, China
Jihong Guan
Department of Computer Science and Engineering, Fudan University, 200433, China
Shuigeng Zhou

Authors

Jihong Guan
View author publications
You can also search for this author in PubMed Google Scholar
Shuigeng Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Science, Faculty of Information Science & Technology, Universiti Kebangsaan Malaysia, Bangi, 43600, UKM Selangor, Malaysia
Tengku Mohd Tengku Sembok
Universiti Kebangsaan Malaysia, Bangi, Malaysia
Halimah Badioze Zaman
Department of Management Information Systems, Eller College of Management, The University of Arizona, AZ 85721, USA
Hsinchun Chen
International School of Information Management, University of Mysore, Mysore, India
Shalini R. Urs
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung-Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, J., Zhou, S. (2003). Hierarchical Classification of Chinese Documents Based on N-grams. In: Sembok, T.M.T., Zaman, H.B., Chen, H., Urs, S.R., Myaeng, SH. (eds) Digital Libraries: Technology and Management of Indigenous Knowledge for Global Access. ICADL 2003. Lecture Notes in Computer Science, vol 2911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24594-0_66

Download citation

DOI: https://doi.org/10.1007/978-3-540-24594-0_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20608-8
Online ISBN: 978-3-540-24594-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics