Abstract
Feature dimension reduction is an important step in text categorization, but traditional feature dimension reduction method ignores semantic information of features. In order to solve this problem, this paper, with the semantic dictionary, proposes a new feature dimensionality reduction processing method. The word-semantic knowledge base is constructed on the basis of HowNet and The Semantic Knowledge-base of Contemporary Chinese. By using the knowledge base and the feature extraction method, text feature is mapped to semantic feature and the dimensional reduction of feature space is realized. Naïve Bayes method is introduced to verify the categorization performance. The experimental results indicate that the proposed approach has a good performance of high dimension reduction and categorization.
This paper is funded by the Natural Science Foundation of China (NSFC, Grant No.61070119), the Project of Construction of Innovative Teams and Teacher Career Development for Universities and Colleges Under Beijing Municipality (Grant No. IDHT20130519) and the Beijing Municipal Education Commission Special Fund (Grant No. PXM2012-014224-000020).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zong, C.: Statistical Natural Language Processing, pp. 340–353. Tsinghua University Press, Beijing (2008)
Dai, L., Huang, H.: A Comparative Study on Feature Selection in Chinese Text Categorization. Journal of Chinese Information Processing 18(1), 26–32 (2004)
Lewis, D.D.: Feature Selection and Feature Extraction for Text Categorization. In: Proceedings of the Workshop on Speech and Natural Language, pp. 23–26 (1992)
Liu, H., Wang, Y.: Mixed Method of Reducing Feature in Text Classification. Computer Engineering 35(2), 194–196 (2009)
Chen, J.: Research of Feature Selection Method for Chinese Text Classification. Northwest Normal University, Gansu (2012)
Zhang, B.: Analysis and Research on Feature Selection Algorithm for Text Classification. University of Science and Technology of China, Anhui (2010)
Wu, J., Kang, Y.: A Study on Feature Dimension Reduction in Text Categorization. Natural Science Journal of HaiNan University 25(1), 62–66 (2001)
Gao, M., Wang, Z.: Comparing Dimension Reduction Methods of Text Feature Matrix. Computer Engineering and Applications 30, 157–159 (2006)
Dong, Z., Dong, Q.: Theoretical Findings of HowNet. Journal of Chinese Information Processing 4(21), 3–9 (2007)
Yang, Y.: A Comparative Study on Feature Selection in Text Categorization. In: Proceeding of the Fourteenth International Conference on Machine Learning, pp. 412–423 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, Z., Zhang, Y., Zheng, R., Jiang, L. (2013). Chinese Text Feature Dimension Reduction Based on Semantics. In: Liu, P., Su, Q. (eds) Chinese Lexical Semantics. CLSW 2013. Lecture Notes in Computer Science(), vol 8229. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45185-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-45185-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45184-3
Online ISBN: 978-3-642-45185-0
eBook Packages: Computer ScienceComputer Science (R0)