Abstract
Automatic text classification is a research focus and core technology in natural language processing and information retrieval. The class-center vector method is an important text classification method, which has the advantages of less calculation and high efficiency. However, the traditional class-center vector method for text classification has the disadvantages that the class vector is large and sparse; its classification accuracy is not high and it lacks semantic information. To overcome these problems, this paper proposes an improved class-center method for text classification using dependencies and the WordNet dictionary. Experiments show that, compared with traditional text classification algorithms, the improved class-center vector method has lower time complexity and higher accuracy on a large corpus.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Li, S.S., Xia, R., Zong, C.Q., Huang, C.R.: A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 692–700 (2009)
Deng, X.L., Li, Y.Q., Weng, J., Zhang, J.L.: Feature selection for text classification: a review. Multimedia Tools Appl. 78(3), 3793–3816 (2018)
Abraham, R., Simha, J.B., Iyengar, S.S.: Medical datamining with a new algorithm for feature selection and Naive Bayesian classifier. In: International Conference on Information Technology, pp. 44–49 (2007)
Yigit, H.: A weighting approach for KNN classifier. In: International Conference on Electronics, Computer and Computation, vol. 8, pp. 228–131 (2014)
Awange, J.L., Paláncz, B., Lewis, R.H., Völgyesi, L.: Support Vector Machines (SVM). Tékhne, Revista de EST udos Politécnicos (2018)
Cohen, W.W.: Context-sensitive learning methods for text categorization. In: Conference on Research and Development in Information Retrieval, pp. 307–315 (1996)
Chen, J.N., Huang, H.K., Tian, S.F., Qu, Y.L.: Feature selection for text classification with Naive Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
https://blog.csdn.net/amds123/article/details/53696027,last. Accessed 17 May 2019
Mao, G.: Research and implementation of text Classification Model Based on Class Center Vector. Dalian University of Technology, Dalian (2010)
Salton, G., Yu, C.T.: On the construction of effective vocabularies for information retrieval. In: Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval, pp. 8–60 (1973)
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Docu. 28(1), 11–21 (1972)
How, B.C., Narayanan, K.: An empirical study of feature selection for text categorization based on term weightage. In: International Conference on Web Intelligence, WI 2004, pp. 599–602. IEEE/WIC/ACM (2004)
Qu, S.N. Wang, S.J., Zou, Y.: Improvement of Text Feature Selection Method Based on TFIDF. IEEE Computer Society (2008)
Wang, D.X, Gao, X.Y., Andreae, P.: Automatic keyword extraction from single sentence natural language queries. In: PRICAI 2013, pp. 637–648 (2012)
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)
Tsuge, S., Shishibori, M., Kuroiwa, S., et al.: Dimensionality reduction using non-negative matrix factorization for information retrieval. In: 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat. No. 01CH37236), vol. 2, pp. 960–965 (2001)
Tesiniere, L.: Elements de Syntaxe Structurale. Libairie C, Klincksieck. (1959)
Zhu, X., Yang, Y., Huang, Y., Guo, Q., Zhang, B.: Measuring similarity and relatedness using multiple semantic relations in WordNet. Knowledge and Information Systems (2019). https://doi.org/10.1007/s10115-019-01387-6. Accessed 01 August 2019
Feng, G.Z., Li, S.T., Sun, T.L., Zhang, B.Z.: A probabilistic model derived term weighting scheme for text classification. Pattern Recogn. Lett. 110(1), 23–29 (2018)
Liu, Y., Huang, R.C.: Research on optimization of maximum discriminant feature selection algorithm in text classification. J. Sichuan Univ. 56(1), 65–70 (2019). Natural Science Edition
Yun, J., Jing, L., Yu, J., et al.: A multi-layer text classification framework based on two-level representation model. Expert Syst. Appl. 39(2), 2035–2046 (2012)
Cao, S.J.: Fuzzy support vector machine of dismissing margin based on the method of class-center. Comput. Eng. Appl. 42(22), 146–149 (2006)
Acknowledgements
This work has been supported by the Natural Science Foundation of Guangxi of China under the contract number 2018GXNSFAA138087, the National Natural Science Foundation of China under the contract numbers 61462010 and 61363036, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, X., Xu, Q., Chen, Y., Wu, T. (2019). An Improved Class-Center Method for Text Classification Using Dependencies and WordNet. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-32236-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)