An Improved Class-Center Method for Text Classification Using Dependencies and WordNet

Zhu, Xinhua; Xu, Qingting; Chen, Yishan; Wu, Tianjun

doi:10.1007/978-3-030-32236-6_1

An Improved Class-Center Method for Text Classification Using Dependencies and WordNet

Xinhua Zhu¹³,
Qingting Xu¹³,
Yishan Chen^13,14 &
…
Tianjun Wu¹³

Conference paper
First Online: 30 September 2019

4675 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11839))

Abstract

Automatic text classification is a research focus and core technology in natural language processing and information retrieval. The class-center vector method is an important text classification method, which has the advantages of less calculation and high efficiency. However, the traditional class-center vector method for text classification has the disadvantages that the class vector is large and sparse; its classification accuracy is not high and it lacks semantic information. To overcome these problems, this paper proposes an improved class-center method for text classification using dependencies and the WordNet dictionary. Experiments show that, compared with traditional text classification algorithms, the improved class-center vector method has lower time complexity and higher accuracy on a large corpus.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

References

Li, S.S., Xia, R., Zong, C.Q., Huang, C.R.: A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 692–700 (2009)
Google Scholar
Deng, X.L., Li, Y.Q., Weng, J., Zhang, J.L.: Feature selection for text classification: a review. Multimedia Tools Appl. 78(3), 3793–3816 (2018)
Google Scholar
Abraham, R., Simha, J.B., Iyengar, S.S.: Medical datamining with a new algorithm for feature selection and Naive Bayesian classifier. In: International Conference on Information Technology, pp. 44–49 (2007)
Google Scholar
Yigit, H.: A weighting approach for KNN classifier. In: International Conference on Electronics, Computer and Computation, vol. 8, pp. 228–131 (2014)
Google Scholar
Awange, J.L., Paláncz, B., Lewis, R.H., Völgyesi, L.: Support Vector Machines (SVM). Tékhne, Revista de EST udos Politécnicos (2018)
Chapter Google Scholar
Cohen, W.W.: Context-sensitive learning methods for text categorization. In: Conference on Research and Development in Information Retrieval, pp. 307–315 (1996)
Google Scholar
Chen, J.N., Huang, H.K., Tian, S.F., Qu, Y.L.: Feature selection for text classification with Naive Bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Article Google Scholar
https://blog.csdn.net/amds123/article/details/53696027,last. Accessed 17 May 2019
Mao, G.: Research and implementation of text Classification Model Based on Class Center Vector. Dalian University of Technology, Dalian (2010)
Google Scholar
Salton, G., Yu, C.T.: On the construction of effective vocabularies for information retrieval. In: Proceedings of the 1973 Meeting on Programming Languages and Information Retrieval, pp. 8–60 (1973)
Google Scholar
Jones, K.S.: A statistical interpretation of term specificity and its application in retrieval. J. Docu. 28(1), 11–21 (1972)
Article Google Scholar
How, B.C., Narayanan, K.: An empirical study of feature selection for text categorization based on term weightage. In: International Conference on Web Intelligence, WI 2004, pp. 599–602. IEEE/WIC/ACM (2004)
Google Scholar
Qu, S.N. Wang, S.J., Zou, Y.: Improvement of Text Feature Selection Method Based on TFIDF. IEEE Computer Society (2008)
Google Scholar
Wang, D.X, Gao, X.Y., Andreae, P.: Automatic keyword extraction from single sentence natural language queries. In: PRICAI 2013, pp. 637–648 (2012)
Google Scholar
Abdi, H., Williams, L.J.: Principal component analysis. Wiley Interdisc. Rev. Comput. Stat. 2(4), 433–459 (2010)
Article Google Scholar
Tsuge, S., Shishibori, M., Kuroiwa, S., et al.: Dimensionality reduction using non-negative matrix factorization for information retrieval. In: 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat. No. 01CH37236), vol. 2, pp. 960–965 (2001)
Google Scholar
Tesiniere, L.: Elements de Syntaxe Structurale. Libairie C, Klincksieck. (1959)
Google Scholar
Zhu, X., Yang, Y., Huang, Y., Guo, Q., Zhang, B.: Measuring similarity and relatedness using multiple semantic relations in WordNet. Knowledge and Information Systems (2019). https://doi.org/10.1007/s10115-019-01387-6. Accessed 01 August 2019
Feng, G.Z., Li, S.T., Sun, T.L., Zhang, B.Z.: A probabilistic model derived term weighting scheme for text classification. Pattern Recogn. Lett. 110(1), 23–29 (2018)
Article Google Scholar
Liu, Y., Huang, R.C.: Research on optimization of maximum discriminant feature selection algorithm in text classification. J. Sichuan Univ. 56(1), 65–70 (2019). Natural Science Edition
Google Scholar
Yun, J., Jing, L., Yu, J., et al.: A multi-layer text classification framework based on two-level representation model. Expert Syst. Appl. 39(2), 2035–2046 (2012)
Article Google Scholar
Cao, S.J.: Fuzzy support vector machine of dismissing margin based on the method of class-center. Comput. Eng. Appl. 42(22), 146–149 (2006)
Google Scholar

Download references

Acknowledgements

This work has been supported by the Natural Science Foundation of Guangxi of China under the contract number 2018GXNSFAA138087, the National Natural Science Foundation of China under the contract numbers 61462010 and 61363036, and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Guangxi Key Lab of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, China
Xinhua Zhu, Qingting Xu, Yishan Chen & Tianjun Wu
International Business School, Guilin Tourism University, Guilin, China
Yishan Chen

Authors

Xinhua Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Qingting Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yishan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Tianjun Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yishan Chen .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Tang
National University of Singapore, Singapore, Singapore
Min-Yen Kan
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Sujian Li
Zhengzhou University, Zhengzhou, China
Hongying Zan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhu, X., Xu, Q., Chen, Y., Wu, T. (2019). An Improved Class-Center Method for Text Classification Using Dependencies and WordNet. In: Tang, J., Kan, MY., Zhao, D., Li, S., Zan, H. (eds) Natural Language Processing and Chinese Computing. NLPCC 2019. Lecture Notes in Computer Science(), vol 11839. Springer, Cham. https://doi.org/10.1007/978-3-030-32236-6_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-32236-6_1
Published: 30 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32235-9
Online ISBN: 978-3-030-32236-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)