Abstract
Document (text) classification is a common method in e-business, facilitating users in tasks such as document collection, analysis, categorization and storage. However, few previous methods consider the classification tasks from the perspective of semantic analysis. This paper proposes two novel semantic document classification strategies to resolve two types of semantic problems: (1) polysemy problem, by using a novel semantic similarity computing strategy (SSC) and (2) synonym problem, by proposing a novel strong correlation analysis method (SCM). Experiments show that the proposed strategies improve the performance of document classification compared with that of traditional approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
PaddlePaddle: http://www.paddlepaddle.org/.
- 2.
Partial source code of the experiment can be found at https://github.com/yangshuodelove/DocEng19/.
References
Altınel, B., Ganiz, M.C.: Semantic text classification: a survey of past and recent advances. Inf. Proces. Manag. 54(6), 1129–1153 (2018)
Cerda, P., Varoquaux, G., Kégl, B.: Similarity encoding for learning with dirty categorical variables. Mach. Learn. 107(8–10), 1477–1494 (2018)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dong, Z., Dong, Q., Hao, C.: HowNet and the computation of meaning (2006)
Fang, J., Guo, L., Wang, X., Yang, N.: Ontology-based automatic classification and ranking for web documents. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), vol. 3, pp. 627–631. IEEE (2007)
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)
Guo, J., Da Xu, L., Xiao, G., Gong, Z.: Improving multilingual semantic interoperation in cross-organizational enterprise systems through concept disambiguation. IEEE Trans. Industr. Inf. 8(3), 647–658 (2012)
Howard, J., Ruder, S.: Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146 (2018)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759 (2016)
Khan, A., Baharudin, B., Lee, L.H., Khan, K.: A review of machine learning algorithms for text-documents classification. J. Adv. Inf. Technol. 1(1), 4–20 (2010)
Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
Liu, Y., Scheuermann, P., Li, X., Zhu, X.: Using WordNet to disambiguate word senses for text classification. In: International Conference on Computational Science, pp. 781–789. Springer (2007)
Manning, C.D., Raghavan, P., Schütze, H.: Scoring, term weighting and the vector space model. In: Introduction to Information Retrieval, vol. 100, pp. 2–4 (2008)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mirończuk, M.M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106, 36–54 (2018)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., Zettlemoyer, L.: Deep contextualized word representations. arXiv preprint arXiv:1802.05365 (2018)
Thangaraj, M., Sivakami, M.: Text classification techniques: a literature review. Interdisc. J. Inf. Knowl. Manag. 13 (2018)
Wang, Y., Wang, X.J.: A new approach to feature selection in text classification. In: 2005 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3814–3819. IEEE (2005)
Wawer, A., Mykowiecka, A.: Supervised and unsupervised word sense disambiguation on word embedding vectors of unambigous synonyms. In: Proceedings of the 1st Workshop on Sense, Concept and Entity Representations and Their Applications, pp. 120–125 (2017)
Xiao, G., Guo, J., Gong, Z., Li, R.: Semantic input method of chinese word senses for semantic document exchange in e-business. J. Ind. Inf. Integr. 3, 31–36 (2016)
Yang, S., Wei, R., Shigarov, A.: Semantic interoperability for electronic business through a novel cross-context semantic document exchange approach. In: Proceedings of the ACM Symposium on Document Engineering 2018, p. 28. ACM (2018)
Young, T., Hazarika, D., Poria, S., Cambria, E.: Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 13(3), 55–75 (2018)
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820 (2015)
Acknowledgment
This research is supported by both the National Natural Science Foundation of China (grant no.: 61802079) and the Guangzhou University Grant (no.: 2900603143).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, S., Wei, R., Guo, J. (2020). Semantic Document Classification Based on Semantic Similarity Computation and Correlation Analysis. In: Chao, KM., Jiang, L., Hussain, O., Ma, SP., Fei, X. (eds) Advances in E-Business Engineering for Ubiquitous Computing. ICEBE 2019. Lecture Notes on Data Engineering and Communications Technologies, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-030-34986-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-34986-8_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34985-1
Online ISBN: 978-3-030-34986-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)