Abstract
Word similarity computation is one of the most fundamental areas of research in semantic information processing. Prior studies on Chinese word similarity computation have mostly adopted rule-based methods. Some studies have been conducted on English word similarity computation using the notable knowledge base WordNet. English word similarity computation methods cannot be used directly for word similarity computation. Therefore, we find a ontology knowledge base whose hierarchical structure is similar to WordNet. With the help of it, we develop an improved Chinese word similarity computation method, therein incorporating the common depth, depth parameter, depth adjustment parameter, concept relation parameter, density parameter and differential value into the Chinese word similarity computation process. First, we perform an in-depth analysis on the merits and disadvantages of existing word semantic similarity computation approaches; then, we investigate the effect of several factors on the word semantic similarity computation. Finally, we utilize the hierarchical tree structure of the ontology knowledge base to improve the word similarity computation accuracy. The experimental results show that our proposed method outperforms state-of-the-art methods. Network public opinion is the mapping of social public opinion on the Internet. By using the means of similarity calculation, a platform of online public opinion with prediction and early warning can be built to quickly find the hinge point of public opinion, which provides rich data support for the management.
Similar content being viewed by others
Data availability
Data is available from the authors upon reasonable request.
References
Agirre E, Rigau G (1997) A proposal for word sense disambiguation using conceptual distance. Amsterdam studies in the theory and history of linguistic science, series 4, pp 161–172
Baker CF, Fillmore CJ, Lowe JB (2002) The berkeley framenet project, in: Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, pp. 86–90
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1991) Word-sense disambiguation using statistical methods. In: 29th Annual meeting of the Association for Computational Linguistics, pp 264–270
Cai D, Bai Y, Yu S, Ye N, Ren X (2010) A context based word similarity computing method. J Chin Inf Process 24(3):24–28
Chen Y, Zong C, Su KY (2010) On jointly recognizing and aligning bilingual named entities. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 631–639
Dagan I, Lee L, Pereira FCN (1999) Similarity-based models of word cooccurrence probabilities. Mach Learn 34:43–69
Dong Z (1998) The expression of semantic relations and the construction of knowledge system. Applied Linguistics 3:76282
Fan M, Zhang Y, Li J (2015) Word similarity computation based on HowNet. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp 1487–1492
HIT-IRLab, Hit ir-lab tongyici cilin (extended) (n.d.). http://ir.hit.edu.cn
Huang D, Jiahuan P (2018) Incorporating prior knowledge into word embedding for Chinese word similarity measurement. ACM Trans Asian Low-Resource Lang Inf Process (TALLIP) 17(3):1–21
Khaledian N, Nazari A, Khamforoosh K, Abualigah L, Javaheri D (2023) Trustdl: use of trust-based dictionary learning to facilitate recommendation in social networks. Expert Syst Appl 228:120487
Kim MH, Lee YJ, Lee JH (1993) Information retrieval based on conceptual distance in is-a hierarchies. J Doc 49(2):188–207
Lee LJ (n.d.) Similarity-based approaches to natural language processing, Comput Therm Sci
Li S, Zong C (2008) Multi-domain adaptation for sentiment classification: Using multiple classifier combining methods, in: Natural Language Processing and Knowledge Engineering, 2008. NLP-KE ‘08. International conference on
Li Y, Bandar ZA, Mclean D (2003) An approach for measuring semantic similarity between words using multiple information sources. Knowl Data Eng IEEE Transact 15(4):871–882
Li S, Xia R, Zong C, Huang CR (2009) A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 692–700
Lu W, Shi K, Cai Y, et al (2016) Semantic similarity measurement using knowledge-augmented multiple-prototype distributed word vector. International Journal of Interdisciplinary Telecommunications and Networking (IJITN) 8(2):45–57
Mei L, Zhou Q, Zang L, et al (2005) Merge information in hownet and TongYiCi CiLin. Journal of Chinese Information Processing (JCIP) 19(1):63–70
Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Neurosci 6(1):1–28
Nama S, Saha AK, Chakraborty S, Gandomi AH, Abualigah L (2023) Boosting particle swarm optimization by backtracking search algorithm for optimization problems. Swarm Evolu Comput 79:101304
Nghia-Nguyen T, Kikumoto M, Nguyen-Xuan H, Khatir S, Wahab MA, Cuong-Le T (2023) Optimization of artificial neutral networks architecture for predicting compression parameters using piezocone penetration test. Expert Syst Appl 223:119832
Ning W, Yu M, Kong D (2016) Evaluating semantic similarity between chinese biomedical terms through multiple ontologies with score normalization: an initial study. J Biomed Inform 64:273–287
Philip R (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130
Liu Q (2002) Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing 7(2):59–76
Rezvan E, Finn D (2016) Topoicsim: a new semantic similarity measure based on gene ontology. BMC Bioinf 17(1):296
Richardson SD, Dolan WB, Vanderwende L (1998) Mindnet: acquiring and structuring semantic information from text, in: Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, pp. 1098–1102
Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633
Sánchez D, Batet M, Martínez S, Domingo-Ferrer J (2015) Semantic variance: an intuitive measure for ontology accuracy evaluation. Eng Appl Artif Intell 39(39):89–99
Jiang Y (2002) The structure of chinese concept dictionary. Journal of Chinese Information Processing (JCIP) 16(4):12–20
Siming L (2008) Semantic frame recognition of chinese unregistered verbs based on semantic similarity, Ph.D. thesis, Beijing University of Posts and Telecommunications
Tao Z (2005) The semantic knowledge acquisition of chinese unknown words, Ph.D. thesis, Shanxi University
Tu M, Zhou Y, Zong C (n.d.) Enhancing grammatical cohesion: Generating transitional expressions for smt
Wei W, Xiang Y, Chen Q (2010) Combined measurement approach for semantic similarity of terms. J Comput Appl 30(6):1668–1670
Wu Y, Li W (2016) Overview of the nlpcc-iccpol 2016 shared task: Chinese word similarity measurement 828–839
Xia T (2007) Study on words semantic similarity computation. Comput Eng 33(6):191–194
Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification 181 (6) 1138–1152
Xinhua Z, Runcong M, Liu S, Hongchao C (2016) Word semantic similarity computation based on hownet and cilin. Journal of Chinese Information Processing 30(4):29–36
Xu J, Liu J, Zhang Y (n.d.) Word similarity computing based on hybrid hierarchical structure by hownet, J Inf Sci Eng 31
Xun E, Yan W (n.d.) English word similarity calculation based on semantic net, J China Soc Sci Tech Inf
YiFei L, Minh H-L, Khatir S, Sang-To T, Cuong-Le T, MaoSen C, Wahab MA (2023) Structure damage identification in dams using sparse polynomial chaos expansion combined with hybrid k-means clustering optimizer and genetic algorithm. Eng Struct 283:115891
Zare M, Akbari M-A, Azizipanah-Abarghooee R, Malekpour M, Mirjalili S, Abualigah L (2023) A modified particle swarm optimization algorithm with enhanced search quality and population using hummingbird flight patterns. Decision Anal J 7:100251
Zare M, Ghasemi M, Zahedi A, Golalipour K, Mohammadi SK, Mirjalili S, Abualigah L (2023) A global best-guided firefly algorithm for engineering problems. J Bionic Eng:1–30
Zhang YM (2020) Word semantic similarity based on cilin and word2vec 304–307
Zhang Q, Haglin D (2016) Semantic similarity between ontologies at different scales. IEEE/CAA J Autom Sin 3(2):132–140
Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25
Zhang Z, Yu L, Chen Y, Luo Y, Shao X (2006) Measurement of word similarity based on corpus. Comput Appl 26(3):638–0640
Zhang P, Zhang Z, Zhang W (2013) An approach of semantic similarity by combining hownet and cilin, in: IEEE international conference on green computing and communications and IEEE internet of things and IEEE cyber, physical and social computing, pp. 1638–164
Acknowledgments
This work is partially supported by the National Natural Science Foundation of China under Grant 41930429; CNPC Major Science and Technology Project (ZD2019-183-006); special fund for basic scientific research operations of central universities (20CX05017A); joint funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, Q., Xu, J., Duan, Y. et al. Improving word similarity computation accuracy by multiple parameter optimization based on ontology knowledge. Multimed Tools Appl 83, 17469–17489 (2024). https://doi.org/10.1007/s11042-023-16122-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16122-1