Skip to main content
Log in

Improving word similarity computation accuracy by multiple parameter optimization based on ontology knowledge

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Word similarity computation is one of the most fundamental areas of research in semantic information processing. Prior studies on Chinese word similarity computation have mostly adopted rule-based methods. Some studies have been conducted on English word similarity computation using the notable knowledge base WordNet. English word similarity computation methods cannot be used directly for word similarity computation. Therefore, we find a ontology knowledge base whose hierarchical structure is similar to WordNet. With the help of it, we develop an improved Chinese word similarity computation method, therein incorporating the common depth, depth parameter, depth adjustment parameter, concept relation parameter, density parameter and differential value into the Chinese word similarity computation process. First, we perform an in-depth analysis on the merits and disadvantages of existing word semantic similarity computation approaches; then, we investigate the effect of several factors on the word semantic similarity computation. Finally, we utilize the hierarchical tree structure of the ontology knowledge base to improve the word similarity computation accuracy. The experimental results show that our proposed method outperforms state-of-the-art methods. Network public opinion is the mapping of social public opinion on the Internet. By using the means of similarity calculation, a platform of online public opinion with prediction and early warning can be built to quickly find the hinge point of public opinion, which provides rich data support for the management.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2

Similar content being viewed by others

Data availability

Data is available from the authors upon reasonable request.

References

  1. Agirre E, Rigau G (1997) A proposal for word sense disambiguation using conceptual distance. Amsterdam studies in the theory and history of linguistic science, series 4, pp 161–172

  2. Baker CF, Fillmore CJ, Lowe JB (2002) The berkeley framenet project, in: Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, pp. 86–90

  3. Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1991) Word-sense disambiguation using statistical methods. In: 29th Annual meeting of the Association for Computational Linguistics, pp 264–270

  4. Cai D, Bai Y, Yu S, Ye N, Ren X (2010) A context based word similarity computing method. J Chin Inf Process 24(3):24–28

    Google Scholar 

  5. Chen Y, Zong C, Su KY (2010) On jointly recognizing and aligning bilingual named entities. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 631–639

  6. Dagan I, Lee L, Pereira FCN (1999) Similarity-based models of word cooccurrence probabilities. Mach Learn 34:43–69

    Article  Google Scholar 

  7. Dong Z (1998) The expression of semantic relations and the construction of knowledge system. Applied Linguistics 3:76282

    Google Scholar 

  8. Fan M, Zhang Y, Li J (2015) Word similarity computation based on HowNet. In: 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), pp 1487–1492

  9. HIT-IRLab, Hit ir-lab tongyici cilin (extended) (n.d.). http://ir.hit.edu.cn

  10. Huang D, Jiahuan P (2018) Incorporating prior knowledge into word embedding for Chinese word similarity measurement. ACM Trans Asian Low-Resource Lang Inf Process (TALLIP) 17(3):1–21

    Google Scholar 

  11. Khaledian N, Nazari A, Khamforoosh K, Abualigah L, Javaheri D (2023) Trustdl: use of trust-based dictionary learning to facilitate recommendation in social networks. Expert Syst Appl 228:120487

  12. Kim MH, Lee YJ, Lee JH (1993) Information retrieval based on conceptual distance in is-a hierarchies. J Doc 49(2):188–207

    Article  Google Scholar 

  13. Lee LJ (n.d.) Similarity-based approaches to natural language processing, Comput Therm Sci

  14. Li S, Zong C (2008) Multi-domain adaptation for sentiment classification: Using multiple classifier combining methods, in: Natural Language Processing and Knowledge Engineering, 2008. NLP-KE ‘08. International conference on

  15. Li Y, Bandar ZA, Mclean D (2003) An approach for measuring semantic similarity between words using multiple information sources. Knowl Data Eng IEEE Transact 15(4):871–882

    Article  Google Scholar 

  16. Li S, Xia R, Zong C, Huang CR (2009) A framework of feature selection methods for text categorization. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp 692–700

  17. Lu W, Shi K, Cai Y, et al (2016) Semantic similarity measurement using knowledge-augmented multiple-prototype distributed word vector. International Journal of Interdisciplinary Telecommunications and Networking (IJITN) 8(2):45–57

  18. Mei L, Zhou Q, Zang L, et al (2005) Merge information in hownet and TongYiCi CiLin. Journal of Chinese Information Processing  (JCIP) 19(1):63–70

  19. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Neurosci 6(1):1–28

    Google Scholar 

  20. Nama S, Saha AK, Chakraborty S, Gandomi AH, Abualigah L (2023) Boosting particle swarm optimization by backtracking search algorithm for optimization problems. Swarm Evolu Comput 79:101304

    Article  Google Scholar 

  21. Nghia-Nguyen T, Kikumoto M, Nguyen-Xuan H, Khatir S, Wahab MA, Cuong-Le T (2023) Optimization of artificial neutral networks architecture for predicting compression parameters using piezocone penetration test. Expert Syst Appl 223:119832

    Article  Google Scholar 

  22. Ning W, Yu M, Kong D (2016) Evaluating semantic similarity between chinese biomedical terms through multiple ontologies with score normalization: an initial study. J Biomed Inform 64:273–287

    Article  Google Scholar 

  23. Philip R (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130

    Article  Google Scholar 

  24. Liu Q (2002) Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing 7(2):59–76

  25. Rezvan E, Finn D (2016) Topoicsim: a new semantic similarity measure based on gene ontology. BMC Bioinf 17(1):296

    Article  Google Scholar 

  26. Richardson SD, Dolan WB, Vanderwende L (1998) Mindnet: acquiring and structuring semantic information from text, in: Meeting of the Association for Computational Linguistics and International Conference on Computational Linguistics, pp. 1098–1102

  27. Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  28. Sánchez D, Batet M, Martínez S, Domingo-Ferrer J (2015) Semantic variance: an intuitive measure for ontology accuracy evaluation. Eng Appl Artif Intell 39(39):89–99

    Article  Google Scholar 

  29. Jiang Y (2002) The structure of chinese concept dictionary. Journal of Chinese Information Processing (JCIP) 16(4):12–20

  30. Siming L (2008) Semantic frame recognition of chinese unregistered verbs based on semantic similarity, Ph.D. thesis, Beijing University of Posts and Telecommunications

  31. Tao Z (2005) The semantic knowledge acquisition of chinese unknown words, Ph.D. thesis, Shanxi University

  32. Tu M, Zhou Y, Zong C (n.d.) Enhancing grammatical cohesion: Generating transitional expressions for smt

  33. Wei W, Xiang Y, Chen Q (2010) Combined measurement approach for semantic similarity of terms. J Comput Appl 30(6):1668–1670

    Google Scholar 

  34. Wu Y, Li W (2016) Overview of the nlpcc-iccpol 2016 shared task: Chinese word similarity measurement 828–839

  35. Xia T (2007) Study on words semantic similarity computation. Comput Eng 33(6):191–194

    Google Scholar 

  36. Xia R, Zong C, Li S (2011) Ensemble of feature sets and classification algorithms for sentiment classification 181 (6) 1138–1152

  37. Xinhua Z, Runcong M, Liu S, Hongchao C (2016) Word semantic similarity computation based on hownet and cilin. Journal of Chinese Information Processing 30(4):29–36

    Google Scholar 

  38. Xu J, Liu J, Zhang Y (n.d.) Word similarity computing based on hybrid hierarchical structure by hownet, J Inf Sci Eng 31

  39. Xun E, Yan W (n.d.) English word similarity calculation based on semantic net, J China Soc Sci Tech Inf

  40. YiFei L, Minh H-L, Khatir S, Sang-To T, Cuong-Le T, MaoSen C, Wahab MA (2023) Structure damage identification in dams using sparse polynomial chaos expansion combined with hybrid k-means clustering optimizer and genetic algorithm. Eng Struct 283:115891

    Article  Google Scholar 

  41. Zare M, Akbari M-A, Azizipanah-Abarghooee R, Malekpour M, Mirjalili S, Abualigah L (2023) A modified particle swarm optimization algorithm with enhanced search quality and population using hummingbird flight patterns. Decision Anal J 7:100251

  42. Zare M, Ghasemi M, Zahedi A, Golalipour K, Mohammadi SK, Mirjalili S, Abualigah L (2023) A global best-guided firefly algorithm for engineering problems. J Bionic Eng:1–30

  43. Zhang YM (2020) Word semantic similarity based on cilin and word2vec 304–307

  44. Zhang Q, Haglin D (2016) Semantic similarity between ontologies at different scales. IEEE/CAA J Autom Sin 3(2):132–140

    Article  MathSciNet  Google Scholar 

  45. Zhang J, Zong C (2015) Deep neural networks in machine translation: an overview. IEEE Intell Syst 30(5):16–25

    Article  Google Scholar 

  46. Zhang Z, Yu L, Chen Y, Luo Y, Shao X (2006) Measurement of word similarity based on corpus. Comput Appl 26(3):638–0640

    Google Scholar 

  47. Zhang P, Zhang Z, Zhang W (2013) An approach of semantic similarity by combining hownet and cilin, in: IEEE international conference on green computing and communications and IEEE internet of things and IEEE cyber, physical and social computing, pp. 1638–164

Download references

Acknowledgments

This work is partially supported by the National Natural Science Foundation of China under Grant 41930429; CNPC Major Science and Technology Project (ZD2019-183-006); special fund for basic scientific research operations of central universities (20CX05017A); joint funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laith Abualigah.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Q., Xu, J., Duan, Y. et al. Improving word similarity computation accuracy by multiple parameter optimization based on ontology knowledge. Multimed Tools Appl 83, 17469–17489 (2024). https://doi.org/10.1007/s11042-023-16122-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16122-1

Keywords

Navigation