Skip to main content
Log in

Measuring similarity and relatedness using multiple semantic relations in WordNet

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Semantic similarity and relatedness computation has attracted an increasing amount of attention among researchers. The majority of previous studies, including edge-based and information content-based methods, rely on a single semantic relationship in WordNet such as the “is-a” relation. However, a performance ceiling may have been created by semantic unicity and inadequate calculation in solely “is-a” relation-based measurements, i.e., the computed results for some word pairs are too small and significantly deviate from human judgments. For this problem, we propose the following solutions: (1) We introduce the notion of the nearest common descendant to provide a supplement for commonalities between concepts according to genetics theory. (2) We design various targeted methods for different incomplete semantic relations. Therefore, various semantic relations can participate in similarity and relatedness computations in their most appropriate manners. (3) We utilize the cross-use of incomplete semantic relations similar-to and antonymy to solve the challenge of adjective and adverb similarity/relatedness measurements in WordNet. (4) We propose a targeted independent computation and largest contribution aggregation method to break through the performance ceiling of similarity/relatedness measurements based on single “is-a” relations. We conduct evaluations of our proposed model using seven extensively employed datasets. These evaluations indicate that our method significantly improves the performance of the existing methods based on single “is-a” relations. Their best Pearson coefficient with human judgments on both the MC30 and RG65 is increased to 0.9. With the development and enrichment of semantic relations in WordNet, our proposed model can be expected to have a more prominent role.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. http://wordnet.princeton.edu/wordnet/download/current-version/.

  2. http://projects.csail.mit.edu/jwi/.

References

  1. Zhu GG, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101(2018):8–24

    Article  Google Scholar 

  2. Ru C, Tang J, Li S, Xie S, Wang T (2018) Using semantic similarity to reduce wrong labels in distant supervision for relation extraction. Inf Process Manag 54(4):593–608

    Article  Google Scholar 

  3. Otegi A, Arregi X, Ansa O, Agirre E (2015) Using knowledge based relatedness for information retrieval. Knowl Inf Syst 44(3):689–718

    Article  Google Scholar 

  4. Tversky A (1977) Features of similarity. Psychol Rev 84(4):327–352

    Article  Google Scholar 

  5. Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6(1):1–28

    Article  MathSciNet  Google Scholar 

  6. Fellbaum C (1998) WordNet: an electronic lexical database (language, speech, and communication). The MIT Press, Cambridge

    Book  MATH  Google Scholar 

  7. Zhu X, Li F, Chen H, Peng Q (2018) An efficient path computing model for measuring semantic similarity using edge and density. Knowl Inf Syst 55(1):79–111

    Article  Google Scholar 

  8. Gao JB, Zhang BW, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intel 39(2015):80–88

    Article  Google Scholar 

  9. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41(2):467–497

    Article  Google Scholar 

  10. Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15(4):871–882

    Article  Google Scholar 

  11. Liu X, Zhou Y, Zheng R (2007) Measuring semantic similarity in WordNet. In: Proceedings of the sixth international conference on machine learning and cybernetics, pp 3431–3435

  12. Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5(3):81–94

    Google Scholar 

  13. Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of t artificial intelligence, pp 1089–1090

  14. Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on association for computational linguistics, pp 133–138

  15. Hao D, Zuo WL, Peng T (2011) An approach for calculating semantic similarity between words using WordNet. In: Proceeding of 2011 second international conference on digital manufacturing and automation (Zhan Jiajie, Hunan, China), pp 177–180

  16. Ahsaee MG, Naghibzadeh M, Naeini SEY (2014) Semantic similarity assessment of words using weighted WordNet. Int J Mach Learn Cybern 5(3):479–490

    Article  Google Scholar 

  17. Lin D (1998) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning, pp 296–304

  18. Ning W, Yu M, Kong D (2016) Evaluating semantic similarity between Chinese biomedical terms through multiple ontologies with score normalization: an initial study. J Biomed Inform 64(2016):273–287

    Article  Google Scholar 

  19. Aouicha MB, Hadj Taieb MA, Hamadou AB (2016) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell 45(2):1–37

    Article  Google Scholar 

  20. Sánchez D, Batet M (2013) A semantic similarity method based on information content exploiting multiple ontologies. Expert Syst Appl 40(4):1393–1399

    Article  Google Scholar 

  21. Sánchez D, Batet M (2011) Ontology-based information content computation. Knowl-Based Syst 24(2):297–303

    Article  Google Scholar 

  22. Batet M, Harispe S, Ranwez S et al (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283(2014):197–210

    Article  Google Scholar 

  23. Aouicha MB, Hadj Taieb MA (2016) Computing semantic similarity between biomedical concepts using new information content approach. J Biomed Inform 59(1):258–275

    Article  Google Scholar 

  24. Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th international joint conference on artificial intelligence, pp 448–453

  25. Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of research in computational linguistics, pp 19–33

  26. Sánchez D, Batet M (2011) Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 44(5):749–759

    Article  Google Scholar 

  27. Petrakis EGM, Varelas G, Hliaoutakis A, Raftopoulou P (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag (JDIM) 4(4):233–237

    Google Scholar 

  28. Rodríguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15(2):442–456

    Article  Google Scholar 

  29. Lu W, Qin Y, Qi Q, Zeng W, Zhong Y (2016) Selecting a semantic similarity measure for concepts in two different CAD model data ontologies. Adv Eng Inform 30(3):449–466

    Article  Google Scholar 

  30. Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the second international conference on information and knowledge management, pp 67–74

  31. Hirst G, St-Onge D (1998) Lexical chains as representation of context for the detection and correction malapropisms. MIT Press, Cambridge, pp 305–322

    Google Scholar 

  32. Zhu G, Iglesias CA (2017) Computing semantic similarity of concepts in knowledge graphs. IEEE Trans Knowl Data Eng 29(1):72–85

    Article  Google Scholar 

  33. Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence (San Francisco, CA, USA), pp 805–810

  34. Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation, pp 24–26

  35. Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 workshop making sense of sense: bringing computational linguistics and psycholinguistics together, pp 1–8

  36. Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19(1):17–30

    Article  Google Scholar 

  37. Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(3):379–423, 623–656

  38. Jaccard P (1901) Distribution de la flore alpine dans le bassin des dranses et dans quelques régions voisines. Bull Soc Vaudoise Sci Nat 37(1901):241–272

    Google Scholar 

  39. Hadj Taieb MA, Aouicha MB, Hamadou AB (2014) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36(8):238–261

    Article  Google Scholar 

  40. Cross V, Yu X, Hu X (2013) Unifying ontological similarity measures: a theoretical and empirical investigation. Int J Approx Reason 54(7):861–875

    Article  Google Scholar 

  41. Harispe S, Sánchez D, Ranwez S, Janaqi S et al (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 48(2):38–53

    Article  Google Scholar 

  42. Zhu X, Guo Q (2019) Zhang B (2019) An efficient approach for measuring semantic relatedness using Wikipedia bidirectional links. Appl Intell. https://doi.org/10.1007/s10489-019-01452-1

    Article  Google Scholar 

  43. Harispe S, Ranwez S, Janaqi S (2015) Semantic similarity from natural language and ontology analysis. Synth Lect Hum Lang Technol 8(1):254

    Google Scholar 

  44. Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Article  Google Scholar 

  45. Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceeding of the human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, pp 19–27

  46. Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inform Syst 20(1):116–131

    Article  Google Scholar 

  47. Hill F, Reichart R, Korhonen A (2014) SimLex-999: evaluating semantic models with (Genuine) similarity estimation. Comput Linguist 41(4):665–695

    Article  MathSciNet  Google Scholar 

  48. Landauer TK, Dumais ST (1997) A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 104(2):211–240

    Article  Google Scholar 

  49. Tsatsaronis G, Varlamis I, Vazirgiannis M (2010) Text relatedness based on a word thesaurus. J Artif Intell Res 37(4):1–39

    Article  MATH  Google Scholar 

  50. Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of the twentieth international conference on computational linguistics (COLING), pp 350–356

Download references

Acknowledgements

This work has been supported by the Natural Science Foundation of Guangxi of China under the contract number 2018GXNSFAA138087, the National Natural Science Foundation of China under the contract numbers 61462010 and 61363036, the Innovation Project of Guangxi Graduate Education under the contract number XYCSZ2019064 and Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinhua Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Yang, X., Huang, Y. et al. Measuring similarity and relatedness using multiple semantic relations in WordNet. Knowl Inf Syst 62, 1539–1569 (2020). https://doi.org/10.1007/s10115-019-01387-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-019-01387-6

Keywords

Navigation