Skip to main content
Log in

Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Semantic similarity measures based on the estimation of the information content (IC) of concepts are currently regarded as the state of the art. Calculating the IC in an intrinsic (i.e., ontology-based) way is particularly convenient due to its accuracy and lack of dependency on annotated corpora. Intrinsic IC calculation models estimate concept probabilities from the taxonomic knowledge (i.e., number of hyponyms and/or hypernyms of the concepts) modelled in an ontology. In this paper, we aim to improve the intrinsic calculation of the IC by leveraging not only the hyponyms and hypernyms of concepts, but also the explicit evidences of synonymy and polysemy that ontologies such as WordNet also model. Specifically, we propose a more accurate intrinsic estimation of the concepts’ probabilities in which the IC calculation relies. We evaluate the accuracy of our proposal through a set of comprehensive experiments in which our IC calculation model is tested on a variety of IC-based similarity measures and benchmarks. Experimental results show that our proposal obtains consistently good accuracies, which vary less across measures and benchmarks than the most prominent intrinsic IC calculation models available in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adhikari A, Singh S, Dutta A, Dutta B (2015) A novel information theoretic approach for finding semantic similarity in WordNet. In: TENCON 2015 IEEE Region 10 conference, Macao, China, 2015. IEEE, pp 1–6

  • Adhikari A, Dutta B, Dutta A, Mondal D, Singh S (2018) An intrinsic information content-based semantic similarity measure considering the disjoint common subsumers of concepts of an ontology. J Assoc Inf Sci Technol 69:1023–1034

    Article  Google Scholar 

  • Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Human language technologies: the 2009 annual conference of the North American chapter of the ACL, 2009, pp 19–27

  • Batet M (2011) Ontology based semantic clustering. AI Commun 24:291–292

    Article  Google Scholar 

  • Batet M, Sánchez D (2014) Review on semantic similarity. In: Mehdi Khosrow-Pour DBA (ed) Encyclopedia of information science and technology, 3rd edn. IGI Global, Hershey, pp 7575–7583

    Google Scholar 

  • Batet M, Harispe S, Ranwez S, Sánchez D, Ranwez V (2014) An information theoretic approach to improve semantic similarity assessments across multiple ontologies. Inf Sci 283:197–210

    Article  Google Scholar 

  • Blanchard E, Harzallah M, Kuntz P (2008) A generic framework for comparing semantic similarities on a subsumption hierarchy. In: Proceedings of 18th European conference on artificial intelligence (ECAI), Patras, Greece, 21–25 July 2008. IOS Press, pp 20–24

  • Chan LWC, Liu Y, Shyu CR, Benzie IFF (2011) A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity. Eng Appl Artif Intell 24:1398–1409

    Article  Google Scholar 

  • Cimiano P (2006) Ontology learning and population from text: algorithms, evaluation and applications. Springer, Berlin

    Google Scholar 

  • Clark P, Harrison P, Jenkins T, Thompson J, Wojcik R (2006) From WordNet to a knowledge base. Paper presented at the AAAI 2006 spring symposium on formalizing and compiling background knowledge

  • Dice LR (1945) Meaures of the amount of ecologic association between species. Ecology 26:297–302

    Article  Google Scholar 

  • Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge

    Book  Google Scholar 

  • Fernando S, Stevenson M (2008) A semantic similarity approach to paraphrase detection. Paper presented at the 11th annual research colloqium computational linguistics UK (CLUK 2008)

  • Freihat AA, Giunchiglia F, Dutta B (2016) A taxonomic classification of WordNet polysemy types. In: 8th Global WordNet conference 2016, Bucharest, Romania, 2016, pp 105–113

  • Gambhir M, Gupta V (2017) Recent automatic text summarization techniques: a survey. Artif Intell Rev 47:1–66

    Article  Google Scholar 

  • Gómez-Pérez A, Fernández-López M, Corcho O (2004) Ontological engineering, 2nd edn. Springer, Berlin

    Google Scholar 

  • Hadj-Taieb MA, Ben-Aouicha M, Ben-Hamadou A (2014) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41:467–497

    Article  Google Scholar 

  • Harispe S, Sánchez D, Ranwez S, Janaqi S, Montmain J (2014) A framework for unifying ontology-based semantic similarity measures: a study in the biomedical domain. J Biomed Inform 49:38–53

    Article  Google Scholar 

  • Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: International conference on research in computational linguistics, ROCLING X, Taipei, Taiwan, Sept 1997, pp 19–33

  • Kim S, Fiorini N, Wilbur WJ, Lu Z (2017) Bridging the gap: incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J Biomed Inform 75:122–127

    Article  Google Scholar 

  • Lastra-Díaz JJ, García-Serrano A (2015a) A new family of information content models with an experiemental survey on WordNet. Knowl-Based Syst 89:509–526

    Article  Google Scholar 

  • Lastra-Díaz JJ, García-Serrano A (2015b) A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Eng Appl Artif Intell 46:140–153

    Article  Google Scholar 

  • Lin D (1998) An information-theoretic definition of similarity. In: Shavlik J (ed) 15th international conference on machine learning, ICML 1998, Madison, Wisconsin, USA, 24–27 July 1998. Morgan Kaufmann, pp 296–304

  • McInnes BT, Pedersen T (2013) Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text. J Biomed Inform 46:1116–1124

    Article  Google Scholar 

  • Meng L, Gu J (2012) A new model for measuring word sense similarity in WordNet. In: 4th international conference on advanced communication and networking, Jeju, Korea, 2012, pp 18–23

  • Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5:81–93

    Google Scholar 

  • Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6:1–28

    Article  Google Scholar 

  • Palmer M, Dang H, Fellbaum C (2007) Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Nat Lang Eng 13:137–163

    Article  Google Scholar 

  • Pirró G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68:1289–1308. https://doi.org/10.1016/j.datak.2009.06.008

    Article  Google Scholar 

  • Pirrò G, Euzenat J (2010) A feature and information theoretic framework for semantic similarity and relatedness. In: International semantic web conference, 2010, pp 615–630

  • Rada R, Mili H, Bichnell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 9:17–30. https://doi.org/10.1109/21.24528

    Article  Google Scholar 

  • Resnik P (1995) Using information content to evaluate semantic similarity in a taxonomy. In: Mellish CS (ed) 14th international joint conference on artificial intelligence, IJCAI 1995, Montreal, Quebec, Canada, 1995. Morgan Kaufmann Publishers Inc., pp 448–453

  • Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130. https://doi.org/10.1613/jair.514

    Article  MATH  Google Scholar 

  • Rodriguez-Garcia M, Batet M, Sánchez D (2017) A semantic framework for noise addition with nominal data. Knowl-Based Syst 122:103–118

    Article  Google Scholar 

  • Rubenstein H, Goodenough J (1965) Contextual correlates of synonymy. Commun ACM 8:627–633. https://doi.org/10.1145/365628.365657

    Article  Google Scholar 

  • Sánchez D, Batet M (2011) Semantic similarity estimation in the biomedical domain: an ontology-based information-theoretic perspective. J Biomed Inform 44:749–759

    Article  Google Scholar 

  • Sánchez D, Batet M (2012) A new model to compute the information content of concepts from taxonomic knowledge. Int J Semant Web Inf Syst 8:34–50

    Article  Google Scholar 

  • Sánchez D, Batet M (2017) Toward sensitive document release with privacy guarantees. Eng Appl Artif Intell 59:23–34

    Article  Google Scholar 

  • Sánchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl-based Syst 24:297–303

    Article  Google Scholar 

  • Sánchez D, Batet M, Isern D, Valls A (2012a) Ontology-based semantic similarity: a new feature-based approach. Expert Syst Appl 39:7718–7728

    Article  Google Scholar 

  • Sánchez D, Moreno A, Vasto-Terrientes LD (2012b) Learning relation axioms from text: an automatic Web-based approach. Expert Syst Appl 39:5792–5805

    Article  Google Scholar 

  • Sánchez D, Castellà-Roca J, Viejo A (2013) Knowledge-based scheme to create privacy-preserving but semantically-related queries for web search engines. Inf Sci 218:17–30

    Article  Google Scholar 

  • Sebti A, Barfroush AA (2008) A new word sense similarity measure in WordNet. Paper presented at the proceedings of the international multiconference on computer science and information technology, IMCSIT 2008, Wisia, Poland

  • Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: López de Mántaras R, Saitta L (eds) 16th European conference on artificial intelligence, ECAI 2004, including prestigious applicants of intelligent systems, PAIS 2004, Valencia, Spain, 22–27 Aug 2004. IOS Press, pp 1089–1090

  • Vicient C, Sánchez D, Moreno A (2013) An automatic approach for ontology-based feature extraction from heterogeneous textual resources. Eng Appl Artif Intell 26:1092–1106

    Article  Google Scholar 

  • Viejo A, Sánchez D (2016) Enforcing transparent access to private content in social networks by means of automatic sanitization. Expert Syst Appl 62:148–160

    Article  Google Scholar 

  • Viejo A, Sánchez D, Castellà-Roca J (2012) Preventing automatic user profiling in Web 2.0 applications. Knowl-Based Syst 36:191–205

    Article  Google Scholar 

  • Wang P, Domeniconi C (2008) Building semantic kernels for text classification using wikipedia. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, Las Vegas, Nevada, USA, 2008. ACM, pp 713–721

  • Wu Z, Palmer M (1994) Verb semantics and lexical selection. In: 32nd annual meeting of the association for computational linguistics, Las Cruces, New Mexico, 1994. Association for Computational Linguistics, pp 133–138

  • Yuan Q, Yu Z, Wang K (2013) A new model of information content for measuring the semantic similarity between concepts. In: Proceedings of the 2nd international conference on cloud computing and big data, 2013. IEEE Computer Society, pp 141–146

  • Zhou Z, Wang Y, Gu J (2008) A new model of information content for semantic similarity in WordNet. In: Yau SS, Lee C, Chung Y-C (eds) 2nd international conference on future generation communication and networking symposia, FGCNS 2008, Sanya, Hainan Island, China, 13–15 Dec 2008. IEEE Computer Society, pp 85–89. https://doi.org/10.1109/fgcns.2008.16

Download references

Acknowledgements

This work was partly supported by the European Commission (H2020-700540 project “CANVAS”), by the Spanish Government (projects TIN2014-57364-C2-2-R “SmartGlacis”, RTI2018-095094-B-C22 "CONSENT" and TIN2016-80250-R “Sec-MCloud”). The opinions expressed in this paper are those of the authors and do not necessarily reflect the views of UNESCO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Montserrat Batet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Batet, M., Sánchez, D. Leveraging synonymy and polysemy to improve semantic similarity assessments based on intrinsic information content. Artif Intell Rev 53, 2023–2041 (2020). https://doi.org/10.1007/s10462-019-09725-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-019-09725-4

Keywords

Navigation