Skip to main content
Log in

SISR: System for integrating semantic relatedness and similarity measures

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Semantic similarity and relatedness measures have increasingly become core elements in the recent research within the semantic technology community. Nowadays, the search for efficient meaning-centered applications that exploit computational semantics has become a necessity. Researchers, have therefore, become increasingly interested in the development of a model that can simulate the human thinking process and capable of measuring semantic similarity/relatedness between lexical terms, including concepts and words. Knowledge resources are fundamental to quantify semantic similarity or relatedness and achieve the best expression for the semantics content. No fully developed system that is able to centralize these approaches is currently available for the research and industrial communities. In this paper, we propose a System for Integrating Semantic Relatedness and similarity measures, SISR, which aims to provide a variety of tools for computing the semantic similarity and relatedness. This system is the first to treat the topic of computing semantic relatedness with a view of integrating different key stakeholders in a parameterized way. As an instance of the proposed architecture, we propose WNetSS which is a Java API allowing the use of a wide WordNet-based semantic similarity measures pertaining to different categories including taxonomic-based, features-based and IC-based measures. It is the first API that allows the extraction of the topological parameters from the WordNet “is a” taxonomy which are used to express the semantics of concepts. Moreover, an evaluation module is proposed to assess the reproducibility of the measures accuracy that can be evaluated according to 10 widely used benchmarks through the correlations coefficients.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. http://wnetss-api.smr-team.org/.

  2. http://wiki.dbpedia.org/About.

  3. Resource Description Framework (RDF) is a graph template to formally describe Web resources and their metadata so that such descriptions can be processed automatically.

  4. Information content-based approach quantifies the similarity between concepts as a function of the Information Content (IC) that both concepts have in common in a given ontology. The basic idea is that general and abstract entities found in a discourse present less IC than more concrete and specialized ones.

  5. Brown Corpus: The Brown University Standard Corpus of Present-Day American English (or just Brown Corpus) was compiled in the 1960s by Henry Kucera and W. Nelson Francis at Brown University, Providence, Rhode Island as a general corpus (text collection) in the field of corpus linguistics. It contains 500 samples of English-language text, with roughly one million words, compiled from works published in the USA in 1961.

  6. In simpler terms, a hyponym shares a type of relationship with its hypernym. For example, pigeon, crow, eagle and seagull are all hyponyms of bird (their hypernym); which, in turn, is a hyponym of animal.

  7. POS: Part Of Speech

  8. http://clic.cimec.unitn.it/~elia.bruni/MEN.html.

  9. http://wacky.sslmit.unibo.it/doku.php?id=corpora.

  10. http://www.cl.cam.ac.uk/~fh295/simlex.html.

  11. http://extjwnl.sourceforge.net/.

References

  • Agirre E, Alfonseca E, Hall K, Kravalova J, Pasca M, Soroa A (2009) A study on similarity and relatedness using distributional and WordNet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Boulder, CO, pp 19–27

  • Al-Mubaid H, Nguyen HA (2006a) A cluster-based approach for semantic similarity in the biomedical domain. In: Proceedings of the 28th annual international conference of the IEEE engineering in medicine and biology society

  • Al-Mubaid H, Nguyen HA (2006b) A cluster-based approach for semantic similarity in the biomedical domain. In: Conference proceedings: annual international conference of the IEEE Engineering in medicine and biology society, IEEE engineering in medicine and biology society conference, vol 1, pp 2713–2717

  • Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence. Morgan Kaufmann Publishers Inc., Acapulco, Mexico, pp 805–810

  • Batet M, Sánchez D, Valls A, Gibert K (2013) Semantic similarity estimation from multiple ontologies. Appl Intell 38:29–44. doi:10.1007/s10489-012-0355-y

    Article  Google Scholar 

  • Ben Aouicha M, Hadj Taieb MA (2016) Computing semantic similarity between biomedical concepts using new information content approach. J Biomed Inform 59:258–275. doi:10.1016/j.jbi.2015.12.007

    Article  Google Scholar 

  • Ben Aouicha M, Hadj Taieb MA, Hamadou AB (2016) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell. doi:10.1007/s10489-015-0755-x

  • Bollegala D, Matsuo Y, Ishizuka M (2007) An integrated approach to measuring semantic similarity between words using information available on the web. In: Sidner CL, Schultz T, Stone M, Zhai C (eds) HLT-NAACL. The Association for Computational Linguistics, pp 340–347

  • Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Int Res 49:1–47

    MathSciNet  MATH  Google Scholar 

  • Budanitsky A (1999) Lexical semantic relatedness and its application in natural language processing

  • Budanitsky A, Hirst G (2001) Semantic distance in WordNet: an experimental, application-oriented evaluation of five measures. In: Workshop on WordNet and other lexical resources, second meeting of the North American chapter of the association for computational linguistics

  • Dolan B, Quirk C, Brockett C (2004) Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In: Proceedings of the 20th international conference on computational linguistics. doi:10.3115/1220355.1220406

  • Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc., Hyderabad, India, pp 1606–1611

  • Gao J-B, Zhang B-W, Chen XH (2015) A WordNet-based semantic similarity measurement combining edge-counting and information content theory. Eng Appl Artif Intell 39:80–88. doi:10.1016/j.engappai.2014.11.009

    Article  Google Scholar 

  • Gurevych I, Mühlhäuser M, Müller C, Steimle J, Weimer M, Zesch T (2007) Darmstadt Knowledge processing repository based on UIMA. In: Proceedings of the first workshop on unstructured information management architecture at biannual conference of the society for computational linguistics and language technology

  • Gurevych I, Strube M (2004) Semantic similarity applied to spoken dialogue summarization. In: Proceedings of the 20th international conference on computational linguistics. doi:10.3115/1220355.1220465

  • Hadj Taieb MA, Ben Aouicha M, Ben Hamadou A (2013) Computing semantic relatedness using Wikipedia features. Knowl Based Syst 50:260–278

    Article  Google Scholar 

  • Hadj Taieb MA, Ben Aouicha M, Ben Hamadou A (2014a) A new semantic relatedness measurement using WordNet features. Knowl Inf Syst 41:467–497. doi:10.1007/s10115-013-0672-4

    Article  Google Scholar 

  • Hadj Taieb MA, Ben Aouicha M, Ben Hamadou A (2014b) Ontology-based approach for measuring semantic similarity. Eng Appl Artif Intell 36:238–261. doi:10.1016/j.engappai.2014.07.015

  • Han X, Zhao J (2010) Structural semantic relatedness: a knowledge-based method to named entity disambiguation. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, Uppsala, Sweden, pp 50–59

  • Hao D, Zuo W, Peng T, He F (2011) An approach for calculating semantic similarity between words using WordNet. In: ICDMA. IEEE, pp 177–180

  • Hill F, Reichart R, Korhonen A (2014) SimLex-999: evaluating semantic models with (genuine) similarity estimation. CoRR abs/1408.3456:

  • Janowicz K, Keßler C, Schwarz M, Wilkes M, Panov I, Espeter M, Bäumer B (2007) Algorithm, implementation and application of the SIM-DL similarity server. In: Fonseca FT, Rodriguez MA, Levashkin S (eds) GeoS. Springer, Berlin, pp 128–145

  • Jiang JJ, Conrath DW (1997) Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. CoRR cmp-lg/9709008:

  • Kolb P (2008) DISCO: a multilingual database of distributionally similar words. In: Storrer A, Geyken A, Siebert A, Würzner K-M (eds) KONVENS 2008 – Ergänzungsband: Textressourcen und lexikalisches Wissen. Berlin, pp 37–44

  • Kondrak G (2001) Identifying cognates by phonetic and semantic similarity. In: Proceedings of the second meeting of the North American chapter of the association for computational linguistics on language technologies. Association for Computational Linguistics, Pittsburgh, PA, pp 1–8

  • Leacock C, Chodorow M (1998) Combining local context and WordNet similarity for word sense identification. In: Fellfaum C (ed). MIT Press, Cambridge, pp 265–283

  • Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th annual international conference on systems documentation. ACM, Toronto, ON, Canada, pp 24–26

  • Li Y, Bandar ZA, McLean D (2003) An approach for measuring semantic similarity between words using multiple information sources. IEEE Trans Knowl Data Eng 15:871–882. doi:10.1109/TKDE.2003.1209005

    Article  Google Scholar 

  • Li B, Luo F, Wang JZ, Feltus FA, Zhou J (2010) Effectively integrating information content and structural relationship to improve the GO-based similarity measure between proteins. In: Arabnia HR, Tran Q-N, Chang R, He M, Marsh A, Solo AMG, Yang JY (eds) BIOCOMP. CSREA Press, pp 166–172

  • Lin D (1998a) An information-theoretic definition of similarity. In: Proceedings of the fifteenth international conference on machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp 296–304

  • Lin D (1998b) Automatic retrieval and clustering of similar words. In: Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics, vol 2. Association for Computational Linguistics, Montreal, QC, Canada, pp 768–774

  • Liu X-Y, Zhou Y-M, Zheng R-S (2007) Measuring semantic similarity in Wordnet. In: International conference on machine learning and cybernetics. IEEE, pp 3431–3435

  • Liu H, Chen Y (2010) Computing semantic relatedness between named entities using Wikipedia. In: Proceedings of the 2010 international conference on artificial intelligence and computational intelligence, vol 01. IEEE Computer Society, Washington, DC, USA, pp 388–392

  • Marie-Francine M (2013) Similarity measures for semantic relation extraction. Université catholique de Louvain, Louvain-La-Neuve

    Google Scholar 

  • Matsuo Y, Sakaki T, Uchiyama K, Ishizuka M (2006) Graph-based word clustering using a web search engine. In: Proceedings of the 2006 conference on empirical methods in natural language processing. Association for Computational Linguistics, Sydney, Australia, pp 542–550

  • Meng L, Gu J, Zhou Z (2012) A new model of information content based on concept’s topology for measuring semantic similarity in WordNet. Int J Grid Distrib Comput 5:81–94

    Google Scholar 

  • Meng L, Gu J (2012) A new model for measuring word sense similarity in WordNet. In: Proceedings of the 4th international conference on advanced communication and networking. SERSC, Jeju, Korea, pp 18–23

  • Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cogn Process 6:1–28

    Article  Google Scholar 

  • Ovaska K, Laakso M, Hautaniemi S (2008) Fast gene ontology based clustering for microarray experiments. BioData Min 1

  • Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. In: Proceedings of the EACL 2006 workshop making sense of sense-bringing computational linguistics and psycholinguistics together, pp 1–8

  • Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:: similarity: measuring the relatedness of concepts. In: Demonstration papers at HLT-NAACL 2004. Association for Computational Linguistics, Boston, MA, pp 38–41

  • Petrakis EGM, Varelas G, Hliaoutakis A, Raftopoulou P (2006) X-similarity: computing semantic similarity between concepts from different ontologies. J Digit Inf Manag 4:233–237

    Google Scholar 

  • Pilehvar MT, Jurgens D, Navigli R (2013) Align, disambiguate and walk: a unified approach for measuring semantic similarity. In: ACL (1). The Association for Computer Linguistics, pp 1341–1351

  • Pirró G (2009) A semantic similarity metric combining features and intrinsic information content. Data Knowl Eng 68:1289–1308

    Article  Google Scholar 

  • Rada R, Mili H, Bicknell E, Blettner M (1989) Development and application of a metric on semantic nets. IEEE Trans Syst Man Cybern 19:17–30

    Article  Google Scholar 

  • Resnik P (1999) Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J Artif Intell Res 11:95–130

    MATH  Google Scholar 

  • Rodriguez MA, Egenhofer MJ (2003) Determining semantic similarity among entity classes from different ontologies. IEEE Trans Knowl Data Eng 15:442–456

    Article  Google Scholar 

  • Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8:627–633. doi:10.1145/365628.365657

  • Sánchez D, Batet M, Isern D (2011) Ontology-based information content computation. Knowl Based Syst 24:297–303. doi:10.1016/j.knosys.2010.10.001

    Article  Google Scholar 

  • Sánchez D, Solé-Ribalta A, Batet M, Serratosa F (2012) Enabling semantic similarity estimation across multiple ontologies: an evaluation in the biomedical domain. J Biomed Inform 45:141–155. doi:10.1016/j.jbi.2011.10.005

    Article  Google Scholar 

  • Šaric F, Glavaš G, Karan M, Šnajder J, Dalbelo Bašic B (2012) TakeLab: systems for measuring semantic text similarity. In: Proceedings of the sixth international workshop on semantic evaluation (SemEval 2012). Association for Computational Linguistics, Montréal, Canada, pp 441–448

  • Sebti A, Barfroush AA (2008) A new word sense similarity measure in WordNet. In: IMCSIT. IEEE, pp 369–373

  • Seco N, Veale T, Hayes J (2004) An intrinsic information content metric for semantic similarity in WordNet. In: Proceedings of ECAI 4

  • Stevenson M, Greenwood MA (2005) A semantic approach to IE pattern induction. In: Proceedings of the 43rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, Ann Arbor, MI, pp 379–386

  • Sussna M (1993) Word sense disambiguation for free-text indexing using a massive semantic network. In: Proceedings of the second international conference on information and knowledge management. ACM, Washington, DC, USA, pp 67–74

  • Tapeh AG, Rahgozar M (2008) A knowledge-based question answering system for B2C eCommerce. Knowl Based Syst 21:946–950

    Article  Google Scholar 

  • Tversky A (1977) Features of similarity. Psychol Rev 84:327–352. doi:10.1037/0033-295X.84.4.327

    Article  Google Scholar 

  • Wang JZ, Du Z, Payattakool R, Yu PS, Chen C-F (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23:1274–1281

    Article  Google Scholar 

  • Wang T, Hirst G (2011) Refining the notions of depth and density in WordNet-based semantic similarity measures. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Edinburgh, UK, pp 1003–1011

  • Wu H, Su Z, Mao F, Olman V, Xu Y (2005) Prediction of functional modules based on comparative genome analysis and Gene Ontology application. Nucleic Acids Res 33:2822–2837. doi:10.1093/nar/gki573

    Article  Google Scholar 

  • Wu Z, Palmer M (1994) Verbs semantics and lexical selection. In: Proceedings of the 32nd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, Las Cruces, NM, pp 133–138

  • Zesch T (2010) Study of semantic relatedness of words using collaboratively constructed semantic resources, pp 1–130

  • Zhou Z, Wang Y, Gu J (2008) A new model of information content for semantic similarity in WordNet. In: International conference on future generation communication and networking symposia, vol 3, pp 85–89

  • Zhou Z, Wang Y, Gu J (2008) New model of semantic similarity measuring in WordNet. In: 3rd international conference on intelligent system and knowledge engineering, 2008 (ISKE 2008). IEEE, pp 256–261

Download references

Acknowledgements

The authors would like to express their sincere gratitude to Mr. Anouar Smaoui from the English Language Unit at the Faculty of Science of Sfax, Tunisia, for his valuable proofreading and language polishing services.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Ali Hadj Taieb.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by V. Loia.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben Aouicha, M., Hadj Taieb, M.A. & Ben Hamadou, A. SISR: System for integrating semantic relatedness and similarity measures. Soft Comput 22, 1855–1879 (2018). https://doi.org/10.1007/s00500-016-2438-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-016-2438-x

Keywords

Navigation