Skip to main content
Log in

A survey of semantic relatedness evaluation datasets and procedures

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Semantic relatedness between words is a core concept in natural language processing. While countless approaches have been proposed, measuring which one works best is still a challenging task. Thus, in this article, we give a comprehensive overview of the evaluation protocols and datasets for semantic relatedness covering both intrinsic and extrinsic approaches. One the intrinsic side, we give an overview of evaluation datasets covering more than 100 datasets in 20 different languages from a wide range of domains. To provide researchers with better guidance for selecting suitable dataset or even building new and better ones, we describe also the construction and annotation process of the datasets. We also shortly describe the evaluation metrics most frequently used for intrinsic evaluation. As for the extrinsic side, several applications involving semantic relatedness measures are detailed through recent research works and by explaining the benefit brought by the measures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Figure adapted from Bär et al. (2015)

Fig. 6

Similar content being viewed by others

Notes

  1. Surveys on semantic relatedness approaches are e.g. Feng et al. (2017), Harispe et al. (2015), Zhang et al. (2012).

  2. https://github.com/MohamedAliHadjTaieb/Semantic-measure-assessment-review-study.

  3. https://github.com/Lambda-3/Gold-Standards/tree/master/SemR-11.

  4. Semantic transparency is the degree to which the meaning of a compound word or an idiom can be inferred from its parts (or morphemes) (Bell and Schäfer 2016).The word blueberry is semantically transparent; the word strawberry is not.

  5. The term preferred-relation (such as hyponym-hypernym pairs) is used to denote the relation which the model should prefer, and unpreferred-relation to denote any other relation.

  6. http://odp.org/.

  7. https://aclweb.org/aclwiki/Google_analogy_test_set_(State_of_the_art).

  8. https://github.com/dkpro/dkpro-similarity/releases.

  9. https://dkpro.github.io/dkpro-core/.

  10. http://www.semanticsimilarity.org/.

  11. http://www.linguatools.de/disco/disco-download_en.html.

  12. http://www.nltk.org/.

  13. https://radimrehurek.com/gensim.

  14. https://shilad.github.io/wikibrain/.

  15. http://takelab.fer.hr/sts.

  16. http://www.marekrei.com/projects/semsim/.

  17. http://mechaglot.sourceforge.net.

  18. https://github.com/fozziethebeat/S-Space.

  19. https://deeplearning4j.org/docs/latest/deeplearning4j-nlp-word2vec.

  20. https://radimrehurek.com/gensim/.

  21. https://github.com/composes-toolkit/dissect.

  22. http://ltmaggie.informatik.uni-hamburg.de/jobimviz/.

  23. https://github.com/dscarvalho/easyesa.

  24. https://github.com/Lambda-3/Indra.

  25. https://data.mendeley.com/datasets/t87s78dg78/4.

  26. https://dkpro.github.io/dkpro-tc/.

  27. http://www.semantic-measures-library.org.

  28. A comparison with other tools is provided at: https://github.com/sharispe/sm-tools-evaluation.

  29. https://pypi.org/project/fastsemsim/.

  30. https://files.ifi.uzh.ch/ddis/oldweb/ddis/research/simpack/.

  31. http://semmf.ag-nbi.de/doc/index.html.

  32. http://ontosim.gforge.inria.fr/.

  33. https://code.google.com/p/ytex/wiki/SemanticSim_V06.

  34. https://simlibrary.wordpress.com/.

  35. http://wn-similarity.sourceforge.net/.

  36. http://code.google.com/p/ws4j/.

  37. http://umls-similarity.sourceforge.net/.

  38. https://github.com/monarch-initiative/owlsim-v3.

  39. http://210.46.85.150/platform/dosim/.

  40. http://www.bioconductor.org/packages/release/bioc/html/DOSE.html.

  41. http://serelex.cental.be/.

  42. http://sematch.cluster.gsi.dit.upm.es/.

  43. https://github.com/gsi-upm/sematch.

  44. https://github.com/jjlastra/HESML.

  45. http://wnetss-api.smr-team.org/.

  46. https://dkpro.github.io/.

  47. http://semanticsimilarity.org/.

  48. https://www.linguatools.de/disco/.

  49. http://swoogle.umbc.edu/SimService/.

  50. https://omictools.com/intego2-tool.

  51. http://xldb.di.fc.ul.pt/tools/cessm/.

  52. http://alt.qcri.org/semeval2019/.

References

  • Abdi A, Idris N, Alguliyev RM, Aliguliyev RM (2015) Pdlk: plagiarism detection using linguistic knowledge. Expert Syst Appl 42(22):8936–8946

    Google Scholar 

  • Agirre E, Alfonseca E, Hall K, Kravalova J, Paşca M, Soroa A (2009) A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics, NAACL’09. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 19–27

  • Agirre E, Diab M, Cer D, Gonzalez-Agirre A (2012) Semeval-2012 task 6: a pilot on semantic textual similarity. In: Proceedings of the first joint conference on lexical and computational semantics—volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation, SemEval’12. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 385–393

  • Akhtar SS, Gupta A, Vajpayee A, Srivastava A, Shrivastava M (2017) Word similarity datasets for Indian languages: annotation and baseline systems. In: Proceedings of the 11th linguistic annotation workshop at ACL, pp 91–94

  • Akmal S, Shih LH, Batres R (2014) Ontology-based similarity for product information retrieval. Comput Ind 65(1):91–107

    Google Scholar 

  • Alkhatlan A, Kalita J, Alhaddad A (2018) Word sense disambiguation for arabic exploiting arabic wordnet and word embedding. Proc Comput Sci 142:50–60

    Google Scholar 

  • Almarsoomi FA, O’Shea J, Bandar Z, Crockett KA (2013) AWSS: an algorithm for measuring arabic word semantic similarity. In: IEEE international conference on systems, man, and cybernetics, Manchester, SMC 2013, United Kingdom, October 13–16, 2013, pp 504–509

  • Almuhareb A (2006) Attributes in lexical acquisition. Ph.D. thesis, University of Essex, England, Essex

  • Angelos H, Giannis V, Epimeneidis V, Euripides GMP, Evangelos M (2006) Information retrieval by semantic similarity. J Semant Web Inf Syst (IJSWIS) 3(3):55–73

    Google Scholar 

  • Araque O, Zhu G, Garcí-Amado M, Iglesias CA (2016) Mining the opinionated web: classification and detection of aspect contexts for aspect based sentiment analysis. In: 2016 IEEE 16th international conference on data mining workshops (ICDMW), pp 900–907

  • Araque O, Zhu G, Iglesias CA (2019) A semantic similarity-based perspective of affect lexicons for sentiment analysis. Knowl Based Syst 165:346–359

    Google Scholar 

  • Artstein R (2017) Inter-annotator agreement. Handbook of linguistic annotation. Springer, Dordrecht, pp 297–313

    Google Scholar 

  • Avraham O, Goldberg Y (2016) Improving reliability of word similarity evaluation by redesigning annotation task and performance measure. In: RepEval@ACL. Association for Computational Linguistics, pp 106–110

  • Baker S, Reichart R, Korhonen A (2014) An unsupervised model for instance level subcategorization acquisition. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a meeting of SIGDAT, a special interest group of the ACL, pp 278–289

  • Ballatore A, Bertolotto M, Wilson DC (2014) An evaluative baseline for geo-semantic relatedness and similarity. Geoinformatica 18(4):747–767

    Google Scholar 

  • Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: In Proceedings of the eighteenth international joint conference on artificial intelligence, pp 805–810

  • Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: ACL 2007, proceedings of the 45th annual meeting of the Association for Computational Linguistics, June 23–30, 2007, Prague, Czech Republic (2007)

  • Bär D, Zesch T, Gurevych I (2011) A reflective view on text similarity. In: Angelova G, Bontcheva K, Mitkov R, Nicolov N (eds) RANLP. RANLP 2011 organising committee, pp 515–520 (2011)

  • Bär D, Biemann C, Gurevych I, Zesch T (2012a) UKP: computing semantic textual similarity by combining multiple content similarity measures. In: Proceedings of the 6th international workshop on semantic evaluation, held in conjunction with the 1st joint conference on lexical and computational semantics, pp 435–440

  • Bär D, Zesch T, Gurevych I (2012b) Text reuse detection using a composition of text similarity measures. In: Proceedings of the 24th international conference on computational linguistics (COLING 2012). Mumbai, India, pp 167–184. http://www.aclweb.org/anthology/C12-1011

  • Bär D, Zesch T, Gurevych I (2013) Dkpro similarity: an open source framework for text similarity. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics: system demonstrations. Association for Computational Linguistics, pp 121–126

  • Bär D, Zesch T, Gurevych I (2015) Composing measures for computing text similarity. Technical report

  • Baroni M, Lenci A (2011) How we BLESSed distributional semantic evaluation. In: Proceedings of the GEMS 2011 workshop on geometrical models of natural language semantics. Association for Computational Linguistics, Edinburgh, UK, pp 1–10

  • Baroni M, Murphy B, Barbu E, Poesio M (2010) Strudel: a corpus-based semantic model based on properties and types. Cognit Sci 34(2):222–254

    Google Scholar 

  • Barzegar S, Sales JE, Freitas A, Handschuh S, Davis B (2015) Dinfra: a one stop shop for computing multilingual semantic relatedness. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, SIGIR’15. New York, NY, USA, pp 1027–1028

  • Barzegar S, Davis B, Zarrouk M, Handschuh S, Freitas A (2018) Semr-11: a multi-lingual gold-standard for semantic similarity and relatedness for eleven languages. In: Proceedings of the eleventh international conference on language resources and evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018

  • Bell MJ, Schäfer M (2016) Modelling semantic transparency. Morphology 26(2):157–199

    Google Scholar 

  • Ben Aouicha M, Hadj Taieb MA, Ibn Marai H (2016a) WSD-TIC: word sense disambiguation using taxonomic information content. In: Computational collective intelligence—8th international conference, ICCCI 2016, Halkidiki, Greece, September 28–30, 2016, proceedings, part I, pp 131–142

  • Ben Aouicha M, Hadj Taieb MA, Ben Hamadou A (2016b) Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl Intell 45(2):475–511

    Google Scholar 

  • Ben Aouicha M, Hadj Taieb MA, Ben Hamadou A (2018a) SISR: system for integrating semantic relatedness and similarity measures. Soft Comput 22(6):1855–1879

    Google Scholar 

  • Ben Aouicha M, Hadj Taieb M, Ibn Marai H (2018b) Wordnet and wiktionary-based approach for word sense disambiguation. Trans Comput Collective Intell 29:123–143

    Google Scholar 

  • Bernstein A, Kaufmann E, Kiefer C, Bürki C (2005) Simpack: a generic java library for similarity measures in ontologies. Technical report

  • Biemann C, Riedl M (2013) Text: now in 2D! A framework for lexical expansion with contextual similarity. J Lang Model 1(1):55–95

    Google Scholar 

  • Bird S (2006) Nltk: the natural language toolkit. In: Proceedings of the COLING/ACL on Interactive presentation sessions, COLING-ACL’06. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 69–72

  • Bjerva J, Östling R (2017) Cross-lingual learning of semantic textual similarity with multilingual word representations. In: Proceedings of the 21st nordic conference on computational linguistics. Association for Computational Linguistics, pp 211–215

  • Blair P, Merhav Y, Barry J (2017) Automated generation of multilingual clusters for the evaluation of distributed representations. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, workshop track proceedings

  • Bollegala D, Matsuo Y, Ishizuka M (2007) Measuring semantic similarity between words using web search engines. In: WWW’07: proceedings of the 16th international conference on world wide web. ACM, pp 757–766

  • Bruni E, Tran NK, Baroni M (2014) Multimodal distributional semantics. J Artif Int Res 49(1):1–47

    MathSciNet  MATH  Google Scholar 

  • Budanitsky A, Hirst G (2006) Evaluating wordnet-based measures of semantic distance. Comput Linguist 32(1):13–47

    MATH  Google Scholar 

  • Camacho-Collados J, Navigli R (2016) Find the word that does not belong: a framework for an intrinsic evaluation of word vector representations. In: Proceedings of the 1st workshop on evaluating vector-space representations for NLP. Association for Computational Linguistics, Berlin, Germany, pp 43–50

  • Camacho-Collados J, Pilehvar MT, Navigli R (2015) A framework for the construction of monolingual and cross-lingual word similarity datasets. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the asian federation of natural language processing, ACL 2015, July 26–31, 2015, Beijing, China, vol 2, pp 1–7

  • Camacho-Collados J, Pilehvar MT, Collier N, Navigli R (2017) Semeval-2017 task 2: multilingual and cross-lingual semantic word similarity. Vancouver, Canada

    Google Scholar 

  • Carvalho D, Çalli C, Freitas A, Curry E (2014) Easyesa: a low-effort infrastructure for explicit semantic analysis. In: Proceedings of the 2014 international conference on posters & demonstrations track, ISWC-PD’14, vol 1272. Aachen, Germany, pp 177–180

  • Cer DM, Diab MT, Agirre E, Lopez-Gazpio I, Specia L (2017) Semeval-2017 task 1: semantic textual similarity multilingual and crosslingual focused evaluation. In: Proceedings of the 11th international workshop on semantic evaluation, SemEval@ACL 2017, Vancouver, Canada, August 3–4, 2017, pp 1–14

  • Chen F, Lu C, Wu H, Li M (2017) A semantic similarity measure integrating multiple conceptual relationships for web service discovery. Expert Syst Appl 67:19–31

    Google Scholar 

  • Chen Z, Song J, Yang Y (2018) An approach to measuring semantic relatedness of geographic terminologies using a thesaurus and lexical database sources. ISPRS Int J Geo-Inf 7(3):98

    Google Scholar 

  • Cilibrasi RL, Vitanyi PMB (2007) The google similarity distance. IEEE Trans Knowl Data Eng 19(3):370–383

    Google Scholar 

  • Cinková S (2016) WordSim353 for czech. Springer, Cham, pp 190–197

    Google Scholar 

  • Cohen KB, Xia J, Zweigenbaum P, Callahan T, Hargraves O, Goss F, Ide N, Névéol A, Grouin C, Hunter LE (2018) Three dimensions of reproducibility in natural language processing. In: Proceedings of the eleventh international conference on language resources and evaluation (LREC (2018) European Language Resources Association (ELRA). Miyazaki, Japan

    Google Scholar 

  • Curran JR (2002) Ensemble methods for automatic thesaurus extraction. In: Proceedings of conference on empirical methods in natural language processing, pp 222–229

  • David J, Euzenat J (2008) Comparison between ontology distances (preliminary results). In: Sheth A, Staab S, Dean M, Paolucci M, Maynard D, Finin T, Thirunarayan K (eds) The semantic web-ISWC 2008. Springer, Berlin, pp 245–260

    Google Scholar 

  • de Saussure F (1983) Course in general linguistics. Duckworth, London ([1916] 1983). (trans. Roy Harris)

  • Dinu G, Pham NT, Baroni M (2013) DISSECT—DIStributional SEmantics composition toolkit. In: Proceedings of the 51st annual meeting of the association for computational linguistics: system demonstrations. Association for Computational Linguistics, Sofia, Bulgaria, pp 31–36

  • Egozi O, Gabrilovich E, Markovitch S (2008) Concept-based feature generation and selection for information retrieval. In: Proceedings of the twenty-third AAAI conference on artificial intelligence

  • Ensan F, Du W (2018) Ad hoc retrieval via entity linking and semantic similarity. Knowl Inf Syst 58:551–583

    Google Scholar 

  • Ercan G, Yildiz OT (2018) Anlamver: semantic model evaluation dataset for turkish—word similarity and relatedness. In: Proceedings of the 27th international conference on computational linguistics, COLING 2018, Santa Fe, New Mexico, USA, August 20–26, 2018, pp 3819–3836

  • Fellbaum C (ed) (1998) WordNet an electronic lexical database. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Feng Y, Bagheri E, Ensan F, Jovanovic J (2017) The state of the art in semantic relatedness: a framework for comparison. Knowl Eng Rev 32:1–30

    Google Scholar 

  • Finkelstein L, Gabrilovich E, Matias Y, Rivlin E, Solan Z, Wolfman G, Ruppin E (2002) Placing search in context: the concept revisited. ACM Trans Inf Syst 20(1):116–131

    Google Scholar 

  • Franco-Salvador M, Rosso P, Montes-y-Gómez M (2016) A systematic study of knowledge graph analysis for cross-language plagiarism detection. Inf Process Manag 52(4):550–570

    Google Scholar 

  • Freitas A, Barzegar S, Sales JE, Handschuh S, Davis B (2016) Semantic relatedness for all (languages): a comparative analysis of multilingual semantic relatedness using machine translation. In: Blomqvist E, Ciancarini P, Poggi F, Vitali F (eds) Knowledge engineering and knowledge management: 20th international conference, EKAW 2016, Bologna, Italy, November 19–23, 2016, Proceedings. Springer International Publishing, Cham, pp 212–222

    Google Scholar 

  • Gabsi I, Kammoun H, Brahmi S, Amous I (2017) Mesh-based disambiguation method using an intrinsic information content measure of semantic similarity. Proc Comput Sci 112:564–573

    Google Scholar 

  • Garla VN, Brandt C (2012) Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BMC Bioinform 13:261–261

    Google Scholar 

  • Gerz D, Vulic I, Hill F, Reichart R, Korhonen A (2016) Simverb-3500: a large-scale evaluation set of verb similarity. In: Proceedings of the 2016 conference on empirical methods in natural language processing, EMNLP 2016, Austin, Texas, USA, November 1–4, 2016, pp 2173–2182

  • Gil JM, Montes JFA (2013) Semantic similarity measurement using historical google search patterns. Inf Syst Front 15(3):399–410

    Google Scholar 

  • Glavas G, Nanni F, Ponzetto SP (2016) Unsupervised text segmentation using semantic relatedness graphs. In: Proceedings of the fifth joint conference on lexical and computational semantics, *SEM@ACL 2016, Berlin, Germany, 11–12 August 2016

  • Gliozzo A, Strapparava C (2006) Exploiting comparable corpora and bilingual dictionaries for cross-language text categorization. In: Proceedings of the 21st international conference on computational linguistics and the 44th annual meeting of the association for computational linguistics, ACL-44. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 553–560

  • Gracia J, Mena E (2008) Web-based measure of semantic relatedness. In: Proceedings of 9th international conference on web information systems engineering (WISE 2008), Auckland, New Zealand. Springer, pp 136–150

  • Granada R, Trojahn C, Vieira R (2014) Comparing semantic relatedness between word pairs in portuguese using wikipedia. Springer, Cham, pp 170–175

    Google Scholar 

  • Guessoum D, Miraoui M, Tadj C (2015) Survey of semantic simialrity measures in pervasive computing. Int J Smart Sens Intell Syst 8(1):125–158

    Google Scholar 

  • Gurevych I (2005) Using the structure of a conceptual network in computing semantic relatedness. In: Natural language processing—IJCNLP 2005, second international joint conference, Jeju Island, Korea, October 11–13, 2005, proceedings, pp 767–778

  • Gurevych I (2006) Computing semantic relatedness across parts of speech. Darmstadt University of Technology, Germany, Department of Computer Science, Telecooperation, technical report

  • Gurevych I, Strube M (2004)Semantic similarity applied to spoken dialogue summarization. In: Proceedings of the 20th international conference on computational linguistics, COLING’04

  • Gurevych I, Müller C, Zesch T (2007) What to be?—Electronic career guidance based on semantic relatedness. In: Proceedings of ACL. Association for Computational Linguistics, pp 1032–1039

  • Guzzi PH, Mina M, Guerra C, Cannataro M (2012) Semantic similarity analysis of protein data: assessment with biological features and issues. Brief Bioinf 13(5):569–585

    Google Scholar 

  • Hadj Taieb MA, Ben Aouicha M, Ben Hamadou A (2013) Computing semantic relatedness using wikipedia features. Knowl Based Syst 50:260–278

    Google Scholar 

  • Hadj Taieb MA, Ben Aouicha M, Ben Hamadou A (2014) Ontology-based approach for measuring semantic similarity. Eng Appl AI 36:238–261

    Google Scholar 

  • Hadj Taieb MA, Ben Aouicha M, Bourouis Y (2015) FM3S: features-based measure of sentences semantic similarity. In: Hybrid artificial intelligent systems—10th international conference, HAIS 2015, Bilbao, Spain, June 22–24, 2015, proceedings, pp 515–529

  • Halawi G, Dror G, Gabrilovich E, Koren Y (2012) Large-scale learning of word relatedness with constraints. In: Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 1406–1414

  • Han L, Kashyap AL, Finin T, Mayfield J, Weese J (2013) Umbc\_ebiquity-core: semantic textual similarity systems. In: *SEM@NAACL-HLT. Association for Computational Linguistics, pp 44–52

  • Harispe S, Ranwez S, Janaqi S, Montmain J (2014) The semantic measures library: assessing semantic similarity from knowledge representation analysis. In: Métais E, Roche M, Teisseire M (eds) Natural language processing and information systems. Springer, Cham, pp 254–257

    Google Scholar 

  • Harispe S, Ranwez S, Janaqi S, Montmain J (2015) Semantic similarity from natural language and ontology analysis. Morgan & Claypool Publishers, San Rafael

    Google Scholar 

  • Hassan S, Mihalcea R (2009) Cross-lingual semantic relatedness using encyclopedic knowledge. In: Proceedings of the 2009 conference on empirical methods in natural language processing. Association for Computational Linguistics, Singapore, pp 1192–1201. http://www.aclweb.org/anthology/D/D09/D09-1124

  • Hassan S, Banea C, Mihalcea R (2012) Measuring semantic relatedness using multilingual representations. In: Proceedings of the first joint conference on lexical and computational semantics—volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation, Semeval’12. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 20–29

  • Hecht B, Carton SH, Quaderi M, Schöning J, Raubal M, Gergle D, Downey D (2012) Explanatory semantic relatedness and explicit spatialization for exploratory search. In: Proceedings of the 35th international ACM SIGIR conference on research and development in information retrieval, SIGIR’12. ACM, New York, NY, USA, pp 415–424

  • Hill F, Reichart R, Korhonen A (2015) Simlex-999: evaluating semantic models with (genuine) similarity estimation. Comput Linguist 41(4):665–695

    MathSciNet  Google Scholar 

  • Hirst G, Budanitsky A (2005) Correcting real-word spelling errors by restoring lexical cohesion. Nat Lang Eng 11(1):87–111

    Google Scholar 

  • Hliaoutakis A (2005) Semantic similarity measures in the mesh ontology and their application to information retrieval on medline. In: Technical report, Technical University of Crete (TUC), Department of Electronic and Computer Engineering

  • Horsmann T, Zesch T (2018) DeepTC—an extension of DKPro text classification for fostering reproducibility of deep learning experiments. In: Proceedings of the eleventh international conference on language resources and evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018

  • Huang E, Socher R, Manning C, Ng A (2012) Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th annual meeting of the association for computational linguistics (volume 1: long papers). Association for Computational Linguistics, Jeju Island, Korea, pp 873–882

  • Jarmasz M, Szpakowicz S (2003) Roget’s thesaurus and semantic similarity. In: Proceedings of conference on recent advances in natural language processing (RANLP 2003), pp 212–219

  • Jiang Y, Wang X, Zheng HT (2014) A semantic similarity measure based on information distance for ontology alignment. Inf Sci 278(Supplement C):76–87. https://doi.org/10.1016/j.ins.2014.03.021

  • Jin P, Wu Y (2012) SemEval-2012 task 4: evaluating chinese word similarity. In: Proceedings of the first joint conference on lexical and computational semantics, pp 374–377

  • Joubarne C, Inkpen D (2011) Comparison of semantic similarity for different languages using the google n-gram corpus and second-order co-occurrence measures. In: Advances in artificial intelligence—24th Canadian conference on artificial intelligence, Canadian AI 2011, St. John’s, Canada, May 25–27, 2011. Proceedings, pp 216–221

  • Jurgens D, Stevens K (2010) The s-space package: an open source package for word space models. In: Proceedings of the ACL 2010 system demonstrations. Association for Computational Linguistics, Uppsala, Sweden, pp 30–35

  • Kennedy A, Hirst G (2012) Measuring semantic relatedness across languages. In: xLiTe: cross-lingual technologies workshop collocated with NIPS 2012

  • Kiela D, Hill F, Korhonen A, Clark S (2014) Improving multi-modal representations using image dispersion: why less is sometimes more. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 2: short papers). Association for Computational Linguistics, Baltimore, Maryland, pp 835–841

  • Kipper K, Korhonen A, Ryant N, Palmer M (2007) A large-scale classification of english verbs. Lang Resour Eval 42(1):21–40

    Google Scholar 

  • Kiritchenko S, Mohammad S (2017) Best-worst scaling more reliable than rating scales: a case study on sentiment intensity annotation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 2: short papers). Association for Computational Linguistics, Vancouver, Canada, pp 465–470

  • Kolb P (2008) DISCO: a multilingual database of distributionally similar words. In: Storrer A, Geyken A, Siebert A, Würzner KM (eds) KONVENS 2008—Ergänzungsband: Textressourcen und lexikalisches Wissen, pp 37–44

  • Konopik M, Pražák O, Steinberger D (2017) Czech dataset for semantic similarity and relatedness. In: Proceedings of the international conference recent advances in natural language processing, RANLP 2017. INCOMA Ltd., Varna, Bulgaria, pp 401–406

  • Kozima H (1993) Computing lexical cohesion as a tool for text analysis. Technical report

  • Lastra-Díaz JJ, García-Serrano A, Batet M, Fernández M, Chirigati F (2017) Hesml: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Inf Syst 66:97–118

    Google Scholar 

  • Lastra-Díaz JJ, Goikoetxea J, Hadj Taieb MA, García-Serrano A, Ben Aouicha M, Agirre E (2019a) Word similarity benchmarks of recent word embedding models and ontology-based semantic similarity measures. e-cienciaDatos, v1. http://dx.doi.org/10.21950/AQ1CVX

  • Lastra-Díaz JJ, Goikoetxea J, Hadj Taieb M, García-Serrano A, Ben Aouicha M, Agirre E (2019b) A reproducible survey on word embeddings and ontology-based methods for word similarity: linear combinations outperform the state of the art. Eng Appl Artif Intell 85:645–665

    Google Scholar 

  • Lastra-Díaz JJ, Goikoetxea J, Hadj Taieb M, García-Serrano A, Ben Aouicha M, Agirre E (2019c) Reproducibility dataset for a large experimental survey on word embeddings and ontology-based methods for word similarity. Data Brief 26:104432

    Google Scholar 

  • Lee JH, Kim MH, Lee YJ (1993) Information retrieval based on conceptual distance in is-a hierarchies. J Doc 49(2):188–207

    Google Scholar 

  • Leviant I, Reichart R (2015) Judgment language matters: multilingual vector space models for judgment language aware lexical semantics. CoRR. arXiv:abs/1508.00106

  • Li YM, Chen CW (2009) A synthetical approach for blog recommendation: combining trust, social relation, and semantic analysis. Expert Syst Appl 36(3):6536–6547

    Google Scholar 

  • Li J, Gong B, Chen X, Liu T, Wu C, Zhang F, Li C, Li X, Rao S, Li X (2011) Dosim: an R package for similarity between diseases based on disease ontology. BMC Bioinf 12(1):266

    Google Scholar 

  • Li P, Wang H, Zhu KQ, Wang Z, Wu X (2013) Computing term similarity by large probabilistic is a knowledge. In: Proceedings of the 22Nd ACM international conference on conference on information & knowledge management, CIKM’13. ACM, New York, NY, USA, pp 1401–1410

  • Lin F, Sandkuhl K (2008) A survey of exploiting wordnet in ontology matching. In: Bramer M (ed) IFIP AI, IFIP, vol 276. Springer, pp 341–350

  • Liu Q, Liu B, Zhang Y, Kim DS, Gao Z (2016) Improving opinion aspect extraction using semantic similarity and aspect associations. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA, pp 2986–2992

  • Liu XY, Zhou YM, Zheng RS (2007) Measuring semantic similarity in wordnet. In: 2007 international conference on machine learning and cybernetics, vol 6, pp 3431–3435

  • Lopez-Gazpio I, Maritxalar M, Gonzalez-Agirre A, Rigau G, Uria L, Agirre E (2017) Interpretable semantic textual similarity: finding and explaining differences between sentences. Knowl Based Syst 119:186–199

    Google Scholar 

  • Lord P, Stevens R, Brass A, Goble C (2003) Semantic similarity measures as tools for exploring the gene ontology. In: Proceedings of pacific symposium on biocomputing, pp 601–612

  • Louviere JJ (1991) Best-worst scaling: a modelfor the largest difference judgments. Working paper

  • Luong T, Socher R, Manning C (2013) Better word representations with recursive neural networks for morphology. In: Proceedings of the seventeenth conference on computational natural language learning. Association for Computational Linguistics, Sofia, Bulgaria, pp 104–113

  • Madani Y, Erritali M, Bengourram J (2019) Sentiment analysis using semantic similarity and hadoop mapreduce. Knowl Inf Syst 59(2):413–436

    Google Scholar 

  • Mandera P, Keuleers E, Brysbaert M (2017) Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: a review and empirical validation. J Mem Lang 92:57–78

    Google Scholar 

  • Marie-Francine M (2013) Similarity measures for semantic relation extraction. Université catholique de Louvain, These

    Google Scholar 

  • McInnes BT, Pedersen T, Pakhomov SVS (2009) UMLS-interface and UMLS-similarity: open source software for measuring paths and semantic similarity. In: AMIA. AMIA

  • Meo PD, Nocera A, Terracina G, Ursino D (2011) Recommendation of similar users, resources and social networks in a social internetworking scenario. Inf Sci 181(7):1285–1305

    MATH  Google Scholar 

  • Meyer CM, Mieskes M, Stab C, Gurevych I (2014) Dkpro agreement: an open-source java library for measuring inter-rater agreement. In: COLING (Demos). ACL, pp 105–109

  • Mihalcea R, Tarau P (2004) Textrank: bringing order into text. In: Proceedings of the 2004 conference on empirical methods in natural language processing

  • Mihalcea R, Corley C, Strapparava C (2006) Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of the 21st national conference on artificial intelligence—volume 1, AAAI’06. AAAI Press, pp 775–780

  • Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. In: 1st international conference on learning representations, ICLR 2013, Scottsdale, Arizona, USA, May 2–4, 2013, workshop track proceedings

  • Mikolov T, Yih WT, Zweig G (2013b) Linguistic regularities in continuous space word representations. In: HLT-NAACL, pp 746–751

  • Miller GA, Charles WG (1991) Contextual correlates of semantic similarity. Lang Cognit Process 6(1):1–28

    MathSciNet  Google Scholar 

  • Monz C, Dorr BJ (2005) Iterative translation disambiguation for cross-language information retrieval. In: Proceedings of the 28th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 520–527

  • Narducci F, Palmonari M, Semeraro G (2017) Cross-lingual link discovery with TR-ESA. Inf Sci 394–395:68–87

    Google Scholar 

  • Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):10:1–10:69

  • Nelson DL, McEvoy CL, Schreiber TA (2004) The University of South Florida free association, rhyme, and word fragment norms. Behav Res Methods Instrum Comput 36(3):402–407

    Google Scholar 

  • Netisopakul P, Wohlgenannt G, Pulich A (2019) Word similarity datasets for thai: Construction and evaluation. CoRR. arXiv:abs/1904.04307

  • Nguyen KA, Schulte im Walde S, Vu NT (2018) Introducing two Vietnamese datasets for evaluating semantic models of (dis-)similarity and relatedness, pp 199–205

  • Nguyen HT, Duong PH, Cambria E (2019) Learning short-text semantic similarity with word embeddings and external knowledge sources. Knowl Based Syst 182:104–842

    Google Scholar 

  • Nie JY, Simard M, Isabelle P, Durand R (1999) Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In: Proceedings of the 22Nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 74–81

  • Och FJ, Ney H (2000) A comparison of alignment models for statistical machine translation. In: Proceedings of the 18th conference on computational linguistics—volume 2. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 1086–1090

  • Oldakowski R, Bizer C (2005) SemMF: a framework for calculating semantic similarity of objects represented as RDF graphs. In: Poster at the 4th international semantic web conference (ISWC 2005) (2005)

  • Pakhomov S, McInnes B, Adam T, Liu Y, Pedersen T, Melton GB (2010) Semantic similarity and relatedness between clinical terms: an experimental study. Annual symposium proceedings/AMIA symposium. AMIA symposium 2010:572–576

    Google Scholar 

  • Pakhomov SVS, Pedersen T, McInnes BT, Melton GB, Ruggieri A, Chute CG (2011) Towards a framework for developing semantic relatedness reference standards. J Biomed Inform 44(2):251–265

    Google Scholar 

  • Panchenko A, Morozova O (2012) A study of hybrid similarity measures for semantic relation extraction. In: Proceedings of the workshop on innovative hybrid approaches to the processing of textual data, HYBRID’12. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 10–18

  • Panchenko A, Romanov P, Morozova O, Naets H, Philippovich A, Romanov A, Fairon C (2013) Serelex: search and visualization of semantically related words. In: European conference on information retrieval. Springer, pp 837–840

  • Panchenko A, Ustalov D, Arefyev N, Paperno D, Konstantinova N, Loukachevitch N, Biemann C (2016) Human and machine judgements for russian semantic relatedness. In: Analysis of images, social networks and texts (AIST’2016)

  • Panchenko A, Ustalov D, Arefyev N, Paperno D, Konstantinova N, Loukachevitch NV, Biemann C (2017) Human and machine judgements for Russian semantic relatedness. CoRR. arXiv:abs/1708.09702

  • Patwardhan S, Banerjee S, Pedersen T (2003) Using measures of semantic relatedness for word sense disambiguation. In: Proceedings of the 4th international conference on computational linguistics and intelligent text processing, Cicling’03. Springer, Berlin, pp 241–257

  • Patwardhan S, Pedersen T (2006) Using WordNet-based context vectors to estimate the semantic relatedness of concepts. EACL 2006 workshop making sense of sense–bringing computational linguistics and psycholinguistics together. Trento, Italy, pp 1–8

    Google Scholar 

  • Pedersen T, Patwardhan S, Michelizzi J (2004) Wordnet::similarity: measuring the relatedness of concepts. Demonstration papers at HLT-NAACL, (2004) HLT-NAACL-demonstrations’04. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 38–41

    Google Scholar 

  • Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG (2007) Measures of semantic similarity and relatedness in the biomedical domain. J Biomed Inf 40(3):288–299

    Google Scholar 

  • Peng J, Li H, Liu Y, Juan L, Jiang Q, Wang Y, Chen J (2016) Intego2: a web tool for measuring and visualizing gene semantic similarities using gene ontology. BMC Genomics 17(5):553–560

    Google Scholar 

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543

  • Pesquita C, Pessoa D, Faria D, Couto F (2009) CESSM: collaborative evaluation of semantic similarity measures. JB2009: challenges in bioinformatics

  • Pilehvar MT, Camacho-Collados J (2019) WIC: the word-in-context dataset for evaluating context-sensitive meaning representations. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2–7, 2019, vol 1 (long and short papers), pp 1267–1273

  • Pirrò G (2012) Reword: semantic relatedness in the web of data. In: Proceedings of the twenty-sixth AAAI conference on artificial intelligence, July 22–26, Toronto, Ontario, Canada, p 2012

  • Pirró G, Euzenat J (2010) A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider PF, Pan Y, Hitzler P, Mika P, PanJZ, Horrocks I, Glimm B (eds) Proceedings of the 9th international semantic web conference (ISWC2010), Lecture notes in computer science, vol 6496. Springer, pp 615–630

  • Ponzetto SP, Strube M (2006) Exploiting semantic role labeling, wordnet and wikipedia for coreference resolution. In: Proceedings of the main conference on human language technology conference of the North American chapter of the Association of Computational Linguistics, HLT-NAACL’06. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 192–199

  • Postma M, Vossen P (2014) What implementation and translation teach us: the case of semantic similarity measures in wordnets. In: Proceedings of the seventh global wordnet conference, pp 133–141

  • Radinsky K, Agichtein E, Gabrilovich E, Markovitch S (2011) A word at a time: computing word relatedness using temporal semantic analysis. In: Proceedings of the 20th international conference on World Wide Web, WWW’11. ACM, New York, NY, USA, pp 337–346

  • Řehůřek R, Sojka P (2010) Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks. ELRA, Valletta, Malta, pp 45–50

  • Resnik P, Diab M (2000) Measuring verb similarity. In: Proceedings of the twenty-second annual conference of the cognitive science society: August 13–15 (2000) Institute for Research in Cognitive Science. University of Pennsylvania, Philadelphia, PA

    Google Scholar 

  • Resnik P, Lin J (2010) Evaluation of NLP systems. Wiley, Hoboken, pp 271–295. https://doi.org/10.1002/9781444324044.ch11

  • Riloff E, Schafer C, Yarowsky D (2002) Inducing information extraction systems for new languages via cross-language projection. In: Proceedings of the 19th international conference on computational linguistics—volume 1, COLING’02. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 1–7

  • Rubenstein H, Goodenough JB (1965) Contextual correlates of synonymy. Commun ACM 8(10):627–633

    Google Scholar 

  • Ruiz-Casado M, Alfonseca E, Castells P (2005) Using context-window overlapping in synonym discovery and ontology extension. In: International conference on recent advances in natural language processing (RANLP 2005), Borovets, Bulgaria

  • Rus V, Lintean MC, Banjade R, Niraula NB, Stefanescu D (2013) Semilar: the semantic similarity toolkit. In: ACL (conference system demonstrations). The Association for Computer Linguistics, pp 163–168

  • Saad M, Langlois D, Smaïli K (2014) Cross-lingual semantic similarity measure for comparable articles. In: Przepiórkowski A, Ogrodniczuk M (eds) Adv Nat Lang Process. Springer, Cham, pp 105–115

    Google Scholar 

  • Sahami M, Heilman TD (2006) A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th international conference on World Wide Web, WWW’06. ACM, New York, NY, USA, pp 377–386

  • Sahlgren M (2006) The word-space model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces. Ph.D. thesis, Stockholm University, Stockholm, Sweden

  • Saif A, Aziz M, Omar N (2014) Evaluating knowledge-based semantic measures on arabic. Int J Commun Antenna Propag 4(5):180–194

    Google Scholar 

  • Sakaizawa Y, Komachi M (2017) Construction of a Japanese word similarity dataset. CoRR. arXiv:abs/1703.05916

  • Salem A, Ben-Abdallah H (2015) The design of valid multidimensional star schemas assisted by repair solutions. Vietnam J Comput Sci 2(3):169–179

    Google Scholar 

  • Sales JE, Souza L, Barzegar S, Davis B, Freitas A, Handschuh S (2018) Indra: a word embedding and semantic relatedness server. In: Proceedings of the eleventh international conference on language resources and evaluation, LREC 2018, Miyazaki, Japan, May 7–12, 2018

  • Sánchez D, Moreno A (2008) Learning non-taxonomic relationships from web documents for domain ontology construction. Data Knowl Eng 64(3):600–623

    Google Scholar 

  • Sánchez D, Isern D, Millan M (2011) Content annotation for the semantic web: an automatic web-based approach. Knowl Inf Syst 27(3):393–418

    Google Scholar 

  • Santus E, Wang H, Chersoni E, Zhang Y (2018) A rank-based similarity metric for word embeddings. In: Proceedings of the 56th annual meeting of the association for computational linguistics vol 2 (short papers). Association for Computational Linguistics, Melbourne, Australia, pp 552–557

  • Šarić F, Glavaš G, Karan M, Šnajder J, Bašić BD (2012) Takelab: systems for measuring semantic text similarity. In: Proceedings of the first joint conference on lexical and computational semantics—volume 1: proceedings of the main conference and the shared task, and volume 2: proceedings of the sixth international workshop on semantic evaluation, SemEval’12. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 441–448

  • Schickel-Zuber V, Faltings B (2007) OSS: a semantic similarity function based on hierarchical ontologies. In: Proceedings of the 20th international joint conference on artifical intelligence, IJCAI’07. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2007)

  • Schuler KK (2005) Verbnet: a broad-coverage, comprehensive verb lexicon. Ph.D. thesis, Philadelphia, PA, USA

  • Sen S, Li TJJ, Team W, Hecht B (2014) Wikibrain: democratizing computation on wikipedia. In: Proceedings of the international symposium on open collaboration, OpenSym’14. ACM, New York, NY, USA, pp 27:1–27:10

  • Silberer C, Lapata M (2014) Learning grounded meaning representations with autoencoders. In: Proceedings of the 52nd annual meeting of the association for computational linguistics vol 1 (long papers). Association for Computational Linguistics, Baltimore, Maryland, pp 721–732

  • Sopaoglu U, Ercan G (2016) Evaluation of semantic relatedness measures for Turkish language. In: CICLing (1), lecture notes in computer science, vol 9623. Springer, pp 600–611

  • Srihari RK, Zhang Z, Rao A (2000) Intelligent indexing and semantic retrieval of multimodal documents. Inf Retr 2(2):245–275

    Google Scholar 

  • Szumlanski SR, Gomez F, Sims VK (2013) A new set of norms for semantic relatedness measures. In: ACL (2). The Association for Computer Linguistics, pp 890–895

  • Tan BV, Thai NP, Lam PV (2017) Construction of a word similarity dataset and evaluation of word similarity techniques for Vietnamese. In: 9th international conference on knowledge and systems engineering (KSE), pp 65–70

  • Torsten Z, Iryna G (2006) Automatically creating datasets for measures of semantic relatedness. Coling/ACL 2006 workshop on linguistic distances. Australia, Sydney, pp 16–24

    Google Scholar 

  • Tóth Á (2013) How similar: word similarity judgments in english and Hungarian. Technical report

  • Tsatsaronis G, Varlamis I, Vazirgiannis M (2010a) Text relatedness based on a word thesaurus. J Artif Int Res 37(1):1–40

    MATH  Google Scholar 

  • Tsatsaronis G, Giannakoulopoulos A, Varlamis I, Kanellopoulos N (2010b) Identifying free text plagiarism based on semantic similarity. In: Proceedings of the 4th international plagiarism conference. Newcastle upon Tyne, UK

  • Uddin MN, Duong TH, Nguyen NT, Qi XM, Jo GS (2013) Semantic similarity measures for enhancing information retrieval in folksonomies. Expert Syst Appl 40(5):1645–1653

    Google Scholar 

  • Vulic I, Moens M (2013) Cross-lingual semantic similarity of words as the similarity of their semantic word responses. Human language technologies: conference of the north American chapter of the association of computational linguistics, proceedings, June 9–14, 2013. Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA, pp 106–116

    Google Scholar 

  • Vulic I, Moens M (2014) Probabilistic models of cross-lingual semantic similarity in context based on latent cross-lingual concepts induced from comparable data. In: Proceedings of the 2014 conference on empirical methods in natural language processing, EMNLP 2014, October 25–29, 2014, Doha, Qatar, a meeting of SIGDAT, a Special Interest Group of the ACL, pp 349–362

  • Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF (2007) A new method to measure the semantic similarity of GO terms. Bioinform 23(10):1274–1281. https://doi.org/10.1093/bioinformatics/btm087

    Article  Google Scholar 

  • Wang X, Jia Y, Zhou B, Ding ZY, Liang Z (2011) Computing semantic relatedness using Chinese wikipedia links and taxonomy. J Chin Comput Syst 32(11):2237–2242

    Google Scholar 

  • Wang S, Huang C, Yao Y, Chan A (2015) Mechanical turk-based experiment vs laboratory-based experiment: a case study on the comparison of semantic transparency rating data. In: Proceedings of the 29th Pacific Asia conference on language, information and computation, PACLIC 29, Shanghai, China, October 30–November 1, 2015

  • Wang B, Wang A, Chen F, Wang Y, Kuo CCJ (2019a) Evaluating word embedding models: methods and experimental results. APSIPA Trans Signal Inf Process. https://doi.org/10.1017/ATSIP.2019.12

    Article  Google Scholar 

  • Wang Y, Wang M, Fujita H (2019b) Word sense disambiguation: a comprehensive knowledge exploitation framework. Knowl Based Syst. https://doi.org/10.1016/j.knosys.2019.105030

    Article  Google Scholar 

  • Washington NL, Haendel MA, Mungall CJ, Ashburner M, Westerfield M, Lewis SE (2009) Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 7(11):e1000247

    Google Scholar 

  • Weeds J (2003) Measures and applications of lexical distributional similarity. Ph.D. thesis, Department of Informatics, University of Sussex

  • Wieling M, Rawee J, van Noord G (2018) Reproducibility in computational linguistics: are we willing to share? Comput Linguist 44(4):641–649

    Google Scholar 

  • Wu Y, Li W (2016) Overview of the NLPCC-ICCPOL 2016 shared task: Chinese word similarity measurement. In: Natural language understanding and intelligent applications—5th CCF conference on natural language processing and chinese computing, NLPCC 2016, and 24th international conference on computer processing of oriental languages, ICCPOL 2016, Kunming, China, December 2–6,2016, proceedings, pp 828–839

  • Xie S, Liu Y (2008) Using corpus and knowledge-based similarity measure in maximum marginal relevance for meeting summarization. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, ICASSP 2008, March 30–April 4, 2008, Caesars Palace, Las Vegas, Nevada, USA, pp 4985–4988

  • Xie F, Wu X, Hu X (2010) Keyphrase extraction based on semantic relatedness. In: Proceedings of the 9th IEEE international conference on cognitive informatics, ICCI 2010, July 7–9, 2010, Beijing, China, pp 308–312

  • Yang D, Powers DMW (2006) Verb similarity on the taxonomy of wordnet. In: The 3rd international WordNet conference (GWC-06), Jeju Island, Korea

  • Yang X, Su J (2007) Coreference resolution using semantic relatedness information from automatically discovered patterns. In: ACL. The Association for Computational Linguistics

  • Zesch T (2010) Study of semantic relatedness of words using collaboratively constructed semantic resources. Ph.D. thesis, Darmstadt University of Technology

  • Zesch T (2012) Measuring contextual fitness using error contexts extracted from the wikipedia revision history. In: Proceedings of the 13th conference of the European chapter of the Association for Computational Linguistics (EACL 2012). Avignon, France, pp 529–538

  • Zesch T, Gurevych I (2010) Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words. Nat Lang Eng 16(1):25–59

    Google Scholar 

  • Zhang SB, Tang QR (2016) Protein-protein interaction inference based on semantic similarity of gene ontology terms. J Theor Biol 401:30–37

    MathSciNet  Google Scholar 

  • Zhang Z, Gentile A, Ciravegna F (2012) Recent advances in methods of lexical semantic relatedness–a survey. Nat Lang Eng 1(1):1–69

    Google Scholar 

  • Zhu G, Iglesias CA (2017) Sematch: semantic similarity framework for knowledge graphs. Knowl Based Syst 130:30–32

    Google Scholar 

  • Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24

    Google Scholar 

  • Ziegler CN, Simon K, Lausen G (2006) Automatic computation of semantic proximity using taxonomic knowledge. In: Proceedings of the 15th ACM international conference on information and knowledge management, CIKM’06. ACM, New York, NY, USA, pp 465–474

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Ali Hadj Taieb.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hadj Taieb, M.A., Zesch, T. & Ben Aouicha, M. A survey of semantic relatedness evaluation datasets and procedures. Artif Intell Rev 53, 4407–4448 (2020). https://doi.org/10.1007/s10462-019-09796-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-019-09796-3

Keywords

Navigation