Abstract
In the past decade various semantic relatedness, similarity, and distance measures have been proposed which play a crucial role in many NLP-applications. Researchers compete for better algorithms (and resources to base the algorithms on), and often only few percentage points seem to suffice in order to prove a new measure (or resource) more accurate than an older one. However, it is still unclear which of them performs best under what conditions. In this work we therefore present a study comparing various relatedness measures. We evaluate them on the basis of a human judgment experiment and also examine several practical issues, such as run time and coverage. We show that the performance of all measures – as compared to human estimates – is still mediocre and argue that the definition of a shared task might bring us considerably closer to results of high quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Anderson, J.R.: A spreading activation theory of memory. Journal of Verbal Leaning and Verbal Behaviour 22, 261–295 (1983)
Baroni, M., Bernardini, S. (eds.): Wacky! Working papers on the web as corpus. GEDIT, Bologna (2006)
Barzilay, R., Elhadad, M.: Using lexical chains for text summarization. In: Proceedings of the Intelligent Scalable Text Summarization Workshop, pp. 10–17 (1997)
Boyd-Graber, J., Fellbaum, C., Osherson, D., Schapire, R.: Adding dense, weighted, connections to wordnet. In: Proceedings of the 3rd Global WordNet Meeting, pp. 29–35 (2006)
Budanitsky, A.: Lexical semantic relatedness and its application in natural language processing. Tech. rep., Department of Computer Science, University of Toronto (1999), http://citeseerx.ist.psu.edu/viewdoc/summary?doi0.1.1.34.1036
Budanitsky, A., Hirst, G.: Evaluating wordnet-based measures of semantic relatedness. Computational Linguistics 32(1), 13–47 (2006)
Bullinaria, J.A., Levy, J.P.: Extracting semantic representations from word co-occurrence statistics: A computational study. Behavior Research Methods 39(1), 510–526 (2007)
Carthy, J.: Lexical chains versus keywords for topic tracking. In: Computational Linguistics and Intelligent Text Processing. LNCS, pp. 507–510. Springer, Heidelberg (2004)
Cederberg, S., Widdows, D.: Using LSA and noun coordination information to improve the precision and recall of automatic hyponymy. In: Proc. of CoNNL 2003 (2003)
Church, K., Hanks, P.: Word association norms, mutual information and lexicography. In: Proceedings of the 27th ACL, vol. 27, pp. 76–83 (1989)
Cilibrasi, R., Vitanyi, P.M.B.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Collins, A., Loftus, E.: A spreading activation theory of semantic processing. Psychological Review 82, 407–428 (1975)
Cramer, I.: How Well Do Semantic Relatedness Measures Perform? A Meta-Study. In: Bos, J., Delmonte, R. (eds.) Semantics in Text Processing. STEP 2008 Conference Proceedings, Research in Computational Semantics, vol. 1, pp. 59–70. College Publications (2008), http://www.aclweb.org/anthology/W08-2206
Cramer, I., Finthammer, M.: An evaluation procedure for word net based lexical chaining: Methods and issues. In: Proceedings of the 4th Global WordNet Meeting, pp. 120–147 (2008)
Deerwester, S., Dumais, S., Landauer, T., Furnas, G., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science 41(6), 391–407 (1990), http://citeseer.nj.nec.com/deerwester90indexing.html
Fellbaum, C. (ed.): WordNet. An Electronic Lexical Database. MIT Press, Cambridge (1998)
Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 6–12 (2007)
Green, S.J.: Building hypertext links by computing semantic similarity. IEEE Transactions on Knowledge and Data Engineering 11(5) (1999)
Gurevych, I.: Using the structure of a conceptual network in computing semantic relatedness. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. (eds.) IJCNLP 2005. LNCS (LNAI), vol. 3651, pp. 767–778. Springer, Heidelberg (2005)
Halliday, M.A.K., Hasan, R.: Cohesion in English. Longman, London (1976)
Hirst, G., St-Onge, D.: Lexical chains as representation of context for the detection and correction malapropisms. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 305–332. MIT Press, Cambridge (1998)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Proceedings of ROCLING X, pp. 19–33 (1997)
Kilgarriff, A.: Googleology is bad science. Computational Linguistics 33(1), 147–151 (2007)
Landauer, T., Dumais, S.: A solution to Plato’s problem: The Latent Semantic Analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review 104(1), 211–240 (1997)
Leacock, C., Chodorow, M.: Combining local context and wordnet similarity for word sense identification. In: Fellbaum, C. (ed.) WordNet: An Electronic Lexical Database, pp. 265–284. The MIT Press, Cambridge (1998)
Lemnitzer, L., Kunze, C.: Germanet – representation, visualization, application. In: Proceedings of the 4th Language Resources and Evaluation Conference, pp. 1485–1491 (2002)
Lemnitzer, L., Wunsch, H., Gupta, P.: Enriching germanet with verb-noun relations – a case study of lexical acquisition. In: Proceedings of the 6th International Language Resources and Evaluation (2008)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the 15th International Conference on Machine Learning, pp. 296–304 (1998)
Marrafa, P., Mendes, S.: Modeling adjectives in computational relational lexica. In: Proceedings of the COLING/ACL 2006, pp. 555–562 (2006) (poster session)
Miller, G.A., Charles, W.G.: Contextual correlates of semantic similiarity. Language and Cognitive Processes 6(1), 1–28 (1991)
Milne, D.: Computing semantic relatedness using wikipedia link structure. In: Proc. of NZCSRSC 2007 (2007)
Morris, J., Hirst, G.: Lexical cohesion computed by thesaural relations as an indicator of the structure of text. Computational linguistics 17(1) (1991)
Morris, J., Hirst, G.: Non-classical lexical semantic relations. In: Proc. of HLT-NAACL Workshop on Computational Lexical Semantics (2004)
Morris, J., Hirst, G.: The subjectivity of lexical cohesion in text. In: Chanahan, J.C., Qu, C., Wiebe, J. (eds.) Computing attitude and affect in text. Springer, Heidelberg (2005)
Novischi, A., Moldovan, D.: Question answering with lexical chains propagating verb arguments. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 897–904 (2006)
Rapp, R.: The computation of word associations: Comparing syntagmatic and paradigmatic approaches. In: Proceedings of COLING 2002, Taipei, Taiwan (2002)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Martin, T., L. Ralescu, A. (eds.) IJCAI-WS 1995. LNCS, vol. 1188, Springer, Heidelberg (1997)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM 8(10), 627–633 (1965)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill, New York (1983)
Schulte im Walde, S., Melinger, A.: Identifying semantic relations and functional properties of human verb associations. In: Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pp. 612–619 (2005)
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1998)
Silber, G.H., McCoy, K.F.: Efficiently computed lexical chains as an intermediate representation for automatic text summarization. Computational Linguistics 28(4) (2002)
Strube, M., Ponzetto, S.P.: Wikirelate! computing semantic relatedness using wikipedia. In: Proceedings of the 21st national conference on Artificial intelligence, vol. 2, pp. 1419–1424. AAAI Press, Menlo Park (2006)
Teich, E., Fankhauser, P.: Wordnet for lexical cohesion analysis. In: Proc. of the 2nd Global WordNet Conference, GWC 2004 (2004)
Turney, P.D.: Mining the web for synonyms: Pmi-ir versus lsa on toefl. In: Proceedings of the 12th European Conference on Machine Learning EMCL 2001, pp. 491–502. Springer, London (2001), http://portal.acm.org/citation.cfm?id=645328.650004
Wandmacher, T.: How semantic is Latent Semantic Analysis? In: Proceedings of TALN/RECITAL 2005, Dourdan, France (2005)
Widdows, D., Ferraro, K.: Semantic vectors: a scalable open source package and online technology management application. In: Elra, E. (ed.) Proceedings of the Sixth International Language Resources and Evaluation (LREC 2008), Marrakech, Morocco (2008)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 133–138 (1994)
Zesch, T., Gurevych, I.: Automatically creating datasets for measures of semantic relatedness. In: Proceedings of the Workshop on Linguistic Distances at COLING/ACL 2006, pp. 16–24 (2006)
Zesch, T., Gurevych, I., Mühlhäuser, M.: Comparing wikipedia and german wordnet by evaluating semantic relatedness on multiple datasets. In: Proc. of NAACL-HLT (2007)
Zesch, T., Müller, C., Gurevych, I.: Extracting lexical semantic knowledge from wikipedia and wiktionary. In: Proceedings of the Conference on Language Resources and Evaluation (LREC). Electronic Proceedings (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Cramer, I., Wandmacher, T., Waltinger, U. (2011). Exploring Resources for Lexical Chaining: A Comparison of Automated Semantic Relatedness Measures and Human Judgments. In: Mehler, A., Kühnberger, KU., Lobin, H., Lüngen, H., Storrer, A., Witt, A. (eds) Modeling, Learning, and Processing of Text Technological Data Structures. Studies in Computational Intelligence, vol 370. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22613-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-22613-7_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22612-0
Online ISBN: 978-3-642-22613-7
eBook Packages: EngineeringEngineering (R0)