Skip to main content

Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content

  • Conference paper
On the Move to Meaningful Internet Systems: OTM 2008 (OTM 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5332))

Abstract

In many research fields such as Psychology, Linguistics, Cognitive Science, Biomedicine, and Artificial Intelligence, computing semantic similarity between words is an important issue. In this paper we present a new semantic similarity metric that exploits some notions of the early work done using a feature based theory of similarity, and translates it into the information theoretic domain which leverages the notion of Information Content (IC). In particular, the proposed metric exploits the notion of intrinsic IC which quantifies IC values by scrutinizing how concepts are arranged in an ontological structure. In order to evaluate this metric, we conducted an on line experiment asking the community of researchers to rank a list of 65 word pairs. The experiment’s web setup allowed to collect 101 similarity ratings, and to differentiate native and non-native English speakers. Such a large and diverse dataset enables to confidently evaluate similarity metrics by correlating them with human assessments. Experimental evaluations using WordNet indicate that our metric, coupled with the notion of intrinsic IC, yields results above the state of the art. Moreover, the intrinsic IC formulation also improves the accuracy of other IC based metrics. We implemented our metric and several others in the Java WordNet Similarity Library.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Budanitsky, H.G.A.: Semantic distance in WordNet: an Experimental Application Oriented Evaluation of Five Measures. In: Proc. of NACCL 2001, pp. 29–34 (2001)

    Google Scholar 

  2. Cilibrasi, R.L., Vitanyi, P.M.B.: The Google Similarity Distance. IEEE TKDE 19(3), 370–383 (2007)

    Google Scholar 

  3. Danushka, B., Yutaka, M., Mitsuru, I.: Measuring Semantic Similarity between Words Using Web Search Engines. In: Proc. of WWW 2007, pp. 757–766 (2007)

    Google Scholar 

  4. Hai, C., Hanhua, J.: Semrex: Efficient Search in Semantic Overlay for Literature Retrieval. FGCS 24(6), 475–488 (2008)

    Article  Google Scholar 

  5. Hirst, G., St-Onge, D.: WordNet: An Electronic Lexical Database. In: Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms. MIT Press, Cambridge (1998)

    Google Scholar 

  6. Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.E.: Information retrieval by Semantic Similarity. Int. J. SWIS 2(3), 55–73 (2006)

    Google Scholar 

  7. Janowicz, K.: Semantic Similarity Blog, http://www.similarity-blog.de/

  8. Jiang, J., Conrath, D.: Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. In: Proc. ROCLING X (1997)

    Google Scholar 

  9. Lee, J., Kim, M., Lee, Y.: Information Retrieval Based on Conceptual Distance in is-a Hierarchies. Journal of Documentation 49, 188–207 (1993)

    Article  Google Scholar 

  10. Li, Y., Bandar, A., McLean, D.: An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources. IEEE TKDE 15(4), 871–882 (2003)

    Google Scholar 

  11. Li, Y., McLean, D., Bandar, Z., O’Shea, J., Crockett, K.: Sentence Similarity based on Semantic Nets and Corpus Statistics. IEEE TKDE 18(8), 1138–1150 (2006)

    Google Scholar 

  12. Lin, D.: An Information-Theoretic Definition of Similarity. In: Proc. of Conf. on Machine Learning, pp. 296–304 (1998)

    Google Scholar 

  13. Meilicke, C., Stuckenschmidt, H., Tamilin, A.: Repairing Ontology Mappings. In: Proc. of AAAI 2007, pp. 1408–1413 (2007)

    Google Scholar 

  14. Miller, G.: Wordnet an On-Line Lexical Database. International Journal of Lexicography 3(4), 235–312 (1990)

    Article  Google Scholar 

  15. Miller, G., Charles, W.: Contextual Correlates of Semantic Similarity. Language and Cognitive Processes 6, 1–28 (1991)

    Article  Google Scholar 

  16. Pedersen, T., Pakhomov, S.V.S., Patwardhan, S., Chute, C.G.: Measures of Semantic Similarity and Relatedness in the Biomedical Domain. Journal of Biomedical Informatics 40(3), 288–299 (2007)

    Article  Google Scholar 

  17. Pirró, G., Ruffolo, M., Talia, D.: SECCO: On Building Semantic Links in Peer to Peer Networks. Journal on Data Semantics XII (to appear, 2008)

    Google Scholar 

  18. Rada, R., Mili, H., Bicknell, M., Blettner, E.: Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics 19, 17–30 (1989)

    Article  Google Scholar 

  19. Ravi, S., Rada, M.: Unsupervised Graph-Based Word Sense Disambiguation Using Measures of Word Semantic Similarity. In: Proc. of ICSC 2007 (2007)

    Google Scholar 

  20. Resnik, P.: Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. of IJCAI 1995, pp. 448–453 (1995)

    Google Scholar 

  21. Rissland, E.L.: Ai and Similarity. IEEE Intelligent Systems 21, 39–49 (2006)

    Article  Google Scholar 

  22. Rodriguez, M., Egenhofer, M.: Determining Semantic Similarity among Entity Classes from Different Ontologies. IEEE TKDE 15(2), 442–456 (2003)

    Google Scholar 

  23. Rubenstein, H., Goodenough, J.B.: Contextual Correlates of Synonymy. CACM 8(10), 627–633 (1965)

    Article  Google Scholar 

  24. Schaeffer, B., Wallace, R.: Semantic Similarity and the Comparison of Word Meanings. J. Experiential Psychology 82, 343–346 (1969)

    Article  Google Scholar 

  25. Schwering, A.: Hybrid Model for Semantic Similarity Measurement. In: Proc. of ODBASE 2005, pp. 1449–1465 (2005)

    Google Scholar 

  26. Seco, N.: Computational Models of Similarity in Lexical Ontologies. Master’s thesis, University College Dublin (2005)

    Google Scholar 

  27. Seco, N., Veale, T., Hayes, J.: An Intrinsic Information Content Metric for Semantic Similarity in WordNet. In: Proc. of ECAI 2004, pp. 1089–1090 (2004)

    Google Scholar 

  28. Shannon, C.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  29. Tversky, A.: Features of similarity. Psychological Review 84(2), 327–352 (1977)

    Article  Google Scholar 

  30. Zavaracky, A.: Glossary-Based Semantic Similarity in the WordNet Ontology. Master’s thesis, University College Dublin (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pirró, G., Seco, N. (2008). Design, Implementation and Evaluation of a New Semantic Similarity Metric Combining Features and Intrinsic Information Content. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems: OTM 2008. OTM 2008. Lecture Notes in Computer Science, vol 5332. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88873-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88873-4_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88872-7

  • Online ISBN: 978-3-540-88873-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics