skip to main content
10.1145/2611040.2611055acmotherconferencesArticle/Chapter ViewAbstractPublication PageswimsConference Proceedingsconference-collections
research-article

Creating a Similarity Graph from WordNet

Authors Info & Claims
Published:02 June 2014Publication History

ABSTRACT

The paper addresses the problem of modeling the relationship between the words in the English language using a similarity graph. The mathematical model stores data about the strength of the relationship between words expressed as a decimal number. Both structured data from WordNet, such as that the word "canine" is a hypernym (i.e., kind of) of the word "dog", and textual descriptions, such as that the definition of the word "dog" is: "a member of the genus Canis that has been domesticated by man since prehistoric times", are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of words using our software that uses the graph with results of studies that are performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research.

References

  1. OWL Web Ontology Language Guide. http://www.w3.org/TR/owl-guide/.Google ScholarGoogle Scholar
  2. D. Bollegala, Y. Matsuo, and M. Ishizuka. A Relational Model of Semantic Similarity Between Words Using Automatically Extracted Lexical Pattern Clusters from Web. Conference on Empirical Methods in Natural Language Processing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Burnard. Reference Guide for the British National Corpus (XML Edition). http://www.natcorp.ox.ac.uk, 2007.Google ScholarGoogle Scholar
  4. R. L. Cilibrasi and P. M. Vitanyi. The Google Similarity Distance. IEEE ITSOC Inforamtion Theory Workshop, 2005.Google ScholarGoogle Scholar
  5. L. Finkelstein, E. Gabrilovich, Y. Matias, E. Rivlin, Z. Solan, G. Wolfman, and E. Ruppin. Placing Search in Context: The Concept Revisited. ACM Transactions on Information Systems, 20(1):116--131, January 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Fox. Lexical Analysis and Stoplists. Information Retrieval: Data Structures and Algorithms, pages 102--130, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Frakes. Stemming Algorithms. Information Retrieval: Data Structures and Algorithms, pages 131--160, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. Fellbaum, pages 305--332, 1998.Google ScholarGoogle Scholar
  9. M. Jarmasz. Roget's Thesaurus as a Lexical Resource for Natural Language Processing. Master's thesis, University of Ottawa, 1993.Google ScholarGoogle Scholar
  10. G. Jeh and J. Widom. SimRank: A Measure of Structural-context Similarity. Proceedings of the Eight ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 538--543, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Jiang and D. Conrath. Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy. Proceedings on International Conference on Research in Computational Linguistics, pages 19--33, 1997.Google ScholarGoogle Scholar
  12. K. Jones. "a statistical interpretation of term specificity and its application in retrieval". Journal of Documentation, 28(1):11--21, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Knappe, H. Bulskov, and T. Andreasen. Similarity Graphs. Fourteenth International Symposium on Foundations of Intelligent Systems, 2003.Google ScholarGoogle Scholar
  14. S. Kulkami and D. Caragea. Computation of the Semantic Relatedness Between Words Using Concept Clouds. International Conference of Knowledge Discovery and Information Retrieval, 2009.Google ScholarGoogle Scholar
  15. C. Leacock and M. Chodorow. Combining Local Context and WordNet Similarity for Word Sense Identification. WordNet: An electronic lexical database, pages 265--283, 1998.Google ScholarGoogle Scholar
  16. D. Lin. An Information-theoretic Definition of Similarity. Proceedings of the Fifteenth International Conference on Machine Learning, pages 296--304, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. B. MacQueen. Some Methods for classification and Analysis of Multivariate Observations. Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, page 281Ű297, 1967.Google ScholarGoogle Scholar
  18. M. F. Porter. An Algorithm for Suffix Stripping. Readings in Information Retrieval, pages 313--316, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Miller and W. Charles. Contextual Correlates of Semantic Similarity. Language and Congnitive Processing, 6(1):1--28, 1991.Google ScholarGoogle Scholar
  20. G. A. Miller. WordNet: A Lexical Database for English. Communications of the ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Oracle. Berkeley DB. http://www.oracle.com.Google ScholarGoogle Scholar
  22. R. Pan, Z. Ding, Y. Yu, and Y. Peng. A Bayesian Network Approach to Ontology Mapping. Proceedings of the Fourth International Semantic Web Conference, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Pearl. Bayesian Networks: A Model of Self-Activated Memory for Evidential Reasoning. Proceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA., page 329Ű334, 1985.Google ScholarGoogle Scholar
  24. P. Resnik. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. International Joint Conference on Artificial Intelligence, pages 448--453, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. R. Rada, H. Mili, E. Bickness, and M. Blettner. Development and Application of a Metric on Semantic Nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17--30, 1989.Google ScholarGoogle Scholar
  26. Q. Rajput and S. Haider. Use of Bayesian Networks in Information Extraction from Unstructured Data Sources. Proceedings of International Conference on Ontological and Semantic Engineering, pages 325--331, 2009.Google ScholarGoogle Scholar
  27. Simone Paolo Ponzetto and Michael Strube. Deriving a Large Scale Taxonomy from Wikipedia. 22nd International conference on Artificial intelligence, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. Sirin and B. Parsia. SPARQL-DL: SPARQL Query for OWL-DL. 3rd OWL: Experiences and Directions Workshop (OWLED), 2007.Google ScholarGoogle Scholar
  29. B. Spell. Java API for WordNet Searching (JAWS). http://lyle.smu.edu/tspell/jaws/index.html, 2009.Google ScholarGoogle Scholar
  30. L. Stanchev. Building Semantic Corpus from WordNet. The First International Workshop on the role of Semantic Web in Literature-Based Discovery, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Stanchev. Similarity Software. http://softbase.ipfw.edu:8080/Similarity, 2012.Google ScholarGoogle Scholar
  32. M. Steyvers and J. Tenenbaum. The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth. Cognitive Science, 29(1):41--78, 2005.Google ScholarGoogle Scholar
  33. M. Strube and S. P. Ponzetto. Wikirelate! Computing Semantic Relatedness using Wikipedia. Association for the Advancement of Artificial Intelligence Conference, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Webber and I. Robinson. Graph Databases. O'Reilly, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Z. Wu and M. Palmer. Verb semantics and lexcial selection. Annual Meeting of the Association for Computational Linguistics, pages 133--138, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. D. Yang and D. M. Powers. Measureing Semantic Similarity in the Taxonomy of WordNet. Australian Computer Science Conference, pages 315--322, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Creating a Similarity Graph from WordNet

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WIMS '14: Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14)
      June 2014
      506 pages
      ISBN:9781450325387
      DOI:10.1145/2611040

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 June 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited

      Acceptance Rates

      WIMS '14 Paper Acceptance Rate41of90submissions,46%Overall Acceptance Rate140of278submissions,50%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader