skip to main content
10.1145/2505515.2505567acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Computing term similarity by large probabilistic isA knowledge

Authors Info & Claims
Published:27 October 2013Publication History

ABSTRACT

Computing semantic similarity between two terms is essential for a variety of text analytics and understanding applications. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents. Given two terms, we map them into the concept space, and compare their similarity there. Furthermore, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Extensive studies demonstrate that our approach can accurately compute the semantic similarity between terms with MWEs and ambiguity, and significantly outperforms 12 competing methods.

Skip Supplemental Material Section

Supplemental Material

References

  1. http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/.Google ScholarGoogle Scholar
  2. http://wn-similarity.sourceforge.net/.Google ScholarGoogle Scholar
  3. http://www.math.uwo.ca/~mdawes/courses/344/kuhn-munkres.html.Google ScholarGoogle Scholar
  4. http://www.codeproject.com/Articles/11835/Word-Net-based-semantic-similarity-measurement.Google ScholarGoogle Scholar
  5. E. Agirre, M. Cuadros, G. Rigau, and A. Soroa. Exploring knowledge bases for similarity. In Proceedings of LREC'10, pages 373--377, 2010.Google ScholarGoogle Scholar
  6. E. Agirre and A. Soroa. Personalizing pagerank for word sense disambiguation. In Proceedings of EACL'09, pages 33--41, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Agirre, A. Soroa, E. Alfonseca, K. Hall, J. Kravalova, and M. Pasca. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of NAACL'09, pages 19--27, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Alvarez and S. Lim. A graph modeling of semantic similarity between words. In Proceedings of the Conference on Semantic Computing, pages 355--362, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Banerjee and T. Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Proceedings of CICLING'02, pages 136--145, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Batet, D. Sánchez, and A. Valls. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics, 44(1):118--125, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Bollegala, Y. Matsuo, and M. Ishizuka. A web search engine-based approach to measure semantic similarity between words. IEEE TKDE, 23:977--990, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32:13--47, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Chen, M. Lin, and Y. Wei. Novel association measures using web search with double checking. In Proceedings of the COLING/ACL 2006, pages 1009--1016, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Q. Do, D. Roth, M. Sammons, Y. Tu, and V. Vydiswaran. Robust, light-weight approaches to compute lexical similarity. Technical report, 2009.Google ScholarGoogle Scholar
  15. M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING'92, pages 539--545, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. In WordNet: An Electronic Lexical Database, pages 305--332, 1998.Google ScholarGoogle Scholar
  17. J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, pages 19--33, 1997.Google ScholarGoogle Scholar
  18. D. Lin. An information-theoretic definition of similarity. In Proceedings of ICML'98, pages 296--304, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6:1--28, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  20. G. A. Miller. WordNet: A lexical database for english. Commun. ACM, 38(11):39--41, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. W. Moore. An intoductory tutorial on kd-trees. Technical report, 1991.Google ScholarGoogle Scholar
  22. T. Pedersen, S. V. S. Pakhomov, S. Patwardhan, and C. G. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Rada, H. Mili, E. Bichnell, and M. Blettner. Development and application of a metric on semanticnets. IEEE Transactions on Systems, Man and Cybernetics, 9:17--30, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  24. K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of WWW'11, pages 337--346, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of IJCAI'95, pages 448--453, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633, 1965. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. D. Sánchez, M. Batet, and D. Isern. Ontology-based information content computation. Knowledge-Based Systems, 24:297--303, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. N. Seco, T. Veale, and J. Hayes. An intrinsic information content metric for semantic similarity in wordnet. In Proceedings of ECAI'04, pages 1089--1090, 2004.Google ScholarGoogle Scholar
  29. Y. Wang, H. Li, H. Wang, and K. Q. Zhu. Concept-based web search. In ER, pages 449--462, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In Proceedings of SIGMOD'12, pages 481--492, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Computing term similarity by large probabilistic isA knowledge

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
            October 2013
            2612 pages
            ISBN:9781450322638
            DOI:10.1145/2505515

            Copyright © 2013 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 October 2013

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%

            Upcoming Conference

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader