research-article

Computing term similarity by large probabilistic isA knowledge

Authors:
Peipei Li

Hefei University of Technology, Hefei city, China

Hefei University of Technology, Hefei city, China
View Profile

,
Haixun Wang

Microsoft Research Asia, Bei Jing, China

Microsoft Research Asia, Bei Jing, China
View Profile

,
Kenny Q. Zhu

Shanghai Jiao Tong University, Shang Hai, China

Shanghai Jiao Tong University, Shang Hai, China
View Profile

,
Zhongyuan Wang

Renmin University of China, Microsoft Research Asia, Bei Jing, China

Renmin University of China, Microsoft Research Asia, Bei Jing, China
View Profile

,
Xindong Wu

University of Vermont, Vermont, USA

University of Vermont, Vermont, USA
View Profile

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementOctober 2013Pages 1401–1410https://doi.org/10.1145/2505515.2505567

Published:27 October 2013Publication History

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

Pages 1401–1410

ABSTRACT

Computing semantic similarity between two terms is essential for a variety of text analytics and understanding applications. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Therefore, we propose a lightweight and effective approach for semantic similarity using a large scale semantic network automatically acquired from billions of web documents. Given two terms, we map them into the concept space, and compare their similarity there. Furthermore, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Extensive studies demonstrate that our approach can accurately compute the semantic similarity between terms with MWEs and ambiguity, and significantly outperforms 12 competing methods.

Supplemental Material

Available for Download

zip

km0841.zip (2.4 MB)

All figures involved in the source file of CIKM841-Li.tex.

References

http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/.Google Scholar
http://wn-similarity.sourceforge.net/.Google Scholar
http://www.math.uwo.ca/~mdawes/courses/344/kuhn-munkres.html.Google Scholar
http://www.codeproject.com/Articles/11835/Word-Net-based-semantic-similarity-measurement.Google Scholar
E. Agirre, M. Cuadros, G. Rigau, and A. Soroa. Exploring knowledge bases for similarity. In Proceedings of LREC'10, pages 373--377, 2010.Google Scholar
E. Agirre and A. Soroa. Personalizing pagerank for word sense disambiguation. In Proceedings of EACL'09, pages 33--41, 2009. Google ScholarDigital Library
E. Agirre, A. Soroa, E. Alfonseca, K. Hall, J. Kravalova, and M. Pasca. A study on similarity and relatedness using distributional and wordnet-based approaches. In Proceedings of NAACL'09, pages 19--27, 2009. Google ScholarDigital Library
M. Alvarez and S. Lim. A graph modeling of semantic similarity between words. In Proceedings of the Conference on Semantic Computing, pages 355--362, 2007. Google ScholarDigital Library
S. Banerjee and T. Pedersen. An adapted lesk algorithm for word sense disambiguation using wordnet. In Proceedings of CICLING'02, pages 136--145, 2002. Google ScholarDigital Library
M. Batet, D. Sánchez, and A. Valls. An ontology-based measure to compute semantic similarity in biomedicine. Journal of Biomedical Informatics, 44(1):118--125, 2011. Google ScholarDigital Library
D. Bollegala, Y. Matsuo, and M. Ishizuka. A web search engine-based approach to measure semantic similarity between words. IEEE TKDE, 23:977--990, 2011. Google ScholarDigital Library
A. Budanitsky and G. Hirst. Evaluating wordnet-based measures of lexical semantic relatedness. Computational Linguistics, 32:13--47, 2006. Google ScholarDigital Library
H. Chen, M. Lin, and Y. Wei. Novel association measures using web search with double checking. In Proceedings of the COLING/ACL 2006, pages 1009--1016, 2006. Google ScholarDigital Library
Q. Do, D. Roth, M. Sammons, Y. Tu, and V. Vydiswaran. Robust, light-weight approaches to compute lexical similarity. Technical report, 2009.Google Scholar
M. A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of COLING'92, pages 539--545, 1992. Google ScholarDigital Library
G. Hirst and D. St-Onge. Lexical chains as representations of context for the detection and correction of malapropisms. In WordNet: An Electronic Lexical Database, pages 305--332, 1998.Google Scholar
J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, pages 19--33, 1997.Google Scholar
D. Lin. An information-theoretic definition of similarity. In Proceedings of ICML'98, pages 296--304, 1998. Google ScholarDigital Library
G. Miller and W. Charles. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6:1--28, 1998.Google ScholarCross Ref
G. A. Miller. WordNet: A lexical database for english. Commun. ACM, 38(11):39--41, 1995. Google ScholarDigital Library
A. W. Moore. An intoductory tutorial on kd-trees. Technical report, 1991.Google Scholar
T. Pedersen, S. V. S. Pakhomov, S. Patwardhan, and C. G. Chute. Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics, 40(3):288--299, 2007. Google ScholarDigital Library
R. Rada, H. Mili, E. Bichnell, and M. Blettner. Development and application of a metric on semanticnets. IEEE Transactions on Systems, Man and Cybernetics, 9:17--30, 1989.Google ScholarCross Ref
K. Radinsky, E. Agichtein, E. Gabrilovich, and S. Markovitch. A word at a time: Computing word relatedness using temporal semantic analysis. In Proceedings of WWW'11, pages 337--346, 2011. Google ScholarDigital Library
P. Resnik. Using information content to evaluate semantic similarity in a taxonomy. In Proceedings of IJCAI'95, pages 448--453, 1995. Google ScholarDigital Library
H. Rubenstein and J. B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627--633, 1965. Google ScholarDigital Library
D. Sánchez, M. Batet, and D. Isern. Ontology-based information content computation. Knowledge-Based Systems, 24:297--303, 2011. Google ScholarDigital Library
N. Seco, T. Veale, and J. Hayes. An intrinsic information content metric for semantic similarity in wordnet. In Proceedings of ECAI'04, pages 1089--1090, 2004.Google Scholar
Y. Wang, H. Li, H. Wang, and K. Q. Zhu. Concept-based web search. In ER, pages 449--462, 2012. Google ScholarDigital Library
W. Wu, H. Li, H. Wang, and K. Q. Zhu. Probase: a probabilistic taxonomy for text understanding. In Proceedings of SIGMOD'12, pages 481--492, 2012. Google ScholarDigital Library

Index Terms

Computing term similarity by large probabilistic isA knowledge

Recommendations

A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity
Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, ...
Read More
A new path based hybrid measure for gene ontology similarity

Gene Ontology (GO) consists of a controlled vocabulary of terms, annotating a gene or gene product, structured in a directed acyclic graph. In the graph, semantic relations connect the terms, that represent the knowledge of functional description and ...
Read More
Knowledge-based vector space model for text clustering

This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management
October 2013
2612 pages
ISBN:9781450322638
DOI:10.1145/2505515
General Chairs:
Qi He
LinkedIn, USA
,
Arun Iyengar
IBM T.J. Watson Research Center, USA
,
Program Chairs:
Wolfgang Nejdl
L3S Research Center, Germany
,
Jian Pei
Simon Fraser University, Canada
,
Rajeev Rastogi
Amazon, India
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
clustering
multi-word expression
semantic network
term similarity
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '13 Paper Acceptance Rate143of848submissions,17%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 37
  Total Citations
  View Citations
- 306
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Computing term similarity by large probabilistic isA knowledge

CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge Management

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity

A new path based hybrid measure for gene ontology similarity

Knowledge-based vector space model for text clustering