Abstract
Web-based association measure aims to evaluate the semantic similarity between two queries (i.e. words or entities) by leveraging the search results returned by search engines. Existing web-relevance similarity measure usually considers all search results for a query as a coarse-grained single topic and measures the similarity between the term vectors constructed by concatenating all search results into a single document for each query. This paper proposes a novel association measure named WSRCM based on web search results clustering and matching to evaluate the semantic similarity between two queries at a fine-grained level. WSRCM first discovers the subtopics in the search results for each query and then measures the consistency between the sets of subtopics for two queries. Each subtopic for a query is expected to describe a unique facet of the query, and two queries sharing more subtopics are deemed more semantically related. Experimental results demonstrate the encouraging performance of the proposed measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of WWW 2007 (2007)
Chen, H.-H., Lin, M.-S., Wei, Y.-C.: Novel association measures using web search with double checking. In: Proceedings of COLING-ACL 2006 (2006)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Quart. 2, 83–97 (1955)
Matsuo, Y., Sakaki, T., Uchiyama, K., Ishizuka, M.: Graph-based word clustering using web search engine. In: Proc. of EMNLP 2006 (2006)
Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. In: Proceedings of AAAI 2006 (2006)
Miller, G., Charles, W.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1998)
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Indust. Appl. Math. 5, 32–38 (1957)
Rubenstein, H., Goodenough, J.B.: Contextual Correlates of Synonymy. Communications of the ACM 8(10), 627–633 (1965)
Sahami, M., Heilman, T.: A web-based kernel function for measuring the similarity of short text snippets. In: Proc. of WWW 2006 (2006)
Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency, vol. A. Springer, Berlin (2003)
Yih, W.-T., Meek, C.: Improving similarity measures for short segments of text. In: Proceedings of AAAI 2007 (2007)
Zamir, O., Etzioni, O.: Grouper: A dynamic clustering interface to web search results. In: Proceedings of WWW 1999 (1999)
Zeng, H.-J., He, Q.-C., Chen, Z., Ma, W.-Y.: Learning to cluster web search results. In: Proceedings of SIGIR 2004 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, X., Xiao, J. (2009). Towards a Novel Association Measure via Web Search Results Mining. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_83
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)