Skip to main content

Exploiting User Queries for Search Result Clustering

  • Conference paper
Web Information Systems Engineering – WISE 2013 (WISE 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8180))

Included in the following conference series:

Abstract

Search Result Clustering (SRC) groups the results of a user query in such a way that each cluster represents a set of related results. To be useful to the user, the different cluster should contain the results corresponding to different possible meanings of the user query and the cluster labels should reflect these meanings. However, existing SRC algorithms often ignore the user query and group the results based just on the similarity of search results. This can lead to two problems: low quality cluster, where the results within a single cluster are related to different meanings of the query; and poor cluster labels, where the label of the cluster does not reflect the query meaning associated with the results in the cluster.

This paper presents a new SRC algorithm called QSC that exploits the user query and uses both syntactic and semantic features of the search results to construct clusters and labels. Experiments show that the query senses are good candidates for the cluster labels and the algorithm can lead to high quality cluster and more semantically meaningful labels than other state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bernardini, A., Carpineto, C., D’Amico, M.: Full-subtopic retrieval with keyphrase-based search results clustering. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies, WI-IAT 2009, vol. 1, pp. 206–213. IET (2009)

    Google Scholar 

  2. Biemann, C.: Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing, pp. 73–80. Association for Computational Linguistics (2006)

    Google Scholar 

  3. Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys (CSUR) 41(3), 17 (2009)

    Article  Google Scholar 

  5. Carpineto, C., Romano, G.: Ambient dataset (2008)

    Google Scholar 

  6. Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, pp. 172–178. IEEE (2005)

    Google Scholar 

  7. Di Marco, A., Navigli, R.: Clustering web search results with maximum spanning trees. In: Pirrone, R., Sorbello, F. (eds.) AI*IA 2011. LNCS, vol. 6934, pp. 201–212. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Di Marco, A., Navigli, R.: Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 1–76 (just accepted, 2013)

    Google Scholar 

  9. Dorow, B., Widdows, D., Ling, K., Eckmann, J.-P., Sergi, D., Moses, E.: Using curvature and markov clustering in graphs for lexical acquisition and word sense discrimination. arXiv preprint cond-mat/0403693 (2004)

    Google Scholar 

  10. Hearst, M., Pedersen, J.: Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76–84. ACM (1996)

    Google Scholar 

  11. Hubert, L., Arabie, P.: Comparing partitions. Journal of Classification 2(1), 193–218 (1985)

    Article  Google Scholar 

  12. Jabeen, S., Gao, X., Andreae, P.: Harnessing wikipedia semantics for computing contextual relatedness. In: Anthony, P., Ishizuka, M., Lukose, D. (eds.) PRICAI 2012. LNCS, vol. 7458, pp. 861–865. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  13. Meilă, M.: Comparing clusterings–an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  14. Meiyappan, Y., Iyengar, N.C.S.N., Kannan, A., Suyoto, Y.D., Suselo, T., Prasetyaningrum, T., Tlili, R., Slimani, Y., Dufreche, S., Zappi, M., et al.: Srcluster: Web clustering engine based on wikipedia. International Journal of Advanced Science and Technology 39(1), 1–18 (2012)

    Google Scholar 

  15. Milne, D., Witten, I.H.: An open-source toolkit for mining wikipedia. Artificial Intelligence (2012)

    Google Scholar 

  16. Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 116–126. Association for Computational Linguistics (2010)

    Google Scholar 

  17. Osiriski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference held in Zakopane, Poland, p. 359 (2004)

    Google Scholar 

  18. Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to data mining. WP Co. (2006)

    Google Scholar 

  19. Pirolli, P., Schank, P., Hearst, M., Diehl, C.: Scatter/gather browsing communicates the topic structure of a very large text collection. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 213–220. ACM (1996)

    Google Scholar 

  20. Rosenberg, A., Hirschberg, J.: V-measure: A conditional entropy-based external cluster evaluation measure. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), vol. 410, p. 420 (2007)

    Google Scholar 

  21. Salton, G., McGill, M.J.: Introduction to modern information retrieval (1986)

    Google Scholar 

  22. Véronis, J.: Hyperlex: lexical cartography for information retrieval. Computer Speech & Language 18(3), 223–252 (2004)

    Article  Google Scholar 

  23. Zamir, O., Etzioni, O., Madani, O., Karp, R.: Fast and intuitive clustering of web documents. In: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, pp. 287–290. MIT Press (1997)

    Google Scholar 

  24. Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 10–17. ACM (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wahid, A., Gao, X., Andreae, P. (2013). Exploiting User Queries for Search Result Clustering. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds) Web Information Systems Engineering – WISE 2013. WISE 2013. Lecture Notes in Computer Science, vol 8180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41230-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41230-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41229-5

  • Online ISBN: 978-3-642-41230-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics