skip to main content
10.1145/1871437.1871586acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Selectively diversifying web search results

Published:26 October 2010Publication History

ABSTRACT

Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more aggressive diversification strategy is typically encoded by existing approaches as a trade-off between promoting relevance or diversity in the search results. In this paper, we propose to learn such a trade-off on a per-query basis. In particular, we examine how the need for diversification can be learnt for each query - given a diversification approach and an unseen query, we predict an effective trade-off between relevance and diversity based on similar previously seen queries. Thorough experiments using the TREC ClueWeb09 collection show that our selective approach can significantly outperform a uniform diversification for both classical and state-of-the-art diversification approaches.

References

  1. R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. W. Aha, D. F. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6:37--66, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G. Gambosi. FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog track. In TREC, 2007.Google ScholarGoogle Scholar
  4. A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Carmel and E. Yom-Tov. Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1):1--89, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In CIKM, pages 1287--1296, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In TREC, 2009.Google ScholarGoogle Scholar
  10. C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. Multiple approaches to analysing query diversity. In SIGIR, pages 734--735, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. N. Craswell and D. Hawking. Overview of the TREC 2004 Web track. In TREC, 2004.Google ScholarGoogle Scholar
  13. S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, pages 299--306, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. E. Gabrilovich, A. Smola, and N. Tishby, editors. SIGIR Workshop on Feature Generation and Selection for IR, 2010.Google ScholarGoogle Scholar
  15. X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, pages 115--122, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, pages 43--54, 2004.Google ScholarGoogle Scholar
  18. B. He and I. Ounis. Query performance prediction. Inf. Syst., 31(7):585--594, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. I.-H. Kang and G. Kim. Query type classification for Web document retrieval. In SIGIR, pages 64--71, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Kohavi and G. H. John. Wrappers for feature subset selection. Artif. Intell., 97(1-2):273--324, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. M. Omohundro. Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute, 1989.Google ScholarGoogle Scholar
  23. I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: a high performance and scalable information retrieval platform. In OSIR, 2006.Google ScholarGoogle Scholar
  24. J. Peng, C. Macdonald, and I. Ounis. Learning to select a ranking function. In ECIR, pages 114--126, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC, 1994.Google ScholarGoogle Scholar
  26. M. Sanderson. Ambiguous queries: Test collections need more sense. In SIGIR, pages 499--506, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for Web search result diversification. In WWW, pages 881--890, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. R. L. T. Santos, C. Macdonald, and I. Ounis. Voting for related entities. In RIAO, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, pages 87--99, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. F. Silvestri. Mining query logs: turning search usage data into knowledge. Found. Trends Inf. Retr., 4(1-2):1--174, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon. Identification of ambiguous queries in Web search. Inf. Process. Manage., 45(2):216--229, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. R. Song, J.-R. Wen, S. Shi, G. Xin, T.-Y. Liu, T. Qin, X. Zheng, J. Zhang, G.-R. Xue, and W.-Y. Ma. Microsoft Research Asia at Web track and Terabyte track of TREC 2004. In TREC, 2004.Google ScholarGoogle Scholar
  33. K. Spärck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Wang and J. Zhu. Portfolio theory of information retrieval. In SIGIR, pages 115--122, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Y. Wang and E. Agichtein. Query ambiguity revisited: clickthrough measures for distinguishing informational and ambiguous queries. In NAACL-HLT 2010, pages 361--364, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools. Morgan Kaufmann, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, pages 512--519, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Y. Zhou and W. B. Croft. Query performance prediction in Web search environments. In SIGIR, pages 543--550, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Selectively diversifying web search results

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
      October 2010
      2036 pages
      ISBN:9781450300995
      DOI:10.1145/1871437

      Copyright © 2010 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 October 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader