ABSTRACT
Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more aggressive diversification strategy is typically encoded by existing approaches as a trade-off between promoting relevance or diversity in the search results. In this paper, we propose to learn such a trade-off on a per-query basis. In particular, we examine how the need for diversification can be learnt for each query - given a diversification approach and an unseen query, we predict an effective trade-off between relevance and diversity based on similar previously seen queries. Thorough experiments using the TREC ClueWeb09 collection show that our selective approach can significantly outperform a uniform diversification for both classical and state-of-the-art diversification approaches.
- R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
- D. W. Aha, D. F. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6:37--66, 1991. Google ScholarDigital Library
- G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G. Gambosi. FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog track. In TREC, 2007.Google Scholar
- A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
- J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
- D. Carmel and E. Yom-Tov. Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1):1--89, 2010. Google ScholarDigital Library
- B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In CIKM, pages 1287--1296, 2009. Google ScholarDigital Library
- H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
- C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In TREC, 2009.Google Scholar
- C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
- P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. Multiple approaches to analysing query diversity. In SIGIR, pages 734--735, 2009. Google ScholarDigital Library
- N. Craswell and D. Hawking. Overview of the TREC 2004 Web track. In TREC, 2004.Google Scholar
- S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, pages 299--306, 2002. Google ScholarDigital Library
- E. Gabrilovich, A. Smola, and N. Tishby, editors. SIGIR Workshop on Feature Generation and Selection for IR, 2010.Google Scholar
- X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, pages 115--122, 2008. Google ScholarDigital Library
- S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
- B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, pages 43--54, 2004.Google Scholar
- B. He and I. Ounis. Query performance prediction. Inf. Syst., 31(7):585--594, 2006. Google ScholarDigital Library
- I.-H. Kang and G. Kim. Query type classification for Web document retrieval. In SIGIR, pages 64--71, 2003. Google ScholarDigital Library
- S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.Google ScholarCross Ref
- R. Kohavi and G. H. John. Wrappers for feature subset selection. Artif. Intell., 97(1-2):273--324, 1997. Google ScholarDigital Library
- S. M. Omohundro. Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute, 1989.Google Scholar
- I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: a high performance and scalable information retrieval platform. In OSIR, 2006.Google Scholar
- J. Peng, C. Macdonald, and I. Ounis. Learning to select a ranking function. In ECIR, pages 114--126, 2010. Google ScholarDigital Library
- S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC, 1994.Google Scholar
- M. Sanderson. Ambiguous queries: Test collections need more sense. In SIGIR, pages 499--506, 2008. Google ScholarDigital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for Web search result diversification. In WWW, pages 881--890, 2010. Google ScholarDigital Library
- R. L. T. Santos, C. Macdonald, and I. Ounis. Voting for related entities. In RIAO, 2010. Google ScholarDigital Library
- R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, pages 87--99, 2010. Google ScholarDigital Library
- F. Silvestri. Mining query logs: turning search usage data into knowledge. Found. Trends Inf. Retr., 4(1-2):1--174, 2010. Google ScholarDigital Library
- R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon. Identification of ambiguous queries in Web search. Inf. Process. Manage., 45(2):216--229, 2009. Google ScholarDigital Library
- R. Song, J.-R. Wen, S. Shi, G. Xin, T.-Y. Liu, T. Qin, X. Zheng, J. Zhang, G.-R. Xue, and W.-Y. Ma. Microsoft Research Asia at Web track and Terabyte track of TREC 2004. In TREC, 2004.Google Scholar
- K. Spärck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007. Google ScholarDigital Library
- J. Wang and J. Zhu. Portfolio theory of information retrieval. In SIGIR, pages 115--122, 2009. Google ScholarDigital Library
- Y. Wang and E. Agichtein. Query ambiguity revisited: clickthrough measures for distinguishing informational and ambiguous queries. In NAACL-HLT 2010, pages 361--364, 2010. Google ScholarDigital Library
- I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools. Morgan Kaufmann, 2005. Google ScholarDigital Library
- E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, pages 512--519, 2005. Google ScholarDigital Library
- C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
- Y. Zhou and W. B. Croft. Query performance prediction in Web search environments. In SIGIR, pages 543--550, 2007. Google ScholarDigital Library
Index Terms
- Selectively diversifying web search results
Recommendations
Exploiting query reformulations for web search result diversification
WWW '10: Proceedings of the 19th international conference on World wide webWhen a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result ...
Diversifying search results
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data MiningWe study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to ...
Intent-aware search result diversification
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information RetrievalSearch result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved ...
Comments