research-article

Selectively diversifying web search results

Authors:
Rodrygo L.T. Santos

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

,
Craig Macdonald

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

,
Iadh Ounis

University of Glasgow, Glasgow, United Kingdom

University of Glasgow, Glasgow, United Kingdom
View Profile

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementOctober 2010Pages 1179–1188https://doi.org/10.1145/1871437.1871586

Published:26 October 2010Publication History

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

Pages 1179–1188

ABSTRACT

Search result diversification is a natural approach for tackling ambiguous queries. Nevertheless, not all queries are equally ambiguous, and hence different queries could benefit from different diversification strategies. A more lenient or more aggressive diversification strategy is typically encoded by existing approaches as a trade-off between promoting relevance or diversity in the search results. In this paper, we propose to learn such a trade-off on a per-query basis. In particular, we examine how the need for diversification can be learnt for each query - given a diversification approach and an unseen query, we predict an effective trade-off between relevance and diversity based on similar previously seen queries. Thorough experiments using the TREC ClueWeb09 collection show that our selective approach can significantly outperform a uniform diversification for both classical and state-of-the-art diversification approaches.

References

R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In WSDM, pages 5--14, 2009. Google ScholarDigital Library
D. W. Aha, D. F. Kibler, and M. K. Albert. Instance-based learning algorithms. Machine Learning, 6:37--66, 1991. Google ScholarDigital Library
G. Amati, E. Ambrosi, M. Bianchi, C. Gaibisso, and G. Gambosi. FUB, IASI-CNR and University of Tor Vergata at TREC 2007 Blog track. In TREC, 2007.Google Scholar
A. Broder. A taxonomy of Web search. SIGIR Forum, 36(2):3--10, 2002. Google ScholarDigital Library
J. Carbonell and J. Goldstein. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In SIGIR, pages 335--336, 1998. Google ScholarDigital Library
D. Carmel and E. Yom-Tov. Estimating the query difficulty for information retrieval. Synthesis Lectures on Information Concepts, Retrieval, and Services, 2(1):1--89, 2010. Google ScholarDigital Library
B. Carterette and P. Chandar. Probabilistic models of ranking novel documents for faceted topic retrieval. In CIKM, pages 1287--1296, 2009. Google ScholarDigital Library
H. Chen and D. R. Karger. Less is more: probabilistic models for retrieving fewer relevant documents. In SIGIR, pages 429--436, 2006. Google ScholarDigital Library
C. L. A. Clarke, N. Craswell, and I. Soboroff. Overview of the TREC 2009 Web track. In TREC, 2009.Google Scholar
C. L. A. Clarke, M. Kolla, G. V. Cormack, O. Vechtomova, A. Ashkan, S. Büttcher, and I. MacKinnon. Novelty and diversity in information retrieval evaluation. In SIGIR, pages 659--666, 2008. Google ScholarDigital Library
P. Clough, M. Sanderson, M. Abouammoh, S. Navarro, and M. Paramita. Multiple approaches to analysing query diversity. In SIGIR, pages 734--735, 2009. Google ScholarDigital Library
N. Craswell and D. Hawking. Overview of the TREC 2004 Web track. In TREC, 2004.Google Scholar
S. Cronen-Townsend, Y. Zhou, and W. B. Croft. Predicting query performance. In SIGIR, pages 299--306, 2002. Google ScholarDigital Library
E. Gabrilovich, A. Smola, and N. Tishby, editors. SIGIR Workshop on Feature Generation and Selection for IR, 2010.Google Scholar
X. Geng, T.-Y. Liu, T. Qin, A. Arnold, H. Li, and H.-Y. Shum. Query dependent ranking using k-nearest neighbor. In SIGIR, pages 115--122, 2008. Google ScholarDigital Library
S. Gollapudi and A. Sharma. An axiomatic approach for result diversification. In WWW, pages 381--390, 2009. Google ScholarDigital Library
B. He and I. Ounis. Inferring query performance using pre-retrieval predictors. In SPIRE, pages 43--54, 2004.Google Scholar
B. He and I. Ounis. Query performance prediction. Inf. Syst., 31(7):585--594, 2006. Google ScholarDigital Library
I.-H. Kang and G. Kim. Query type classification for Web document retrieval. In SIGIR, pages 64--71, 2003. Google ScholarDigital Library
S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing. Science, 220(4598):671--680, 1983.Google ScholarCross Ref
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artif. Intell., 97(1-2):273--324, 1997. Google ScholarDigital Library
S. M. Omohundro. Five balltree construction algorithms. Technical Report TR-89-063, International Computer Science Institute, 1989.Google Scholar
I. Ounis, G. Amati, V. Plachouras, B. He, C. Macdonald, and C. Lioma. Terrier: a high performance and scalable information retrieval platform. In OSIR, 2006.Google Scholar
J. Peng, C. Macdonald, and I. Ounis. Learning to select a ranking function. In ECIR, pages 114--126, 2010. Google ScholarDigital Library
S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu, and M. Gatford. Okapi at TREC-3. In TREC, 1994.Google Scholar
M. Sanderson. Ambiguous queries: Test collections need more sense. In SIGIR, pages 499--506, 2008. Google ScholarDigital Library
R. L. T. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for Web search result diversification. In WWW, pages 881--890, 2010. Google ScholarDigital Library
R. L. T. Santos, C. Macdonald, and I. Ounis. Voting for related entities. In RIAO, 2010. Google ScholarDigital Library
R. L. T. Santos, J. Peng, C. Macdonald, and I. Ounis. Explicit search result diversification through sub-queries. In ECIR, pages 87--99, 2010. Google ScholarDigital Library
F. Silvestri. Mining query logs: turning search usage data into knowledge. Found. Trends Inf. Retr., 4(1-2):1--174, 2010. Google ScholarDigital Library
R. Song, Z. Luo, J.-Y. Nie, Y. Yu, and H.-W. Hon. Identification of ambiguous queries in Web search. Inf. Process. Manage., 45(2):216--229, 2009. Google ScholarDigital Library
R. Song, J.-R. Wen, S. Shi, G. Xin, T.-Y. Liu, T. Qin, X. Zheng, J. Zhang, G.-R. Xue, and W.-Y. Ma. Microsoft Research Asia at Web track and Terabyte track of TREC 2004. In TREC, 2004.Google Scholar
K. Spärck-Jones, S. E. Robertson, and M. Sanderson. Ambiguous requests: implications for retrieval tests, systems and theories. SIGIR Forum, 41(2):8--17, 2007. Google ScholarDigital Library
J. Wang and J. Zhu. Portfolio theory of information retrieval. In SIGIR, pages 115--122, 2009. Google ScholarDigital Library
Y. Wang and E. Agichtein. Query ambiguity revisited: clickthrough measures for distinguishing informational and ambiguous queries. In NAACL-HLT 2010, pages 361--364, 2010. Google ScholarDigital Library
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools. Morgan Kaufmann, 2005. Google ScholarDigital Library
E. Yom-Tov, S. Fine, D. Carmel, and A. Darlow. Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval. In SIGIR, pages 512--519, 2005. Google ScholarDigital Library
C. Zhai, W. W. Cohen, and J. Lafferty. Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In SIGIR, pages 10--17, 2003. Google ScholarDigital Library
Y. Zhou and W. B. Croft. Query performance prediction in Web search environments. In SIGIR, pages 543--550, 2007. Google ScholarDigital Library

Index Terms

Selectively diversifying web search results
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Exploiting query reformulations for web search result diversification
WWW '10: Proceedings of the 19th international conference on World wide web

When a Web user's underlying information need is not clearly specified from the initial query, an effective approach is to diversify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search result ...
Read More
Diversifying search results
WSDM '09: Proceedings of the Second ACM International Conference on Web Search and Data Mining

We study the problem of answering ambiguous web queries in a setting where there exists a taxonomy of information, and that both queries and documents may belong to more than one category according to this taxonomy. We present a systematic approach to ...
Read More
Intent-aware search result diversification
SIGIR '11: Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval

Search result diversification has gained momentum as a way to tackle ambiguous queries. An effective approach to this problem is to explicitly model the possible aspects underlying a query, in order to maximise the estimated relevance of the retrieved ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management
October 2010
2036 pages
ISBN:9781450300995
DOI:10.1145/1871437
General Chair:
Jimmy Huang
York University, Canada
,
Program Chairs:
Nick Koudas
University of Toronto, Canada
,
Gareth Jones
Dublin City University, Ireland
,
Xindong Wu
University of Vermont, USA
,
Kevyn Collins-Thompson
Microsoft Research, USA
,
Aijun An
York University, Canada
Copyright © 2010 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 October 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
diversity
feature selection
machine learning
relevance
selective retrieval
web search
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,861of8,427submissions,22%
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 558
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Selectively diversifying web search results

CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Exploiting query reformulations for web search result diversification

Diversifying search results

Intent-aware search result diversification