short-paper

Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures

Authors:

Jimmy LinAuthors Info & Claims

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Pages 997 - 1000

https://doi.org/10.1145/2484028.2484132

Published: 28 July 2013 Publication History

Abstract

This paper examines a multi-stage retrieval architecture consisting of a candidate generation stage, a feature extraction stage, and a reranking stage using machine-learned models. Given a fixed set of features and a learning-to-rank model, we explore effectiveness/efficiency tradeoffs with three candidate generation approaches: postings intersection with SvS, conjunctive query evaluation with WAND, and disjunctive query evaluation with WAND. We find no significant differences in end-to-end effectiveness as measured by NDCG between conjunctive and disjunctive WAND, but conjunctive query evaluation is substantially faster. Postings intersection with SvS, while fast, yields substantially lower end-to-end effectiveness, suggesting that document and term frequencies remain important in the initial ranking stage. These findings show that conjunctive WAND is the best overall candidate generation strategy of those we examined.

References

[1]

D. Arroyuelo, S. González, M. Marin, M. Oyarzún, and T. Suel. To index or not to index: Time-space trade-offs in search engines with positional ranking functions. SIGIR, 2012.

Digital Library

[2]

N. Asadi and J. Lin. Document vector representations for feature extraction in multi-stage document ranking. IRJ, in press, 2012.

[3]

N. Asadi and J. Lin. Fast candidate generation for two-phase document ranking: Postings list intersection with Bloom filters. CIKM, 2012.

Digital Library

[4]

A. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. CIKM, 2003.

Digital Library

[5]

C. Burges. From RankNet to LambdaRank to LambdaMART: An overview. Technical Report MSR-TR-2010-82, Microsoft Research, 2010.

[6]

B. Cambazoglu, H. Zaragoza, O. Chapelle, J. Chen, C. Liao, Z. Zheng, and J. Degenhardt. Early exit optimizations for additive machine learned ranking systems. WSDM, 2010.

Digital Library

[7]

G. Cormack, M. Smucker, and C. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. arXiv:1004.5168v1, 2010.

[8]

J. Culpepper and A. Moffat. Efficient set intersection for inverted indexing. TOIS, 29(1), 2010.

Digital Library

[9]

S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. SIGIR, 2011.

Digital Library

[10]

Y. Ganjisaffar, R. Caruana, and C. Lopes. Bagging gradient-boosted trees for high precision, low variance ranking models. SIGIR, 2011.

Digital Library

[11]

K. J\"arvelin and J. Kek\"al\"ainen. Cumulative gain-based evaluation of IR techniques. TOIS, 20(4):422--446, 2002.

Digital Library

[12]

H. Li. Learning to Rank for Information Retrieval and Natural Language Processing. Morgan & Claypool, 2011.

Digital Library

[13]

T.-Y. Liu. Learning to rank for information retrieval. FnTIR, 3(3):225--331, 2009.

Digital Library

[14]

C. Macdonald, R. Santos, and I. Ounis. The whens and hows of learning to rank for web search. IRJ, in press, 2012.

[15]

D. Metzler. Automatic feature selection in the Markov random field model for information retrieval. CIKM, 2007.

Digital Library

[16]

G. Pass, A. Chowdhury, and C. Torgeson. A picture of search. InfoScale, 2006.

Digital Library

[17]

S. Tatikonda, B. Cambazoglu, and F. Junqueira. Posting list intersection on multicore architectures. SIGIR, 2011.

Digital Library

[18]

N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. WSDM, 2013.

Digital Library

Cited By

Bruch SNardini FIngber ALiberty E(2024)Bridging Dense and Sparse Maximum Inner Product SearchACM Transactions on Information Systems10.1145/366532442:6(1-38)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3665324
Evnine AIoannidis SKalimeris DKalyanaraman SLi WNir ISun WWeinsberg UBaeza-Yates RBonchi F(2024)Achieving a Better Tradeoff in Multi-stage Recommender Systems through PersonalizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671593(4939-4950)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671593
Kumarasinghe ULekssays ASencar HBoughorbel SElvitigala CNakov PQuek TGao DZhou JCardenas A(2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3645000
Show More Cited By

Index Terms

Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures
1. Information systems
  1. Information retrieval

Recommendations

The Limits of Efficiency for Open- and Closed-World Query Evaluation Under Guarded TGDs
PODS'20: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems

Ontology-mediated querying and querying in the presence of constraints are two key database problems where tuple-generating dependencies (TGDs) play a central role. In ontology-mediated querying, TGDs can formalize the ontology and thus derive ...
Evaluating the retrieval effectiveness of web search engines using a representative query sample

Search engine retrieval effectiveness studies are usually small scale, using only limited query samples. Furthermore, queries are selected by the researchers. We address these issues by taking a random representative sample of 1,000 informational and 1,...
Fast Disjunctive Candidate Generation Using Live Block Filtering
WSDM '21: Proceedings of the 14th ACM International Conference on Web Search and Data Mining

A lot of research has focused on the efficiency of search engine query processing, and in particular on disjunctive top-k queries that return the highest scoring k results that contain at least one of the query terms. Disjunctive top-k queries over ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '13: Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

July 2013

1188 pages

ISBN:9781450320344

DOI:10.1145/2484028

General Chairs:
Gareth J.F. Jones
Dublin City University, Ireland
,
Páraic Sheridan
Dublin City University, Ireland
,
Program Chairs:
Diane Kelly
University of North Carolina, Chapel Hill, USA
,
Maarten de Rijke
University of Amsterdam, The Netherlands
,
Tetsuya Sakai
Microsoft Research Asia, China

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 July 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '13

Sponsor:

SIGIR

SIGIR '13: The 36th International ACM SIGIR conference on research and development in Information Retrieval

July 28 - August 1, 2013

Dublin, Ireland

Acceptance Rates

SIGIR '13 Paper Acceptance Rate 73 of 366 submissions, 20%;

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

60
Total Citations
View Citations
466
Total Downloads

Downloads (Last 12 months)20
Downloads (Last 6 weeks)2

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bruch SNardini FIngber ALiberty E(2024)Bridging Dense and Sparse Maximum Inner Product SearchACM Transactions on Information Systems10.1145/366532442:6(1-38)Online publication date: 19-Aug-2024
https://dl.acm.org/doi/10.1145/3665324
Evnine AIoannidis SKalimeris DKalyanaraman SLi WNir ISun WWeinsberg UBaeza-Yates RBonchi F(2024)Achieving a Better Tradeoff in Multi-stage Recommender Systems through PersonalizationProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671593(4939-4950)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671593
Kumarasinghe ULekssays ASencar HBoughorbel SElvitigala CNakov PQuek TGao DZhou JCardenas A(2024)Semantic Ranking for Automated Adversarial Technique Annotation in Security TextProceedings of the 19th ACM Asia Conference on Computer and Communications Security10.1145/3634737.3645000(49-62)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3634737.3645000
Fröbe MMackenzie JMitra BNardini FPotthast MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)ReNeuIR at SIGIR 2024: The Third Workshop on Reaching Efficiency in Neural Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657994(3051-3054)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657994
Kostric IBalog KHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)A Surprisingly Simple yet Effective Multi-Query Rewriting Method for Conversational Passage RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657933(2271-2275)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657933
Wang JZeng ZChen BWang YLiao DLi GWang YXia S(2024)Hugs Bring Double Benefits: Unsupervised Cross-Modal Hashing with Multi-granularity Aligned TransformersInternational Journal of Computer Vision10.1007/s11263-024-02009-7132:8(2765-2797)Online publication date: 18-Feb-2024
https://doi.org/10.1007/s11263-024-02009-7
McKechnie J(2024)Cascading Ranking Pipelines for Sensitivity-Aware SearchAdvances in Information Retrieval10.1007/978-3-031-56069-9_41(331-333)Online publication date: 23-Mar-2024
https://doi.org/10.1007/978-3-031-56069-9_41
Bruch SNardini FIngber ALiberty E(2023)An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse VectorsACM Transactions on Information Systems10.1145/360979742:2(1-43)Online publication date: 8-Nov-2023
https://dl.acm.org/doi/10.1145/3609797
Bruch SGai SIngber A(2023)An Analysis of Fusion Functions for Hybrid RetrievalACM Transactions on Information Systems10.1145/359651242:1(1-35)Online publication date: 20-May-2023
https://dl.acm.org/doi/10.1145/3596512
Bruch SLucchese CNardini F(2023)Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022ACM SIGIR Forum10.1145/3582900.358291656:2(1-14)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1145/3582900.3582916
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten