research-article

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation

Authors:
Matt Crane

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
J. Shane Culpepper

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
Jimmy Lin

University of Waterloo, Waterloo, ON, Canada

University of Waterloo, Waterloo, ON, Canada
View Profile

,
Joel Mackenzie

RMIT University, Melbourne, Australia

RMIT University, Melbourne, Australia
View Profile

,
Andrew Trotman

University of Otago, Dunedin, New Zealand

University of Otago, Dunedin, New Zealand
View Profile

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data MiningFebruary 2017Pages 201–210https://doi.org/10.1145/3018661.3018726

Published:02 February 2017Publication History

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

Pages 201–210

ABSTRACT

We present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct comparison: such a study has been difficult due to vastly different query evaluation mechanics and index organizations. Our work controls for score quantization, document processing, compression, implementation language, implementation effort, and a number of details, arriving at an empirical evaluation that fairly characterizes the performance of three specific techniques: WAND (DaaT), BMW (DaaT), and JASS (SaaT). Experiments reveal a number of interesting findings. The performance gap between WAND and BMW is not as clear as the literature suggests, and both methods are susceptible to tail queries that may take orders of magnitude longer than the median query to execute. Surprisingly, approximate query evaluation in WAND and BMW does not significantly reduce the risk of these tail queries. Overall, JASS is slightly slower than either WAND or BMW, but exhibits much lower variance in query latencies and is much less susceptible to tail query effects. Furthermore, JASS query latency is not particularly sensitive to the retrieval depth, making it an appealing solution for performance-sensitive applications where bounds on query latencies are desirable.

References

V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR, pages 372--379, 2006. Google ScholarDigital Library
V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Software:\ Practice and Experience, 40 (2): 131--147, 2010.Google Scholar
V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR, pages 35--42, 2001. Google ScholarDigital Library
N. Asadi and J. Lin. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In SIGIR, pages 997--1000, 2013.Google ScholarDigital Library
P. Bailey, A. Moffat, F. Scholer, and P. Thomas. UQV100: A test collection with query variability. In SIGIR, pages 725--728, 2016. Google ScholarDigital Library
P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51 (12): 77--85, 2008. Google ScholarDigital Library
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, pages 426--434, 2003. Google ScholarDigital Library
K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-k processing over compressed lists. In ICDE, pages 709--720, 2011. Google ScholarDigital Library
M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In CIKM, pages 1221--1224, 2013. Google ScholarDigital Library
C. M. Daoud, E. S. de Moura, A. Carvalho, A. S. da Silva, D. Fernandes, and C. Rossi. Fast top-k preserving query processing using two-tier indexes. IP&M, 52 (5): 855--872, 2016. Google ScholarDigital Library
J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56 (2): 74--80, 2013. Google ScholarDigital Library
Dimopoulos, Nepomnyachiy, and Suel]dimopoulos2013optimizingC. Dimopoulos, S. Nepomnyachiy, and T. Suel. Optimizing top-k document retrieval strategies for block-max indexes. In WSDM, pages 113--122, 2013\natexlaba.Google ScholarDigital Library
Dimopoulos, Nepomnyachiy, and Suel]dns13-sigirC. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In SIGIR, pages 723--732, 2013\natexlabb.Google ScholarDigital Library
S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In SIGIR, pages 993--1002, 2011. Google ScholarDigital Library
M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4 (12): 1213--1224, 2011.Google ScholarDigital Library
X.-F. Jia, A. Trotman, and R. O'Keefe. Efficient accumulator initialisation. In ADCS, pages 44--51, 2010.Google Scholar
S. Kim, Y. He, S.-W. Hwang, S. Elnikety, and S. Choi. Delayed-Dynamic-Selective (DDS) prediction for reducing extreme tail latency in web search. In WSDM, pages 7--16, 2015.Google ScholarDigital Library
D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Software:\ Practice and Experience, 45 (1): 1--29, 2015. Google ScholarDigital Library
D. Lemire, L. Boytsov, and N. Kurz. SIMD compression and the intersection of sorted integers. Software:\ Practice and Experience, 46 (6): 723--749, 2016. Google ScholarDigital Library
J. Lin and A. Trotman. Anytime ranking for impact-ordered indexes. In ICTIR, pages 301--304, 2015. Google ScholarDigital Library
J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. Toward reproducible baselines: The Open-Source IR Reproducibility Challenge. In ECIR, pages 408--420, 2016.Google ScholarCross Ref
X. Lu, A. Moffat, and J. S. Culpepper. On the cost of extracting proximity features for term-dependency models. In CIKM, pages 293--302, 2015. Google ScholarDigital Library
X. Lu, A. Moffat, and J. S. Culpepper. The effect of pooling and evaluation depth on ir metrics. IRJ, 19 (4): 416--445, 2016. Google ScholarDigital Library
C. Macdonald, I. Ounis, and N. Tonellotto. Upper-bound approximations for dynamic pruning. TOIS, 29 (4): 17:1--17:28, 2011.Google ScholarDigital Library
J. Mackenzie, F. M. Choudhury, and J. S. Culpepper. Efficient location-aware web search. In ADCS, pages 4:1--4:8, 2015. Google ScholarDigital Library
A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. TOIS, 14 (4): 349--379, 1996. Google ScholarDigital Library
A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27 (1): 2:1--2:27, 2008.Google ScholarDigital Library
A. Moffat, J. Zobel, and R. Sacks-Davis. Memory efficient ranking. IP&M, 30 (6): 733--744, 1994. Google ScholarDigital Library
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. JASIS, 47 (10): 749--764, 1996. Google ScholarCross Ref
M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In ADCS, pages 58--65, 2013. Google ScholarDigital Library
M. Petri, A. Moffat, and J. S. Culpepper. Score-safe term dependency processing with hybrid indexes. In SIGIR, pages 899--902, 2014. Google ScholarDigital Library
K. A. Ross, J. Cieslewicz, J. Rao, and J. Zhou. Architecture sensitive database design: Examples from the Columbia group. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 28 (2): 5--10, 2005.Google Scholar
D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In WSDM, pages 423--432, 2012. Google ScholarDigital Library
A. Trotman. Compression, SIMD, and postings lists. In ADCS, pages 50:50--50:57, 2014.Google Scholar
A. Trotman, X.-F. Jia, and M. Crane. Towards an efficient and effective search engine. In Workshop on Open Source Information Retrieval, pages 40--47, 2012.Google Scholar
L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In SIGIR, pages 105--114, 2011. Google ScholarDigital Library

Index Terms

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation
1. Information systems
  1. Information retrieval
    1. Search engine architectures and scalability

Recommendations

A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Document-at-a-time (DaaT) and score-at-a-time (SaaT) query evaluation techniques are different approaches to top-k retrieval with inverted indexes. While modern systems are dominated by DaaT, the academic literature has seen decades of debate about the ...
Read More
Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

We study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Read More
Top-k query evaluation in sensor networks under query response time constraint

Top-k query in a wireless sensor network is to find the k sensor nodes with the highest sensing values. To evaluate the top-k query in such an energy-constrained network poses great challenges, due to the unique characteristics imposed on its sensors. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
February 2017
868 pages
ISBN:9781450346757
DOI:10.1145/3018661
General Chairs:
Maarten de Rijke
University of Amsterdam
,
Milad Shokouhi
Microsoft
,
Program Chairs:
Andrew Tomkins
Google
,
Min Zhang
Tsinghua University
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 February 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
block-max wand
document-ordered indexes
impact-ordered indexes
jass
wand
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 33
  Total Citations
  View Citations
- 536
  Total Downloads
- Downloads (Last 12 months)48
- Downloads (Last 6 weeks)27
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation

WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods

Query evaluation using overlapping views: completeness and efficiency

Top-k query evaluation in sensor networks under query response time constraint