ABSTRACT
We present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct comparison: such a study has been difficult due to vastly different query evaluation mechanics and index organizations. Our work controls for score quantization, document processing, compression, implementation language, implementation effort, and a number of details, arriving at an empirical evaluation that fairly characterizes the performance of three specific techniques: WAND (DaaT), BMW (DaaT), and JASS (SaaT). Experiments reveal a number of interesting findings. The performance gap between WAND and BMW is not as clear as the literature suggests, and both methods are susceptible to tail queries that may take orders of magnitude longer than the median query to execute. Surprisingly, approximate query evaluation in WAND and BMW does not significantly reduce the risk of these tail queries. Overall, JASS is slightly slower than either WAND or BMW, but exhibits much lower variance in query latencies and is much less susceptible to tail query effects. Furthermore, JASS query latency is not particularly sensitive to the retrieval depth, making it an appealing solution for performance-sensitive applications where bounds on query latencies are desirable.
- V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR, pages 372--379, 2006. Google ScholarDigital Library
- V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Software:\ Practice and Experience, 40 (2): 131--147, 2010.Google Scholar
- V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR, pages 35--42, 2001. Google ScholarDigital Library
- N. Asadi and J. Lin. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In SIGIR, pages 997--1000, 2013.Google ScholarDigital Library
- P. Bailey, A. Moffat, F. Scholer, and P. Thomas. UQV100: A test collection with query variability. In SIGIR, pages 725--728, 2016. Google ScholarDigital Library
- P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51 (12): 77--85, 2008. Google ScholarDigital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, pages 426--434, 2003. Google ScholarDigital Library
- K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-k processing over compressed lists. In ICDE, pages 709--720, 2011. Google ScholarDigital Library
- M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In CIKM, pages 1221--1224, 2013. Google ScholarDigital Library
- C. M. Daoud, E. S. de Moura, A. Carvalho, A. S. da Silva, D. Fernandes, and C. Rossi. Fast top-k preserving query processing using two-tier indexes. IP&M, 52 (5): 855--872, 2016. Google ScholarDigital Library
- J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56 (2): 74--80, 2013. Google ScholarDigital Library
- Dimopoulos, Nepomnyachiy, and Suel]dimopoulos2013optimizingC. Dimopoulos, S. Nepomnyachiy, and T. Suel. Optimizing top-k document retrieval strategies for block-max indexes. In WSDM, pages 113--122, 2013\natexlaba.Google ScholarDigital Library
- Dimopoulos, Nepomnyachiy, and Suel]dns13-sigirC. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In SIGIR, pages 723--732, 2013\natexlabb.Google ScholarDigital Library
- S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In SIGIR, pages 993--1002, 2011. Google ScholarDigital Library
- M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4 (12): 1213--1224, 2011.Google ScholarDigital Library
- X.-F. Jia, A. Trotman, and R. O'Keefe. Efficient accumulator initialisation. In ADCS, pages 44--51, 2010.Google Scholar
- S. Kim, Y. He, S.-W. Hwang, S. Elnikety, and S. Choi. Delayed-Dynamic-Selective (DDS) prediction for reducing extreme tail latency in web search. In WSDM, pages 7--16, 2015.Google ScholarDigital Library
- D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Software:\ Practice and Experience, 45 (1): 1--29, 2015. Google ScholarDigital Library
- D. Lemire, L. Boytsov, and N. Kurz. SIMD compression and the intersection of sorted integers. Software:\ Practice and Experience, 46 (6): 723--749, 2016. Google ScholarDigital Library
- J. Lin and A. Trotman. Anytime ranking for impact-ordered indexes. In ICTIR, pages 301--304, 2015. Google ScholarDigital Library
- J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. Toward reproducible baselines: The Open-Source IR Reproducibility Challenge. In ECIR, pages 408--420, 2016.Google ScholarCross Ref
- X. Lu, A. Moffat, and J. S. Culpepper. On the cost of extracting proximity features for term-dependency models. In CIKM, pages 293--302, 2015. Google ScholarDigital Library
- X. Lu, A. Moffat, and J. S. Culpepper. The effect of pooling and evaluation depth on ir metrics. IRJ, 19 (4): 416--445, 2016. Google ScholarDigital Library
- C. Macdonald, I. Ounis, and N. Tonellotto. Upper-bound approximations for dynamic pruning. TOIS, 29 (4): 17:1--17:28, 2011.Google ScholarDigital Library
- J. Mackenzie, F. M. Choudhury, and J. S. Culpepper. Efficient location-aware web search. In ADCS, pages 4:1--4:8, 2015. Google ScholarDigital Library
- A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. TOIS, 14 (4): 349--379, 1996. Google ScholarDigital Library
- A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27 (1): 2:1--2:27, 2008.Google ScholarDigital Library
- A. Moffat, J. Zobel, and R. Sacks-Davis. Memory efficient ranking. IP&M, 30 (6): 733--744, 1994. Google ScholarDigital Library
- M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. JASIS, 47 (10): 749--764, 1996. Google ScholarCross Ref
- M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In ADCS, pages 58--65, 2013. Google ScholarDigital Library
- M. Petri, A. Moffat, and J. S. Culpepper. Score-safe term dependency processing with hybrid indexes. In SIGIR, pages 899--902, 2014. Google ScholarDigital Library
- K. A. Ross, J. Cieslewicz, J. Rao, and J. Zhou. Architecture sensitive database design: Examples from the Columbia group. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 28 (2): 5--10, 2005.Google Scholar
- D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In WSDM, pages 423--432, 2012. Google ScholarDigital Library
- A. Trotman. Compression, SIMD, and postings lists. In ADCS, pages 50:50--50:57, 2014.Google Scholar
- A. Trotman, X.-F. Jia, and M. Crane. Towards an efficient and effective search engine. In Workshop on Open Source Information Retrieval, pages 40--47, 2012.Google Scholar
- L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In SIGIR, pages 105--114, 2011. Google ScholarDigital Library
Index Terms
- A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation
Recommendations
A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information RetrievalDocument-at-a-time (DaaT) and score-at-a-time (SaaT) query evaluation techniques are different approaches to top-k retrieval with inverted indexes. While modern systems are dominated by DaaT, the academic literature has seen decades of debate about the ...
Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataWe study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Top-k query evaluation in sensor networks under query response time constraint
Top-k query in a wireless sensor network is to find the k sensor nodes with the highest sensing values. To evaluate the top-k query in such an energy-constrained network poses great challenges, due to the unique characteristics imposed on its sensors. ...
Comments