skip to main content
10.1145/3018661.3018726acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation

Published:02 February 2017Publication History

ABSTRACT

We present an empirical comparison between document-at-a-time (DaaT) and score-at-a-time (SaaT) document ranking strategies within a common framework. Although both strategies have been extensively explored, the literature lacks a fair, direct comparison: such a study has been difficult due to vastly different query evaluation mechanics and index organizations. Our work controls for score quantization, document processing, compression, implementation language, implementation effort, and a number of details, arriving at an empirical evaluation that fairly characterizes the performance of three specific techniques: WAND (DaaT), BMW (DaaT), and JASS (SaaT). Experiments reveal a number of interesting findings. The performance gap between WAND and BMW is not as clear as the literature suggests, and both methods are susceptible to tail queries that may take orders of magnitude longer than the median query to execute. Surprisingly, approximate query evaluation in WAND and BMW does not significantly reduce the risk of these tail queries. Overall, JASS is slightly slower than either WAND or BMW, but exhibits much lower variance in query latencies and is much less susceptible to tail query effects. Furthermore, JASS query latency is not particularly sensitive to the retrieval depth, making it an appealing solution for performance-sensitive applications where bounds on query latencies are desirable.

References

  1. V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR, pages 372--379, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Anh and A. Moffat. Inverted index compression using word-aligned binary codes. Software:\ Practice and Experience, 40 (2): 131--147, 2010.Google ScholarGoogle Scholar
  3. V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR, pages 35--42, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. N. Asadi and J. Lin. Effectiveness/efficiency tradeoffs for candidate generation in multi-stage retrieval architectures. In SIGIR, pages 997--1000, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Bailey, A. Moffat, F. Scholer, and P. Thomas. UQV100: A test collection with query variability. In SIGIR, pages 725--728, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. A. Boncz, M. L. Kersten, and S. Manegold. Breaking the memory wall in MonetDB. Communications of the ACM, 51 (12): 77--85, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In CIKM, pages 426--434, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-k processing over compressed lists. In ICDE, pages 709--720, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In CIKM, pages 1221--1224, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. M. Daoud, E. S. de Moura, A. Carvalho, A. S. da Silva, D. Fernandes, and C. Rossi. Fast top-k preserving query processing using two-tier indexes. IP&M, 52 (5): 855--872, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56 (2): 74--80, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Dimopoulos, Nepomnyachiy, and Suel]dimopoulos2013optimizingC. Dimopoulos, S. Nepomnyachiy, and T. Suel. Optimizing top-k document retrieval strategies for block-max indexes. In WSDM, pages 113--122, 2013\natexlaba.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dimopoulos, Nepomnyachiy, and Suel]dns13-sigirC. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In SIGIR, pages 723--732, 2013\natexlabb.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In SIGIR, pages 993--1002, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Fontoura, V. Josifovski, J. Liu, S. Venkatesan, X. Zhu, and J. Zien. Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4 (12): 1213--1224, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X.-F. Jia, A. Trotman, and R. O'Keefe. Efficient accumulator initialisation. In ADCS, pages 44--51, 2010.Google ScholarGoogle Scholar
  17. S. Kim, Y. He, S.-W. Hwang, S. Elnikety, and S. Choi. Delayed-Dynamic-Selective (DDS) prediction for reducing extreme tail latency in web search. In WSDM, pages 7--16, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Software:\ Practice and Experience, 45 (1): 1--29, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Lemire, L. Boytsov, and N. Kurz. SIMD compression and the intersection of sorted integers. Software:\ Practice and Experience, 46 (6): 723--749, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Lin and A. Trotman. Anytime ranking for impact-ordered indexes. In ICTIR, pages 301--304, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Lin, M. Crane, A. Trotman, J. Callan, I. Chattopadhyaya, J. Foley, G. Ingersoll, C. Macdonald, and S. Vigna. Toward reproducible baselines: The Open-Source IR Reproducibility Challenge. In ECIR, pages 408--420, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  22. X. Lu, A. Moffat, and J. S. Culpepper. On the cost of extracting proximity features for term-dependency models. In CIKM, pages 293--302, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Lu, A. Moffat, and J. S. Culpepper. The effect of pooling and evaluation depth on ir metrics. IRJ, 19 (4): 416--445, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Macdonald, I. Ounis, and N. Tonellotto. Upper-bound approximations for dynamic pruning. TOIS, 29 (4): 17:1--17:28, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Mackenzie, F. M. Choudhury, and J. S. Culpepper. Efficient location-aware web search. In ADCS, pages 4:1--4:8, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. TOIS, 14 (4): 349--379, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Moffat and J. Zobel. Rank-biased precision for measurement of retrieval effectiveness. ACM Transactions on Information Systems, 27 (1): 2:1--2:27, 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Moffat, J. Zobel, and R. Sacks-Davis. Memory efficient ranking. IP&M, 30 (6): 733--744, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. JASIS, 47 (10): 749--764, 1996. Google ScholarGoogle ScholarCross RefCross Ref
  30. M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In ADCS, pages 58--65, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Petri, A. Moffat, and J. S. Culpepper. Score-safe term dependency processing with hybrid indexes. In SIGIR, pages 899--902, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. K. A. Ross, J. Cieslewicz, J. Rao, and J. Zhou. Architecture sensitive database design: Examples from the Columbia group. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 28 (2): 5--10, 2005.Google ScholarGoogle Scholar
  33. D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In WSDM, pages 423--432, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Trotman. Compression, SIMD, and postings lists. In ADCS, pages 50:50--50:57, 2014.Google ScholarGoogle Scholar
  35. A. Trotman, X.-F. Jia, and M. Crane. Towards an efficient and effective search engine. In Workshop on Open Source Information Retrieval, pages 40--47, 2012.Google ScholarGoogle Scholar
  36. L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In SIGIR, pages 105--114, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Comparison of Document-at-a-Time and Score-at-a-Time Query Evaluation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining
      February 2017
      868 pages
      ISBN:9781450346757
      DOI:10.1145/3018661

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 February 2017

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '17 Paper Acceptance Rate80of505submissions,16%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader