skip to main content
10.1145/3539618.3591806acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper

Profiling and Visualizing Dynamic Pruning Algorithms

Published:18 July 2023Publication History

ABSTRACT

Efficiently retrieving the top-k documents for a given query is a fundamental operation in many search applications. Dynamic pruning algorithms accelerate top-k retrieval over inverted indexes by skipping documents that are not able to enter the current set of results. However, the performance of these algorithms depends on a number of variables such as the ranking function, the order of documents within the index, and the number of documents to be retrieved. In this paper, we propose a diagnostic framework, Dyno, for profiling and visualizing the performance of dynamic pruning algorithms. Our framework captures processing traces during retrieval, allowing the operations of the index traversal algorithm to be visualized. These visualizations support both query-level and system-to-system comparisons, enabling performance characteristics to be readily understood for different systems. Dyno benefits both academics and practitioners by furthering our understanding of the behavior of dynamic pruning algorithms, allowing better design choices to be made during experimentation and deployment.

References

  1. I. S. Altingovde, E. Demir, F. Can, and O. Ulusoy. Incremental cluster-based retrieval using compressed cluster-skipping inverted files. ACM Trans. Inf. Sys., 26(3), 2008.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35--42, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. Mc- Namara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, and T. Wang. MS MARCO: A human generated MAchine Reading COmprehension dataset. arXiv:1611.09268v3, 2018.Google ScholarGoogle Scholar
  4. D. Blandford and G. Blelloch. Index compression through document reordering. In Proc. DCC, pages 342--352, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  5. E. Bortnikov, D. Carmel, and G. Golan-Gueta. Top-k query processing with conditional skips. In Proc. WWW, pages 653--661, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426--434, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-?? processing over compressed lists. In Proc. ICDE, pages 709--720, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221--1224, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman. A comparison of document-at-a-time and score-at-a-time query evaluation. In Proc. WSDM, pages 201--210, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Craswell, B. Mitra, E. Yilmaz, D. Campos, and E. M. Voorhees. Overview of the TREC 2019 deep learning track. In Proc. TREC, 2021.Google ScholarGoogle Scholar
  11. C. M. Daoud, E. S. de Moura, D. Fernandes, A. S. da Silva, C. Rossi, and A. Carvalho. Waves: A fast multi-tier top-k query processing algorithm. Inf. Retr., 20(3):292--316, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. L. S. de Carvalho, E. S. de Moura, C. M. Daoud, and A. S. da Silva. Heuristics to improve the BMW method and its variants. Journal of Information & Data Management, 6(3):178--191, 2015.Google ScholarGoogle Scholar
  13. L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In Proc. KDD, pages 1535--1544, 2016.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In Proc. SIGIR, pages 723--732, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In Proc. SIGIR, pages 993--1002, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. A. Grand, R. Muir, J. Ferenczi, and J. Lin. From MaxScore to Block-Max Wand: The story of how Lucene significantly improved query evaluation performance. In Proc. ECIR, pages 20--27, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Hafizoglu, E. C. Kucukoglu, and I. S. Altingovde. On the efficiency of selective search. In Proc. ECIR, pages 705--712, 2017.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Kane and F. W. Tompa. Split-lists and initial thresholds for WAND-based search. In Proc. SIGIR, pages 877--880, 2018.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. O. Khattab, M. Hammoud, and T. Elsayed. Finding the best of both worlds: Faster and more robust top-k document retrieval. In Proc. SIGIR, pages 1031--1040, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. Does selective search benefit from WAND optimization? In Proc. ECIR, pages 145--158, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  21. D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Soft. Prac. & Exp., 45(1):1--29, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. X. Ma, R. Pradeep, R. Nogueira, and J. Lin. Document expansions and learned sparse lexical representations for MSMARCO V1 and V2. In Proc. SIGIR, 2022.Google ScholarGoogle Scholar
  23. C. Macdonald, R. McCreadie, R. L. T. Santos, and I. Ounis. From puppy to maturity: Experiences in developing Terrier. In Proc. OSIR at SIGIR 2012, 2012.Google ScholarGoogle Scholar
  24. C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In Proc. SIGIR, pages 621--630, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Mackenzie and A. Moffat. Examining the additivity of top-k query processing innovations. In Proc. CIKM, pages 1085--1094, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Mackenzie, A. Mallia, M. Petri, J. S. Culpepper, and T. Suel. Compressing inverted indexes with recursive graph bisection: A reproducibility study. In Proc. ECIR, pages 339--352, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Mackenzie, Z. Dai, L. Gallagher, and J. Callan. Efficiency implications of term weighting for passage retrieval. In Proc. SIGIR, pages 1821--1824, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Mackenzie, A. Mallia, A. Moffat, and M. Petri. Accelerating learned sparse indexes via term impact decomposition. In Findings of the ACL: EMNLP 2022, pages 2830--2842, 2022.Google ScholarGoogle ScholarCross RefCross Ref
  29. J. Mackenzie, M. Petri, and A. Moffat. Tradeoff options for bipartite graph partitioning. IEEE Trans. Know. & Data Eng., 2022. To appear.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Mackenzie, M. Petri, and A. Moffat. Anytime ranking on document-ordered indexes. ACM Trans. Inf. Sys., 40(1):13.1--13.32, 2022.Google ScholarGoogle Scholar
  31. J. Mackenzie, A. Trotman, and J. Lin. Efficient document-at-a-time and score-at- a-time query evaluation for learned sparse representations. ACM Trans. Inf. Sys., 41(4), 2023.Google ScholarGoogle Scholar
  32. A. Mallia, G. Ottaviano, E. Porciani, N. Tonellotto, and R. Venturini. Faster BlockMax WAND with variable-sized blocks. In Proc. SIGIR, pages 625--634, 2017.Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Mallia, M. Siedlaczek, J. Mackenzie, and T. Suel. PISA: Performant indexes and search for academia. In Proc. OSIRRC at SIGIR 2019, pages 50--56, 2019.Google ScholarGoogle Scholar
  34. A. Mallia, M. Siedlaczek, and T. Suel. An experimental study of index compression and DAAT query processing methods. In Proc. ECIR, pages 353--368, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Mallia, M. Siedlaczek, M. Sun, and T. Suel. A comparison of top-?? threshold estimation techniques for disjunctive query processing. In Proc. CIKM, pages 2141--2144, 2020.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. A. Mallia, O. Khattab, N. Tonellotto, and T. Suel. Learning passage impacts for inverted indexes. In Proc. SIGIR, pages 1723--1727, 2021.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. Nogueira and J. Lin. From doc2query to docTTTTTquery, 2019. Unpublished technical report.Google ScholarGoogle Scholar
  38. M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In Proc. Aust. Doc. Comp. Symp., pages 58--65, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Petri, A. Moffat, J. Mackenzie, J. S. Culpepper, and D. Beck. Accelerated query processing via similarity score prediction. In Proc. SIGIR, pages 485--494, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. G. E. Pibiri and R. Venturini. Techniques for inverted index compression. ACM Comp. Surv., 53(6):125.1--125.36, 2021.Google ScholarGoogle Scholar
  41. S. E. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Found. Trnd. Inf. Retr., 3:333--389, 2009.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In Proc. WSDM, pages 423--432, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. W.-Y. Shieh, T.-F. Chen, J. J.-J. Shann, and C.-P. Chung. Inverted file compression through document identifier reassignment. Inf. Proc. & Man., 39(1):117--131, 2003.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. Silvestri. Sorting out the document identifier assignment problem. In Proc. ECIR, pages 101--112, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  45. N. Tonellotto, C. Macdonald, and I. Ounis. Effect of different docid orderings on dynamic pruning retrieval strategies. In Proc. SIGIR, pages 1179--1180, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In Proc. WSDM, pages 63--72, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. N. Tonellotto, C. Macdonald, and I. Ounis. Efficient query processing for scalable web search. Found. Trnd. Inf. Retr., 12(4--5):319--500, 2018.Google ScholarGoogle ScholarCross RefCross Ref
  48. H. R. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Inf. Proc. & Man., 31(6):831--850, 1995.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In Proc. SIGIR, pages 105--114, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. E. Yafay and I. S. Altingovde. Caching scores for faster query processing with dynamic pruning in search engines. In Proc. CIKM, pages 2457--2460, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselines using lucene. J. Data Inf. Qual., 10(4):16.1--17.20, 2018.Google ScholarGoogle Scholar
  52. J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comp. Surv., 38(2):6.1--6.56, 2006Google ScholarGoogle Scholar

Index Terms

  1. Profiling and Visualizing Dynamic Pruning Algorithms

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
          July 2023
          3567 pages
          ISBN:9781450394086
          DOI:10.1145/3539618

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 18 July 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          Overall Acceptance Rate792of3,983submissions,20%
        • Article Metrics

          • Downloads (Last 12 months)53
          • Downloads (Last 6 weeks)2

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader