short-paper

Profiling and Visualizing Dynamic Pruning Algorithms

Authors:

Joel MackenzieAuthors Info & Claims

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 3125 - 3129

https://doi.org/10.1145/3539618.3591806

Published: 18 July 2023 Publication History

Abstract

Efficiently retrieving the top-k documents for a given query is a fundamental operation in many search applications. Dynamic pruning algorithms accelerate top-k retrieval over inverted indexes by skipping documents that are not able to enter the current set of results. However, the performance of these algorithms depends on a number of variables such as the ranking function, the order of documents within the index, and the number of documents to be retrieved. In this paper, we propose a diagnostic framework, Dyno, for profiling and visualizing the performance of dynamic pruning algorithms. Our framework captures processing traces during retrieval, allowing the operations of the index traversal algorithm to be visualized. These visualizations support both query-level and system-to-system comparisons, enabling performance characteristics to be readily understood for different systems. Dyno benefits both academics and practitioners by furthering our understanding of the behavior of dynamic pruning algorithms, allowing better design choices to be made during experimentation and deployment.

References

[1]

I. S. Altingovde, E. Demir, F. Can, and O. Ulusoy. Incremental cluster-based retrieval using compressed cluster-skipping inverted files. ACM Trans. Inf. Sys., 26(3), 2008.

Digital Library

[2]

V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35--42, 2001.

Digital Library

[3]

P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. Mc- Namara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, and T. Wang. MS MARCO: A human generated MAchine Reading COmprehension dataset. arXiv:1611.09268v3, 2018.

[4]

D. Blandford and G. Blelloch. Index compression through document reordering. In Proc. DCC, pages 342--352, 2002.

[5]

E. Bortnikov, D. Carmel, and G. Golan-Gueta. Top-k query processing with conditional skips. In Proc. WWW, pages 653--661, 2017.

Digital Library

[6]

A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426--434, 2003.

Digital Library

[7]

K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-?? processing over compressed lists. In Proc. ICDE, pages 709--720, 2011.

Digital Library

[8]

M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221--1224, 2013.

Digital Library

[9]

M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman. A comparison of document-at-a-time and score-at-a-time query evaluation. In Proc. WSDM, pages 201--210, 2017.

Digital Library

[10]

N. Craswell, B. Mitra, E. Yilmaz, D. Campos, and E. M. Voorhees. Overview of the TREC 2019 deep learning track. In Proc. TREC, 2021.

[11]

C. M. Daoud, E. S. de Moura, D. Fernandes, A. S. da Silva, C. Rossi, and A. Carvalho. Waves: A fast multi-tier top-k query processing algorithm. Inf. Retr., 20(3):292--316, 2017.

Digital Library

[12]

L. L. S. de Carvalho, E. S. de Moura, C. M. Daoud, and A. S. da Silva. Heuristics to improve the BMW method and its variants. Journal of Information & Data Management, 6(3):178--191, 2015.

[13]

L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In Proc. KDD, pages 1535--1544, 2016.

Digital Library

[14]

C. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In Proc. SIGIR, pages 723--732, 2013.

Digital Library

[15]

S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In Proc. SIGIR, pages 993--1002, 2011.

Digital Library

[16]

A. Grand, R. Muir, J. Ferenczi, and J. Lin. From MaxScore to Block-Max Wand: The story of how Lucene significantly improved query evaluation performance. In Proc. ECIR, pages 20--27, 2020.

Digital Library

[17]

F. Hafizoglu, E. C. Kucukoglu, and I. S. Altingovde. On the efficiency of selective search. In Proc. ECIR, pages 705--712, 2017.

[18]

A. Kane and F. W. Tompa. Split-lists and initial thresholds for WAND-based search. In Proc. SIGIR, pages 877--880, 2018.

Digital Library

[19]

O. Khattab, M. Hammoud, and T. Elsayed. Finding the best of both worlds: Faster and more robust top-k document retrieval. In Proc. SIGIR, pages 1031--1040, 2020.

Digital Library

[20]

Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. Does selective search benefit from WAND optimization? In Proc. ECIR, pages 145--158, 2016.

[21]

D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Soft. Prac. & Exp., 45(1):1--29, 2015.

Digital Library

[22]

X. Ma, R. Pradeep, R. Nogueira, and J. Lin. Document expansions and learned sparse lexical representations for MSMARCO V1 and V2. In Proc. SIGIR, 2022.

[23]

C. Macdonald, R. McCreadie, R. L. T. Santos, and I. Ounis. From puppy to maturity: Experiences in developing Terrier. In Proc. OSIR at SIGIR 2012, 2012.

[24]

C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In Proc. SIGIR, pages 621--630, 2012.

Digital Library

[25]

J. Mackenzie and A. Moffat. Examining the additivity of top-k query processing innovations. In Proc. CIKM, pages 1085--1094, 2020.

Digital Library

[26]

J. Mackenzie, A. Mallia, M. Petri, J. S. Culpepper, and T. Suel. Compressing inverted indexes with recursive graph bisection: A reproducibility study. In Proc. ECIR, pages 339--352, 2019.

Digital Library

[27]

J. Mackenzie, Z. Dai, L. Gallagher, and J. Callan. Efficiency implications of term weighting for passage retrieval. In Proc. SIGIR, pages 1821--1824, 2020.

Digital Library

[28]

J. Mackenzie, A. Mallia, A. Moffat, and M. Petri. Accelerating learned sparse indexes via term impact decomposition. In Findings of the ACL: EMNLP 2022, pages 2830--2842, 2022.

[29]

J. Mackenzie, M. Petri, and A. Moffat. Tradeoff options for bipartite graph partitioning. IEEE Trans. Know. & Data Eng., 2022. To appear.

Digital Library

[30]

J. Mackenzie, M. Petri, and A. Moffat. Anytime ranking on document-ordered indexes. ACM Trans. Inf. Sys., 40(1):13.1--13.32, 2022.

[31]

J. Mackenzie, A. Trotman, and J. Lin. Efficient document-at-a-time and score-at- a-time query evaluation for learned sparse representations. ACM Trans. Inf. Sys., 41(4), 2023.

[32]

A. Mallia, G. Ottaviano, E. Porciani, N. Tonellotto, and R. Venturini. Faster BlockMax WAND with variable-sized blocks. In Proc. SIGIR, pages 625--634, 2017.

Digital Library

[33]

A. Mallia, M. Siedlaczek, J. Mackenzie, and T. Suel. PISA: Performant indexes and search for academia. In Proc. OSIRRC at SIGIR 2019, pages 50--56, 2019.

[34]

A. Mallia, M. Siedlaczek, and T. Suel. An experimental study of index compression and DAAT query processing methods. In Proc. ECIR, pages 353--368, 2019.

Digital Library

[35]

A. Mallia, M. Siedlaczek, M. Sun, and T. Suel. A comparison of top-?? threshold estimation techniques for disjunctive query processing. In Proc. CIKM, pages 2141--2144, 2020.

Digital Library

[36]

A. Mallia, O. Khattab, N. Tonellotto, and T. Suel. Learning passage impacts for inverted indexes. In Proc. SIGIR, pages 1723--1727, 2021.

Digital Library

[37]

R. Nogueira and J. Lin. From doc2query to docTTTTTquery, 2019. Unpublished technical report.

[38]

M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In Proc. Aust. Doc. Comp. Symp., pages 58--65, 2013.

Digital Library

[39]

M. Petri, A. Moffat, J. Mackenzie, J. S. Culpepper, and D. Beck. Accelerated query processing via similarity score prediction. In Proc. SIGIR, pages 485--494, 2019.

Digital Library

[40]

G. E. Pibiri and R. Venturini. Techniques for inverted index compression. ACM Comp. Surv., 53(6):125.1--125.36, 2021.

[41]

S. E. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Found. Trnd. Inf. Retr., 3:333--389, 2009.

Digital Library

[42]

D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In Proc. WSDM, pages 423--432, 2012.

Digital Library

[43]

W.-Y. Shieh, T.-F. Chen, J. J.-J. Shann, and C.-P. Chung. Inverted file compression through document identifier reassignment. Inf. Proc. & Man., 39(1):117--131, 2003.

Digital Library

[44]

F. Silvestri. Sorting out the document identifier assignment problem. In Proc. ECIR, pages 101--112, 2007.

[45]

N. Tonellotto, C. Macdonald, and I. Ounis. Effect of different docid orderings on dynamic pruning retrieval strategies. In Proc. SIGIR, pages 1179--1180, 2011.

Digital Library

[46]

N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In Proc. WSDM, pages 63--72, 2013.

Digital Library

[47]

N. Tonellotto, C. Macdonald, and I. Ounis. Efficient query processing for scalable web search. Found. Trnd. Inf. Retr., 12(4--5):319--500, 2018.

[48]

H. R. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Inf. Proc. & Man., 31(6):831--850, 1995.

Digital Library

[49]

L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In Proc. SIGIR, pages 105--114, 2011.

Digital Library

[50]

E. Yafay and I. S. Altingovde. Caching scores for faster query processing with dynamic pruning in search engines. In Proc. CIKM, pages 2457--2460, 2019.

Digital Library

[51]

P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselines using lucene. J. Data Inf. Qual., 10(4):16.1--17.20, 2018.

[52]

J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comp. Surv., 38(2):6.1--6.56, 2006

Index Terms

Profiling and Visualizing Dynamic Pruning Algorithms
1. Human-centered computing
  1. Visualization
    1. Visualization systems and tools
2. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Retrieval efficiency
    2. Information retrieval query processing

Recommendations

U-Skyline: A New Skyline Query for Uncertain Databases

The skyline query, aiming at identifying a set of skyline tuples that are not dominated by any other tuple, is particularly useful for multicriteria data analysis and decision making. For uncertain databases, a probabilistic skyline query, called P-...
Hybrid Dynamic Pruning for Efficient and Effective Query Processing
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge Management

The performance of query processing has always been a concern in the field of information retrieval. Dynamic pruning algorithms have been proposed to improve query processing performance in terms of efficiency and effectiveness. However, a single ...
Combining static and dynamic data in code visualization
PASTE '02: Proceedings of the 2002 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering

The task of developing, tuning, and debugging compiler optimizations is a difficult one which can be facilitated by software visualization. There are many characteristics of the code which must be considered when studying the kinds of optimizations ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
74
Total Downloads

Downloads (Last 12 months)23
Downloads (Last 6 weeks)1

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten