ABSTRACT
Efficiently retrieving the top-k documents for a given query is a fundamental operation in many search applications. Dynamic pruning algorithms accelerate top-k retrieval over inverted indexes by skipping documents that are not able to enter the current set of results. However, the performance of these algorithms depends on a number of variables such as the ranking function, the order of documents within the index, and the number of documents to be retrieved. In this paper, we propose a diagnostic framework, Dyno, for profiling and visualizing the performance of dynamic pruning algorithms. Our framework captures processing traces during retrieval, allowing the operations of the index traversal algorithm to be visualized. These visualizations support both query-level and system-to-system comparisons, enabling performance characteristics to be readily understood for different systems. Dyno benefits both academics and practitioners by furthering our understanding of the behavior of dynamic pruning algorithms, allowing better design choices to be made during experimentation and deployment.
- I. S. Altingovde, E. Demir, F. Can, and O. Ulusoy. Incremental cluster-based retrieval using compressed cluster-skipping inverted files. ACM Trans. Inf. Sys., 26(3), 2008.Google ScholarDigital Library
- V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In Proc. SIGIR, pages 35--42, 2001.Google ScholarDigital Library
- P. Bajaj, D. Campos, N. Craswell, L. Deng, J. Gao, X. Liu, R. Majumder, A. Mc- Namara, B. Mitra, T. Nguyen, M. Rosenberg, X. Song, A. Stoica, S. Tiwary, and T. Wang. MS MARCO: A human generated MAchine Reading COmprehension dataset. arXiv:1611.09268v3, 2018.Google Scholar
- D. Blandford and G. Blelloch. Index compression through document reordering. In Proc. DCC, pages 342--352, 2002.Google ScholarCross Ref
- E. Bortnikov, D. Carmel, and G. Golan-Gueta. Top-k query processing with conditional skips. In Proc. WWW, pages 653--661, 2017.Google ScholarDigital Library
- A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. CIKM, pages 426--434, 2003.Google ScholarDigital Library
- K. Chakrabarti, S. Chaudhuri, and V. Ganti. Interval-based pruning for top-?? processing over compressed lists. In Proc. ICDE, pages 709--720, 2011.Google ScholarDigital Library
- M. Crane, A. Trotman, and R. O'Keefe. Maintaining discriminatory power in quantized indexes. In Proc. CIKM, pages 1221--1224, 2013.Google ScholarDigital Library
- M. Crane, J. S. Culpepper, J. Lin, J. Mackenzie, and A. Trotman. A comparison of document-at-a-time and score-at-a-time query evaluation. In Proc. WSDM, pages 201--210, 2017.Google ScholarDigital Library
- N. Craswell, B. Mitra, E. Yilmaz, D. Campos, and E. M. Voorhees. Overview of the TREC 2019 deep learning track. In Proc. TREC, 2021.Google Scholar
- C. M. Daoud, E. S. de Moura, D. Fernandes, A. S. da Silva, C. Rossi, and A. Carvalho. Waves: A fast multi-tier top-k query processing algorithm. Inf. Retr., 20(3):292--316, 2017.Google ScholarDigital Library
- L. L. S. de Carvalho, E. S. de Moura, C. M. Daoud, and A. S. da Silva. Heuristics to improve the BMW method and its variants. Journal of Information & Data Management, 6(3):178--191, 2015.Google Scholar
- L. Dhulipala, I. Kabiljo, B. Karrer, G. Ottaviano, S. Pupyrev, and A. Shalita. Compressing graphs and indexes with recursive graph bisection. In Proc. KDD, pages 1535--1544, 2016.Google ScholarDigital Library
- C. Dimopoulos, S. Nepomnyachiy, and T. Suel. A candidate filtering mechanism for fast top-k query processing on modern CPUs. In Proc. SIGIR, pages 723--732, 2013.Google ScholarDigital Library
- S. Ding and T. Suel. Faster top-k document retrieval using block-max indexes. In Proc. SIGIR, pages 993--1002, 2011.Google ScholarDigital Library
- A. Grand, R. Muir, J. Ferenczi, and J. Lin. From MaxScore to Block-Max Wand: The story of how Lucene significantly improved query evaluation performance. In Proc. ECIR, pages 20--27, 2020.Google ScholarDigital Library
- F. Hafizoglu, E. C. Kucukoglu, and I. S. Altingovde. On the efficiency of selective search. In Proc. ECIR, pages 705--712, 2017.Google ScholarCross Ref
- A. Kane and F. W. Tompa. Split-lists and initial thresholds for WAND-based search. In Proc. SIGIR, pages 877--880, 2018.Google ScholarDigital Library
- O. Khattab, M. Hammoud, and T. Elsayed. Finding the best of both worlds: Faster and more robust top-k document retrieval. In Proc. SIGIR, pages 1031--1040, 2020.Google ScholarDigital Library
- Y. Kim, J. Callan, J. S. Culpepper, and A. Moffat. Does selective search benefit from WAND optimization? In Proc. ECIR, pages 145--158, 2016.Google ScholarCross Ref
- D. Lemire and L. Boytsov. Decoding billions of integers per second through vectorization. Soft. Prac. & Exp., 45(1):1--29, 2015.Google ScholarDigital Library
- X. Ma, R. Pradeep, R. Nogueira, and J. Lin. Document expansions and learned sparse lexical representations for MSMARCO V1 and V2. In Proc. SIGIR, 2022.Google Scholar
- C. Macdonald, R. McCreadie, R. L. T. Santos, and I. Ounis. From puppy to maturity: Experiences in developing Terrier. In Proc. OSIR at SIGIR 2012, 2012.Google Scholar
- C. Macdonald, N. Tonellotto, and I. Ounis. Learning to predict response times for online query scheduling. In Proc. SIGIR, pages 621--630, 2012.Google ScholarDigital Library
- J. Mackenzie and A. Moffat. Examining the additivity of top-k query processing innovations. In Proc. CIKM, pages 1085--1094, 2020.Google ScholarDigital Library
- J. Mackenzie, A. Mallia, M. Petri, J. S. Culpepper, and T. Suel. Compressing inverted indexes with recursive graph bisection: A reproducibility study. In Proc. ECIR, pages 339--352, 2019.Google ScholarDigital Library
- J. Mackenzie, Z. Dai, L. Gallagher, and J. Callan. Efficiency implications of term weighting for passage retrieval. In Proc. SIGIR, pages 1821--1824, 2020.Google ScholarDigital Library
- J. Mackenzie, A. Mallia, A. Moffat, and M. Petri. Accelerating learned sparse indexes via term impact decomposition. In Findings of the ACL: EMNLP 2022, pages 2830--2842, 2022.Google ScholarCross Ref
- J. Mackenzie, M. Petri, and A. Moffat. Tradeoff options for bipartite graph partitioning. IEEE Trans. Know. & Data Eng., 2022. To appear.Google ScholarDigital Library
- J. Mackenzie, M. Petri, and A. Moffat. Anytime ranking on document-ordered indexes. ACM Trans. Inf. Sys., 40(1):13.1--13.32, 2022.Google Scholar
- J. Mackenzie, A. Trotman, and J. Lin. Efficient document-at-a-time and score-at- a-time query evaluation for learned sparse representations. ACM Trans. Inf. Sys., 41(4), 2023.Google Scholar
- A. Mallia, G. Ottaviano, E. Porciani, N. Tonellotto, and R. Venturini. Faster BlockMax WAND with variable-sized blocks. In Proc. SIGIR, pages 625--634, 2017.Google ScholarDigital Library
- A. Mallia, M. Siedlaczek, J. Mackenzie, and T. Suel. PISA: Performant indexes and search for academia. In Proc. OSIRRC at SIGIR 2019, pages 50--56, 2019.Google Scholar
- A. Mallia, M. Siedlaczek, and T. Suel. An experimental study of index compression and DAAT query processing methods. In Proc. ECIR, pages 353--368, 2019.Google ScholarDigital Library
- A. Mallia, M. Siedlaczek, M. Sun, and T. Suel. A comparison of top-?? threshold estimation techniques for disjunctive query processing. In Proc. CIKM, pages 2141--2144, 2020.Google ScholarDigital Library
- A. Mallia, O. Khattab, N. Tonellotto, and T. Suel. Learning passage impacts for inverted indexes. In Proc. SIGIR, pages 1723--1727, 2021.Google ScholarDigital Library
- R. Nogueira and J. Lin. From doc2query to docTTTTTquery, 2019. Unpublished technical report.Google Scholar
- M. Petri, J. S. Culpepper, and A. Moffat. Exploring the magic of WAND. In Proc. Aust. Doc. Comp. Symp., pages 58--65, 2013.Google ScholarDigital Library
- M. Petri, A. Moffat, J. Mackenzie, J. S. Culpepper, and D. Beck. Accelerated query processing via similarity score prediction. In Proc. SIGIR, pages 485--494, 2019.Google ScholarDigital Library
- G. E. Pibiri and R. Venturini. Techniques for inverted index compression. ACM Comp. Surv., 53(6):125.1--125.36, 2021.Google Scholar
- S. E. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Found. Trnd. Inf. Retr., 3:333--389, 2009.Google ScholarDigital Library
- D. Shan, S. Ding, J. He, H. Yan, and X. Li. Optimized top-k processing with global page scores on block-max indexes. In Proc. WSDM, pages 423--432, 2012.Google ScholarDigital Library
- W.-Y. Shieh, T.-F. Chen, J. J.-J. Shann, and C.-P. Chung. Inverted file compression through document identifier reassignment. Inf. Proc. & Man., 39(1):117--131, 2003.Google ScholarDigital Library
- F. Silvestri. Sorting out the document identifier assignment problem. In Proc. ECIR, pages 101--112, 2007.Google ScholarCross Ref
- N. Tonellotto, C. Macdonald, and I. Ounis. Effect of different docid orderings on dynamic pruning retrieval strategies. In Proc. SIGIR, pages 1179--1180, 2011.Google ScholarDigital Library
- N. Tonellotto, C. Macdonald, and I. Ounis. Efficient and effective retrieval using selective pruning. In Proc. WSDM, pages 63--72, 2013.Google ScholarDigital Library
- N. Tonellotto, C. Macdonald, and I. Ounis. Efficient query processing for scalable web search. Found. Trnd. Inf. Retr., 12(4--5):319--500, 2018.Google ScholarCross Ref
- H. R. Turtle and J. Flood. Query evaluation: Strategies and optimizations. Inf. Proc. & Man., 31(6):831--850, 1995.Google ScholarDigital Library
- L. Wang, J. Lin, and D. Metzler. A cascade ranking model for efficient ranked retrieval. In Proc. SIGIR, pages 105--114, 2011.Google ScholarDigital Library
- E. Yafay and I. S. Altingovde. Caching scores for faster query processing with dynamic pruning in search engines. In Proc. CIKM, pages 2457--2460, 2019.Google ScholarDigital Library
- P. Yang, H. Fang, and J. Lin. Anserini: Reproducible ranking baselines using lucene. J. Data Inf. Qual., 10(4):16.1--17.20, 2018.Google Scholar
- J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comp. Surv., 38(2):6.1--6.56, 2006Google Scholar
Index Terms
- Profiling and Visualizing Dynamic Pruning Algorithms
Recommendations
Combining static and dynamic data in code visualization
PASTE '02: Proceedings of the 2002 ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineeringThe task of developing, tuning, and debugging compiler optimizations is a difficult one which can be facilitated by software visualization. There are many characteristics of the code which must be considered when studying the kinds of optimizations ...
Hybrid Dynamic Pruning for Efficient and Effective Query Processing
CIKM '20: Proceedings of the 29th ACM International Conference on Information & Knowledge ManagementThe performance of query processing has always been a concern in the field of information retrieval. Dynamic pruning algorithms have been proposed to improve query processing performance in terms of efficiency and effectiveness. However, a single ...
Efficient algorithms for finding the most desirable skyline objects
The skyline query is a powerful tool for multi-criteria decision making. However, it may return too many skyline objects to offer any meaningful insight. In this paper, we introduce a new operator, namely, the most desirable skyline object (MDSO) query, ...
Comments