skip to main content
10.1145/3539618.3591987acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
short-paper
Open access

Faster Dynamic Pruning via Reordering of Documents in Inverted Indexes

Published: 18 July 2023 Publication History

Abstract

Widely used dynamic pruning algorithms (such as MaxScore, WAND and BMW) keep track of the k-th highest score (i.e., heap threshold) among the documents that are scored so far, to avoid scoring the documents that cannot get into the top-k result list. Obviously, the faster the heap threshold converges to its final value, the larger will be the number of skipped documents and hence, the efficiency gains of the pruning algorithms. In this paper, we tailor approaches that reorder the documents in the inverted index based on their access counts and ranks for previous queries. By storing such frequently retrieved documents at front of the postings lists, we aim to compute the heap threshold earlier during the query processing. Our approach yields substantial speedups (up to 1.33x) for all three dynamic pruning algorithms and outperforms two strong baselines that have been employed for document reordering in the literature.

Supplemental Material

MP4 File
Presentation video.

References

[1]
Ismail Sengor Altingovde, Rifat Ozcan, and Özgür Ulusoy. 2012. Static index pruning in web search engines: Combining term and document popularities with query views. ACM Transactions on Information Systems (TOIS), Vol. 30, 1 (2012), 1--28.
[2]
Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, Vol. 30, 1--7 (1998), 107--117.
[3]
Andrei Z. Broder, David Carmel, Michael Herscovici, Aya Soffer, and Jason Zien. 2003. Efficient query evaluation using a two-level retrieval process. In Proceedings of the 12th International Conference on Information and Knowledge Management. 426--434.
[4]
Lidia Lizziane Serejo de Carvalho, Edleno Silva de Moura, Caio Moura Daoud, and Altigran Soares da Silva. 2015. Heuristics to Improve the BMW Method and Its Variants. J. Inf. Data Manag., Vol. 6, 3 (2015), 178--191. https://sol.sbc.org.br/journals/index.php/jidm/article/view/1569
[5]
Laxman Dhulipala, Igor Kabiljo, Brian Karrer, Giuseppe Ottaviano, Sergey Pupyrev, and Alon Shalita. 2016. Compressing graphs and indexes with recursive graph bisection. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1535--1544.
[6]
Constantinos Dimopoulos, Sergey Nepomnyachiy, and Torsten Suel. 2013. Optimizing top-k document retrieval strategies for block-max indexes. In Proceedings of the 6th ACM International Conference on Web Search and Data Mining. 113--122.
[7]
Shuai Ding and Torsten Suel. 2011. Faster top-k document retrieval using block-max indexes. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval. 993--1002.
[8]
Steven Garcia and Andrew Turpin. 2006. Efficient Query Evaluation Through Access-Reordering. In Proceedings of the Third Asia Information Retrieval Symposium (AIRS). 106--118.
[9]
Steven Garcia, Hugh E. Williams, and Adam Cannane. 2004. Access-Ordered Indexes. In Proceedings of the 27th Australasian Conference on Computer Science (ACSC), Vol. 26. 7--14.
[10]
Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. 2014. Predictive parallelization: taming tail latencies in web search. In Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval. 253--262.
[11]
Andrew Kane and Frank Wm Tompa. 2018. Split-lists and initial thresholds for WAND-based search. In Proceedings of the 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. 877--880.
[12]
Daniel Lemire and Leonid Boytsov. 2015. Decoding billions of integers per second through vectorization. Software: Practice and Experience, Vol. 45, 1 (2015), 1--29.
[13]
Xiaohui Long and Torsten Suel. 2003. Optimized query execution in large search engines with global page ordering. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB). 129--140.
[14]
Joel M. Mackenzie, J. Shane Culpepper, Roi Blanco, Matt Crane, Charles L. A. Clarke, and Jimmy Lin. 2018. Query Driven Algorithm Selection in Early Stage Retrieval. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (WSDM). 396--404.
[15]
Joel M. Mackenzie, Antonio Mallia, Matthias Petri, J. Shane Culpepper, and Torsten Suel. 2019. Compressing Inverted Indexes with Recursive Graph Bisection: A Reproducibility Study. In Proceedings of the 41st European Conference on IR Research (ECIR). 339--352.
[16]
Antonio Mallia, Giuseppe Ottaviano, Elia Porciani, Nicola Tonellotto, and Rossano Venturini. 2017. Faster BlockMax WAND with variable-sized blocks. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. 625--634.
[17]
Antonio Mallia, Michal Siedlaczek, Joel Mackenzie, and Torsten Suel. 2019. PISA: Performant Indexes and Search for Academia. In Proceedings of the Open-Source IR Replicability Challenge co-located with 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (OSIRRC@SIGIR). 50--56. http://ceur-ws.org/Vol-2409/docker08.pdf
[18]
Antonio Mallia, Michal Siedlaczek, and Torsten Suel. 2021. Fast Disjunctive Candidate Generation Using Live Block Filtering. In Proceedings of the Fourteenth ACM International Conference on Web Search and Data Mining (WSDM). ACM, 671--679.
[19]
Antonio Mallia, Michal Siedlaczek, Mengyang Sun, and Torsten Suel. 2020. A comparison of top-k threshold estimation techniques for disjunctive query processing. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2141--2144.
[20]
Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A picture of search. In Proceedings of the 1st international Conference on Scalable Information Systems (Infoscale). 1.
[21]
Matthias Petri, Alistair Moffat, Joel Mackenzie, J. Shane Culpepper, and Daniel Beck. 2019. Accelerated query processing via similarity score prediction. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 485--494.
[22]
Michal Siedlaczek, Juan Rodriguez, and Torsten Suel. 2019. Exploiting Global Impact Ordering for Higher Throughput in Selective Search. In Proceedings of the 41st European Conference on IR Research (ECIR). 12--19.
[23]
Fabrizio Silvestri. 2007. Sorting out the document identifier assignment problem. In Proceedings of the 29th European Conference on IR Research (ECIR). 101--112.
[24]
Howard Turtle and James Flood. 1995. Query evaluation: strategies and optimizations. Information Processing & Management, Vol. 31, 6 (1995), 831--850.
[25]
Qi Wang, Constantinos Dimopoulos, and Torsten Suel. 2016. Fast First-Phase Candidate Generation for Cascading Rankers. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. 295--304.
[26]
Qi Wang and Torsten Suel. 2019. Document Reordering for Faster Intersection. Proc. VLDB Endow., Vol. 12, 5 (2019), 475--487.
[27]
Erman Yafay and Ismail Sengor Altingovde. 2019. Caching scores for faster query processing with dynamic pruning in search engines. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management. 2457--2460.
[28]
Dawei Yin, Yuening Hu, Jiliang Tang, Tim Daly Jr., Mianwei Zhou, Hua Ouyang, Jianhui Chen, Changsung Kang, Hongbo Deng, Chikashi Nobata, Jean-Marc Langlois, and Yi Chang. 2016. Ranking Relevance in Yahoo Search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 323--332.
[29]
Fan Zhang, Shuming Shi, Hao Yan, and Ji-Rong Wen. 2010. Revisiting globally sorted indexes for efficient document retrieval. In Proceedings of the Third International Conference on Web Search and Web Data Mining (WSDM). 371--380.

Cited By

View all
  • (2024)Efficient List Intersection Algorithm for Short Documents by Document ReorderingMathematics10.3390/math1209132812:9(1328)Online publication date: 26-Apr-2024

Index Terms

  1. Faster Dynamic Pruning via Reordering of Documents in Inverted Indexes

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
    July 2023
    3567 pages
    ISBN:9781450394086
    DOI:10.1145/3539618
    This work is licensed under a Creative Commons Attribution International 4.0 License.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 July 2023

    Check for updates

    Author Tags

    1. dynamic pruning
    2. inverted index
    3. search engines

    Qualifiers

    • Short-paper

    Conference

    SIGIR '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)142
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Efficient List Intersection Algorithm for Short Documents by Document ReorderingMathematics10.3390/math1209132812:9(1328)Online publication date: 26-Apr-2024

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media