Article

Pruned query evaluation using pre-computed impacts

Authors:

Alistair MoffatAuthors Info & Claims

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

Pages 372 - 379

https://doi.org/10.1145/1148170.1148235

Published: 06 August 2006 Publication History

Abstract

Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exhaustive evaluation, yet still generating a demonstrably "good enough" set of answers to the query. In this work we propose new pruning methods that make use of impact-sorted indexes. Compared to exhaustive evaluation, the new methods reduce the amount of computation performed, reduce the amount of memory required for accumulators, reduce the amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision. These strong claims are backed by experiments using the TREC Terabyte collection and queries.

References

[1]

V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 35--42, New Orleans, Louisiana, September 2001. ACM Press, New York.

Digital Library

[2]

V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 226--233, Salvador, Brazil, August 2005. ACM Press, New York.

Digital Library

[3]

V. N. Anh and A. Moffat. Structured index organizations for high-throughput text querying. April 2006. Submitted for publication.

[4]

A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. 2003 CIKM Int. Conf. Information and Knowledge Management, pages 426--434, New Orleans, Louisiana, November 2005. ACM Press, New York.

Digital Library

[5]

E. W. Brown. Fast evaluation of structured queries for information retrieval. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, Proc. 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30--38. ACM Press, New York, July 1995.

Digital Library

[6]

C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 97--110, Montreal, Canada, June 1985. ACM Press, New York.

Digital Library

[7]

C. L. A. Clarke and F. Scholer. The TREC 2005 Terabyte Track. In The Fourthteenth Text REtrieval Conference (TREC 2005) Notebook, Gaithersburg, MD, November 2005. National Institute of Standards and Technology. http://trec.nist.gov/act_part/t14_notebook/t14.notebook.html.

[8]

E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web serach efficiency via a locality based static pruning method. In Proc. 14th International World Wide Web Conference, pages 235--244, Chiba, Japan, May 2005.

Digital Library

[9]

D. K. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 581--589, August 1990.

[10]

D. Hawking. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum, 16--22, September 1998.

Digital Library

[11]

N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In A. H. H. Ngu, M. Kitsuregawa, E. J. Neuhold, J.-Y. Chung, and Q. Z. Sheng, editors, Proc. 6th International Conference on Web Information Systems Engineering, pages 470--477, New York, November 2005. LNCS 3806, Springer.

Digital Library

[12]

A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 349--379, October 1996.

Digital Library

[13]

M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 749--764, October 1996.

Digital Library

[14]

A. Soffer, D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, and Y. S. Maarek. Static index pruning for information retrieval systems. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43--50, New Orleans, Louisiana, September 2001. ACM Press, New York.

Digital Library

[15]

T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 219--225, Salvador, Brazil, August 2005. ACM Press, New York.

Digital Library

[16]

M. Theobold, R. Schenkel, and G. Weikum. Efficient and self-tuning incremental query expansion for top-k query processing. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 242--249, Salvador, Brazil, August 2005. ACM Press, New York.

Digital Library

[17]

H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing & Management, 831--850, November 1995.

Digital Library

[18]

E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. ISBN 0262220733.

Digital Library

[19]

I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco, second edition, 1999.

Digital Library

Cited By

Aggarwal CAggarwal C(2022)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-030-96623-2_9(257-302)Online publication date: 10-Feb-2022
https://doi.org/10.1007/978-3-030-96623-2_9
Fontoura MJosifovski VLiu JVenkatesan SZhu XZien J(2020)Evaluation strategies for top-k queries over memory-resident inverted indexesProceedings of the VLDB Endowment10.14778/3402755.34027564:12(1213-1224)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402755.3402756
Johannessen EKarlsen RChbeir RManolopoulos YAkerkar RMizera-Pietraszko J(2020)Incremental Information RetrievalProceedings of the 10th International Conference on Web Intelligence, Mining and Semantics10.1145/3405962.3405969(169-177)Online publication date: 30-Jun-2020
https://dl.acm.org/doi/10.1145/3405962.3405969
Show More Cited By

Index Terms

Pruned query evaluation using pre-computed impacts
1. Information systems
  1. Information retrieval
  2. Information storage systems
    1. Record storage systems
      1. Record storage alternatives
        Hashed file organization
        Indexed file organization

Recommendations

Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

We study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Techniques for partial query evaluation
Incremental evaluation of top-k combinatorial metric skyline query

In this paper, we define a novel type of skyline query, namely top-k combinatorial metric skyline (kCMS) query. The kCMS query aims to find k combinations of data points according to a monotonic preference function such that each combination has the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

August 2006

768 pages

ISBN:1595933697

DOI:10.1145/1148170

General Chair:
Efthimis N. Efthimiadis
University of Washington
,
Program Chairs:
Susan Dumais
Microsoft Research, Redmond
,
David Hawking
CSIRO ICT Centre, Canberra, Australia
,
Kalervo Järvelin,
University of Tampere, Finland

Copyright © 2006 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGIR06

Sponsor:

SIGIR06: The 29th Annual International SIGIR Conference

August 6 - 11, 2006

Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

129
Total Citations
View Citations
1,140
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Aggarwal CAggarwal C(2022)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-030-96623-2_9(257-302)Online publication date: 10-Feb-2022
https://doi.org/10.1007/978-3-030-96623-2_9
Fontoura MJosifovski VLiu JVenkatesan SZhu XZien J(2020)Evaluation strategies for top-k queries over memory-resident inverted indexesProceedings of the VLDB Endowment10.14778/3402755.34027564:12(1213-1224)Online publication date: 3-Jun-2020
https://dl.acm.org/doi/10.14778/3402755.3402756
Johannessen EKarlsen RChbeir RManolopoulos YAkerkar RMizera-Pietraszko J(2020)Incremental Information RetrievalProceedings of the 10th International Conference on Web Intelligence, Mining and Semantics10.1145/3405962.3405969(169-177)Online publication date: 30-Jun-2020
https://dl.acm.org/doi/10.1145/3405962.3405969
Zhou LBhuyan LRamakrishnan K(2020)Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00059(637-349)Online publication date: Oct-2020
https://doi.org/10.1109/MICRO50266.2020.00059
Mohideen AMajumdar SSt-Hilaire MEl-Haraki A(2020)A Data Indexing Technique to Improve the Search Latency of AND Queries for Large Scale Textual Documents2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)10.1109/BDCAT50828.2020.00019(37-46)Online publication date: Dec-2020
https://doi.org/10.1109/BDCAT50828.2020.00019
Zhong WRohatgi SWu JGiles CZanibbi R(2020)Accelerating Substructure Similarity Search for Formula RetrievalAdvances in Information Retrieval10.1007/978-3-030-45439-5_47(714-727)Online publication date: 8-Apr-2020
https://doi.org/10.1007/978-3-030-45439-5_47
Trotman ACrane M(2019) Micro‐ and macro‐optimizations of S aa T search Software: Practice and Experience10.1002/spe.268349:5(942-950)Online publication date: 20-Feb-2019
https://doi.org/10.1002/spe.2683
Tonellotto NMacdonald CCollins-Thompson KMei QDavison BLiu YYilmaz E(2018)Efficient Query Processing InfrastructuresThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210191(1403-1406)Online publication date: 27-Jun-2018
https://dl.acm.org/doi/10.1145/3209978.3210191
Siedlaczek MWang QChen YSuel T(2018)Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval Systems2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8621935(821-830)Online publication date: Dec-2018
https://doi.org/10.1109/BigData.2018.8621935
Aggarwal CAggarwal C(2018)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-319-73531-3_9(259-304)Online publication date: 20-Mar-2018
https://doi.org/10.1007/978-3-319-73531-3_9
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten