skip to main content
10.1145/1148170.1148235acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Pruned query evaluation using pre-computed impacts

Published: 06 August 2006 Publication History

Abstract

Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exhaustive evaluation, yet still generating a demonstrably "good enough" set of answers to the query. In this work we propose new pruning methods that make use of impact-sorted indexes. Compared to exhaustive evaluation, the new methods reduce the amount of computation performed, reduce the amount of memory required for accumulators, reduce the amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision. These strong claims are backed by experiments using the TREC Terabyte collection and queries.

References

[1]
V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 35--42, New Orleans, Louisiana, September 2001. ACM Press, New York.
[2]
V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 226--233, Salvador, Brazil, August 2005. ACM Press, New York.
[3]
V. N. Anh and A. Moffat. Structured index organizations for high-throughput text querying. April 2006. Submitted for publication.
[4]
A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Y. Zien. Efficient query evaluation using a two-level retrieval process. In Proc. 2003 CIKM Int. Conf. Information and Knowledge Management, pages 426--434, New Orleans, Louisiana, November 2005. ACM Press, New York.
[5]
E. W. Brown. Fast evaluation of structured queries for information retrieval. In E. A. Fox, P. Ingwersen, and R. Fidel, editors, Proc. 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 30--38. ACM Press, New York, July 1995.
[6]
C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. 8th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 97--110, Montreal, Canada, June 1985. ACM Press, New York.
[7]
C. L. A. Clarke and F. Scholer. The TREC 2005 Terabyte Track. In The Fourthteenth Text REtrieval Conference (TREC 2005) Notebook, Gaithersburg, MD, November 2005. National Institute of Standards and Technology. http://trec.nist.gov/act_part/t14_notebook/t14.notebook.html.
[8]
E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web serach efficiency via a locality based static pruning method. In Proc. 14th International World Wide Web Conference, pages 235--244, Chiba, Japan, May 2005.
[9]
D. K. Harman and G. Candela. Retrieving records from a gigabyte of text on a minicomputer using statistical ranking. Journal of the American Society for Information Science, 581--589, August 1990.
[10]
D. Hawking. Efficiency/effectiveness trade-offs in query processing. ACM SIGIR Forum, 16--22, September 1998.
[11]
N. Lester, A. Moffat, W. Webber, and J. Zobel. Space-limited ranked query evaluation using adaptive pruning. In A. H. H. Ngu, M. Kitsuregawa, E. J. Neuhold, J.-Y. Chung, and Q. Z. Sheng, editors, Proc. 6th International Conference on Web Information Systems Engineering, pages 470--477, New York, November 2005. LNCS 3806, Springer.
[12]
A. Moffat and J. Zobel. Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems, 349--379, October 1996.
[13]
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. Journal of the American Society for Information Science, 749--764, October 1996.
[14]
A. Soffer, D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, and Y. S. Maarek. Static index pruning for information retrieval systems. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, Proc. 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43--50, New Orleans, Louisiana, September 2001. ACM Press, New York.
[15]
T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 219--225, Salvador, Brazil, August 2005. ACM Press, New York.
[16]
M. Theobold, R. Schenkel, and G. Weikum. Efficient and self-tuning incremental query expansion for top-k query processing. In G. Marchionini, A. Moffat, J. Tait, R. Baeza-Yates, and N. Ziviani, editors, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 242--249, Salvador, Brazil, August 2005. ACM Press, New York.
[17]
H. Turtle and J. Flood. Query evaluation: strategies and optimizations. Information Processing & Management, 831--850, November 1995.
[18]
E. M. Voorhees and D. K. Harman. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005. ISBN 0262220733.
[19]
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, San Francisco, second edition, 1999.

Cited By

View all
  • (2022)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-030-96623-2_9(257-302)Online publication date: 10-Feb-2022
  • (2020)Evaluation strategies for top-k queries over memory-resident inverted indexesProceedings of the VLDB Endowment10.14778/3402755.34027564:12(1213-1224)Online publication date: 3-Jun-2020
  • (2020)Incremental Information RetrievalProceedings of the 10th International Conference on Web Intelligence, Mining and Semantics10.1145/3405962.3405969(169-177)Online publication date: 30-Jun-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
August 2006
768 pages
ISBN:1595933697
DOI:10.1145/1148170
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 August 2006

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGIR06
Sponsor:
SIGIR06: The 29th Annual International SIGIR Conference
August 6 - 11, 2006
Washington, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-030-96623-2_9(257-302)Online publication date: 10-Feb-2022
  • (2020)Evaluation strategies for top-k queries over memory-resident inverted indexesProceedings of the VLDB Endowment10.14778/3402755.34027564:12(1213-1224)Online publication date: 3-Jun-2020
  • (2020)Incremental Information RetrievalProceedings of the 10th International Conference on Web Intelligence, Mining and Semantics10.1145/3405962.3405969(169-177)Online publication date: 30-Jun-2020
  • (2020)Gemini: Learning to Manage CPU Power for Latency-Critical Search Engines2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00059(637-349)Online publication date: Oct-2020
  • (2020)A Data Indexing Technique to Improve the Search Latency of AND Queries for Large Scale Textual Documents2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)10.1109/BDCAT50828.2020.00019(37-46)Online publication date: Dec-2020
  • (2020)Accelerating Substructure Similarity Search for Formula RetrievalAdvances in Information Retrieval10.1007/978-3-030-45439-5_47(714-727)Online publication date: 8-Apr-2020
  • (2019) Micro‐ and macro‐optimizations of S aa T search Software: Practice and Experience10.1002/spe.268349:5(942-950)Online publication date: 20-Feb-2019
  • (2018)Efficient Query Processing InfrastructuresThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210191(1403-1406)Online publication date: 27-Jun-2018
  • (2018)Fast Bag-Of-Words Candidate Selection in Content-Based Instance Retrieval Systems2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8621935(821-830)Online publication date: Dec-2018
  • (2018)Information Retrieval and Search EnginesMachine Learning for Text10.1007/978-3-319-73531-3_9(259-304)Online publication date: 20-Mar-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media