skip to main content
10.1145/1458082.1458174acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Can phrase indexing help to process non-phrase queries?

Published: 26 October 2008 Publication History

Abstract

Modern web search engines, while indexing billions of web pages, are expected to process queries and return results in a very short time. Many approaches have been proposed for efficiently computing top-k query results, but most of them ignore one key factor in the ranking functions of commercial search engines - term-proximity, which is the metric of the distance between query terms in a document. When term-proximity is included in ranking functions, most of the existing top-k algorithms will become inefficient. To address this problem, in this paper we propose to build a compact phrase index to speed up the search process when incorporating the term-proximity factor. The compact phrase index can help more accurately estimate the score upper bounds of unknown documents. The size of the phrase index is controlled by including a small portion of phrases which are possibly helpful for improving search performance. Phrase index has been used to process phrase queries in existing work. It is, however, to the best of our knowledge, the first time that phrase index is used to improve the performance of generic queries. Experimental results show that, compared with the state-of-the-art top-k computation approaches, our approach can reduce average query processing time to 1/5 for typical setttings.

References

[1]
V. N. Anh, O. de Kretser, and A. Moffat. Vector-space ranking with effective early termination. In SIGIR '01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pages 35--42, New York, NY, USA, 2001. ACM.
[2]
V. N. Anh and A. Moffat. Simplified similarity scoring using term ranks. In SIGIR '05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, pages 226--233, New York, NY, USA, 2005. ACM.
[3]
V. N. Anh and A. Moffat. Pruned query evaluation using pre-computed impacts. In SIGIR '06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 372--379, New York, NY, USA, 2006. ACM.
[4]
V. N. Anh and A. Moffat. Pruning strategies for mixed-mode querying. In CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management, pages 190--197, New York, NY, USA, 2006. ACM.
[5]
D. Bahle, H. E. Williams, and J. Zobel. Effcient phrase querying with an auxiliary index. In SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 215--221, New York,NY, USA, 2002. ACM.
[6]
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1-7):107--117, 1998.
[7]
E. S. de Moura, C. F. dos Santos, D. R. Fernandes, A. S. Silva, P. Calado, and M. A. Nascimento. Improving web search efficiency via a locality based static pruning method. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 235--244, New York, NY, USA, 2005. ACM.
[8]
R. Fagin. Combining fuzzy information from multiple systems. J. Comput. Syst. Sci., 58(1):83--99, 1999.
[9]
R. Fagin. Combining fuzzy information: an overview. SIGMOD Rec., 31(2):109--118, 2002.
[10]
R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In PODS '01: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 102--113, New York, NY, USA, 2001. ACM.
[11]
X. Long and T. Suel. Optimized query execution in large search engines with global page ordering. In vldb'2003: Proceedings of the 29th international conference on Very large data bases, pages 129--140. VLDB Endowment, 2003.
[12]
X. Long and T. Suel. Three-level caching for efficient query processing in large web search engines. In WWW '05: Proceedings of the 14th international conference on World Wide Web, pages 257--266, New York, NY, USA, 2005. ACM.
[13]
A. Ntoulas and J. Cho. Pruning policies for two-tiered inverted index with correctness guarantee. In SIGIR '07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pages 191--198, New York, NY, USA, 2007. ACM.
[14]
M. Persin, J. Zobel, and R. Sacks-Davis. Filtered document retrieval with frequency-sorted indexes. J. Am. Soc. Inf. Sci., 47(10):749--764, 1996.
[15]
Y. Rasolofo and J. Savoy. Term proximity scoring for keyword-based retrieval systems. In ECIR, pages 207--218, 2003.
[16]
S. Robertson, S. Walker, and M. Beaulieu. Okapi at trec-7: Automatic ad hoc, filtering, vlc and interactive track.
[17]
M. Theobald, G. Weikum, and R. Schenkel. Top-k query evaluation with probabilistic guarantees. In VLDB '04: Proceedings of the Thirtieth international conference on Very large data bases, pages 648--659. VLDB Endowment, 2004.
[18]
H. E. Williams, J. Zobel, and D. Bahle. Fast phrase querying with combined indexes. ACM Trans. Inf. Syst., 22(4):573--594, 2004.
[19]
I. H. Witten, A. Moffat, and T. C. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann Publishers, San Francisco, CA, 1999.
[20]
M. Zhu, S. Shi, M. Li, and J.-R. Wen. Effective top-k computation in retrieving structured documents with term-proximity support. In CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, pages771--780, New York, NY, USA, 2007. ACM.
[21]
J. Zobel and A. Moffat. Inverted files for text search engines. ACM Comput. Surv., 38(2):6, 2006.

Cited By

View all
  • (2017)Top-k Term-Proximity in Succinct SpaceAlgorithmica10.1007/s00453-016-0167-278:2(379-393)Online publication date: 1-Jun-2017
  • (2016)Fast First-Phase Candidate Generation for Cascading RankersProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911515(295-304)Online publication date: 7-Jul-2016
  • (2015)Fast Image Retrieval: Query Pruning and Early TerminationIEEE Transactions on Multimedia10.1109/TMM.2015.240856317:5(648-659)Online publication date: May-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '08: Proceedings of the 17th ACM conference on Information and knowledge management
October 2008
1562 pages
ISBN:9781595939913
DOI:10.1145/1458082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2008

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. compact phrase indexing
  2. dynamic index pruning
  3. phrase index
  4. term proximity
  5. top-k

Qualifiers

  • Research-article

Conference

CIKM08
CIKM08: Conference on Information and Knowledge Management
October 26 - 30, 2008
California, Napa Valley, USA

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2017)Top-k Term-Proximity in Succinct SpaceAlgorithmica10.1007/s00453-016-0167-278:2(379-393)Online publication date: 1-Jun-2017
  • (2016)Fast First-Phase Candidate Generation for Cascading RankersProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2911515(295-304)Online publication date: 7-Jul-2016
  • (2015)Fast Image Retrieval: Query Pruning and Early TerminationIEEE Transactions on Multimedia10.1109/TMM.2015.240856317:5(648-659)Online publication date: May-2015
  • (2014)Incremental Text Indexing for Fast Disk-Based SearchACM Transactions on the Web10.1145/25608008:3(1-31)Online publication date: 8-Jul-2014
  • (2014)Entity linking at the tailProceedings of the 7th ACM international conference on Web search and data mining10.1145/2556195.2556230(453-462)Online publication date: 24-Feb-2014
  • (2014)Efficient instant-fuzzy search with proximity ranking2014 IEEE 30th International Conference on Data Engineering10.1109/ICDE.2014.6816662(328-339)Online publication date: Mar-2014
  • (2014)Top-$$k$$ Term-Proximity in Succinct SpaceAlgorithms and Computation10.1007/978-3-319-13075-0_14(169-180)Online publication date: 8-Nov-2014
  • (2012)Optimized top-k processing with global page scores on block-max indexesProceedings of the fifth ACM international conference on Web search and data mining10.1145/2124295.2124346(423-432)Online publication date: 8-Feb-2012
  • (2012)High-performance processing of text queries with tunable pruned term and term pair indexesACM Transactions on Information Systems10.1145/2094072.209407730:1(1-32)Online publication date: 6-Mar-2012
  • (2011)Query efficiency prediction for dynamic pruningProceedings of the 9th workshop on Large-scale and distributed informational retrieval10.1145/2064730.2064734(3-8)Online publication date: 28-Oct-2011
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media