poster

When close enough is good enough: approximate positional indexes for efficient ranked retrieval

Authors:

Tamer Elsayed,

Jimmy Lin,

Donald MetzlerAuthors Info & Claims

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

Pages 1993 - 1996

https://doi.org/10.1145/2063576.2063873

Published: 24 October 2011 Publication History

Get Access

Abstract

Previous research has shown that features based on term proximity are important for effective retrieval. However, they incur substantial costs in terms of larger inverted indexes and slower query execution times as compared to term-based features. This paper explores whether term proximity features based on approximate term positions are as effective as those based on exact term positions. We introduce the novel notion of approximate positional indexes based on dividing documents into coarse-grained buckets and recording term positions with respect to those buckets. We propose different approaches to defining the buckets and compactly encoding bucket ids. In the context of linear ranking functions, experimental results show that features based on approximate term positions are able to achieve effectiveness comparable to exact term positions, but with smaller indexes and faster query evaluation.

References

[1]

S. Büttcher, C. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR, 2006.

Digital Library

Google Scholar

[2]

D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR, 2001.

Digital Library

Google Scholar

[3]

W. Croft, H. Turtle, and D. Lewis. The use of phrases and structured queries in information retrieval. In SIGIR, 1991.

Digital Library

Google Scholar

[4]

J. Fagan. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Technical report, Cornell University, 1987.

Digital Library

Google Scholar

[5]

J. Lin, D. Metzler, T. Elsayed, and L. Wang. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. In TREC, 2009.

Google Scholar

[6]

D. Metzler and W. Croft. A Markov random field model for term dependencies. In SIGIR, 2005.

Digital Library

Google Scholar

[7]

D. Metzler and W. Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007.

Digital Library

Google Scholar

[8]

S. Robertson, S. Walker, M. Hancock-Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In TREC, 1995.

Google Scholar

[9]

M. Srikanth and R. Srihari. Biterm language models for document retrieval. In SIGIR, 2002.

Digital Library

Google Scholar

[10]

I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999.

Digital Library

Google Scholar

Cited By

View all

Gao SLiu JLiu XWang GZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)A Lossy Compression Method on Positional Index for Efficient and Effective RetrievalProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358125(2317-2320)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358125
Lu XMoffat ACulpepper JCarterette BFang HLalmas MNie J(2016)Efficient and Effective Higher Order Proximity ModelingProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970404(21-30)Online publication date: 12-Sep-2016
https://dl.acm.org/doi/10.1145/2970398.2970404
Lu XMoffat ACulpepper JBailey JMoffat AAggarwal Cde Rijke MKumar RMurdock VSellis TYu J(2015)On the Cost of Extracting Proximity Features for Term-Dependency ModelsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806467(293-302)Online publication date: 17-Oct-2015
https://dl.acm.org/doi/10.1145/2806416.2806467
Show More Cited By

Index Terms

When close enough is good enough: approximate positional indexes for efficient ranked retrieval
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing

Recommendations

Proximity relevance model for query expansion
SAC '16: Proceedings of the 31st Annual ACM Symposium on Applied Computing

Query expansion (QE) aims at improving information retrieval effectiveness by enhancing the query formulation. Because users' queries are generally short and because of the language ambiguity, some information needs are difficult to satisfy. Query ...
A Lossy Compression Method on Positional Index for Efficient and Effective Retrieval
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management

In query processing, incorporating proximity between query terms is beneficial for effective retrieval. However, it brings inevitable storage and computing costs by using positional data in inverted indexes. In this paper, we propose a lossy method for ...
Should one use term proximity or multi-word terms for Arabic information retrieval?
Highlights
- Explore whether term dependencies (TDs) can help improve Arabic IR systems.
- ...
Abstract
Recently, several information retrieval (IR) models have been proposed in order to boost the retrieval performance using term dependencies. However, in the context of the Arabic language, most IR researchers have focused on the problem ...

Comments

Information & Contributors

Information

Published In

CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

October 2011

2712 pages

ISBN:9781450307178

DOI:10.1145/2063576

Editors:
Bettina Berendt,
Arjen de Vries,
Wenfei Fan,
Craig Macdonald
University of Glasgow, UK
,
Iadh Ounis
University of Glasgow, UK
,
Ian Ruthven
University of Strathclyde, UK

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2011

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Poster

Conference

CIKM '11

Sponsor:

CIKM '11: International Conference on Information and Knowledge Management

October 24 - 28, 2011

Glasgow, Scotland, UK

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Sponsor:
sigir
sigir

The 34th ACM International Conference on Information and Knowledge Management

November 10 - 14, 2025

Seoul , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
197
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Gao SLiu JLiu XWang GZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)A Lossy Compression Method on Positional Index for Efficient and Effective RetrievalProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3358125(2317-2320)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3358125
Lu XMoffat ACulpepper JCarterette BFang HLalmas MNie J(2016)Efficient and Effective Higher Order Proximity ModelingProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970404(21-30)Online publication date: 12-Sep-2016
https://dl.acm.org/doi/10.1145/2970398.2970404
Lu XMoffat ACulpepper JBailey JMoffat AAggarwal Cde Rijke MKumar RMurdock VSellis TYu J(2015)On the Cost of Extracting Proximity Features for Term-Dependency ModelsProceedings of the 24th ACM International on Conference on Information and Knowledge Management10.1145/2806416.2806467(293-302)Online publication date: 17-Oct-2015
https://dl.acm.org/doi/10.1145/2806416.2806467
Chakrabarti ASatuluri VSrivathsan AParthasarathy S(2015)A Bayesian Perspective on Locality Sensitive Hashing with Extensions for Kernel MethodsACM Transactions on Knowledge Discovery from Data10.1145/277899010:2(1-32)Online publication date: 12-Oct-2015
https://dl.acm.org/doi/10.1145/2778990
Satuluri VParthasarathy S(2012)Bayesian locality sensitive hashing for fast similarity searchProceedings of the VLDB Endowment10.14778/2140436.21404405:5(430-441)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.14778/2140436.2140440
Potthast MHagen MStein BGraßegger JMichel MTippmann MWelsch CHersh WCallan JMaarek YSanderson M(2012)ChatNoirProceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval10.1145/2348283.2348429(1004-1004)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2348283.2348429

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Proximity relevance model for query expansion

A Lossy Compression Method on Positional Index for Efficient and Effective Retrieval

Should one use term proximity or multi-word terms for Arabic information retrieval?

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations