ABSTRACT
Previous research has shown that features based on term proximity are important for effective retrieval. However, they incur substantial costs in terms of larger inverted indexes and slower query execution times as compared to term-based features. This paper explores whether term proximity features based on approximate term positions are as effective as those based on exact term positions. We introduce the novel notion of approximate positional indexes based on dividing documents into coarse-grained buckets and recording term positions with respect to those buckets. We propose different approaches to defining the buckets and compactly encoding bucket ids. In the context of linear ranking functions, experimental results show that features based on approximate term positions are able to achieve effectiveness comparable to exact term positions, but with smaller indexes and faster query evaluation.
- S. Büttcher, C. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR, 2006. Google ScholarDigital Library
- D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR, 2001. Google ScholarDigital Library
- W. Croft, H. Turtle, and D. Lewis. The use of phrases and structured queries in information retrieval. In SIGIR, 1991. Google ScholarDigital Library
- J. Fagan. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Technical report, Cornell University, 1987. Google ScholarDigital Library
- J. Lin, D. Metzler, T. Elsayed, and L. Wang. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. In TREC, 2009.Google Scholar
- D. Metzler and W. Croft. A Markov random field model for term dependencies. In SIGIR, 2005. Google ScholarDigital Library
- D. Metzler and W. Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007. Google ScholarDigital Library
- S. Robertson, S. Walker, M. Hancock-Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In TREC, 1995.Google Scholar
- M. Srikanth and R. Srihari. Biterm language models for document retrieval. In SIGIR, 2002. Google ScholarDigital Library
- I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999. Google ScholarDigital Library
Index Terms
- When close enough is good enough: approximate positional indexes for efficient ranked retrieval
Recommendations
Efficient term proximity search with term-pair indexes
CIKM '10: Proceedings of the 19th ACM international conference on Information and knowledge managementThere has been a large amount of research on early termination techniques in web search and information retrieval. Such techniques return the top-k documents without scanning and evaluating the full inverted lists of the query terms. Thus, they can ...
A Lossy Compression Method on Positional Index for Efficient and Effective Retrieval
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge ManagementIn query processing, incorporating proximity between query terms is beneficial for effective retrieval. However, it brings inevitable storage and computing costs by using positional data in inverted indexes. In this paper, we propose a lossy method for ...
Should one use term proximity or multi-word terms for Arabic information retrieval?
Highlights- Explore whether term dependencies (TDs) can help improve Arabic IR systems.
- ...
AbstractRecently, several information retrieval (IR) models have been proposed in order to boost the retrieval performance using term dependencies. However, in the context of the Arabic language, most IR researchers have focused on the problem ...
Comments