skip to main content
10.1145/2063576.2063873acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

When close enough is good enough: approximate positional indexes for efficient ranked retrieval

Authors Info & Claims
Published:24 October 2011Publication History

ABSTRACT

Previous research has shown that features based on term proximity are important for effective retrieval. However, they incur substantial costs in terms of larger inverted indexes and slower query execution times as compared to term-based features. This paper explores whether term proximity features based on approximate term positions are as effective as those based on exact term positions. We introduce the novel notion of approximate positional indexes based on dividing documents into coarse-grained buckets and recording term positions with respect to those buckets. We propose different approaches to defining the buckets and compactly encoding bucket ids. In the context of linear ranking functions, experimental results show that features based on approximate term positions are able to achieve effectiveness comparable to exact term positions, but with smaller indexes and faster query evaluation.

References

  1. S. Büttcher, C. Clarke, and B. Lushman. Term proximity scoring for ad-hoc retrieval on very large text collections. In SIGIR, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In SIGIR, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. W. Croft, H. Turtle, and D. Lewis. The use of phrases and structured queries in information retrieval. In SIGIR, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Fagan. Experiments in automatic phrase indexing for document retrieval: A comparison of syntactic and non-syntactic methods. Technical report, Cornell University, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Lin, D. Metzler, T. Elsayed, and L. Wang. Of Ivory and Smurfs: Loxodontan MapReduce experiments for web search. In TREC, 2009.Google ScholarGoogle Scholar
  6. D. Metzler and W. Croft. A Markov random field model for term dependencies. In SIGIR, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Metzler and W. Croft. Linear feature-based models for information retrieval. Information Retrieval, 10(3):257--274, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Robertson, S. Walker, M. Hancock-Beaulieu, M. Gatford, and A. Payne. Okapi at TREC-4. In TREC, 1995.Google ScholarGoogle Scholar
  9. M. Srikanth and R. Srihari. Biterm language models for document retrieval. In SIGIR, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. I. Witten, A. Moffat, and T. Bell. Managing Gigabytes: Compressing and Indexing Documents and Images. Morgan Kaufmann, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. When close enough is good enough: approximate positional indexes for efficient ranked retrieval

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management
      October 2011
      2712 pages
      ISBN:9781450307178
      DOI:10.1145/2063576

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 October 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • poster

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader