skip to main content
10.1145/2396761.2398673acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
poster

Information preservation in static index pruning

Published:29 October 2012Publication History

ABSTRACT

We develop a new static index pruning criterion based on the notion of information preservation. This idea is motivated by the fact that model degeneration, as does static index pruning, inevitably reduces the predictive power of the resulting model. We model this loss in predictive power using conditional entropy and show that the decision in static index pruning can therefore be optimized to preserve information as much as possible. We evaluated the proposed approach on three different test corpora, and the result shows that our approach is comparable in retrieval performance to state-of-the-art methods. When efficiency is of concern, our method has some advantages over the reference methods and is therefore suggested in Web retrieval settings.

References

  1. R. Blanco and A. Barreiro. Static pruning of terms in inverted files advances in information retrieval. In G. Amati, C. Carpineto, and G. Romano, editors, Advances in Information Retrieval, volume 4425 of Lecture Notes in Computer Science, chapter 9, pages 64--75. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Blanco and A. Barreiro. Probabilistic static pruning of inverted files. ACM Transactions on Information Systems, 28(1), Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM'06, pages 182--189, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '01, pages 43--50, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Robertson. The probability ranking principle in IR. In K. S. Jones and P. Willett, editors, Reading in Information Retrieval, chapter The probability ranking principle in IR, pages 281--286. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. L. Zheng and I. J. Cox. Entropy-Based static index pruning. In M. Boughanem, C. Berrut, J. Mothe, and C. Soule-Dupuy, editors, Advances in Information Retrieval, volume 5478 of Lecture Notes in Computer Science, chapter 72, pages 713--718. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Information preservation in static index pruning

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management
          October 2012
          2840 pages
          ISBN:9781450311564
          DOI:10.1145/2396761

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 29 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • poster

          Acceptance Rates

          Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader