ABSTRACT
We develop a new static index pruning criterion based on the notion of information preservation. This idea is motivated by the fact that model degeneration, as does static index pruning, inevitably reduces the predictive power of the resulting model. We model this loss in predictive power using conditional entropy and show that the decision in static index pruning can therefore be optimized to preserve information as much as possible. We evaluated the proposed approach on three different test corpora, and the result shows that our approach is comparable in retrieval performance to state-of-the-art methods. When efficiency is of concern, our method has some advantages over the reference methods and is therefore suggested in Web retrieval settings.
- R. Blanco and A. Barreiro. Static pruning of terms in inverted files advances in information retrieval. In G. Amati, C. Carpineto, and G. Romano, editors, Advances in Information Retrieval, volume 4425 of Lecture Notes in Computer Science, chapter 9, pages 64--75. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2007. Google ScholarDigital Library
- R. Blanco and A. Barreiro. Probabilistic static pruning of inverted files. ACM Transactions on Information Systems, 28(1), Jan. 2010. Google ScholarDigital Library
- S. Büttcher and C. L. A. Clarke. A document-centric approach to static index pruning in text retrieval systems. In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM'06, pages 182--189, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- D. Carmel, D. Cohen, R. Fagin, E. Farchi, M. Herscovici, Y. S. Maarek, and A. Soffer. Static index pruning for information retrieval systems. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '01, pages 43--50, New York, NY, USA, 2001. ACM. Google ScholarDigital Library
- S. Robertson. The probability ranking principle in IR. In K. S. Jones and P. Willett, editors, Reading in Information Retrieval, chapter The probability ranking principle in IR, pages 281--286. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1997. Google ScholarDigital Library
- L. Zheng and I. J. Cox. Entropy-Based static index pruning. In M. Boughanem, C. Berrut, J. Mothe, and C. Soule-Dupuy, editors, Advances in Information Retrieval, volume 5478 of Lecture Notes in Computer Science, chapter 72, pages 713--718. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2009. Google ScholarDigital Library
Index Terms
- Information preservation in static index pruning
Recommendations
A document-centric approach to static index pruning in text retrieval systems
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge managementWe present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a document-centric approach to decide whether a posting for a given term should remain in the index or not. The decision is made based on the term's ...
An Online Static Index Pruning Algorithm
CSA '13: Proceedings of the 2013 International Conference on Computer Sciences and ApplicationsStatic index pruning can significantly reduce index size and query processing time. An online static index pruning algorithm is presented, which is a term-centric method and adopts BM25 weighting as the pruning measure. The algorithm scans through ...
A Fast Static Index Pruning Algorithm
ICCC '13: Proceedings of the Second International Conference on Innovative Computing and Cloud ComputingAs a query processing optimization technique over inverted index, static index pruning can significantly reduce index size and query processing time. A fast static index pruning algorithm is presented, which is a term-centric method and adopts BM25 ...
Comments