Abstract
In this paper we look at a combination of bulk-compression, partial query processing and skipping for document-ordered inverted indexes. We propose a new inverted index organization, and provide an updated version of the MaxScore method by Turtle and Flood and a skipping-adapted version of the space-limited adaptive pruning method by Lester et al. Both our methods significantly reduce the number of processed elements and reduce the average query latency by more than three times. Our experiments with a real implementation and a large document collection are valuable for a further research within inverted index skipping and query processing optimizations.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Boldi, P., Vigna, S.: Compressed perfect embedded skip lists for quick inverted-index lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)
Broder, A., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proc. CIKM. ACM, New York (2003)
Buckley, C., Lewit, A.: Optimization of inverted vector searches. In: Proc. SIGIR, pp. 97–110. ACM, New York (1985)
Büttcher, S., Clarke, C.: Index compression is good, especially for random access. In: Proc. CIKM, pp. 761–770. ACM, New York (2007)
Büttcher, S., Clarke, C., Soboroff, I.: The trec 2006 terabyte track. In: Proc. TREC (2006)
Chierichetti, F., Lattanzi, S., Mari, F., Panconesi, A.: On placing skips optimally in expectation. In: Proc. WSDM, pp. 15–24. ACM, New York (2008)
Clarke, C., Soboroff, I.: The trec 2005 terabyte track. In: Proc. TREC (2005)
Ding, S., He, J., Yan, H., Suel, T.: Using graphics processors for high-performance ir query processing. In: Proc. WWW, pp. 1213–1214. ACM, New York (2008)
Lacour, P., Macdonald, C., Ounis, I.: Efficiency comparison of document matching techniques. In: Proc. ECIR (2008)
Lester, N., Moffat, A., Webber, W., Zobel, J.: Space-limited ranked query evaluation using adaptive pruning. In: Ngu, A.H.H., Kitsuregawa, M., Neuhold, E.J., Chung, J.-Y., Sheng, Q.Z. (eds.) WISE 2005. LNCS, vol. 3806, pp. 470–477. Springer, Heidelberg (2005)
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10(3), 205–231 (2007)
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Trans. Inf. Syst. 14(4), 349–379 (1996)
Strohman, T., Croft, W.: Efficient document retrieval in main memory. In: Proc. SIGIR, pp. 175–182. ACM, New York (2007)
Strohman, T., Turtle, H., Croft, W.: Optimization strategies for complex queries. In: Proc. SIGIR, pp. 219–225. ACM, New York (2005)
Turtle, H., Flood, J.: Query evaluation: strategies and optimizations. Inf. Process. Manage. 31(6), 831–850 (1995)
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. WWW, pp. 401–410. ACM, New York (2009)
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. WWW, pp. 387–396. ACM, New York (2008)
Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2) (2006)
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar ram-cpu cache compression. In: Proc. ICDE, p. 59. IEEE Computer Society, Los Alamitos (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jonassen, S., Bratsberg, S.E. (2011). Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries. In: Clough, P., et al. Advances in Information Retrieval. ECIR 2011. Lecture Notes in Computer Science, vol 6611. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20161-5_53
Download citation
DOI: https://doi.org/10.1007/978-3-642-20161-5_53
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20160-8
Online ISBN: 978-3-642-20161-5
eBook Packages: Computer ScienceComputer Science (R0)