Skip to main content

Positional Data Organization and Compression in Web Inverted Indexes

  • Conference paper
Database and Expert Systems Applications (DEXA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7446))

Included in the following conference series:

  • 856 Accesses

Abstract

To sustain the tremendous workloads they suffer on a daily basis, Web search engines employ highly compressed data structures known as inverted indexes. Previous works demonstrated that organizing the inverted lists of the index in individual blocks of postings leads to significant efficiency improvements. Moreover, the recent literature has shown that the current state-of-the-art compression strategies such as PForDelta and VSEncoding perform well when used to encode the lists docIDs. In this paper we examine their performance when used to compress the positional values. We expose their drawbacks and we introduce PFBC, a simple yet efficient encoding scheme, which encodes the positional data of an inverted list block by using a fixed number of bits. PFBC allows direct access to the required data by avoiding costly look-ups and unnecessary information decoding, achieving several times faster positions decompression than the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anh, V.N., Moffat, A.: Structured Index Organizations for High-Throughput Text Querying. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 304–315. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  2. Anh, V., Moffat, A.: Index compression using 64-bit words. Software: Practice and Experience 40(2), 131–147 (2010)

    Google Scholar 

  3. Boldi, P., Vigna, S.: Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  4. Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622 (2006)

    Google Scholar 

  5. Chierichetti, F., Kumar, R., Raghavan, P.: Compressed web indexes. In: Proceedings of the 18th International Conference on World Wide Web, pp. 451–460 (2009)

    Google Scholar 

  6. Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, p. 1 (2009)

    Google Scholar 

  7. Heman, S.: Super-Scalar Database Compression between RAM and CPU Cache. Master’s Thesis. University of Amsterdam. Amsterdam, The Netherlands (2005)

    Google Scholar 

  8. Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems (TOIS) 14(4), 349–379 (1996)

    Article  Google Scholar 

  9. Silvestri, F., Venturini, R.: Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1219–1228 (2010)

    Google Scholar 

  10. Transier, F., Sanders, P.: Engineering basic algorithms of an in-memory text search engine. ACM Transactions on Information Systems (TOIS) 29(1), 2 (2010)

    Article  Google Scholar 

  11. Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images (1999)

    Google Scholar 

  12. Yan, H., Ding, S., Suel, T.: Compressing term positions in web indexes. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 147–154 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Akritidis, L., Bozanis, P. (2012). Positional Data Organization and Compression in Web Inverted Indexes. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32600-4_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32599-1

  • Online ISBN: 978-3-642-32600-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics