Positional Data Organization and Compression in Web Inverted Indexes

Akritidis, Leonidas; Bozanis, Panayiotis

doi:10.1007/978-3-642-32600-4_31

Leonidas Akritidis²⁰ &
Panayiotis Bozanis²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7446))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

856 Accesses

Abstract

To sustain the tremendous workloads they suffer on a daily basis, Web search engines employ highly compressed data structures known as inverted indexes. Previous works demonstrated that organizing the inverted lists of the index in individual blocks of postings leads to significant efficiency improvements. Moreover, the recent literature has shown that the current state-of-the-art compression strategies such as PForDelta and VSEncoding perform well when used to encode the lists docIDs. In this paper we examine their performance when used to compress the positional values. We expose their drawbacks and we introduce PFBC, a simple yet efficient encoding scheme, which encodes the positional data of an inverted list block by using a fixed number of bits. PFBC allows direct access to the required data by avoiding costly look-ups and unnecessary information decoding, achieving several times faster positions decompression than the state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anh, V.N., Moffat, A.: Structured Index Organizations for High-Throughput Text Querying. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 304–315. Springer, Heidelberg (2006)
Chapter Google Scholar
Anh, V., Moffat, A.: Index compression using 64-bit words. Software: Practice and Experience 40(2), 131–147 (2010)
Google Scholar
Boldi, P., Vigna, S.: Compressed Perfect Embedded Skip Lists for Quick Inverted-Index Lookups. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 25–28. Springer, Heidelberg (2005)
Chapter Google Scholar
Buttcher, S., Clarke, C., Lushman, B.: Term proximity scoring for ad-hoc retrieval on very large text collections. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 621–622 (2006)
Google Scholar
Chierichetti, F., Kumar, R., Raghavan, P.: Compressed web indexes. In: Proceedings of the 18th International Conference on World Wide Web, pp. 451–460 (2009)
Google Scholar
Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proceedings of the Second ACM International Conference on Web Search and Data Mining, p. 1 (2009)
Google Scholar
Heman, S.: Super-Scalar Database Compression between RAM and CPU Cache. Master’s Thesis. University of Amsterdam. Amsterdam, The Netherlands (2005)
Google Scholar
Moffat, A., Zobel, J.: Self-indexing inverted files for fast text retrieval. ACM Transactions on Information Systems (TOIS) 14(4), 349–379 (1996)
Article Google Scholar
Silvestri, F., Venturini, R.: Vsencoding: efficient coding and fast decoding of integer lists via dynamic programming. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1219–1228 (2010)
Google Scholar
Transier, F., Sanders, P.: Engineering basic algorithms of an in-memory text search engine. ACM Transactions on Information Systems (TOIS) 29(1), 2 (2010)
Article Google Scholar
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images (1999)
Google Scholar
Yan, H., Ding, S., Suel, T.: Compressing term positions in web indexes. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 147–154 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer & Communication Engineering, University of Thessaly, Volos, Greece
Leonidas Akritidis & Panayiotis Bozanis

Authors

Leonidas Akritidis
View author publications
You can also search for this author in PubMed Google Scholar
Panayiotis Bozanis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Marriott School, Brigham Young University, 784 TNRB, 84602, Provo, UT, USA
Stephen W. Liddle
Software Competence Center Hagenberg, Softwarepark 21, 4232, Hagenberg, Austria
Klaus-Dieter Schewe
Institute of Software Technology & Interactive Systems, Vienna University of Technology, Favoritenstr. 9-11/188, 1040, Vienna, Austria
A Min Tjoa
School of Information Technology and Electrical Engineering, University of Queensland, 4072, Brisbane, QLD, Australia
Xiaofang Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Akritidis, L., Bozanis, P. (2012). Positional Data Organization and Compression in Web Inverted Indexes. In: Liddle, S.W., Schewe, KD., Tjoa, A.M., Zhou, X. (eds) Database and Expert Systems Applications. DEXA 2012. Lecture Notes in Computer Science, vol 7446. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32600-4_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-32600-4_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32599-1
Online ISBN: 978-3-642-32600-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics