Direct Transformation Techniques for Compressed Data: General Approach and Application Scenarios

Damme, Patrick; Habich, Dirk; Lehner, Wolfgang

doi:10.1007/978-3-319-23135-8_11

Patrick Damme¹⁶,
Dirk Habich¹⁶ &
Wolfgang Lehner¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

Included in the following conference series:

East European Conference on Advances in Databases and Information Systems

1045 Accesses

Abstract

Lightweight data compression techniques like dictionary or run-length compression play an important role in main memory database systems. Having decided for a compression scheme for a dataset, the transformation to another scheme is very inefficient today. The common approach works as follows: First, the compressed data is decompressed using the source decompression algorithm resulting in the materialization of the raw data in main memory. Second, the compression algorithm of the destination scheme is applied. This indirect way relies on existing algorithms, but is very inefficient, since the whole uncompressed data has to be materialized as an intermediate step. To overcome these drawbacks, we propose a novel approach called direct transformation, which avoids the materialization of the whole uncompressed data. Our techniques are cache optimized to reduce necessary data accesses. Moreover, we present application scenarios, where such direct transformations can be efficiently applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Compression-Aware In-Memory Query Processing: Vision, System Design and Beyond

Compressing Big Data: When the Rate of Convergence to the Entropy Matters

Compressed String Dictionaries via Data-Aware Subtrie Compaction

Notes

1.
Our source code is downloadable at https://wwwdb.inf.tu-dresden.de/team/staff/patrick-damme-msc/.
2.
We call a block homogeneous, if it contains just one distinct value. Otherwise we call it heterogeneous.
3.
Following Schlegel et al. [7], we use the term effective bits to denote all but the leading zero bits of a value. The analogous holds for the term effective bytes. By definition, the value zero also has one effective bit respectively one effective byte.

References

Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)
Google Scholar
Chen, Z., Gehrke, J., Korn, F.: Query optimization in compressed database systems. SIGMOD Rec. 30(2), 271–282 (2001)
Article Google Scholar
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE, pp. 370–379 (1998)
Google Scholar
Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)
Article MATH Google Scholar
Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. In: CoRR abs/1209.2137 (2012)
Google Scholar
Roth, M.A., Van Horn, S.J.: Database compression. SIGMOD Rec. 22(3), 31–39 (1993)
Article Google Scholar
Schlegel, B., Gemulla, R., Lehner, W.: Fast integer compression using simd instructions. In: DaMoN Workshop, pp. 34–40 (2010)
Google Scholar
Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.: SIMD-based decoding of posting lists. In: CIKM, pp. 317–326 (2011)
Google Scholar
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: ICDE, pp. 59–70 (2006)
Google Scholar

Download references

Acknowledgments

This work was funded by the German Research Foundation (DFG) in the context of the project “Lightweight Compression Techniques for the Optimization of Complex Database Queries” (LE-1416/26-1).

Author information

Authors and Affiliations

Database Systems Group, Technische Universität Dresden, 01062, Dresden, Germany
Patrick Damme, Dirk Habich & Wolfgang Lehner

Authors

Patrick Damme
View author publications
You can also search for this author in PubMed Google Scholar
Dirk Habich
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Patrick Damme .

Editor information

Editors and Affiliations

Poznan University of Technology, Poznán, Poland
Morzy Tadeusz
INRIA, Montpellier, France
Patrick Valduriez
Teleport 2, LIAS/ISAE-ENSMA, Poitiers, France
Ladjel Bellatreche

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Damme, P., Habich, D., Lehner, W. (2015). Direct Transformation Techniques for Compressed Data: General Approach and Application Scenarios. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-23135-8_11
Published: 15 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23134-1
Online ISBN: 978-3-319-23135-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics