Skip to main content

Direct Transformation Techniques for Compressed Data: General Approach and Application Scenarios

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9282))

  • 1045 Accesses

Abstract

Lightweight data compression techniques like dictionary or run-length compression play an important role in main memory database systems. Having decided for a compression scheme for a dataset, the transformation to another scheme is very inefficient today. The common approach works as follows: First, the compressed data is decompressed using the source decompression algorithm resulting in the materialization of the raw data in main memory. Second, the compression algorithm of the destination scheme is applied. This indirect way relies on existing algorithms, but is very inefficient, since the whole uncompressed data has to be materialized as an intermediate step. To overcome these drawbacks, we propose a novel approach called direct transformation, which avoids the materialization of the whole uncompressed data. Our techniques are cache optimized to reduce necessary data accesses. Moreover, we present application scenarios, where such direct transformations can be efficiently applied.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Our source code is downloadable at https://wwwdb.inf.tu-dresden.de/team/staff/patrick-damme-msc/.

  2. 2.

    We call a block homogeneous, if it contains just one distinct value. Otherwise we call it heterogeneous.

  3. 3.

    Following Schlegel et al. [7], we use the term effective bits to denote all but the leading zero bits of a value. The analogous holds for the term effective bytes. By definition, the value zero also has one effective bit respectively one effective byte.

References

  1. Abadi, D., Madden, S., Ferreira, M.: Integrating compression and execution in column-oriented database systems. In: SIGMOD, pp. 671–682 (2006)

    Google Scholar 

  2. Chen, Z., Gehrke, J., Korn, F.: Query optimization in compressed database systems. SIGMOD Rec. 30(2), 271–282 (2001)

    Article  Google Scholar 

  3. Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: ICDE, pp. 370–379 (1998)

    Google Scholar 

  4. Huffman, D.: A method for the construction of minimum-redundancy codes. Proc. IRE 40(9), 1098–1101 (1952)

    Article  MATH  Google Scholar 

  5. Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. In: CoRR abs/1209.2137 (2012)

    Google Scholar 

  6. Roth, M.A., Van Horn, S.J.: Database compression. SIGMOD Rec. 22(3), 31–39 (1993)

    Article  Google Scholar 

  7. Schlegel, B., Gemulla, R., Lehner, W.: Fast integer compression using simd instructions. In: DaMoN Workshop, pp. 34–40 (2010)

    Google Scholar 

  8. Stepanov, A.A., Gangolli, A.R., Rose, D.E., Ernst, R.J., Oberoi, P.S.: SIMD-based decoding of posting lists. In: CIKM, pp. 317–326 (2011)

    Google Scholar 

  9. Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: ICDE, pp. 59–70 (2006)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the German Research Foundation (DFG) in the context of the project “Lightweight Compression Techniques for the Optimization of Complex Database Queries” (LE-1416/26-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Damme .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Damme, P., Habich, D., Lehner, W. (2015). Direct Transformation Techniques for Compressed Data: General Approach and Application Scenarios. In: Tadeusz, M., Valduriez, P., Bellatreche, L. (eds) Advances in Databases and Information Systems. ADBIS 2015. Lecture Notes in Computer Science(), vol 9282. Springer, Cham. https://doi.org/10.1007/978-3-319-23135-8_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-23135-8_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-23134-1

  • Online ISBN: 978-3-319-23135-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics