Skip to main content

Join Directly on Heavy-Weight Compressed Data in Column-Oriented Database

  • Conference paper
Web-Age Information Management (WAIM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

  • 1704 Accesses

Abstract

Operating directly on compressed data can decrease CPU costs. Many light-weight compressions, such as run-length encoding and bit-vector encoding, can gain this benefit easily. Heavy-Weight Lempel-Ziv (LZ) has no method to operate directly on compressed data. We proposed a join algorithm, LZ join, which join two relations R and S directly on compressed data when decoding. Regard R as probe table and S as build table, R is encoded by LZ. When R probing S, LZ join decreases the join cost by using cached results (previous join results of IDs in R’s LZ dictionary window when decoder find that the same R’s ID sequence in window). LZ join combines decoding and join phase into one, which reduces the memory usage for decoding the whole R and CPU overhead for probing those cached results. Our analysis and experiments show that LZ join is better in some cases, the more compression ratio the better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating Compression and Execution in Column-Oriented Database Systems. In: The 2006 ACM SIGMOD conference on Management of data, pp. 671–682. ACM Press, Chicago (2006)

    Chapter  Google Scholar 

  2. Graefe, G., Shapiro, L.: Data compression and database performance. In: ACM/IEEE-CS Symp. on Applied Computing, pp. 22–27. ACM Press, New York (1991)

    Google Scholar 

  3. Daniel, J.A., Daniel, S.M., David, J.D., Samuel, R.M.: Materialization Strategies in a Column-Oriented DBMS. In: The 23rd International Conference on Data Engineering, pp. 466–475. IEEE Press, Turkey (2007)

    Google Scholar 

  4. Mike, S., Daniel, J.A., Adam, B., et al.: C-Store: A Column-oriented DBMS. In: The 31st Very Large DataBase Conference, Norway, pp. 553–564 (2005)

    Google Scholar 

  5. Peter, B., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: First Biennial Conference on Innovative Data Systems Research, CA, pp. 225–237 (2003)

    Google Scholar 

  6. Huffman, D.: A method for the construction of minimum-redundancy codes. In: Proc. IRE, pp. 1098–1101 (1952)

    Google Scholar 

  7. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  8. Neil, P.E.O’., Neil, E.J.O’., Chen, X.: The Star Schema Benchmark (SSB), http://www.cs.umb.edu/~poneil/StarSchemaB.PDF

    Google Scholar 

  9. Daniel, J.A., Peter, A.B., Stavros, H.: Column-oriented Database Systems. In: Proc of the 35th Very Large DataBase Conference (VLDB), France, pp. 1644–1645 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liang, G., RunHeng, L., Yan, J., Xin, J. (2010). Join Directly on Heavy-Weight Compressed Data in Column-Oriented Database. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14246-8_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14245-1

  • Online ISBN: 978-3-642-14246-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics