Abstract
Operating directly on compressed data can decrease CPU costs. Many light-weight compressions, such as run-length encoding and bit-vector encoding, can gain this benefit easily. Heavy-Weight Lempel-Ziv (LZ) has no method to operate directly on compressed data. We proposed a join algorithm, LZ join, which join two relations R and S directly on compressed data when decoding. Regard R as probe table and S as build table, R is encoded by LZ. When R probing S, LZ join decreases the join cost by using cached results (previous join results of IDs in R’s LZ dictionary window when decoder find that the same R’s ID sequence in window). LZ join combines decoding and join phase into one, which reduces the memory usage for decoding the whole R and CPU overhead for probing those cached results. Our analysis and experiments show that LZ join is better in some cases, the more compression ratio the better.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating Compression and Execution in Column-Oriented Database Systems. In: The 2006 ACM SIGMOD conference on Management of data, pp. 671–682. ACM Press, Chicago (2006)
Graefe, G., Shapiro, L.: Data compression and database performance. In: ACM/IEEE-CS Symp. on Applied Computing, pp. 22–27. ACM Press, New York (1991)
Daniel, J.A., Daniel, S.M., David, J.D., Samuel, R.M.: Materialization Strategies in a Column-Oriented DBMS. In: The 23rd International Conference on Data Engineering, pp. 466–475. IEEE Press, Turkey (2007)
Mike, S., Daniel, J.A., Adam, B., et al.: C-Store: A Column-oriented DBMS. In: The 31st Very Large DataBase Conference, Norway, pp. 553–564 (2005)
Peter, B., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: First Biennial Conference on Innovative Data Systems Research, CA, pp. 225–237 (2003)
Huffman, D.: A method for the construction of minimum-redundancy codes. In: Proc. IRE, pp. 1098–1101 (1952)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Neil, P.E.O’., Neil, E.J.O’., Chen, X.: The Star Schema Benchmark (SSB), http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
Daniel, J.A., Peter, A.B., Stavros, H.: Column-oriented Database Systems. In: Proc of the 35th Very Large DataBase Conference (VLDB), France, pp. 1644–1645 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liang, G., RunHeng, L., Yan, J., Xin, J. (2010). Join Directly on Heavy-Weight Compressed Data in Column-Oriented Database. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-14246-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)