Join Directly on Heavy-Weight Compressed Data in Column-Oriented Database

Liang, Gan; RunHeng, Li; Yan, Jia; Xin, Jin

doi:10.1007/978-3-642-14246-8_35

Gan Liang²⁰,
Li RunHeng²⁰,
Jia Yan²⁰ &
…
Jin Xin²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

International Conference on Web-Age Information Management

1704 Accesses

Abstract

Operating directly on compressed data can decrease CPU costs. Many light-weight compressions, such as run-length encoding and bit-vector encoding, can gain this benefit easily. Heavy-Weight Lempel-Ziv (LZ) has no method to operate directly on compressed data. We proposed a join algorithm, LZ join, which join two relations R and S directly on compressed data when decoding. Regard R as probe table and S as build table, R is encoded by LZ. When R probing S, LZ join decreases the join cost by using cached results (previous join results of IDs in R’s LZ dictionary window when decoder find that the same R’s ID sequence in window). LZ join combines decoding and join phase into one, which reduces the memory usage for decoding the whole R and CPU overhead for probing those cached results. Our analysis and experiments show that LZ join is better in some cases, the more compression ratio the better.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abadi, D.J., Madden, S.R., Ferreira, M.C.: Integrating Compression and Execution in Column-Oriented Database Systems. In: The 2006 ACM SIGMOD conference on Management of data, pp. 671–682. ACM Press, Chicago (2006)
Chapter Google Scholar
Graefe, G., Shapiro, L.: Data compression and database performance. In: ACM/IEEE-CS Symp. on Applied Computing, pp. 22–27. ACM Press, New York (1991)
Google Scholar
Daniel, J.A., Daniel, S.M., David, J.D., Samuel, R.M.: Materialization Strategies in a Column-Oriented DBMS. In: The 23rd International Conference on Data Engineering, pp. 466–475. IEEE Press, Turkey (2007)
Google Scholar
Mike, S., Daniel, J.A., Adam, B., et al.: C-Store: A Column-oriented DBMS. In: The 31st Very Large DataBase Conference, Norway, pp. 553–564 (2005)
Google Scholar
Peter, B., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-Pipelining Query Execution. In: First Biennial Conference on Innovative Data Systems Research, CA, pp. 225–237 (2003)
Google Scholar
Huffman, D.: A method for the construction of minimum-redundancy codes. In: Proc. IRE, pp. 1098–1101 (1952)
Google Scholar
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Article MATH MathSciNet Google Scholar
Neil, P.E.O’., Neil, E.J.O’., Chen, X.: The Star Schema Benchmark (SSB), http://www.cs.umb.edu/~poneil/StarSchemaB.PDF
Google Scholar
Daniel, J.A., Peter, A.B., Stavros, H.: Column-oriented Database Systems. In: Proc of the 35th Very Large DataBase Conference (VLDB), France, pp. 1644–1645 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Nation University of Defense Technology, 410073, ChangSha, HuNan, China
Gan Liang, Li RunHeng & Jia Yan
School of Software, ChangSha Social Work College, 410004, ChangSha, HuNan, China
Jin Xin

Authors

Gan Liang
View author publications
You can also search for this author in PubMed Google Scholar
Li RunHeng
View author publications
You can also search for this author in PubMed Google Scholar
Jia Yan
View author publications
You can also search for this author in PubMed Google Scholar
Jin Xin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Lei Chen
Computer Department, Sichuan University, 610064, Chengdu, China
Changjie Tang
Department of Computer Science, Duke University, Box 90129, NC 27708-0129, Durham, USA
Jun Yang
College of Computer Science, Zhejiang University, 388 Yuhangtang Road, 310058, Hangzhou, China
Yunjun Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, G., RunHeng, L., Yan, J., Xin, J. (2010). Join Directly on Heavy-Weight Compressed Data in Column-Oriented Database. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-14246-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics