CDDTA-JOIN: One-Pass OLAP Algorithm for Column-Oriented Databases

Jiao, Min; Zhang, Yansong; Sun, Yan; Wang, Shan; Zhou, Xuan

doi:10.1007/978-3-642-29253-8_38

CDDTA-JOIN: One-Pass OLAP Algorithm for Column-Oriented Databases

Min Jiao^20,22,
Yansong Zhang²¹,
Yan Sun^20,22,
Shan Wang^20,22 &
…
Xuan Zhou^20,22

Conference paper

2155 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7235))

Abstract

Row-store commonly uses a volcano-style “once-a-tuple” pipeline processor for processing efficiency but looses the I/O efficiency when only a small part of columns are accessed in a wide table. The academic column-store usually uses “once-a-column” style processing for I/O and cache efficiency but it has to suffer multi-pass column scan for complex query. This paper focuses on how to achieve the maximal gains from storage models for both pipeline processing efficiency and column processing efficiency. Based on the “address-value” mapping for surrogate key in dimension table, we can map incremental primary keys as offset addresses, so the foreign keys in fact table can be utilized as native join index for dimensional tuples. We use predicate vector as bitmap vector filters for dimensions to enable star-join as pipeline operator and pre-generate hash aggregators for aggregat based on the column. Using these approaches, star-join and pre-grouping can be completed in one-pass scan on dimensional attributes in fact table, and the following aggregate column scanning responses for the sparse accessing aggregation. We can gain both I/O efficiency for vector processing and CPU efficiency for pipeline aggregating. We perform the experiments for both simulated algorithm based on the column and the commercial column-store database.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Boncz, P.A., Mangegold, S., Kersten, M.L.: Database architecture optimized for the new bottleneck: Memory access. In: VLDB, pp. 266–277 (1999)
Google Scholar
Bruno, N.: Teaching an Old Elephant New Tricks. In: CIDR 2009, Asilomar, California, USA (2009)
Google Scholar
Ailamaki, A., DeWitt, D.J., Hill, M.D.: Data page layouts for relational databases on deep memory hierarchies. The VLDB Journal 11(3), 198–215 (2002)
Article MATH Google Scholar
Ślęzak, D., Wróblewski, J., Eastwood, V., Synak, P.: Brighthouse: An Analytic Data Warehouse for Adhoc Queries. In: PVLDB 2008, August 23-28 (2008)
Google Scholar
Hankins, R.A., Patel, J.M.: Data morphing: an adaptive, cache-conscious storage technique. In: Proceedings VLDB, pp. 417–428 (2003)
Google Scholar
The Vertica Analytic Database: Rethinking Data Warehouse Architecture. WinterCorporation White Paper (May 2005)
Google Scholar
Abadi, D.J., Madden, S.R., Hachem, N.: Column-Stores vs. Row-Stores: How Different Are They Really? In: Proceeding of SIGMOD 2008, Vancouvrer, BC, Canada (2008)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-Store: A Column-oriented DBMS. In: Proceedings of the VLDB, Trondheim, Norway, pp. 553–564 (2005)
Google Scholar
MacNicol, R., French, B.: Sybase IQ Multiplex -Designed for analytics. In: Proceedings of VLDB (2004)
Google Scholar
Zukowski, M., Nes, N., Boncz, P.A.: DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing. In: DaMoN 2008, pp. 47–54 (2008)
Google Scholar
MOSS-DB: A Hardware-Aware OLAP Database. In: WAIM 2010, pp. 582–594 (2010)
Google Scholar
O’Neil, P., O’Neil, B., Chen, X.: The Star Schema Benchmark (SSB), http://www.cs.umb.edu/~poneil/StarSchemaB.PDF

Download references

Author information

Authors and Affiliations

DEKE Lab, Renmin University of China, Beijing, 100872, China
Min Jiao, Yan Sun, Shan Wang & Xuan Zhou
National Survey Research Center, Renmin University of China, Beijing, 100872, China
Yansong Zhang
School of Information, Renmin University of China, Beijing, 100872, China
Min Jiao, Yan Sun, Shan Wang & Xuan Zhou

Authors

Min Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Yansong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xuan Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, The University of Adelaide, Australia
Quan Z. Sheng
College of Information Science and Engineering, Northeastern University, 110819, Shenyang, China
Guoren Wang
Aarhus University, Denmark
Christian S. Jensen
Center for Applied Informatics, Victoria University, PO Box 14428, 8001, VIC, Australia
Guandong Xu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiao, M., Zhang, Y., Sun, Y., Wang, S., Zhou, X. (2012). CDDTA-JOIN: One-Pass OLAP Algorithm for Column-Oriented Databases. In: Sheng, Q.Z., Wang, G., Jensen, C.S., Xu, G. (eds) Web Technologies and Applications. APWeb 2012. Lecture Notes in Computer Science, vol 7235. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29253-8_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-29253-8_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29252-1
Online ISBN: 978-3-642-29253-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics