Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Chen, Ying; Dehne, Frank; Eavis, Todd; Rau-Chaplin, Andrew

doi:10.1023/B:DAPD.0000018572.20283.e0

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Published: May 2004

Volume 15, pages 219–236, (2004)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Ying Chen¹,
Frank Dehne²,
Todd Eavis¹ &
…
Andrew Rau-Chaplin¹

83 Accesses
31 Citations
Explore all metrics

Abstract

The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. This paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology.

We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2,000,000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of ≈227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of ≈846 million rows (21.7 Gigabytes) in under 47 minutes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

References

S. Agarwal, R. Agarwal, P. Deshpande, A. Gupta, J. Naughton, R. Ramakrishnan, and S. Srawagi, “On the computation of multi-dimensional aggregates,” in Proc. 22nd VLDB Conf., 1996, pp. 506-521.
K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg cubes,” in ACM SIGMOD Conference on Management of Data, 1999, pp. 359-370.
F. Dehne, T. Eavis, S. Hambrusch, and A. Rau-Chaplin, “Parallelizing the data cube,” Distributed and Parallel Databases, vol. 11, no. 2, pp. 181–201, 2002.
Google Scholar
F. Dehne, T. Eavis, and A. Rau-Chaplin, “A cluster architecture for parallel data warehousing,” in Proc IEEE International Conference on Cluster Computing and the Grid (CCGrid 2001), Brisbane, Australia, 2001.
F. Dehne, T. Eavis, and A. Rau-Chaplin, “Computing partial data cubes,” Technical report, http://www.cs.dal.ca/^~arc/publications/2-30/paper.pdf, 2003.
P. Flajolet and G. Martin, “Probablistic counting algorithms for database applications,” Journal of Computer and System Sciences, vol. 31, no. 2, pp. 182–209, 1985.
Google Scholar
S. Goil and A. Choudhary, “High performance OLAP and data mining on parallel computers,” Journal of Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 391–417, 1997.
Google Scholar
S. Goil and A.N. Choudhary, “High performance multidimensional analysis of large datasets,” in International Workshop on Data Warehousing and OLAP, 1998, pp. 34-39.
S. Goil and A. Choudhary, “Aparallel scalable infrastructure for OLAP and data mining,” in Proc. International Data Engineering and Applications Symposium (IDEAS'99), Montreal, 1999.
J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, and M. Venkatrao, “Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals,” J. Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 29–53, 1997.
Google Scholar
J. Han, Y. Fu, W. Wang, J. Chiang, W. Gong, K. Koperski, D. Li, Y. Lu, A. Rajan, N. Stefanovic, B. Xia, and O.R. Zaiane, “DBMiner: A system for mining knowledge in large relational databases,” in Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96), Portland, Oregon, 1996, pp. 250-255.
V. Harinarayan, A. Rajaraman, and J. Ullman, “Implementing data cubes efficiently,” ACMSIGMOD Record, vol. 25, no. 2, pp. 205–216, 1996.
Google Scholar
X. Li, P. Lu, J. Schaeffer, J. Shillington, P.S. Wong, and H. Shi, “On the versatility of parallel sorting by regular sampling,” Parallel Computing, vol. 19, no. 10, pp. 1079–1103, 1993.
Google Scholar
H. Lu, X. Huang, and Z. Li, “Computing data cubes using massively parallel processors,” in Proc. 7th Parallel Computing Workshop (PCW'97), Canberra, Australia, 1997.
K. Mehlhorn and S. Naeher, LEDA. http://www.mpi-sb.mpg.de/LEDA/, 1999.
S. Muto and M. Kitsuregawa, “A dynamic load balancing strategy for parallel datacube computation,” in ACM Second International Workshop on Data Warehousing and OLAP (DOLAP 1999), 1999, pp. 67-72.
S. Muto and M. Kitsuregawa, “A dynamic load balancing strategy for parallel datacube computation,” in Proceedings of the Second ACM InternationalWorkshop on DataWarehousing and OLAP, ACM Press, 1999, pp. 67-72.
R. Ng, A. Wagner, and Y. Yin, “Iceberg-cube computation with pc clusters,” in ACM SIGMOD Conference on Management of Data, 2001, pp. 25-36.
K. Ross and D. Srivastava, “Fast computation of sparse datacubes,” in Proc. 23rd VLDB Conference, 1997, pp. 116-125.
S. Sarawagi, R. Agrawal, and A. Gupta, “On computing the data cube,” Technical report rj10026, IBM Almaden Research Center, San Jose, CA, 1996.
Google Scholar
A. Shukla, P. Deshpende, J. Naughton, and K. Ramasamy, “Storage estimation for mutlidimensional aggregates in the presence of hierarchies,” in Proc. 22nd VLDB Conference, 1996, pp. 522-531.
J.S. Vitter, “External memory algorithms and data structures: Dealing with MASSIVE DATA,” ACM Computing Surveys, vol. 33, no. 2, pp. 209–271, 2001.
Google Scholar
J.S. Vitter and E.A.M. Shriver, “Algorithms for parallel memory I: Two-level memories,” Algorithmica, vol. 12, nos. 2/3, pp. 110–147, 1994.
Google Scholar
J. Yu and H. Lu, “Multi-cube computation,” in Proc. 7th International Symposium on Database Systems for Advanced Applications, Hong Kong, 2001, pp. 126-133.
Y. Zhao, P. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” in Proc. ACM SIGMOD Conf., 1997, pp. 159-170.
G. Zipf, Human Behavior and The Principle of Least Effort, Addison-Wesley, 1949.

Download references

Author information

Authors and Affiliations

Faculty of Computer Science, Dalhousie University, Halifax, Canada
Ying Chen, Todd Eavis & Andrew Rau-Chaplin
School of Computer Science, Carleton University, Ottawa, Canada
Frank Dehne

Authors

Ying Chen
View author publications
You can also search for this author in PubMed Google Scholar
Frank Dehne
View author publications
You can also search for this author in PubMed Google Scholar
Todd Eavis
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Rau-Chaplin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Dehne, F., Eavis, T. et al. Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors. Distributed and Parallel Databases 15, 219–236 (2004). https://doi.org/10.1023/B:DAPD.0000018572.20283.e0

Download citation

Issue Date: May 2004
DOI: https://doi.org/10.1023/B:DAPD.0000018572.20283.e0

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Containerization technologies: taxonomies, applications and challenges

Parallelizing the dual revised simplex method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

Abstract

Access this article

Similar content being viewed by others

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Containerization technologies: taxonomies, applications and challenges

Parallelizing the dual revised simplex method

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation