Skip to main content
Log in

Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

The pre-computation of data cubes is critical to improving the response time of On-Line Analytical Processing (OLAP) systems and can be instrumental in accelerating data mining tasks in large data warehouses. In order to meet the need for improved performance created by growing data sizes, parallel solutions for generating the data cube are becoming increasingly important. This paper presents a parallel method for generating data cubes on a shared-nothing multiprocessor. Since no (expensive) shared disk is required, our method can be used on low cost Beowulf style clusters consisting of standard PCs with local disks connected via a data switch. Our approach uses a ROLAP representation of the data cube where views are stored as relational tables. This allows for tight integration with current relational database technology.

We have implemented our parallel shared-nothing data cube generation method and evaluated it on a PC cluster, exploring relative speedup, local vs. global schedule trees, data skew, cardinality of dimensions, data dimensionality, and balance tradeoffs. For an input data set of 2,000,000 rows (72 Megabytes), our parallel data cube generation method achieves close to optimal speedup; generating a full data cube of ≈227 million rows (5.6 Gigabytes) on a 16 processors cluster in under 6 minutes. For an input data set of 10,000,000 rows (360 Megabytes), our parallel method, running on a 16 processor PC cluster, created a data cube consisting of ≈846 million rows (21.7 Gigabytes) in under 47 minutes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. Agarwal, R. Agarwal, P. Deshpande, A. Gupta, J. Naughton, R. Ramakrishnan, and S. Srawagi, “On the computation of multi-dimensional aggregates,” in Proc. 22nd VLDB Conf., 1996, pp. 506-521.

  2. K. Beyer and R. Ramakrishnan, “Bottom-up computation of sparse and iceberg cubes,” in ACM SIGMOD Conference on Management of Data, 1999, pp. 359-370.

  3. F. Dehne, T. Eavis, S. Hambrusch, and A. Rau-Chaplin, “Parallelizing the data cube,” Distributed and Parallel Databases, vol. 11, no. 2, pp. 181–201, 2002.

    Google Scholar 

  4. F. Dehne, T. Eavis, and A. Rau-Chaplin, “A cluster architecture for parallel data warehousing,” in Proc IEEE International Conference on Cluster Computing and the Grid (CCGrid 2001), Brisbane, Australia, 2001.

  5. F. Dehne, T. Eavis, and A. Rau-Chaplin, “Computing partial data cubes,” Technical report, http://www.cs.dal.ca/~arc/publications/2-30/paper.pdf, 2003.

  6. P. Flajolet and G. Martin, “Probablistic counting algorithms for database applications,” Journal of Computer and System Sciences, vol. 31, no. 2, pp. 182–209, 1985.

    Google Scholar 

  7. S. Goil and A. Choudhary, “High performance OLAP and data mining on parallel computers,” Journal of Data Mining and Knowledge Discovery, vol. 1, no. 4, pp. 391–417, 1997.

    Google Scholar 

  8. S. Goil and A.N. Choudhary, “High performance multidimensional analysis of large datasets,” in International Workshop on Data Warehousing and OLAP, 1998, pp. 34-39.

  9. S. Goil and A. Choudhary, “Aparallel scalable infrastructure for OLAP and data mining,” in Proc. International Data Engineering and Applications Symposium (IDEAS'99), Montreal, 1999.

  10. J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, and M. Venkatrao, “Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals,” J. Data Mining and Knowledge Discovery, vol. 1, no. 1, pp. 29–53, 1997.

    Google Scholar 

  11. J. Han, Y. Fu, W. Wang, J. Chiang, W. Gong, K. Koperski, D. Li, Y. Lu, A. Rajan, N. Stefanovic, B. Xia, and O.R. Zaiane, “DBMiner: A system for mining knowledge in large relational databases,” in Proc. 1996 Int'l Conf. on Data Mining and Knowledge Discovery (KDD'96), Portland, Oregon, 1996, pp. 250-255.

  12. V. Harinarayan, A. Rajaraman, and J. Ullman, “Implementing data cubes efficiently,” ACMSIGMOD Record, vol. 25, no. 2, pp. 205–216, 1996.

    Google Scholar 

  13. X. Li, P. Lu, J. Schaeffer, J. Shillington, P.S. Wong, and H. Shi, “On the versatility of parallel sorting by regular sampling,” Parallel Computing, vol. 19, no. 10, pp. 1079–1103, 1993.

    Google Scholar 

  14. H. Lu, X. Huang, and Z. Li, “Computing data cubes using massively parallel processors,” in Proc. 7th Parallel Computing Workshop (PCW'97), Canberra, Australia, 1997.

  15. K. Mehlhorn and S. Naeher, LEDA. http://www.mpi-sb.mpg.de/LEDA/, 1999.

  16. S. Muto and M. Kitsuregawa, “A dynamic load balancing strategy for parallel datacube computation,” in ACM Second International Workshop on Data Warehousing and OLAP (DOLAP 1999), 1999, pp. 67-72.

  17. S. Muto and M. Kitsuregawa, “A dynamic load balancing strategy for parallel datacube computation,” in Proceedings of the Second ACM InternationalWorkshop on DataWarehousing and OLAP, ACM Press, 1999, pp. 67-72.

  18. R. Ng, A. Wagner, and Y. Yin, “Iceberg-cube computation with pc clusters,” in ACM SIGMOD Conference on Management of Data, 2001, pp. 25-36.

  19. K. Ross and D. Srivastava, “Fast computation of sparse datacubes,” in Proc. 23rd VLDB Conference, 1997, pp. 116-125.

  20. S. Sarawagi, R. Agrawal, and A. Gupta, “On computing the data cube,” Technical report rj10026, IBM Almaden Research Center, San Jose, CA, 1996.

    Google Scholar 

  21. A. Shukla, P. Deshpende, J. Naughton, and K. Ramasamy, “Storage estimation for mutlidimensional aggregates in the presence of hierarchies,” in Proc. 22nd VLDB Conference, 1996, pp. 522-531.

  22. J.S. Vitter, “External memory algorithms and data structures: Dealing with MASSIVE DATA,” ACM Computing Surveys, vol. 33, no. 2, pp. 209–271, 2001.

    Google Scholar 

  23. J.S. Vitter and E.A.M. Shriver, “Algorithms for parallel memory I: Two-level memories,” Algorithmica, vol. 12, nos. 2/3, pp. 110–147, 1994.

    Google Scholar 

  24. J. Yu and H. Lu, “Multi-cube computation,” in Proc. 7th International Symposium on Database Systems for Advanced Applications, Hong Kong, 2001, pp. 126-133.

  25. Y. Zhao, P. Deshpande, and J.F. Naughton, “An array-based algorithm for simultaneous multidimensional aggregates,” in Proc. ACM SIGMOD Conf., 1997, pp. 159-170.

  26. G. Zipf, Human Behavior and The Principle of Least Effort, Addison-Wesley, 1949.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Dehne, F., Eavis, T. et al. Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors. Distributed and Parallel Databases 15, 219–236 (2004). https://doi.org/10.1023/B:DAPD.0000018572.20283.e0

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:DAPD.0000018572.20283.e0

Navigation