ABSTRACT
In recent years, OLAP technologies have become one of the important applications in the database industry. In particular, the datacube operation proposed in [5] receives strong attention among researchers as a fundamental research topic in the OLAP technologies. The datacube operation requires computation of aggregations on all possible combinations of each dimension attribute. As the number of dimensions increases, it becomes very expensive to compute datacubes, because the required computation cost grows exponentially with the increase of dimensions. Parallelization is very important factor for fast datacube computation. However, we cannot obtain sufficient performance gain in the presence of data skew even if the computation is parallelized. In this paper, we present a dynamic load balancing strategy, which enables us to extract the effectiveness of parallizing datacube computation sufficiently. We perform experiments based on simulations and show that our strategy performs well.
- 1.S. Agarwal, R. Agrawal, P. M. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan and S. Sarawagi, "On the Computation of Multidimentional Aggregates", In Proceedings of the International Conference on Very Large Databases, pages 506-52 1, 1996. Google ScholarDigital Library
- 2.K. S. Beyer and R. Ramakrishnan, "Bottom-Up Computation of Sparse and Iceberg CUBES", In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 359- 370,1999. Google ScholarDigital Library
- 3.P. M. Deshpande, S. Agarwal, J. F. Naughton and R. Ramakrishnan, "Computation of Multidimensional Aggregates", Technical Report 1314, University of Wisconsin, Madison, 1996.Google Scholar
- 4.D. J. Dewitt, J. F. Naughton, D. A. Schneider and S. Seshadri, "Practical Skew Handling in Parallel Joins", In Proceedings of the International Conference on Very Large Databases, pages 27-40, 1992. Google ScholarDigital Library
- 5.J. Gray, A. Bosworth, A. Layman and H. Pirahesh, "A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals", In Proceedings of the IEEE International Conference on Data Engineering, pages 152- 159, 1996. Google ScholarDigital Library
- 6.S. Goil and A. Choudhary, "High Performance OLAP and Data Mining on parallel computers", Journal of Data Mining and Knowledge DiscoveT, 1(4):391-417, 1997. Google ScholarDigital Library
- 7.K. A. Hua and C. Lee, "Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning", In Proceedings of the International Conference on Very Large Databases, pages 525-535, 1991. Google ScholarDigital Library
- 8.V. Harinarayan, A. Rajaraman and J. D. Ullman, "Implementing Data Cubes Efficiently", In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 205-2 16, 1996. Google ScholarDigital Library
- 9.M. Kitsuregawa and Y. Ogawa, "Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)", In Proceedings of the International Conference on Very Large Databases, pages 2 1 O-22 1, 1990. Google ScholarDigital Library
- 10.K. A. Ross and D. Srivastava, "Fast Computation of Sparse Datacubes", In Proceedings of the International Conference on Very Large Databases, pages 116-I 25, 1997. Google ScholarDigital Library
- 11.S. Sarawag, R. Agrawal and A. Gupta, "On Computing the Data Cube", Research Report RJ10026, IBM Almaden Research Center, San Jose, CA, 1996.Google Scholar
- 12.A. Shatdal and J. F. Naughton, "Adaptive Parallel Aggregation Algorithms", In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 104- 114, 1995. Google ScholarDigital Library
- 13.A. Shukla, P. Deshpande, J. F. Naughton and K. Ramasamy, "Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies", In Proceedings of the International Conference on Very Large Databases, pages 522-53 1, 1996 Google ScholarDigital Library
- 14.C. B. Walton, A. G. Dale and R. M. Jenevein, "A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins", In Proceedings of the International Conference on Very Large Databases, pages 537-548, 1991 Google ScholarDigital Library
- 15.Y. Zhao, P. M. Deshpande and J. F. Naughton, "An Array- Based Algorithm for Simultaneous Multidimensional Aggregates", In Proceedings of the ACM SIGMOD Conference on Management of Data, pages 159-170, 1997. Google ScholarDigital Library
Index Terms
- A dynamic load balancing strategy for parallel datacube computation
Recommendations
Parallel Multi-Dimensional ROLAP Indexing
CCGRID '03: Proceedings of the 3st International Symposium on Cluster Computing and the GridThis paper addresses the query performance issuefor Relational OLAP (ROLAP) datacubes. Wepresent a distributed multi-dimensional ROLAP indexingscheme which is practical to implement, requiresonly a small communication volume, and is fullyadapted to ...
Load balancing non-uniform parallel computations
AGERE! 2013: Proceedings of the 2013 workshop on Programming based on actors, agents, and decentralized controlDynamic load balancing is critical in achieving high performance in parallel systems, especially for applications with unpredictable workloads. Traditional load balancing approaches -- such as work-sharing and work-stealing -- often assume that the ...
Load balancing based concurrent execution of NAS parallel benchmarks with BYTE sequential benchmarks on a non-dedicated cluster
Dedicated clusters are becoming commonly used for high performance parallel processing. Computers of a non-dedicated cluster are often idle or lightly loaded. These under utilised computers can be employed to execute parallel applications. Thus, they ...
Comments