Abstract
Data cube is an essential part of OLAP(On-Line Analytical Processing) to support efficiently multidimensional analysis for a large size of data. The computation of data cube takes much time, because a data cube with d dimensions consists of 2d (i.e., exponential order of d) cuboids. To build ROLAP (Relational OLAP) data cubes efficiently, many algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc.) have been developed, which share sort cost and input data scan and/or reduce data computation time. Several parallel processing algorithms have been also proposed. On the other hand, MapReduce is recently emerging for the framework processing huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies like short-share and computation-reduction which existing ROLAP algorithms use. In this paper, we propose two distributed parallel processing algorithms. The first algorithm called MRLevel, which takes advantages of the MapReduce framework. The second algorithm called MRPipeLevel, which is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. (Top-down approach is more effective to handle big data, compared to others such as bottom-up and special data structures which are dependent on main-memory size.) The proposed MRLevel algorithm tries to parallelize cube computation and to reduce the number of data scan by level at the same time. The MRPipeLevel algorithm is based on the advantages of the MRLevel and to reduce the number of data scan by pipelining at the same time. We implemented and evaluated the performance of this algorithm under the MapReduce framework. Through the experiments, we also identify the factors for performance enhancement in MapReduce to process very huge data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gray, J., et al.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. In: Proceedings of Conference on Data Engineering, New Orleans, LA, pp. 152–199, February 1996
Harinarayan, V., Rajaraman, A., Ullman, J.D.: Implementing data cubes efficiently. In: Proceedings of International Conference on Management of Data, ACM SIGMOD, Montreal, Canada, pp. 205–216, June 1996
Agarwal, S., et al.: On the computation of multidimensional aggregates. In: Proceedings of the 22nd International Conference on Very Large Data Bases, pp. 506–521, September 1996
Ross, K.A., Srivastava, D.: Fast computation of sparse datacubes. In: Proceedings of the 23rd International Conference on Very Large Data Bases, pp. 116–125, August 1997
Beyer, K., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: Proceedings of International Conference on Management of Data, ACM SIGMOD, Philadelphia, PA, pp. 359–370, June 1999
Dehne, F., Eavis, T., Rau-Chaplin, A.: The cgmCUBE project: optimizing parallel data cube generation for ROLAP. Distrib. Parallel Databases 19(1), 29–62 (2006)
Chen, Y., Dehne, F., Eavis, T., Rau-Chaplin, A.: PnP: parallel and external memory iceberg cube computation. In: Proceedings of the International Conference on Data Engineering, Tokyo, Japan, pp. 576–577, April 2005
Jin, R., Vaidyanathan, K., Yang, G., Agrawal, G.: Communication and memory optimal parallel data cube construction. Parallel Distrib. Syst. 16(12), 1105–1119 (2005)
Ng, R. T., Wagner, A., and Yin, Y.: Iceberg-cube computation with PC clusters. In: Proceedings of International Conference on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp. 25–36, June 2001
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google File System. In: Proceedings of 19th on operating Systems Principles, Bolton Landing, NY, pp. 29–43, December 2003
Hadoop. http://hadoop.apache.org/
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Jinguo, Y., Jianging, X., Pingjian, Z., Hu, C.: A parallel algorithm for closed cube computation. In: Proceedings of 7th International Conference on Computer and Information Science, Portland, OR, pp. 95–99, May 2008
Yuxiang, W., Aibo, S., Junzhou, L.: A MapReduceMerge-based data cube construction method.” In: Proceedings of 9th International Conference on Grid and Cooperative Computing, Nanjing, China, pp. 1–6, Nov. 2010
Lee, S., Moon, Y.-S., Kim, J.: Distributed parallel top-down computation of data cube using MapReduce. In: Proceedings of 3rd International Conference on Emerging Databases, Inchoen, Korea, pp. 303–306, August 2011
Nandi, A., Yu, C., Bohannon, P., Ramakrishnan, R.: Distributed cube materialization on holistic measures. In: Proceedings 27th International Conference on Data Engineering, Hannover, Germany, pp. 183–194, April 2011
Cuzzocrea, A.: Providing probabilistically-bounded approximate answers to non-holistic aggregate range queries in OLAP. In: Proceedings of 8th International Workshop on Data Warehousing and OLAP, Bremen, Germany, pp. 97–106, November 2005
Cuzzocrea, A. Sacca, D.: Balancing accuracy and privacy of OLAP aggregations on data cubes. In: Proceedings of 13th International Workshop on Data Warehousing and OLAP, Toronto, Canada, pp. 93–98, October 2010
Cuzzocrea, A., Darmont, J., Mahboubi, H.: Fragmenting very large XML data warehouses via k-means clustering algorithm. Int. J. Bus. Intell. Data Min. 4(3), 301–328 (2009)
Acknowledgement
This research work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (2011-0011824).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lee, S., Kim, J., Moon, YS., Lee, W. (2015). Efficient Level-Based Top-Down Data Cube Computation Using MapReduce. In: Hameurlain, A., Küng, J., Wagner, R., Cuzzocrea, A., Dayal, U. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems XXI. Lecture Notes in Computer Science(), vol 9260. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-47804-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-662-47804-2_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-47803-5
Online ISBN: 978-3-662-47804-2
eBook Packages: Computer ScienceComputer Science (R0)