HaCube: Extending MapReduce for Efficient OLAP Cube Materialization and View Maintenance

Wang, Zhengkui; Chu, Yan; Tan, Kian-Lee; Agrawal, Divyakant; EI Abbadi, Amr

doi:10.1007/978-3-319-32049-6_8

HaCube: Extending MapReduce for Efficient OLAP Cube Materialization and View Maintenance

Zhengkui Wang¹⁹,
Yan Chu²⁰,
Kian-Lee Tan²¹,
Divyakant Agrawal²² &
…
Amr EI Abbadi²²

Conference paper
First Online: 25 March 2016

1505 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9643))

Abstract

Data cubes are widely used as a powerful tool to provide multi-dimensional views in data warehousing and On-Line Analytical Processing (OLAP). However, with increasing data sizes, it is becoming computationally expensive to perform data cube analysis. In this paper, we introduce HaCube, an extension of MapReduce, designed for efficient parallel data cube computation on large-scale data. We also provide a general data cube materialization solution which is able to facilitate the features in MapReduce-like systems towards an efficient data cube computation. Furthermore, we demonstrate how HaCube supports view maintenance through either incremental computation (e.g. used for SUM or COUNT) or recomputation (e.g. used for MEDIAN or CORRELATION). We implement HaCube by extending Hadoop and evaluate it based on the TPC-D benchmark over billions of tuples on a cluster with over 320 cores. The experimental results demonstrate the efficiency, scalability and practicality of HaCube for cube computation over a large amount of data in a distributed environment.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Hadoop. http://hadoop.apache.org/
Tacc longhorn cluster. https://www.tacc.utexas.edu/
Tpc-h, ad-hoc, decision support benchmark. www.tpc.org/tpch/
Beyer, K.S., Ramakrishnan, R.: Bottom-up computation of sparse and iceberg cubes. In: SIGMOD, pp. 359–370 (1999)
Google Scholar
Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquini, R.: Incoop: mapreduce for incremental computations. In: SOCC (2011)
Google Scholar
Yingyi, B., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: efficient iterative data processing on large clusters. PVLDB 3(1), 285–296 (2010)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)
Google Scholar
Elghandour, I., Aboulnaga, A.: Restore: reusing results of mapreduce jobs. PVLDB 5(6), 586–597 (2012)
Google Scholar
Gray, J., Bosworth, A., Layman, A., Reichart, D.: Data cube: a relational aggregation operator generalizing group-by cross-tab and sub-totals. In: ICDE, pp. 152–159 (1996)
Google Scholar
Jörg, T., Parvizi, R., Yong, H., Dessloch, S.: Incremental recomputations in mapreduce. In: CloudDB, pp. 7–14 (2011)
Google Scholar
Lämmel, R., Saile, D.: Mapreduce with deltas. In PDPTA, (2011)
Google Scholar
Lee, K.Y., Kim, M.H.: Efficient incremental maintenance of data cubes. In: VLDB, pp. 823–833 (2006)
Google Scholar
Feng Li, M., Ozsu, T., Chen, G., Ooi, B.C.: R-store: a scalable distributed system for supporting real-time analytics. In: ICDE, pp. 40–51 (2014)
Google Scholar
Mumick, I.S., Quass, D., Mumick, B.S.: Maintenace of data cubes and summary tables in a warehouse. In: SIGMOD, pp. 100–111 (1997)
Google Scholar
Nandi, A., Cong, Y., Bohannon, P., Ramakrishnan, R.: Distributed cube materialization on holistic measures. In: ICDE, pp. 183–194 (2011)
Google Scholar
Palpanas, T., Sidle, R., Cochrane, R., Pirahesh, H.: Incremental maintenance for non-distributive aggregate functions. In: VLDB, pp. 802–813 (2002)
Google Scholar
Sergey, K., Yury, K.: Applying map-reduce paradigm for parallel closed cube computation. In: DBKDA, pp. 62–67 (2009)
Google Scholar
Wang, Z., Chu, Y., Tan, K.-L., Agrawal, D., Abbadi, A.E., Xiaolong, X.: Scalable data cube analysis over big data. In: CORR (2013). arxiv:1311.5663
Wang, Z., Fan, Q., Wang, H., Tan, K.-L., Agrawal, D., El Abbadi, A.: Pagrol: parallel graph olap over large-scale attributed graphs. In: ICDE, pp. 496–507 (2014)
Google Scholar
Xin, D., Han, J., Li, X., Wah, B.W.: Computing iceberg cubes by top-down and bottom-up integration: the starcubing approach. TKDE 19(1), 111–126 (2007)
Google Scholar
Xin, D., Han, J., Wah, B.W.: Star-cubing: Computing iceberg cubes by top-down and bottom-up integration. In VLDB, pp. 476–487 (2003)
Google Scholar
You, J., Xi, J., Zhang, P., Chen, H.: A parallel algorithm for closed cube computation. In ACIS-ICIS, pp. 95–99, (2008)
Google Scholar
Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD, pp. 159–170 (1997)
Google Scholar

Download references

Acknowledgements

Kian-Lee Tan is partially supported by the MOE/NUS grant R-252-000-500-112. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number OCI-1053575.

Author information

Authors and Affiliations

Singapore Institute of Technology, Singapore, Singapore
Zhengkui Wang
Harbin Engineering University, Harbin, China
Yan Chu
National University of Singapore, Singapore, Singapore
Kian-Lee Tan
University of California, Santa Barbara, USA
Divyakant Agrawal & Amr EI Abbadi

Authors

Zhengkui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yan Chu
View author publications
You can also search for this author in PubMed Google Scholar
Kian-Lee Tan
View author publications
You can also search for this author in PubMed Google Scholar
Divyakant Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Amr EI Abbadi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zhengkui Wang or Yan Chu .

Editor information

Editors and Affiliations

Georgia Institute of Technology , Atlanta, Georgia, USA
Shamkant B. Navathe
University of Texas at Dallas , Richardson, Texas, USA
Weili Wu
University of Minnesota , Minneapolis, Minnesota, USA
Shashi Shekhar
Renmin University , Beijing, China
Xiaoyong Du
Fudan University , Shanghai, China
Sean X. Wang
Rutgers, The State University of New Jer , New Brunswick, New Jersey, USA
Hui Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Chu, Y., Tan, KL., Agrawal, D., EI Abbadi, A. (2016). HaCube: Extending MapReduce for Efficient OLAP Cube Materialization and View Maintenance. In: Navathe, S., Wu, W., Shekhar, S., Du, X., Wang, S., Xiong, H. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9643. Springer, Cham. https://doi.org/10.1007/978-3-319-32049-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-32049-6_8
Published: 25 March 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32048-9
Online ISBN: 978-3-319-32049-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics