Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

Wang, Hongzhi; Wang, Zheng; Li, Ning; Kong, Xinxin

doi:10.1007/s10619-018-7239-z

Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

Published: 31 July 2018

Volume 37, pages 507–542, (2019)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Hongzhi Wang¹,
Zheng Wang¹,
Ning Li¹ &
…
Xinxin Kong¹

395 Accesses
1 Citation
Explore all metrics

Abstract

In the time of big data, on-line analytical processing (OLAP) is an important method to process massive data. In order to realize a system with the capacity of both high storage and high computing power, Hadoop and GPU are both applied in OLAP. In general, three cores of OLAP determines the efficiency of OLAP analysis, which are aggregation of multi-dimensional data, pre-calculation of multi-dimensional data set (Cube) and connection of dimension table and fact table. For the purpose of boosting efficiency, this paper presents optimizing algorithms for each core. Beginning with aggregation on single machine, this paper firstly designs the GPU-based aggregation algorithm. Then, GPU-based Cube algorithm is introduced to accelerate pre-calculation, using inverted index to shrink computation amount. Finally, with new-designed dimension table connecting algorithm and query algorithm, GPU-based OLAP analysis algorithm is presented. Along with corresponding experiments and results, each algorithm shows their ability of boosting efficiency, optimizing GPU-based OLAP analysis on Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

Article 08 April 2024

H. S. Jennath & S. Asharaf

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Antonios Makris, Konstantinos Tserpes, … Dimosthenis Anagnostopoulos

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

Article 24 April 2021

Pandu Sowkuntla & P. S. V. S. Sai Prasad

References

Ailamaki, A., DeWitt, D.J., Hill, M.D.: Data page layouts for relational databases on deep memory hierarchies. VLDB J. 11(3), 198–215 (2002)
Article Google Scholar
Alcantara, D.A., Sharf, A.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 154 (2011)
Google Scholar
Arres, B., Kabbachi, N., Boussaid, O.: Building olap cubes on a cloud computing environment with mapreduce. In: IEEE ACS International Conference on Computer Systems and Applications (AICCSA), pp. 1–5 (2013)
Beyer, R.: Bottom-up computation of sparse and iceberg cube. In: SIGMOD (1999)
Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop hbase-0.20. 2 performance evaluation. In: NISS (2010)
Chen, Y., Dehne, F.: Parallel rolap data cube construction on shared-nothing multiprocessors. Distrib. Parallel Databases 15(3), 219–236 (2003)
Article Google Scholar
Chen., Y, Dehne, F.: PnP: parallel and external memory iceberg cube computation. In: ICDE (2005)
Condie, T., Conway, N.: Online aggregation and continuous query support in mapreduce. In: ACM SIGMOD International Conference on Management of Data (2010)
Dehne, F., Eavis, T., Rauchaplin, A.: Parallel multi-dimensional ROLAP indexing. In: 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 86–93 (2003)
Dennl, C., Ziener, D., Teich, J.: Acceleration of SQL restrictions and aggregations through FPGA-based dynamic partial reconfiguration. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, IEEE Computer Society (2013)
Garca, I., Lefebvre, S.: Coherent parallel hashing. In: ACM Transactions on Graphics (TOG), vol. 30, no. 6, p. 161 (2011)
Govindaraju, N., Gray, J.: Gputerasort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data. ACM (2006)
Guo, Y., Rao, J., Zhou, X.: ishuffle: improving hadoop performance with shuffle-on-write. In: ICAC (2013)
Han, J., Pei, J., Dong, G., Wang, K.: Efficient computation of iceberg cubes with complex measures. In: SIGMOD (2001)
He, B., Lu, M.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21 (2009)
Article Google Scholar
Hotea Solutions. TPC Benchmark DS (2018). http://www.tpc.org/tpcds/default.asp
Janet, B., Reddy, A.V.: Cube index for unstructured text analysis and mining. In: ICCCS (2011)
Kaldewey, T., Lohman, G.: GPU join processing revisited. In: Eighth International Workshop on Data Management on New Hardware. ACM, pp. 55–62 (2012)
Laks, V.S., Lakshmanan, J.P., Han, J.: Quotient cubes: how to summarize the semantics of a data cube. In: VLDB (2002)
Lauer, T., Datta, A.: Exploring graphics processing units as parallel coprocessors for online aggregation. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP. ACM, pp. 77–84 (2010)
Lee, S., Kim, J.: Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce. In: Data Warehousing and Knowledge Discovery (2012)
Chapter Google Scholar
Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: IEEE International Conference on Big Data and Smart Computing (BigComp) (2008)
Leng, F., Bao, Y.: An efficient indexing technique for computing high dimensional data cubes. In: International Conference on Advances in Web-Age Information Management (2006)
Leng, F., Bao, Y.: Mapreduce-based data aggregation algorithms. China Science Paper (2011)
Li, X., Hamilton, H.J.: The multi-tree cubing algorithm for computing iceberg cubes. J. Intell. Inf. Syst. (2009)
Lim, Y., Kim, M.: A Bitmap Index for Multidimensional Data Cubes. Database and Expert Systems Applications. Springer, Berlin (2004)
Book Google Scholar
Luan, H., Zhou, M., Fu, Y.: Closed cube computation on multi-core cpus. In: Fuzzy Systems and Knowledge Discovery (FSKD) (2012)
Luo, J.Z., Li, J.Z., Zhao, K.: An iceberg cube algorithm for large compressed data warehouses. J. Softw. (2006)
Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(02), 245–272 (2011)
Article MathSciNet Google Scholar
Ng, R.T., Wagner, A., Yin, Y.: Iceberg-cube computation with pc clusters. In: ACM SIGMOD Record (2001)
Article Google Scholar
Pansare, N., Borkar, V.R.: Online aggregation for large MapReduce jobs. In: Proceedings of the VLDB Endowment (2011)
Quan, Q.: Optimization of aggregation query performance based on MapReduce. In: China Computer and Communication (2014)
Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IPDPS (2009)
Song, J., Guo, C., Wang, Z.: Haolap: a hadoop based OLAP system for big data. J. Syst. Softw. 102, 167–181 (2015)
Article Google Scholar
Thusoo, A., Samara, J.S., Jain, N.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment (2009)
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: IEEE (2009)
Woods, L., István, Z., Alonso, G. Ibex: an intelligent storage engine with support for advanced SQL offloading. In: Proceedings of the VLDB Endowment (2014)
Xin, D., Han, J., Li, X., Wah, B.W: Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of the 29th International Conference on VLDB (2003)
Xin, D., Han, J., Liu, H.: C-cubing efficient computation of closed cubes by aggregation-based checking. In: ICDE (2006)
You, J., Xi, J.: A parallel algorithm for closed cube computation. In: Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS), pp. 95–99 (2008)
Zhao, A.: An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD (1997)
Zhuo, G., Chen, H.: Parallel cube computation on modern CPUs and GPUs. J. Supercomput. 61(3), 394–417 (2012)
Article Google Scholar

Download references

Acknowledgements

Funding was provided by NSFC grant (Grant Nos. U1509216 and 61472099) and National Sci-Tech Support Plan (Grant No. 2015BAH10F01).

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, China
Hongzhi Wang, Zheng Wang, Ning Li & Xinxin Kong

Authors

Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ning Li
View author publications
You can also search for this author in PubMed Google Scholar
Xinxin Kong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, H., Wang, Z., Li, N. et al. Efficient OLAP algorithms on GPU-accelerated Hadoop clusters. Distrib Parallel Databases 37, 507–542 (2019). https://doi.org/10.1007/s10619-018-7239-z

Download citation

Published: 31 July 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10619-018-7239-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

Abstract

Access this article

Similar content being viewed by others

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

MongoDB Vs PostgreSQL: A comparative study on performance aspects

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

Abstract

Access this article

Similar content being viewed by others

Scalable decision fusion algorithm for enabling decentralized computation in distributed, big data clustering problems

MongoDB Vs PostgreSQL: A comparative study on performance aspects

MapReduce based parallel fuzzy-rough attribute reduction using discernibility matrix

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation