Skip to main content
Log in

Efficient OLAP algorithms on GPU-accelerated Hadoop clusters

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

In the time of big data, on-line analytical processing (OLAP) is an important method to process massive data. In order to realize a system with the capacity of both high storage and high computing power, Hadoop and GPU are both applied in OLAP. In general, three cores of OLAP determines the efficiency of OLAP analysis, which are aggregation of multi-dimensional data, pre-calculation of multi-dimensional data set (Cube) and connection of dimension table and fact table. For the purpose of boosting efficiency, this paper presents optimizing algorithms for each core. Beginning with aggregation on single machine, this paper firstly designs the GPU-based aggregation algorithm. Then, GPU-based Cube algorithm is introduced to accelerate pre-calculation, using inverted index to shrink computation amount. Finally, with new-designed dimension table connecting algorithm and query algorithm, GPU-based OLAP analysis algorithm is presented. Along with corresponding experiments and results, each algorithm shows their ability of boosting efficiency, optimizing GPU-based OLAP analysis on Hadoop.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ailamaki, A., DeWitt, D.J., Hill, M.D.: Data page layouts for relational databases on deep memory hierarchies. VLDB J. 11(3), 198–215 (2002)

    Article  Google Scholar 

  2. Alcantara, D.A., Sharf, A.: Real-time parallel hashing on the GPU. ACM Trans. Graph. 28(5), 154 (2011)

    Google Scholar 

  3. Arres, B., Kabbachi, N., Boussaid, O.: Building olap cubes on a cloud computing environment with mapreduce. In: IEEE ACS International Conference on Computer Systems and Applications (AICCSA), pp. 1–5 (2013)

  4. Beyer, R.: Bottom-up computation of sparse and iceberg cube. In: SIGMOD (1999)

  5. Carstoiu, D., Cernian, A., Olteanu, A.: Hadoop hbase-0.20. 2 performance evaluation. In: NISS (2010)

  6. Chen, Y., Dehne, F.: Parallel rolap data cube construction on shared-nothing multiprocessors. Distrib. Parallel Databases 15(3), 219–236 (2003)

    Article  Google Scholar 

  7. Chen., Y, Dehne, F.: PnP: parallel and external memory iceberg cube computation. In: ICDE (2005)

  8. Condie, T., Conway, N.: Online aggregation and continuous query support in mapreduce. In: ACM SIGMOD International Conference on Management of Data (2010)

  9. Dehne, F., Eavis, T., Rauchaplin, A.: Parallel multi-dimensional ROLAP indexing. In: 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 86–93 (2003)

  10. Dennl, C., Ziener, D., Teich, J.: Acceleration of SQL restrictions and aggregations through FPGA-based dynamic partial reconfiguration. In: 2013 IEEE 21st Annual International Symposium on Field-Programmable Custom Computing Machines, IEEE Computer Society (2013)

  11. Garca, I., Lefebvre, S.: Coherent parallel hashing. In: ACM Transactions on Graphics (TOG), vol. 30, no. 6, p. 161 (2011)

  12. Govindaraju, N., Gray, J.: Gputerasort: high performance graphics co-processor sorting for large database management. In: ACM SIGMOD International Conference on Management of Data. ACM (2006)

  13. Guo, Y., Rao, J., Zhou, X.: ishuffle: improving hadoop performance with shuffle-on-write. In: ICAC (2013)

  14. Han, J., Pei, J., Dong, G., Wang, K.: Efficient computation of iceberg cubes with complex measures. In: SIGMOD (2001)

  15. He, B., Lu, M.: Relational query coprocessing on graphics processors. ACM Trans. Database Syst. 34(4), 21 (2009)

    Article  Google Scholar 

  16. Hotea Solutions. TPC Benchmark DS (2018). http://www.tpc.org/tpcds/default.asp

  17. Janet, B., Reddy, A.V.: Cube index for unstructured text analysis and mining. In: ICCCS (2011)

  18. Kaldewey, T., Lohman, G.: GPU join processing revisited. In: Eighth International Workshop on Data Management on New Hardware. ACM, pp. 55–62 (2012)

  19. Laks, V.S., Lakshmanan, J.P., Han, J.: Quotient cubes: how to summarize the semantics of a data cube. In: VLDB (2002)

  20. Lauer, T., Datta, A.: Exploring graphics processing units as parallel coprocessors for online aggregation. In: Proceedings of the ACM 13th International Workshop on Data Warehousing and OLAP. ACM, pp. 77–84 (2010)

  21. Lee, S., Kim, J.: Efficient distributed parallel top-down computation of ROLAP data cube using mapreduce. In: Data Warehousing and Knowledge Discovery (2012)

    Chapter  Google Scholar 

  22. Lee, S., Jo, S., Kim, J.: MRDataCube: data cube computation using MapReduce. In: IEEE International Conference on Big Data and Smart Computing (BigComp) (2008)

  23. Leng, F., Bao, Y.: An efficient indexing technique for computing high dimensional data cubes. In: International Conference on Advances in Web-Age Information Management (2006)

  24. Leng, F., Bao, Y.: Mapreduce-based data aggregation algorithms. China Science Paper (2011)

  25. Li, X., Hamilton, H.J.: The multi-tree cubing algorithm for computing iceberg cubes. J. Intell. Inf. Syst. (2009)

  26. Lim, Y., Kim, M.: A Bitmap Index for Multidimensional Data Cubes. Database and Expert Systems Applications. Springer, Berlin (2004)

    Book  Google Scholar 

  27. Luan, H., Zhou, M., Fu, Y.: Closed cube computation on multi-core cpus. In: Fuzzy Systems and Knowledge Discovery (FSKD) (2012)

  28. Luo, J.Z., Li, J.Z., Zhao, K.: An iceberg cube algorithm for large compressed data warehouses. J. Softw. (2006)

  29. Merrill, D., Grimshaw, A.: High performance and scalable radix sorting: a case study of implementing dynamic parallelism for GPU computing. Parallel Process. Lett. 21(02), 245–272 (2011)

    Article  MathSciNet  Google Scholar 

  30. Ng, R.T., Wagner, A., Yin, Y.: Iceberg-cube computation with pc clusters. In: ACM SIGMOD Record (2001)

    Article  Google Scholar 

  31. Pansare, N., Borkar, V.R.: Online aggregation for large MapReduce jobs. In: Proceedings of the VLDB Endowment (2011)

  32. Quan, Q.: Optimization of aggregation query performance based on MapReduce. In: China Computer and Communication (2014)

  33. Satish, N., Harris, M., Garland, M.: Designing efficient sorting algorithms for manycore GPUs. In: IPDPS (2009)

  34. Song, J., Guo, C., Wang, Z.: Haolap: a hadoop based OLAP system for big data. J. Syst. Softw. 102, 167–181 (2015)

    Article  Google Scholar 

  35. Thusoo, A., Samara, J.S., Jain, N.: Hive: a warehousing solution over a map-reduce framework. In: Proceedings of the VLDB Endowment (2009)

  36. Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: IEEE (2009)

  37. Woods, L., István, Z., Alonso, G. Ibex: an intelligent storage engine with support for advanced SQL offloading. In: Proceedings of the VLDB Endowment (2014)

  38. Xin, D., Han, J., Li, X., Wah, B.W: Star-cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of the 29th International Conference on VLDB (2003)

  39. Xin, D., Han, J., Liu, H.: C-cubing efficient computation of closed cubes by aggregation-based checking. In: ICDE (2006)

  40. You, J., Xi, J.: A parallel algorithm for closed cube computation. In: Seventh IEEE/ACIS International Conference on Computer and Information Science (ICIS), pp. 95–99 (2008)

  41. Zhao, A.: An array-based algorithm for simultaneous multidimensional aggregates. In: SIGMOD (1997)

  42. Zhuo, G., Chen, H.: Parallel cube computation on modern CPUs and GPUs. J. Supercomput. 61(3), 394–417 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

Funding was provided by NSFC grant (Grant Nos. U1509216 and 61472099) and National Sci-Tech Support Plan (Grant No. 2015BAH10F01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, H., Wang, Z., Li, N. et al. Efficient OLAP algorithms on GPU-accelerated Hadoop clusters. Distrib Parallel Databases 37, 507–542 (2019). https://doi.org/10.1007/s10619-018-7239-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-018-7239-z

Keywords

Navigation