Skip to main content
Log in

Parallel cube computation on modern CPUs and GPUs

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

With the popularity of column-store databases, modern multi-core CPUs, and general-purpose computing on graphics processing units (GPGPUs), there will be radical changes in how processing is done in the online analytical processing (OLAP) and data warehousing fields. Cube computation is a core and time-consuming problem which has been researched extensively. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures and column storage. This paper presents a new parallel cube algorithm that takes advantage of multi-core architectures. We first propose a cache-conscious bottom-up computation (BUC) algorithm called CC-BUC that adopts an integrated bottom-up and breadth-first partitioning order. Each dimension is separately stored and processed. In processing each dimension, breadth-first data scanning and results outputting reduce memory I/O and enhance cache locality. Cache misses are limited in a dimension scope, and translation lookaside buffer (TLB) misses are reduced. Based on CC-BUC, we give a multi-core architecture-based cube algorithm called MC-Cubing. Multiple partitions are processed simultaneously and multiple threads undergo parallel execution inside each partition. MC-Cubing is consistent with multi-core architectures and high parallelism. The layout and associated algorithms take advantage of single instruction, multiple data (SIMD) instructions and thread-level parallelism (TLP). We implement and demonstrate the effectiveness of MC-Cubing on two multi-core architectures: multi-core CPUs and GPUs. Experimental results show that the MC-Cubing algorithm can speed up nearly six times faster than BUC in real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abadi DJ, Madden SR, Ferreira M (2006) Integrating compression and execution in column-oriented database systems. In: Proceedings of ACM SIGMOD conference on management of data, pp 671–682

    Chapter  Google Scholar 

  2. Abadi DJ, Madden SR, Hachem N (2008) Column stores vs. row stores: how different are they really? In: Proceedings of ACM SIGMOD conference on management of data, pp 967–980

    Google Scholar 

  3. Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases, pp 169–180

    Google Scholar 

  4. Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg CUBEsA. In: Proceedings of ACM SIGMOD international conference on management of data, pp 359–370

    Google Scholar 

  5. Blasgen M, Gray J, Mitoma M, Price T (1979) The convoy phenomenon. SIGOPS Oper Syst Rev 13(2):20–25

    Article  Google Scholar 

  6. Bolosky WJ, Scott ML (1993) False sharing and its effect on shared memory performance. In: USENIX symposium on experiences with distributed and multiprocessor systems, vol 4. USENIX Association, pp 3–3

    Google Scholar 

  7. Boncz PA, Manegold S, Kersten ML (1999) Database architecture optimized for the new bottleneck: memory access. In: Proceedings of the 25th international conference on very large data bases, pp 54–65

    Google Scholar 

  8. Chen S, Gibbons PB, Mowry TC (2001) Improving index performance through prefetching. In: Proceedings of ACM SIGMOD conference on management of data, pp 235–246

    Google Scholar 

  9. Chen S, Ailamaki A, Gibbons PB, Mowry TC (2004) Improving hash join performance through prefetching. In: IEEE international conference on data engineering, pp 116–127

    Google Scholar 

  10. Chen Y, Dehne F, Eavis T (2005) PnP: parallel and external memory iceberg cube computation. In: Proceedings of IEEE international conference on data engineering, pp 576–577

    Google Scholar 

  11. CUDPP (2010) CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/

  12. Fang W, Lu M, Xiao X, He B, Luo Q (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, pp 34–42

    Chapter  Google Scholar 

  13. Fermi architecture (2011) http://www.nvidia.com/object/fermiarchitecture.html

  14. Govindaraju NK, Lloyd B, Wang W, Lin M, Manocha D (2004) Fast computation of database operations using graphics processors. In: Proceedings of ACM SIGMOD international conference on management of data, pp 215–226

    Chapter  Google Scholar 

  15. Govindaraju N, Gray J, Kumar R, Manocha D (2006) GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of ACM SIGMOD conference on management of data, pp 325–336

    Chapter  Google Scholar 

  16. Hahn C, Warren S (2011) Extended edited synoptic cloud reports from ships and land stations over the globe. http://cdiac.ornl.gov/ftp/ndp026c

  17. Harris M (2007) Parallel prefix sum scan with CUDA. http://developer.download.nvidia.com/compute/cuda/1-1/Website/projects/scan/doc/scan.pdf

  18. He B, Lu M, Yang K, Fang R, Govindaraju NK, Luo Q, Sander PV (2009) Relational query coprocessing on graphics processors. ACM Trans Database Syst 34(4):1–39

    Article  Google Scholar 

  19. He B, Yang K, Fang R et al (2008) Relational joins on graphics processors. In: Proceedings of ACM SIGMOD conference on management of data, pp 511–524

    Google Scholar 

  20. IlliMine system package (2011) http://illimine.cs.uiuc.edu/

  21. Intel Corporation (2009) Intel 64 and IA-32 architectures optimization reference manual. http://www.intel.com/Assets/ja-JP/PDF/manual/248966.pdf

  22. Kim C, Sedlar E, Chhugani J, Kaldewey T (2009) Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. In: Proceedings of 35th international conference on very large data bases, pp 1378–1389

    Google Scholar 

  23. Kim C, Chhugani J, Satish N, Sedlar E, Nguyen AD, Kaldewey T, Lee VW, Brandt SA, Dubey P (2010) FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of the international conference on management of data, pp 339–350

    Google Scholar 

  24. Lakshmanan LVS, Russakovsky A, Sashikanth V (2008) What-if OLAP queries with changing dimensions. In: Proceedings of IEEE 24th international conference on data engineering, pp 1334–1336

    Chapter  Google Scholar 

  25. Langseth J (2004) Real-time data warehousing: challenges and solutions. http://dssresources.com/papers/features/langseth/langseth02082004.html

  26. Lee R, Ding X, Chen F, Lu Q, Zhang X (2009) MCC-DB: minimizing cache conflicts in multi-core processors for databases. In: Proceedings of 35th international conference on very large data bases, pp 373–384

    Google Scholar 

  27. Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: HPCA, pp 367–378

    Google Scholar 

  28. Luan H, Du X-Y, Wang S (2009) Cache-conscious data cube computation on a modern processor. J Comput Sci Technol (JCST) 24(4):708–722

    Article  Google Scholar 

  29. Ma W, Agrawal G (2009) A translation system for enabling data mining applications on GPUs. In: Proceedings of the 23rd international conference on supercomputing, pp 400–409

    Chapter  Google Scholar 

  30. Ng R, Wagner A, Yin Y (2001) Iceberg-cube computation with PC clusters. In: Proceedings of ACM SIGMOD conference on management of data, pp 25–36

    Google Scholar 

  31. Openmp (2011) http://www.openmp.org/

  32. Qiao L, Raman V, Reiss F et al (2008) Main memory scan sharing for multi-core CPUs. In: Proceedings of international conference on very large data bases, pp 610–621

    Google Scholar 

  33. Rao J, Ross KA (1999) Cache conscious indexing for decision support in main memory. In: Proceedings of international conference on very large data bases, pp 78–89

    Google Scholar 

  34. Rao J, Ross KA (1999) Cache conscious indexing for decision support in main memory. In: Proceedings of international conference on very large data bases, pp 78–89

    Google Scholar 

  35. Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of IEEE international symposium on parallel distributed processing, pp 1–10

    Chapter  Google Scholar 

  36. Satish N, Kim C, Chhugani J, Nguyen AD, Lee VW, Kim D, Dubey P (2010) Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: Proceedings of the international conference on management of data, pp 351–362

    Google Scholar 

  37. Shao Z, Han JW, Xin D (2004) MM-Cubing: computing iceberg cubes by factorizing the lattice space. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 213–222

    Chapter  Google Scholar 

  38. Stonebraker M, Abadi DJ, Batkin A, Chen X et al (2005) C-Store: a column-oriented DBMS. In: Proceedings of the 31st VLDB conference, pp 553–564

    Google Scholar 

  39. Sybase IQ (2011) http://www.sybase.com/products/datawarehousing/sybaseiq

  40. The BI Survey 8 (2011) http://www.bi-survey.com/

  41. Vertica (2011) http://www.vertica.com/

  42. VTune (2011) http://software.intel.com/en-us/intel-vtune/

  43. Xin D, Han J, Li X, Wah BW (2003) Star-Cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of ACM VLDB conference, pp 476–487

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoliang Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, G., Chen, H. Parallel cube computation on modern CPUs and GPUs. J Supercomput 61, 394–417 (2012). https://doi.org/10.1007/s11227-011-0575-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-011-0575-7

Keywords

Navigation