Abstract
With the popularity of column-store databases, modern multi-core CPUs, and general-purpose computing on graphics processing units (GPGPUs), there will be radical changes in how processing is done in the online analytical processing (OLAP) and data warehousing fields. Cube computation is a core and time-consuming problem which has been researched extensively. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures and column storage. This paper presents a new parallel cube algorithm that takes advantage of multi-core architectures. We first propose a cache-conscious bottom-up computation (BUC) algorithm called CC-BUC that adopts an integrated bottom-up and breadth-first partitioning order. Each dimension is separately stored and processed. In processing each dimension, breadth-first data scanning and results outputting reduce memory I/O and enhance cache locality. Cache misses are limited in a dimension scope, and translation lookaside buffer (TLB) misses are reduced. Based on CC-BUC, we give a multi-core architecture-based cube algorithm called MC-Cubing. Multiple partitions are processed simultaneously and multiple threads undergo parallel execution inside each partition. MC-Cubing is consistent with multi-core architectures and high parallelism. The layout and associated algorithms take advantage of single instruction, multiple data (SIMD) instructions and thread-level parallelism (TLP). We implement and demonstrate the effectiveness of MC-Cubing on two multi-core architectures: multi-core CPUs and GPUs. Experimental results show that the MC-Cubing algorithm can speed up nearly six times faster than BUC in real datasets.
Similar content being viewed by others
References
Abadi DJ, Madden SR, Ferreira M (2006) Integrating compression and execution in column-oriented database systems. In: Proceedings of ACM SIGMOD conference on management of data, pp 671–682
Abadi DJ, Madden SR, Hachem N (2008) Column stores vs. row stores: how different are they really? In: Proceedings of ACM SIGMOD conference on management of data, pp 967–980
Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases, pp 169–180
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg CUBEsA. In: Proceedings of ACM SIGMOD international conference on management of data, pp 359–370
Blasgen M, Gray J, Mitoma M, Price T (1979) The convoy phenomenon. SIGOPS Oper Syst Rev 13(2):20–25
Bolosky WJ, Scott ML (1993) False sharing and its effect on shared memory performance. In: USENIX symposium on experiences with distributed and multiprocessor systems, vol 4. USENIX Association, pp 3–3
Boncz PA, Manegold S, Kersten ML (1999) Database architecture optimized for the new bottleneck: memory access. In: Proceedings of the 25th international conference on very large data bases, pp 54–65
Chen S, Gibbons PB, Mowry TC (2001) Improving index performance through prefetching. In: Proceedings of ACM SIGMOD conference on management of data, pp 235–246
Chen S, Ailamaki A, Gibbons PB, Mowry TC (2004) Improving hash join performance through prefetching. In: IEEE international conference on data engineering, pp 116–127
Chen Y, Dehne F, Eavis T (2005) PnP: parallel and external memory iceberg cube computation. In: Proceedings of IEEE international conference on data engineering, pp 576–577
CUDPP (2010) CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/
Fang W, Lu M, Xiao X, He B, Luo Q (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, pp 34–42
Fermi architecture (2011) http://www.nvidia.com/object/fermiarchitecture.html
Govindaraju NK, Lloyd B, Wang W, Lin M, Manocha D (2004) Fast computation of database operations using graphics processors. In: Proceedings of ACM SIGMOD international conference on management of data, pp 215–226
Govindaraju N, Gray J, Kumar R, Manocha D (2006) GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of ACM SIGMOD conference on management of data, pp 325–336
Hahn C, Warren S (2011) Extended edited synoptic cloud reports from ships and land stations over the globe. http://cdiac.ornl.gov/ftp/ndp026c
Harris M (2007) Parallel prefix sum scan with CUDA. http://developer.download.nvidia.com/compute/cuda/1-1/Website/projects/scan/doc/scan.pdf
He B, Lu M, Yang K, Fang R, Govindaraju NK, Luo Q, Sander PV (2009) Relational query coprocessing on graphics processors. ACM Trans Database Syst 34(4):1–39
He B, Yang K, Fang R et al (2008) Relational joins on graphics processors. In: Proceedings of ACM SIGMOD conference on management of data, pp 511–524
IlliMine system package (2011) http://illimine.cs.uiuc.edu/
Intel Corporation (2009) Intel 64 and IA-32 architectures optimization reference manual. http://www.intel.com/Assets/ja-JP/PDF/manual/248966.pdf
Kim C, Sedlar E, Chhugani J, Kaldewey T (2009) Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. In: Proceedings of 35th international conference on very large data bases, pp 1378–1389
Kim C, Chhugani J, Satish N, Sedlar E, Nguyen AD, Kaldewey T, Lee VW, Brandt SA, Dubey P (2010) FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of the international conference on management of data, pp 339–350
Lakshmanan LVS, Russakovsky A, Sashikanth V (2008) What-if OLAP queries with changing dimensions. In: Proceedings of IEEE 24th international conference on data engineering, pp 1334–1336
Langseth J (2004) Real-time data warehousing: challenges and solutions. http://dssresources.com/papers/features/langseth/langseth02082004.html
Lee R, Ding X, Chen F, Lu Q, Zhang X (2009) MCC-DB: minimizing cache conflicts in multi-core processors for databases. In: Proceedings of 35th international conference on very large data bases, pp 373–384
Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: HPCA, pp 367–378
Luan H, Du X-Y, Wang S (2009) Cache-conscious data cube computation on a modern processor. J Comput Sci Technol (JCST) 24(4):708–722
Ma W, Agrawal G (2009) A translation system for enabling data mining applications on GPUs. In: Proceedings of the 23rd international conference on supercomputing, pp 400–409
Ng R, Wagner A, Yin Y (2001) Iceberg-cube computation with PC clusters. In: Proceedings of ACM SIGMOD conference on management of data, pp 25–36
Openmp (2011) http://www.openmp.org/
Qiao L, Raman V, Reiss F et al (2008) Main memory scan sharing for multi-core CPUs. In: Proceedings of international conference on very large data bases, pp 610–621
Rao J, Ross KA (1999) Cache conscious indexing for decision support in main memory. In: Proceedings of international conference on very large data bases, pp 78–89
Rao J, Ross KA (1999) Cache conscious indexing for decision support in main memory. In: Proceedings of international conference on very large data bases, pp 78–89
Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of IEEE international symposium on parallel distributed processing, pp 1–10
Satish N, Kim C, Chhugani J, Nguyen AD, Lee VW, Kim D, Dubey P (2010) Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: Proceedings of the international conference on management of data, pp 351–362
Shao Z, Han JW, Xin D (2004) MM-Cubing: computing iceberg cubes by factorizing the lattice space. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 213–222
Stonebraker M, Abadi DJ, Batkin A, Chen X et al (2005) C-Store: a column-oriented DBMS. In: Proceedings of the 31st VLDB conference, pp 553–564
Sybase IQ (2011) http://www.sybase.com/products/datawarehousing/sybaseiq
The BI Survey 8 (2011) http://www.bi-survey.com/
Vertica (2011) http://www.vertica.com/
VTune (2011) http://software.intel.com/en-us/intel-vtune/
Xin D, Han J, Li X, Wah BW (2003) Star-Cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of ACM VLDB conference, pp 476–487
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhou, G., Chen, H. Parallel cube computation on modern CPUs and GPUs. J Supercomput 61, 394–417 (2012). https://doi.org/10.1007/s11227-011-0575-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-011-0575-7