Parallel cube computation on modern CPUs and GPUs

Zhou, Guoliang; Chen, Hong

doi:10.1007/s11227-011-0575-7

Parallel cube computation on modern CPUs and GPUs

Published: 24 February 2011

Volume 61, pages 394–417, (2012)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Guoliang Zhou^1,2,3 &
Hong Chen^1,2

244 Accesses
6 Citations
Explore all metrics

Abstract

With the popularity of column-store databases, modern multi-core CPUs, and general-purpose computing on graphics processing units (GPGPUs), there will be radical changes in how processing is done in the online analytical processing (OLAP) and data warehousing fields. Cube computation is a core and time-consuming problem which has been researched extensively. However, most of the algorithms have been proposed without considering the prevalent multi-core architectures and column storage. This paper presents a new parallel cube algorithm that takes advantage of multi-core architectures. We first propose a cache-conscious bottom-up computation (BUC) algorithm called CC-BUC that adopts an integrated bottom-up and breadth-first partitioning order. Each dimension is separately stored and processed. In processing each dimension, breadth-first data scanning and results outputting reduce memory I/O and enhance cache locality. Cache misses are limited in a dimension scope, and translation lookaside buffer (TLB) misses are reduced. Based on CC-BUC, we give a multi-core architecture-based cube algorithm called MC-Cubing. Multiple partitions are processed simultaneously and multiple threads undergo parallel execution inside each partition. MC-Cubing is consistent with multi-core architectures and high parallelism. The layout and associated algorithms take advantage of single instruction, multiple data (SIMD) instructions and thread-level parallelism (TLP). We implement and demonstrate the effectiveness of MC-Cubing on two multi-core architectures: multi-core CPUs and GPUs. Experimental results show that the MC-Cubing algorithm can speed up nearly six times faster than BUC in real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abadi DJ, Madden SR, Ferreira M (2006) Integrating compression and execution in column-oriented database systems. In: Proceedings of ACM SIGMOD conference on management of data, pp 671–682
Chapter Google Scholar
Abadi DJ, Madden SR, Hachem N (2008) Column stores vs. row stores: how different are they really? In: Proceedings of ACM SIGMOD conference on management of data, pp 967–980
Google Scholar
Ailamaki A, DeWitt DJ, Hill MD, Skounakis M (2001) Weaving relations for cache performance. In: Proceedings of the 27th international conference on very large data bases, pp 169–180
Google Scholar
Beyer K, Ramakrishnan R (1999) Bottom-up computation of sparse and iceberg CUBEsA. In: Proceedings of ACM SIGMOD international conference on management of data, pp 359–370
Google Scholar
Blasgen M, Gray J, Mitoma M, Price T (1979) The convoy phenomenon. SIGOPS Oper Syst Rev 13(2):20–25
Article Google Scholar
Bolosky WJ, Scott ML (1993) False sharing and its effect on shared memory performance. In: USENIX symposium on experiences with distributed and multiprocessor systems, vol 4. USENIX Association, pp 3–3
Google Scholar
Boncz PA, Manegold S, Kersten ML (1999) Database architecture optimized for the new bottleneck: memory access. In: Proceedings of the 25th international conference on very large data bases, pp 54–65
Google Scholar
Chen S, Gibbons PB, Mowry TC (2001) Improving index performance through prefetching. In: Proceedings of ACM SIGMOD conference on management of data, pp 235–246
Google Scholar
Chen S, Ailamaki A, Gibbons PB, Mowry TC (2004) Improving hash join performance through prefetching. In: IEEE international conference on data engineering, pp 116–127
Google Scholar
Chen Y, Dehne F, Eavis T (2005) PnP: parallel and external memory iceberg cube computation. In: Proceedings of IEEE international conference on data engineering, pp 576–577
Google Scholar
CUDPP (2010) CUDA Data Parallel Primitives Library. http://www.gpgpu.org/developer/cudpp/
Fang W, Lu M, Xiao X, He B, Luo Q (2009) Frequent itemset mining on graphics processors. In: Proceedings of the fifth international workshop on data management on new hardware, pp 34–42
Chapter Google Scholar
Fermi architecture (2011) http://www.nvidia.com/object/fermiarchitecture.html
Govindaraju NK, Lloyd B, Wang W, Lin M, Manocha D (2004) Fast computation of database operations using graphics processors. In: Proceedings of ACM SIGMOD international conference on management of data, pp 215–226
Chapter Google Scholar
Govindaraju N, Gray J, Kumar R, Manocha D (2006) GPUTeraSort: high performance graphics coprocessor sorting for large database management. In: Proceedings of ACM SIGMOD conference on management of data, pp 325–336
Chapter Google Scholar
Hahn C, Warren S (2011) Extended edited synoptic cloud reports from ships and land stations over the globe. http://cdiac.ornl.gov/ftp/ndp026c
Harris M (2007) Parallel prefix sum scan with CUDA. http://developer.download.nvidia.com/compute/cuda/1-1/Website/projects/scan/doc/scan.pdf
He B, Lu M, Yang K, Fang R, Govindaraju NK, Luo Q, Sander PV (2009) Relational query coprocessing on graphics processors. ACM Trans Database Syst 34(4):1–39
Article Google Scholar
He B, Yang K, Fang R et al (2008) Relational joins on graphics processors. In: Proceedings of ACM SIGMOD conference on management of data, pp 511–524
Google Scholar
IlliMine system package (2011) http://illimine.cs.uiuc.edu/
Intel Corporation (2009) Intel 64 and IA-32 architectures optimization reference manual. http://www.intel.com/Assets/ja-JP/PDF/manual/248966.pdf
Kim C, Sedlar E, Chhugani J, Kaldewey T (2009) Sort vs. hash revisited: fast join implementation on modern multi-core CPUs. In: Proceedings of 35th international conference on very large data bases, pp 1378–1389
Google Scholar
Kim C, Chhugani J, Satish N, Sedlar E, Nguyen AD, Kaldewey T, Lee VW, Brandt SA, Dubey P (2010) FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In: Proceedings of the international conference on management of data, pp 339–350
Google Scholar
Lakshmanan LVS, Russakovsky A, Sashikanth V (2008) What-if OLAP queries with changing dimensions. In: Proceedings of IEEE 24th international conference on data engineering, pp 1334–1336
Chapter Google Scholar
Langseth J (2004) Real-time data warehousing: challenges and solutions. http://dssresources.com/papers/features/langseth/langseth02082004.html
Lee R, Ding X, Chen F, Lu Q, Zhang X (2009) MCC-DB: minimizing cache conflicts in multi-core processors for databases. In: Proceedings of 35th international conference on very large data bases, pp 373–384
Google Scholar
Lin J, Lu Q, Ding X, Zhang Z, Zhang X, Sadayappan P (2008) Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems. In: HPCA, pp 367–378
Google Scholar
Luan H, Du X-Y, Wang S (2009) Cache-conscious data cube computation on a modern processor. J Comput Sci Technol (JCST) 24(4):708–722
Article Google Scholar
Ma W, Agrawal G (2009) A translation system for enabling data mining applications on GPUs. In: Proceedings of the 23rd international conference on supercomputing, pp 400–409
Chapter Google Scholar
Ng R, Wagner A, Yin Y (2001) Iceberg-cube computation with PC clusters. In: Proceedings of ACM SIGMOD conference on management of data, pp 25–36
Google Scholar
Openmp (2011) http://www.openmp.org/
Qiao L, Raman V, Reiss F et al (2008) Main memory scan sharing for multi-core CPUs. In: Proceedings of international conference on very large data bases, pp 610–621
Google Scholar
Rao J, Ross KA (1999) Cache conscious indexing for decision support in main memory. In: Proceedings of international conference on very large data bases, pp 78–89
Google Scholar
Rao J, Ross KA (1999) Cache conscious indexing for decision support in main memory. In: Proceedings of international conference on very large data bases, pp 78–89
Google Scholar
Satish N, Harris M, Garland M (2009) Designing efficient sorting algorithms for manycore GPUs. In: Proceedings of IEEE international symposium on parallel distributed processing, pp 1–10
Chapter Google Scholar
Satish N, Kim C, Chhugani J, Nguyen AD, Lee VW, Kim D, Dubey P (2010) Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort. In: Proceedings of the international conference on management of data, pp 351–362
Google Scholar
Shao Z, Han JW, Xin D (2004) MM-Cubing: computing iceberg cubes by factorizing the lattice space. In: Proceedings of the 16th international conference on scientific and statistical database management, pp 213–222
Chapter Google Scholar
Stonebraker M, Abadi DJ, Batkin A, Chen X et al (2005) C-Store: a column-oriented DBMS. In: Proceedings of the 31st VLDB conference, pp 553–564
Google Scholar
Sybase IQ (2011) http://www.sybase.com/products/datawarehousing/sybaseiq
The BI Survey 8 (2011) http://www.bi-survey.com/
Vertica (2011) http://www.vertica.com/
VTune (2011) http://software.intel.com/en-us/intel-vtune/
Xin D, Han J, Li X, Wah BW (2003) Star-Cubing: computing iceberg cubes by top-down and bottom-up integration. In: Proceedings of ACM VLDB conference, pp 476–487
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of the Ministry of Education for Data Engineering and Knowledge Engineering, Renmin University of China, Beijing, 100872, China
Guoliang Zhou & Hong Chen
China School of Information, Renmin University of China, Beijing, 100872, China
Guoliang Zhou & Hong Chen
China Department of Information, Baoding Electric Power Vocation & Technology College, Baoding, Hebei, 071051, China
Guoliang Zhou

Authors

Guoliang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hong Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoliang Zhou.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, G., Chen, H. Parallel cube computation on modern CPUs and GPUs. J Supercomput 61, 394–417 (2012). https://doi.org/10.1007/s11227-011-0575-7

Download citation

Published: 24 February 2011
Issue Date: September 2012
DOI: https://doi.org/10.1007/s11227-011-0575-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallel cube computation on modern CPUs and GPUs

Abstract

Access this article

Similar content being viewed by others

Accelerating multi-way joins on the GPU

GPU-Accelerated Quantification Filters for Analytical Queries in Multidimensional Databases

Big high-dimension data cube designs for hybrid memory systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallel cube computation on modern CPUs and GPUs

Abstract

Access this article

Similar content being viewed by others

Accelerating multi-way joins on the GPU

GPU-Accelerated Quantification Filters for Analytical Queries in Multidimensional Databases

Big high-dimension data cube designs for hybrid memory systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation