skip to main content
10.1145/2771937.2771941acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

Published: 31 May 2015 Publication History

Abstract

There have been a number of research proposals to use discrete graphics processing units (GPUs) to accelerate database operations. Although many of these works show up to an order of magnitude performance improvement, discrete GPUs are not commonly used in modern database systems. However, there is now a proliferation of integrated GPUs which are on the same silicon die as the conventional CPU. With the advent of new programming models like heterogeneous system architecture, these integrated GPUs are considered first-class compute units, with transparent access to CPU virtual addresses and very low overhead for computation offloading. We show that integrated GPUs significantly reduce the overheads of using GPUs in a database environment. Specifically, an integrated GPU is 3x faster than a discrete GPU even though the discrete GPU has 4x the computational capability. Therefore, we develop high performance scan and aggregate algorithms for the integrated GPU. We show that the integrated GPU can outperform a four-core CPU with SIMD extensions by an average of 30% (up to 3:2x) and provides an average of 45% reduction in energy on 16 TPC-H queries.

References

[1]
L. Abraham et al. Scuba: Diving into data at facebook. PVLDB, 6(11):1057--1067, 2013.
[2]
AMD. AMD's most advanced APU ever. http://www.amd.com/us/products/desktop/processors/a-series/Pages/nextgenapu.aspx. Accessed: 2014-1-23.
[3]
AMD. Graphics Card Solutions. http://products.amd.com/en-us/GraphicCardResult.aspx. Accessed: 2014-1-23.
[4]
Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In SIGMOD Conference, pages 271--282, 2001.
[5]
W. chun Feng and S. Xiao. To gpu synchronize or not gpu synchronize? In Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on, pages 3801--3804, May 2010.
[6]
F. Färber, N. May, W. Lehner, P. Große, I. Müller, H. Rauhe, and J. Dees. The SAP HANA database -- an architecture overview. IEEE Data Eng. Bull., 35(1):28--33, 2012.
[7]
Z. Feng and E. Lo. Accelerating aggregation using intra-cycle parallelism. In Data Engineering (ICDE), 2015 IEEE 31th International Conference on, 2015.
[8]
G. GLIGOR and S. Teodoru. Oracle Exalytics: Engineered for Speed-of-Thought Analytics. Database Systems Journal, 2(4):3--8, December 2011.
[9]
N. Govindaraju, J. Gray, R. Kumar, and D. Manocha. GPUTeraSort: high performance graphics co-processor sorting for large database management. In SIGMOD Conference, page 325, 2006.
[10]
N. K. Govindaraju, B. Lloyd, W. Wang, M. Lin, and D. Manocha. Fast computation of database operations using graphics processors. In SIGMOD Conference, page 215, 2004.
[11]
B. He, K. Yang, R. Fang, M. Lu, N. Govindaraju, Q. Luo, and P. Sander. Relational joins on graphics processors. In SIGMOD Conference, page 511, 2008.
[12]
J. He, M. Lu, and B. He. Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. PVLDB, 6(10):889--900, 2013.
[13]
J. He, S. Zhang, and B. He. In-cache query co-processing on coupled cpu-gpu architectures. Proc. VLDB Endow., 8(4):329--340, Dec. 2014.
[14]
R. Johnson, V. Raman, R. Sidle, and G. Swart. Row-wise parallel predicate evaluation. PVLDB, 1(1):622--634, 2008.
[15]
T. Kaldewey, G. Lohman, R. Mueller, and P. Volk. GPU join processing revisited. In DaMoN Workshop, pages 55--62, 2012.
[16]
S. W. Keckler. Life after Dennard and How I Learned to Love the Picojoule. In MICRO 44 Keynote, 2011.
[17]
Y. Li and J. M. Patel. BitWeaving: fast scans for main memory data processing. In SIGMOD Conference, pages 289--300, 2013.
[18]
Y. Li and J. M. Patel. WideTable: An Accelerator for Analytical Data Processing. PVLDB, 7(10), 2014.
[19]
NVIDIA. NVIDIA's Next Generation CUDA Compute Architecture: Fermi, 2009.
[20]
O. Polychroniou and K. A. Ross. High throughput heavy hitter aggregation for modern simd processors. In Proceedings of the Ninth International Workshop on Data Management on New Hardware, DaMoN '13, pages 6:1--6:6, New York, NY, USA, 2013. ACM.
[21]
J. Power, Y. Li, M. D. Hill, J. M. Patel, and D. A. Wood. Implications of emerging 3D GPU architecture on the scan primitive. SIGMOD Rec., 44(1), 2015.
[22]
V. Raman et al. DB2 with BLU acceleration: So much more than just a column store. PVLDB, 6(11):1080--1091, 2013.
[23]
V. Raman, G. Swart, L. Qiao, F. Reiss, V. Dialani, D. Kossmann, I. Narang, and R. Sidle. Constant-time query processing. In ICDE Conference, 2008.
[24]
P. Rogers. Heterogeneous System Architecture Overview. In Hot Chips 25, 2013.
[25]
K. Rupp. CPU, GPU and MIC hardware characteristics over time. http://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/. Accessed: 2015-05-05.
[26]
N. Satish, C. Kim, J. Chhugani, A. D. Nguyen, V. W. Lee, D. Kim, and P. Dubey. Fast sort on cpus and gpus: a case for bandwidth oblivious SIMD sort. In SIGMOD Conference, pages 351--362, 2010.
[27]
L. Sun, S. Krishnan, R. S. Xin, and M. J. Franklin. A partitioning framework for aggressive data skipping. PVLDB, 7(13):1617--1620, 2014.
[28]
T. Willhalm, I. Oukid, I. Müller, and F. Faerber. Vectorizing database column scans with complex predicates. In AMDS Workshop, pages 1--12, 2013.
[29]
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. SIMD-Scan: Ultra fast in-memory table scan using on-chip vector processing units. PVLDB, 2(1):385--394, 2009.
[30]
Y. Ye, K. A. Ross, and N. Vesdapunt. Scalable aggregation on multicore processors. In Proceedings of the Seventh International Workshop on Data Management on New Hardware, DaMoN '11, pages 1--9, New York, NY, USA, 2011. ACM.
[31]
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In SIGMOD Conference, pages 145--156, 2002.

Cited By

View all
  • (2023)Query Processing on Gaming ConsolesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595313(86-88)Online publication date: 18-Jun-2023
  • (2022)Accelerating Group-By and Aggregation on Heterogeneous CPU-GPU PlatformsAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-030-89698-0_100(980-990)Online publication date: 4-Jan-2022
  • (2021)Supporting Autonomous Vehicle Applications on the Heterogeneous System Architecture7th Conference on the Engineering of Computer Based Systems10.1145/3459960.3459970(1-8)Online publication date: 26-May-2021
  • Show More Cited By

Index Terms

  1. Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      DaMoN'15: Proceedings of the 11th International Workshop on Data Management on New Hardware
      May 2015
      100 pages
      ISBN:9781450336383
      DOI:10.1145/2771937
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 May 2015

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      Conference

      SIGMOD/PODS'15
      Sponsor:
      SIGMOD/PODS'15: International Conference on Management of Data
      May 31 - June 4, 2015
      VIC, Melbourne, Australia

      Acceptance Rates

      DaMoN'15 Paper Acceptance Rate 12 of 16 submissions, 75%;
      Overall Acceptance Rate 94 of 127 submissions, 74%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)10
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 01 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Query Processing on Gaming ConsolesProceedings of the 19th International Workshop on Data Management on New Hardware10.1145/3592980.3595313(86-88)Online publication date: 18-Jun-2023
      • (2022)Accelerating Group-By and Aggregation on Heterogeneous CPU-GPU PlatformsAdvances in Natural Computation, Fuzzy Systems and Knowledge Discovery10.1007/978-3-030-89698-0_100(980-990)Online publication date: 4-Jan-2022
      • (2021)Supporting Autonomous Vehicle Applications on the Heterogeneous System Architecture7th Conference on the Engineering of Computer Based Systems10.1145/3459960.3459970(1-8)Online publication date: 26-May-2021
      • (2020)XeFlow: Streamlining Inter-Processor Pipeline Execution for the Discrete CPU-GPU PlatformIEEE Transactions on Computers10.1109/TC.2020.296830269:6(819-831)Online publication date: 1-Jun-2020
      • (2019)A morsel-driven query execution engine for heterogeneous multi-coresProceedings of the VLDB Endowment10.14778/3352063.335213712:12(2218-2229)Online publication date: 1-Aug-2019
      • (2019)In-Memory Join Algorithms on GPUs for Large-Data2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)10.1109/HPCC/SmartCity/DSS.2019.00151(1060-1067)Online publication date: Aug-2019
      • (2018)Filtering Translation Bandwidth with Virtual CachingACM SIGPLAN Notices10.1145/3296957.317319553:2(113-127)Online publication date: 19-Mar-2018
      • (2018)Pipelined Query Processing in Coprocessor EnvironmentsProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3183734(1603-1618)Online publication date: 27-May-2018
      • (2018)Filtering Translation Bandwidth with Virtual CachingProceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3173162.3173195(113-127)Online publication date: 19-Mar-2018
      • (2017)SiliconDBProceedings of the 13th International Workshop on Data Management on New Hardware10.1145/3076113.3076124(1-4)Online publication date: 14-May-2017
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media