research-article

Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries

Authors:
Emily Furst

University of Washington

University of Washington
View Profile

,
Mark Oskin

University of Washington

University of Washington
View Profile

,
Bill Howe

University of Washington

University of Washington
View Profile

DAMON '17: Proceedings of the 13th International Workshop on Data Management on New HardwareMay 2017Article No.: 3Pages 1–6https://doi.org/10.1145/3076113.3076119

Published:14 May 2017Publication History

DAMON '17: Proceedings of the 13th International Workshop on Data Management on New Hardware

Pages 1–6

ABSTRACT

General Purpose computing on Graphics Processing Units (GPGPU) has become an increasingly popular option for accelerating database queries. However, GPUs are not well-suited for all types of queries as data transfer costs can often dominate query execution. We develop a methodology for quantifying how well databases utilize GPU architectures using proprietary profiling tools. By aggregating various profiling metrics, we break down the different aspects that comprise occupancy on the GPU across the runtime of query execution. We show that for the Alenka GPU database, only a small minority of execution time, roughly 5% is spent on the GPU. We further show that even on queries with seemingly good performance, a large portion of the achieved occupancy can actually be attributed to stalls and scalar instructions.

References

Nvprof, command line profiling tool. http://docs.nvidia.com/cuda/profiler-users-guide/.Google Scholar
TPC-H, Benchmark Specification. https://tpc.org/tpch/.Google Scholar
Alenka - A GPU Database Engine. https://github.com/antonmks/Alenka/, 2012--20017.Google Scholar
Bress, S., Heimel, M., Siegmund, N., Bellatreche, L., and Saake, G. Gpu-accelerated database systems: Survey and open challenges. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer, 2014, pp. 1--35.Google Scholar
Coutinho, B. R., Teodoro, G. L. M., Oliveira, R. S., Neto, D. O. G., and Ferreira, R. A. C. Profiling general purpose gpu applications. In Computer Architecture and High Performance Computing, 2009. SBAC-PAD'09. 21st International Symposium on (2009), IEEE, pp. 11--18. Google ScholarDigital Library
Gregg, C., and Hazelwood, K. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on (2011), IEEE, pp. 134--144.Google ScholarCross Ref
He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N. K., Luo, Q., and Sander, P. V. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS) 34, 4 (2009), 21.Google Scholar
He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., and Sander, P. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (2008), ACM, pp. 511--524. Google ScholarDigital Library
Hong, S., and Kim, H. An integrated gpu power and performance model. In ACM SIGARCH Computer Architecture News (2010), vol. 38, ACM, pp. 280--289. Google ScholarDigital Library
Mostak, T. An overview of mapd (massively parallel database). White paper. Massachusetts Institute of Technology (2013).Google Scholar
Sim, J., Dasgupta, A., Kim, H., and Vuduc, R. A performance analysis framework for identifying potential benefits in gpgpu applications. In ACM SIGPLAN Notices (2012), vol. 47, ACM, pp. 11--22. Google ScholarDigital Library
Team, A. D. T. Codexl quick start guide. https://github.com/GPUOpen-Tools/CodeXL/releases/download/v2.0/CodeXL_Quick_Start_Guide.pdf.Google Scholar
Vesely, J., Basu, A., Oskin, M., Loh, G. H., and Bhattacharjee, A. Observations and opportunities in architecting shared virtual memory for heterogeneous systems. In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on (2016), IEEE, pp. 161--171. Google ScholarCross Ref
Yuan, Y., Lee, R., and Zhang, X. The yin and yang of processing data ware-housing queries on gpu devices. Proceedings of the VLDB Endowment 6, 10 (2013), 817--828. Google ScholarDigital Library
Zhang, S., He, J., He, B., and Lu, M. Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures. Proceedings of the VLDB Endowment 6, 12 (2013), 1374--1377. Google ScholarDigital Library
Zhang, Y., and Owens, J. D. A quantitative performance analysis model for gpu architectures. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on (2011), IEEE, pp. 382--393. Google ScholarCross Ref

Index Terms

Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Single instruction, multiple data
2. Information systems
  1. Data management systems
    1. Database management system engines
      1. Parallel and distributed DBMSs
        Relational parallel and distributed DBMSs

Recommendations

Accelerating SQL database operations on a GPU with CUDA
GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units

Prior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on ...
Read More
Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation

We have successfully ported an arbitrary high-order discontinuous Galerkin (ADER-DG) method for solving the three-dimensional elastic seismic wave equation on unstructured tetrahedral meshes to an Nvidia Tesla C2075 GPU using the Nvidia CUDA programming ...
Read More
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance Computing

The graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

DAMON '17: Proceedings of the 13th International Workshop on Data Management on New Hardware
May 2017
70 pages
ISBN:9781450350259
DOI:10.1145/3076113

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 May 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
GPGPU
GPU database
profiling
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate80of102submissions,78%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 4
  Total Citations
  View Citations
- 482
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries

DAMON '17: Proceedings of the 13th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerating SQL database operations on a GPU with CUDA

Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries

DAMON '17: Proceedings of the 13th International Workshop on Data Management on New Hardware

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerating SQL database operations on a GPU with CUDA

Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation

On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media