ABSTRACT
General Purpose computing on Graphics Processing Units (GPGPU) has become an increasingly popular option for accelerating database queries. However, GPUs are not well-suited for all types of queries as data transfer costs can often dominate query execution. We develop a methodology for quantifying how well databases utilize GPU architectures using proprietary profiling tools. By aggregating various profiling metrics, we break down the different aspects that comprise occupancy on the GPU across the runtime of query execution. We show that for the Alenka GPU database, only a small minority of execution time, roughly 5% is spent on the GPU. We further show that even on queries with seemingly good performance, a large portion of the achieved occupancy can actually be attributed to stalls and scalar instructions.
- Nvprof, command line profiling tool. http://docs.nvidia.com/cuda/profiler-users-guide/.Google Scholar
- TPC-H, Benchmark Specification. https://tpc.org/tpch/.Google Scholar
- Alenka - A GPU Database Engine. https://github.com/antonmks/Alenka/, 2012--20017.Google Scholar
- Bress, S., Heimel, M., Siegmund, N., Bellatreche, L., and Saake, G. Gpu-accelerated database systems: Survey and open challenges. In Transactions on Large-Scale Data-and Knowledge-Centered Systems XV. Springer, 2014, pp. 1--35.Google Scholar
- Coutinho, B. R., Teodoro, G. L. M., Oliveira, R. S., Neto, D. O. G., and Ferreira, R. A. C. Profiling general purpose gpu applications. In Computer Architecture and High Performance Computing, 2009. SBAC-PAD'09. 21st International Symposium on (2009), IEEE, pp. 11--18. Google ScholarDigital Library
- Gregg, C., and Hazelwood, K. Where is the data? why you cannot debate cpu vs. gpu performance without the answer. In Performance Analysis of Systems and Software (ISPASS), 2011 IEEE International Symposium on (2011), IEEE, pp. 134--144.Google ScholarCross Ref
- He, B., Lu, M., Yang, K., Fang, R., Govindaraju, N. K., Luo, Q., and Sander, P. V. Relational query coprocessing on graphics processors. ACM Transactions on Database Systems (TODS) 34, 4 (2009), 21.Google Scholar
- He, B., Yang, K., Fang, R., Lu, M., Govindaraju, N., Luo, Q., and Sander, P. Relational joins on graphics processors. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (2008), ACM, pp. 511--524. Google ScholarDigital Library
- Hong, S., and Kim, H. An integrated gpu power and performance model. In ACM SIGARCH Computer Architecture News (2010), vol. 38, ACM, pp. 280--289. Google ScholarDigital Library
- Mostak, T. An overview of mapd (massively parallel database). White paper. Massachusetts Institute of Technology (2013).Google Scholar
- Sim, J., Dasgupta, A., Kim, H., and Vuduc, R. A performance analysis framework for identifying potential benefits in gpgpu applications. In ACM SIGPLAN Notices (2012), vol. 47, ACM, pp. 11--22. Google ScholarDigital Library
- Team, A. D. T. Codexl quick start guide. https://github.com/GPUOpen-Tools/CodeXL/releases/download/v2.0/CodeXL_Quick_Start_Guide.pdf.Google Scholar
- Vesely, J., Basu, A., Oskin, M., Loh, G. H., and Bhattacharjee, A. Observations and opportunities in architecting shared virtual memory for heterogeneous systems. In Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on (2016), IEEE, pp. 161--171. Google ScholarCross Ref
- Yuan, Y., Lee, R., and Zhang, X. The yin and yang of processing data ware-housing queries on gpu devices. Proceedings of the VLDB Endowment 6, 10 (2013), 817--828. Google ScholarDigital Library
- Zhang, S., He, J., He, B., and Lu, M. Omnidb: Towards portable and efficient query processing on parallel cpu/gpu architectures. Proceedings of the VLDB Endowment 6, 12 (2013), 1374--1377. Google ScholarDigital Library
- Zhang, Y., and Owens, J. D. A quantitative performance analysis model for gpu architectures. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on (2011), IEEE, pp. 382--393. Google ScholarCross Ref
Index Terms
- Profiling a GPU database implementation: a holistic view of GPU resource utilization on TPC-H queries
Recommendations
Accelerating SQL database operations on a GPU with CUDA
GPGPU-3: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing UnitsPrior work has shown dramatic acceleration for various database operations on GPUs, but only using primitives that are not part of conventional database languages such as SQL. This paper implements a subset of the SQLite command processor directly on ...
Accelerating the discontinuous Galerkin method for seismic wave propagation simulations using the graphic processing unit (GPU)-single-GPU implementation
We have successfully ported an arbitrary high-order discontinuous Galerkin (ADER-DG) method for solving the three-dimensional elastic seismic wave equation on unstructured tetrahedral meshes to an Nvidia Tesla C2075 GPU using the Nvidia CUDA programming ...
On the Efficacy of a Fused CPU+GPU Processor (or APU) for Parallel Computing
SAAHPC '11: Proceedings of the 2011 Symposium on Application Accelerators in High-Performance ComputingThe graphics processing unit (GPU) has made significant strides as an accelerator in parallel computing. However, because the GPU has resided out on PCIe as a discrete device, the performance of GPU applications can be bottlenecked by data transfers ...
Comments