Parameter based tuning model for optimizing performance on GPU

Tran, Nhat-Phuong; Lee, Myungho; Choi, Jaeyoung

doi:10.1007/s10586-017-1003-4

Parameter based tuning model for optimizing performance on GPU

Published: 01 July 2017

Volume 20, pages 2133–2142, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

280 Accesses
4 Citations
Explore all metrics

Abstract

Recently, the graphic processing units (GPUs) are becoming increasingly popular for the high performance computing applications. Although the GPUs provide high peak performance, exploiting the full performance potential for application programs, however, leaves a challenging task to the programmers. When launching a parallel kernel of an application on the GPU, the programmer needs to carefully select the number of blocks (grid size) and the number of threads per block (block size). These values determine the degree of SIMD parallelism and the multithreading, and greatly influence the performance. With a huge range of possible combinations of these values, choosing the right grid size and the block size is not straightforward. In this paper, we propose a mathematical model for tuning the grid size and the block size based on the GPU architecture parameters. Using our model we first calculate a small set of candidate grid size and block size values, then search for the optimal values out of the candidate values through experiments. Our approach significantly reduces the potential search space instead of exhaustive search approaches in the previous research. Thus our approach can be practically applied to the real applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can GPU performance increase faster than the code error rate?

Article Open access 18 April 2024

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Efficient High-Level Programming in Plain Java

Article 05 December 2022

References

Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)
Article Google Scholar
Group Khronos: OpenCL. https://www.khronos.org/opencl/ (2015)
Nath, R., Tomov, S., Dongarra, J., Agullo, E.: Autotuning dense linear algebra libraries on gpus and overview of the magma library. In: 6th International Workshop on Parallel Matrix Algorithms and Applications (PMAA’10), June (2010)
NVIDA: CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html, September (2015)
OpenACC-standard.org.: OpenACC. http://www.openacc.org/, March (2012)
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82. ACM, New York (2008)
Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: Understanding the impact of cuda tuning techniques for fermi. In: 2011 International Conference on High Performance Computing and Simulation (HPCS), pp. 631–639. IEEE (2011)
Vuduc, R., Demmel, J.W., Yelick, K.A.: Oski: A library of automatically tuned sparse matrix kernels. J. Phys. 16, 521 (2005)
Google Scholar
Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, pp. 1–27. IEEE Computer Society (1998)
Yang, Y., Xiang, P., Kong, J., Zhou, H.: An optimizing compiler for gpgpu programs with input-data sharing. In: ACM Sigplan Notices, vol. 45, pp. 343–344. ACM, New York (2010)

Download references

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (NRF-2015M3C4A7065662).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Myongji University, 116 Myongji-ro, Cheoin-gu, Yongin, Gyeonggi-do, Korea
Nhat-Phuong Tran & Myungho Lee
School of Computer Science and Engineering, Soongsil University, 369 Sangdo-ro, Dongjak-gu, Seoul, Korea
Jaeyoung Choi

Authors

Nhat-Phuong Tran
View author publications
You can also search for this author in PubMed Google Scholar
Myungho Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jaeyoung Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Myungho Lee.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tran, NP., Lee, M. & Choi, J. Parameter based tuning model for optimizing performance on GPU. Cluster Comput 20, 2133–2142 (2017). https://doi.org/10.1007/s10586-017-1003-4

Download citation

Received: 28 February 2017
Revised: 30 April 2017
Accepted: 19 June 2017
Published: 01 July 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-1003-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parameter based tuning model for optimizing performance on GPU

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parameter based tuning model for optimizing performance on GPU

Abstract

Access this article

Similar content being viewed by others

Can GPU performance increase faster than the code error rate?

Parallelizing the dual revised simplex method

Efficient High-Level Programming in Plain Java

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation