Skip to main content
Log in

Parameter based tuning model for optimizing performance on GPU

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Recently, the graphic processing units (GPUs) are becoming increasingly popular for the high performance computing applications. Although the GPUs provide high peak performance, exploiting the full performance potential for application programs, however, leaves a challenging task to the programmers. When launching a parallel kernel of an application on the GPU, the programmer needs to carefully select the number of blocks (grid size) and the number of threads per block (block size). These values determine the degree of SIMD parallelism and the multithreading, and greatly influence the performance. With a huge range of possible combinations of these values, choosing the right grid size and the block size is not straightforward. In this paper, we propose a mathematical model for tuning the grid size and the block size based on the GPU architecture parameters. Using our model we first calculate a small set of candidate grid size and block size values, then search for the optimal values out of the candidate values through experiments. Our approach significantly reduces the potential search space instead of exhaustive search approaches in the previous research. Thus our approach can be practically applied to the real applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005)

    Article  Google Scholar 

  2. Group Khronos: OpenCL. https://www.khronos.org/opencl/ (2015)

  3. Nath, R., Tomov, S., Dongarra, J., Agullo, E.: Autotuning dense linear algebra libraries on gpus and overview of the magma library. In: 6th International Workshop on Parallel Matrix Algorithms and Applications (PMAA’10), June (2010)

  4. NVIDA: CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html, September (2015)

  5. OpenACC-standard.org.: OpenACC. http://www.openacc.org/, March (2012)

  6. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded gpu using cuda. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, pp. 73–82. ACM, New York (2008)

  7. Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: Understanding the impact of cuda tuning techniques for fermi. In: 2011 International Conference on High Performance Computing and Simulation (HPCS), pp. 631–639. IEEE (2011)

  8. Vuduc, R., Demmel, J.W., Yelick, K.A.: Oski: A library of automatically tuned sparse matrix kernels. J. Phys. 16, 521 (2005)

    Google Scholar 

  9. Whaley, R.C., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE Conference on Supercomputing, pp. 1–27. IEEE Computer Society (1998)

  10. Yang, Y., Xiang, P., Kong, J., Zhou, H.: An optimizing compiler for gpgpu programs with input-data sharing. In: ACM Sigplan Notices, vol. 45, pp. 343–344. ACM, New York (2010)

Download references

Acknowledgements

This research was supported by Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science, and Technology (NRF-2015M3C4A7065662).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Myungho Lee.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tran, NP., Lee, M. & Choi, J. Parameter based tuning model for optimizing performance on GPU. Cluster Comput 20, 2133–2142 (2017). https://doi.org/10.1007/s10586-017-1003-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-017-1003-4

Keywords

Navigation