Skip to main content

Automatic Tuning of CUDA Execution Parameters for Stencil Processing

  • Chapter
  • First Online:
Book cover Software Automatic Tuning

Abstract

Recently, Compute Unified Device Architecture (CUDA) has enabled Graphics Processing Units (GPUs) to accelerate various applications. However, to exploit the GPU’s computing power fully, a programmer has to carefully adjust some CUDA execution parameters even for simple stencil processing kernels. Hence, this paper develops an automatic parameter tuning mechanism based on profiling to predict the optimal execution parameters. This paper first discusses the scope of the parameter exploration space determined by GPU’s architectural restrictions. To find the optimal execution parameters, performance models are created by profiling execution times of kernel using each promising parameter configuration. The execution parameters are determined by using those performance models. This paper evaluates the performance improvement due to the proposed mechanism using two benchmark programs. From the evaluation results, it is clarified that the proposed mechanism can appropriately select a suboptimal Cooperative Thread Array (CTA) configuration whose performance is comparable to the optimal one.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The current SPRAT compiler does not generate a code that dynamically allocates the shared memory, and therefore the dynamically-allocated shared memory size is not considered here.

References

  1. GPGPU.org : GPGPU General-Purpose Computation on Graphics Hardware http://gpgpu.org

  2. NVIDIA Corporation : CUDA ZONE http://www.nvidia.com/object/cuda_home.html

  3. NVIDIA Corporation (2008) NVIDIA CUDA Compute Unified Device Architecture programming guide version 2.0

    Google Scholar 

  4. AMD Corporation (2009) ATI STREAM ATI stream computing user guide version 1.4 beta

    Google Scholar 

  5. Papakipos M (2006) SC06 GPGPU Course: PeakStream Platform. In: the ACM/IEEE SC06 tutorial

    Google Scholar 

  6. McCool MD et al (2006) Performance Evaluation of GPUs Using the RapidMind Development Platform. In: poster reception at the ACM/IEEE SC06

    Google Scholar 

  7. Ueng SZ, Lathara M, Baghsorkhi SS, Hwu WMW (2008) CUDA-Lite: Reducing GPU Programming Complexity. In: Languages and Compilers for Parallel Computing: 21th International Workshop, LCPC 2008, Edmonton, Canada, July 31–Aug 2, 2008, Revised Selected Papers, Springer, Berlin, pp 1–15

    Google Scholar 

  8. Ryoo S, Rodrigues CI, Baghsorkhi SS, Stone SS, Kirk DB, Hwu WMW (2008) Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP ’08: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, ACM, New York, pp 73–82

    Google Scholar 

  9. Buck I et al (2004) Brook for GPUs: Stream Computing on Graphics Hardware. ACM Trans Graph 23(3):777–786

    Article  Google Scholar 

  10. Han TD, Abdelrahman TS (2009) hiCUDA: a high-level directive-based language for GPU programming. In: GPGPU-2: Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, ACM, New York, 52–61

    Google Scholar 

  11. Takizawa H, Sato K, Kobayashi H (2008) SPRAT: Runtime processor selection for energy-aware computing. 2008 IEEE International Conference on Cluster Computing (29 2008–Oct. 1 2008) pp 386–393

    Google Scholar 

  12. Flynn MJ (1972) Some computer organizations and their effectiveness. Comput IEEE Trans C-21(9):948–960

    Google Scholar 

  13. Lindholm E, Nickolls J, Oberman S, Montrym J (2008) NVIDIA tesla: a unified graphics and computing architecture. IEEE Micro 28:39–55

    Article  Google Scholar 

  14. Kongetira P, Aingaran K, Olukotun K (2005) Niagara: a 32-way multithreaded Sparc processor. Micro IEEE 25(2):21–29

    Article  Google Scholar 

  15. Cormen TH, Leiserson CE, Rivest LR, Stein C (2001) In: Introduction to algorithms, 2 edn. MIT, Cambridge, Massachusetts 02142, 762–766

    Google Scholar 

  16. Khronos OpenCL Working Group : The OpenCL Specification version 1.0 http://www.khronos.org/opencl/.

Download references

Acknowledgement

The authors would like to acknowledge support from the Tohoku University Global COE Program on World Center of Education and Research for Trans-disciplinary Flow Dynamics. This work was partially supported by Grants-in-Aid for Young Scientists(B) #21700049 and Scientific Research (B) #21300007, by NAKAYAMA HAYAO Foundation for Science & Technology and Culture, and by JST, CREST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroyuki Takizawa .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer New York

About this chapter

Cite this chapter

Sato, K., Takizawa, H., Komatsu, K., Kobayashi, H. (2011). Automatic Tuning of CUDA Execution Parameters for Stencil Processing. In: Naono, K., Teranishi, K., Cavazos, J., Suda, R. (eds) Software Automatic Tuning. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6935-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6935-4_13

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-6934-7

  • Online ISBN: 978-1-4419-6935-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics