Skip to main content

A Throughput-Aware Analytical Performance Model for GPU Applications

  • Conference paper
Advanced Computer Architecture

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 451))

Abstract

Graphics processing units (GPUs) have shown increased popularity in general-purpose parallel processing. This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel to solve heavily data-parallel problems efficiently. However, despite the tremendous computing power, optimizing GPU kernels to achieve high performance is still a challenge due to the sea change from CPU to GPU and lacking of tools for programming and performance analysis.

In this paper, we propose a throughput-aware analytical model to estimate the performance of GPU kernels and optimizations. We construct a pipeline for global memory access servicing and redefine the compute throughput and memory throughput as the speed of memory requests arriving and leaving the pipeline. Based on concluding the kernel throughput limiting factor, GPU programs are classified into compute-bound and memory-bound categories and then we predict performance for each category. Besides, our model can provide useful information on the direction of optimization and predict the potential performance benefits. We demonstrate our model on a manually written benchmark as well as the matrix-multiply kernel and show that the geometric mean of absolute error of our model is less than 6.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Keckler, S.W., Dally, W.J., Khailany, B., et al.: GPUs and the future of parallel computing. IEEE Micro 31(5), 7–17 (2011)

    Article  Google Scholar 

  2. Advanced Micro Devices, Inc. AMD Brook+

    Google Scholar 

  3. NVIDIA Corporation. CUDA Programming Guide, Version 4.0

    Google Scholar 

  4. Stone, J.E., Gohara, D., Shi, G.: OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 12(3), 66 (2010)

    Article  Google Scholar 

  5. Owens, J.D., Houston, M., Luebke, D., et al.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)

    Article  Google Scholar 

  6. Lindholm, E., Nickolls, J., Oberman, S., et al.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)

    Article  Google Scholar 

  7. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  8. Zhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 382–393. IEEE (2011)

    Google Scholar 

  9. Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Computer Architecture News 37(3), 152–163 (2009)

    Article  MathSciNet  Google Scholar 

  10. Baghsorkhi, S.S., Delahaye, M., Patel, S.J., et al.: An adaptive performance modeling tool for GPU architectures. ACM Sigplan Notices 45(5), 105–114 (2010)

    Article  Google Scholar 

  11. Meng, J., Morozov, V.A., Kumaran, K., et al.: GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 14. ACM (2011)

    Google Scholar 

  12. Cui, Z., et al.: An accurate GPU performance model for effective control flow divergence optimization. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS). IEEE (2012)

    Google Scholar 

  13. Che, S., Boyer, M., Meng, J., et al.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, Z., Liu, G., Dong, W. (2014). A Throughput-Aware Analytical Performance Model for GPU Applications. In: Wu, J., Chen, H., Wang, X. (eds) Advanced Computer Architecture. Communications in Computer and Information Science, vol 451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44491-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-44491-7_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-44490-0

  • Online ISBN: 978-3-662-44491-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics