A Throughput-Aware Analytical Performance Model for GPU Applications

Hu, Zhidan; Liu, Guangming; Dong, Wenrui

doi:10.1007/978-3-662-44491-7_8

Zhidan Hu¹⁵,
Guangming Liu¹⁵ &
Wenrui Dong¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 451))

979 Accesses
1 Citations

Abstract

Graphics processing units (GPUs) have shown increased popularity in general-purpose parallel processing. This massively parallel architecture allows GPUs to execute tens of thousands of threads in parallel to solve heavily data-parallel problems efficiently. However, despite the tremendous computing power, optimizing GPU kernels to achieve high performance is still a challenge due to the sea change from CPU to GPU and lacking of tools for programming and performance analysis.

In this paper, we propose a throughput-aware analytical model to estimate the performance of GPU kernels and optimizations. We construct a pipeline for global memory access servicing and redefine the compute throughput and memory throughput as the speed of memory requests arriving and leaving the pipeline. Based on concluding the kernel throughput limiting factor, GPU programs are classified into compute-bound and memory-bound categories and then we predict performance for each category. Besides, our model can provide useful information on the direction of optimization and predict the potential performance benefits. We demonstrate our model on a manually written benchmark as well as the matrix-multiply kernel and show that the geometric mean of absolute error of our model is less than 6.5%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Keckler, S.W., Dally, W.J., Khailany, B., et al.: GPUs and the future of parallel computing. IEEE Micro 31(5), 7–17 (2011)
Article Google Scholar
Advanced Micro Devices, Inc. AMD Brook+
Google Scholar
NVIDIA Corporation. CUDA Programming Guide, Version 4.0
Google Scholar
Stone, J.E., Gohara, D., Shi, G.: OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in Science & Engineering 12(3), 66 (2010)
Article Google Scholar
Owens, J.D., Houston, M., Luebke, D., et al.: GPU computing. Proceedings of the IEEE 96(5), 879–899 (2008)
Article Google Scholar
Lindholm, E., Nickolls, J., Oberman, S., et al.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28(2), 39–55 (2008)
Article Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM 52(4), 65–76 (2009)
Article Google Scholar
Zhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA), pp. 382–393. IEEE (2011)
Google Scholar
Hong, S., Kim, H.: An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness. ACM SIGARCH Computer Architecture News 37(3), 152–163 (2009)
Article MathSciNet Google Scholar
Baghsorkhi, S.S., Delahaye, M., Patel, S.J., et al.: An adaptive performance modeling tool for GPU architectures. ACM Sigplan Notices 45(5), 105–114 (2010)
Article Google Scholar
Meng, J., Morozov, V.A., Kumaran, K., et al.: GROPHECY: GPU performance projection from CPU code skeletons. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 14. ACM (2011)
Google Scholar
Cui, Z., et al.: An accurate GPU performance model for effective control flow divergence optimization. In: 2012 IEEE 26th International Parallel & Distributed Processing Symposium (IPDPS). IEEE (2012)
Google Scholar
Che, S., Boyer, M., Meng, J., et al.: Rodinia: A benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Hunan, China
Zhidan Hu, Guangming Liu & Wenrui Dong

Authors

Zhidan Hu
View author publications
You can also search for this author in PubMed Google Scholar
Guangming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenrui Dong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, 410073, Changsha, China
Junjie Wu
Shanghai Jiao Tong University, 200240, Shanghai, China
Haibo Chen
College of Information Science and Engineering, Northeastern University Shenyang, China
Xingwei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, Z., Liu, G., Dong, W. (2014). A Throughput-Aware Analytical Performance Model for GPU Applications. In: Wu, J., Chen, H., Wang, X. (eds) Advanced Computer Architecture. Communications in Computer and Information Science, vol 451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44491-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-662-44491-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44490-0
Online ISBN: 978-3-662-44491-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics