Skip to main content

Predicting GPU Kernel’s Performance on Upcoming Architectures

  • Conference paper
  • First Online:
Euro-Par 2024: Parallel Processing (Euro-Par 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14801))

Included in the following conference series:

  • 864 Accesses

Abstract

With the advent of heterogeneous systems that combine CPUs and GPUs, designing a supercomputer becomes more and more complex. The hardware characteristics of GPUs significantly impact the performance. Choosing the GPU that will maximize performance for a limited budget is tedious because it requires predicting the performance on a non-existing hardware platform.

In this paper, we propose a new methodology for predicting the performance of kernels running on GPUs. This method analyzes the behavior of an application running on an existing platform, and projects its performance on another GPU based on the target hardware characteristics. The performance projection relies on a hierarchical roofline model as well as on a comparison of the kernel’s assembly instructions of both GPUs to estimate the operational intensity of the target GPU.

We demonstrate the validity of our methodology on modern NVIDIA GPUs on several mini-applications. The experiments show that the performance is predicted with a mean absolute percentage error of 20.3 % for LULESH, 10.2 % for MiniMDock, and 5.9 % for Quicksilver.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abdelkhalik, H., Arafa, Y., Santhi, N., Badawy, A.H.: Demystifying the nvidia ampere architecture through microbenchmarking and instruction-level analysis. In: 2022 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–8 (2022)

    Google Scholar 

  2. Ardalani, N., Lestourgeon, C., Sankaralingam, K., Zhu, X.: Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance. In: Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2015)

    Google Scholar 

  3. Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: IEEE International Symposium on Performance Analysis of Systems and Software (2009)

    Google Scholar 

  4. Benatia, A., Ji, W., Wang, Y., Shi, F.: Machine learning approach for the predicting performance of SpMV on GPU. In: IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS) (2016)

    Google Scholar 

  5. Binkert, N., et al.: The gem5 simulator. ACM SIGARCH Comput. Archit. News 39, 1–7 (2011)

    Article  Google Scholar 

  6. Ding, N., Awan, M., Williams, S.: Instruction roofline: an insightful visual performance model for GPUs. Concurrency Comput. Pract. Experience 34, e6591 (2022)

    Article  Google Scholar 

  7. Domke, J., et al.: At the locus of performance: quantifying the effects of copious 3D-stacked cache on HPC workloads. ACM Trans. Archit. Code Optim. 20(4), 1–26 (2023)

    Article  Google Scholar 

  8. Gavoille, C., Taboada, H., Carribault, P., Dupros, F., Goglin, B., Jeannot, E.: Relative Performance Projection on Arm Architectures. In: Cano, J., Trinder, P. (eds.) Euro-Par 2022: Parallel Processing. Lecture Notes in Computer Science, vol. 13440. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-12597-3_6

  9. Gu, Y., Wu, W., Li, Y., Chen, L.: UVMBench: a comprehensive benchmark suite for researching unified virtual memory in GPUs. arXiv:2007.09822(2020)

  10. Karlin, I., Keasler, J., Neely, R.: Lulesh 2.0 updates and changes. Tech. rep. (2013)

    Google Scholar 

  11. Khairy, M., Shen, Z., Aamodt, T.M., Rogers, T.G.: Accel-Sim: an extensible simulation framework for validated GPU modeling. In: ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) (2020)

    Google Scholar 

  12. Konstantinidis, E., Cotronis, Y.: A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling. J. Parallel Distrib. Comput. 107, 37–56 (2017)

    Article  Google Scholar 

  13. Kwack, J., Arnold, G., Mendes, C., Bauer, G.H.: Roofline analysis with cray performance analysis tools (CrayPat) and roofline-based performance projections for a future architecture. Concurrency Comput. Pract. Experience 31, e4963 (2019)

    Article  Google Scholar 

  14. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Committee Comput. Archit. (TCCA) Newsl. 2(19–25) (1995)

    Google Scholar 

  15. NVIDIA: CUDA C++ Programming Guide (2020). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  16. NVIDIA: Nvidia Nsight Compute. https://docs.nvidia.com/nsight-compute/NsightCompute/index.html

  17. Petitet, A., et al.: HPL - a portable implementation of the high-performance linpack benchmark for distributed-memory computers (2008)

    Google Scholar 

  18. Richards, D., Brantley, P., Dawson, S., Mckenley, S., O’Brien, M.: Quicksilver, version 00 (2016). https://www.osti.gov/biblio/1313660

  19. Thavappiragasam, M., Scheinberg, A., Elwasif, W., Hernandez, O., Sedova, A.: Performance portability of molecular docking miniapp on leadership computing platforms. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) (2020)

    Google Scholar 

  20. Wang, Q., Chu, X.: GPGPU performance estimation with core and memory frequency scaling. IEEE Trans. Parallel Distrib. Syst. 31(12), 2865–2881 (2020)

    Article  Google Scholar 

  21. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  22. Yang, C., Kurth, T., Williams, S.: Hierarchical roofline analysis for GPUs: accelerating performance optimization for the NERSC-9 perlmutter system. Concurrency Comput. Pract. Experience 32, e5547 (2020)

    Article  Google Scholar 

  23. Yang, C., Wang, Y., Kurth, T., Farrell, S., Williams, S.: Hierarchical roofline performance analysis for deep learning applications. In: Intelligent Computing: Proceedings of the 2021 Computing Conference, vol. 2, pp. 473–491 (2021)

    Google Scholar 

  24. Yang, C., et al.: An empirical roofline methodology for quantitatively assessing performance portability. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC) (2018)

    Google Scholar 

Download references

Acknowledgments

We thank the University of Oregon and the OACISS team for the use of their machines.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lucas Van Lanker .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Van Lanker, L., Taboada, H., Brunet, E., Trahay, F. (2024). Predicting GPU Kernel’s Performance on Upcoming Architectures. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14801. Springer, Cham. https://doi.org/10.1007/978-3-031-69577-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69577-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69576-6

  • Online ISBN: 978-3-031-69577-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics