Abstract
The expanding hardware diversity in high performance computing adds enormous complexity to scientific software development. Developers who aim to write maintainable software have two options: 1) To use a so-called data locality abstraction that handles portability internally, thereby, performance-productivity becomes a trade off. Such abstractions usually come in the form of libraries, domain-specific languages, and run-time systems. 2) To use generic programming where performance, productivity and portability are subject to software design. In the direction of the second, this work describes a design approach that allows the integration of low-level and verbose programming tools into high-level generic algorithms based on template meta-programming in C++. This enables the development of performance-portable applications targeting host-device computer architectures, such as CPUs and GPUs. With a suitable design in place, the extensibility of generic algorithms to new hardware becomes a well defined procedure that can be developed in isolation from other parts of the code. That allows scientific software to be maintainable and efficient in a period of diversifying hardware in HPC. As proof of concept, a finite-difference modelling algorithm for the acoustic wave equation is developed and benchmarked using roofline model analysis on Intel Xeon Gold 6248 CPU, Nvidia Tesla V100 GPU, and AMD MI100 GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
In C++ this is so-called explicit (full) template specialization.
- 3.
#ifdef, #else, #endif, etc.
- 4.
https://github.com/ahadji05/pp-template/tree/main/include/algorithms.
- 5.
the add_source kernels has no parallelism to exploit, thus, it is neglected.
References
Ang, J.A., et. al.: Abstract machine models and proxy architectures for exascale computing, Lawrence Berkeley National Laboratory (2014)
Beckingsale, D.A., et al.: RAJA: portable performance for large-scale scientific applications. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA (2019)
Carter, H., et al.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parall. Distrib. Comput. 74(12), 3202–3216 (2014)
Deakin, T., et al.: Performance portability across diverse computing architectures. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Denver, CO, USA (2019)
Deakin, T., et al.: Evaluating attainable memory bandwidth of parallel programming models via BabelStream. Int. J. Comput. Sci. Eng. Special issue 17(3), 247–262 (2018)
Hasselbring, W.: Software architecture: past, present, future. In: The Essence of Software Engineering, pp. 169–184. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73897-0_10
Iglberger, K.: C++ Software Design: Design Principles and Patterns for High-Quality Software. O’Reilly Media Inc, 1005 (2022)
Johanson, N.A.: Software engineering for computational science: past, present, future. Comput. Sci. Eng. 20, 90–109 (2018)
Lilis, Y., Savidis, A.: A survey of metaprogramming languages. ACM Comput. Surv. 52(6), 1–39 (2019)
Prabhu, P., et al.: A survey of the practice of computational science. In: Association for Computing Machinery, New York, NY, USA, Article 19, 1–12 (2011)
Rompf, T., et al.: Go Meta! a case for generative programming and DSLs in performance critical systems. In: 1st Summit on Advances in Programming Languages (SNAPL 2015), Asilomar, CA, USA, May 3–6 (2015)
Stroustrup, B.: The C++ Programming Language, Fourth Edition, ch. 17, pp. 481–526. Addison-Wesley (2013)
Unat, D., et al.: Trends in data locality abstractions for HPC systems. IEEE Trans. Parall. Distrib. Syst. 28(10), 3007–3020 (2017)
Stylianou, C., Weiland, M.: Exploiting dynamic sparse matrices for performance portable linear algebra operations. In: 2022 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). Los Alamitos, CA, USA: IEEE Computer Society, Nov 2022, pp. 47–57 (2022)
Acknowledgment
This research is funded by Delphi Consortium at Delft University of Technology and the EPSRC project ASiMoV (EP/S005072/1). The experiments have been carried out on the Cyclone HPC system at the Cyprus Institute, and the Isambard 2 UK National Tier-2 HPC Service (http://gw4.ac.uk/isambard) operated by GW4 and the UK Met Office, and funded by EPSRC (EP/T022078/1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hadjigeorgiou, A., Stylianou, C., Weiland, M., Verschuur, D.J., Finkenrath, J. (2024). An Approach to Performance Portability Through Generic Programming. In: Zeinalipour, D., et al. Euro-Par 2023: Parallel Processing Workshops. Euro-Par 2023. Lecture Notes in Computer Science, vol 14351. Springer, Cham. https://doi.org/10.1007/978-3-031-50684-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-50684-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50683-3
Online ISBN: 978-3-031-50684-0
eBook Packages: Computer ScienceComputer Science (R0)