Performance Porting the ExaStar Multi-Physics App Thornado On Heterogeneous Systems - A Fortran-OpenMP Code-Base Evaluation

Thavappiragasam, Mathialakan; Harris, J. Austin; Endeve, Eirik; Videau, Brice

doi:10.1007/978-3-031-72567-8_2

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15195))

Included in the following conference series:

International Workshop on OpenMP

301 Accesses

Abstract

The heterogeneity of HPC systems requires efficient host-to-device porting of compute kernels and high-bandwidth data communication. This capability varies from one system to another depending on system architectures and environments. New vendors such as AMD and Intel are entering the GPU field, creating a software portability challenge. Major scientific simulation code bases rely on Fortran and require portable programming models for performance porting to HPC systems with high software productivity. Even though OpenMP target offloading features support portability, most Fortran-OpenMP code bases face significant challenges. Hence, in this work, we motivated an evaluation of a) the computing capability of heterogeneous systems for Fortran-OpenMP-based multi-physics code bases, and b) the performance portability of the astrophysical supernova simulation code Flash-X on heterogeneous systems. For this study, three HPC systems were chosen: Sunspot, a test-bed system of the Intel-PVC GPU featured supercomputer Aurora and Polaris, an NVIDIA system accelerated by A100 GPU, both located at the Argonne Leadership Computing Facility (ALCF), and the AMD-MI250-based Frontier at the Oak Ridge Leadership Computing Facility (OLCF). We discuss challenges and solutions for performance porting the compute-intensive module Thornado, which can be incorporated as an external library in Flash-X to model neutrino transport. We show that the performance of test apps improved by approximately 24$\times $ using the relevant optimization strategies + compiler-and-system updates. Further, this study helped improve the intel OneAPI-OpenMP compiler by providing bug reports and reproducers internally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Almgren, A., et al.: CASTRO: a massively parallel compressible astrophysics simulation code. J. Open Source Softw. 5(54), 2513 (2020). https://doi.org/10.21105/joss.02513
Argonne Leadership Computing Facility: Aurora (2023). https://www.alcf.anl.gov/aurora
Argonne Leadership Computing Facility: JLSE (2023). https://www.jlse.anl.gov/hardware-under-development/
Argonne Leadership Computing Facility: Polaris (2023). https://docs.alcf.anl.gov/polaris/hardware-overview/machine-overview/
Argonne Leadership Computing Facility: Sunspot (2023). https://www.alcf.anl.gov/support-center/aurorasunspot/getting-started-sunspot
Ascher, U., Ruuth, S., Spiteri, R.: Implicit-explicit Runge-Kutta methods for time-dependent partial differential equations. Appl. Numer. Math. 25, 151–167 (1997)
Article MathSciNet Google Scholar
Bruenn, S.W., et al.: CHIMERA: a massively parallel code for core-collapse supernova simulations. APJS 248(1), 11 (2020). https://doi.org/10.3847/1538-4365/ab7aff
Article Google Scholar
Cardall, C.Y., Endeve, E., Mezzacappa, A.: Conservative 3+1 general relativistic variable Eddington tensor radiation transport equations. Phys. Rev. D 87, 103004 (2013)
Article Google Scholar
Chapman, B., et al.: Outcomes of OpenMP hackathon: OpenMP application experiences with the offloading model (Part II). In: McIntosh-Smith, S., de Supinski, B.R., Klinkenberg, J. (eds.) IWOMP 2021. LNCS, vol. 12870, pp. 81–95. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-85262-7_6
Chapter Google Scholar
Christlieb, A.J., Guthrey, P.T., Sands, W.A., Thavappiragasm, M.: Parallel algorithms for successive convolution. J. Sci. Comput. 86, 1–44 (2021)
Article MathSciNet Google Scholar
Chu, R., Endeve, E., Hauck, C., Mezzacappa, A.: Realizability-preserving DG-IMEX method for the two-moment model of fermion transport. J. Comput. Phys. 389, 62–93 (2019)
Article MathSciNet Google Scholar
Clauss, P., Altintas, E., Kuhn, M.: Automatic collapsing of non-rectangular loops. In: 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 778–787. IEEE (2017)
Google Scholar
Cockburn, B., Shu, C.W.: Runge-Kutta discontinuous Galerkin methods for convection-dominated problems. J. Sci. Comput. 16, 173–261 (2001)
Article MathSciNet Google Scholar
Corporation, I.: Developer guide: oneAPI GPU optimization guide (2023). https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/overview.html
Dubey, A., Weide, K., O’Neal, J., Dhruv, A., Couch, S., Harris, J.A., Klosterman, T., Jain, R., Rudi, J., Messer, B., et al.: Flash-x: a multiphysics simulation software instrument. SoftwareX 19, 101168 (2022)
Article Google Scholar
Georgakoudis, G., Parasyris, K., Liao, C., Beckingsale, D., Gamblin, T., de Supinski, B.: Machine learning-driven adaptive OpenMP for portable performance on heterogeneous systems. arXiv preprint arXiv:2303.08873 (2023)
Harris, A.: wlInterpolationModule. https://github.com/starkiller-astro/weaklib/blob/89c2ff3228c37022e74e3bb98290a1c9a52ba93e/Distributions/Library/wlInterpolationModule.F90
Harris, J.A., et al.: Exascale models of stellar explosions: quintessential multi-physics simulation. Int. J. High Perform. Comput. Appl. 36(1), 59–77 (2022)
Article Google Scholar
Intel-Corporation: Compilation Flow Overview. https://www.intel.com/content/www/us/en/docs/oneapi/programming-guide/2023-2/compilation-flow-overview.html
Intel-Corporation: Intel Data Center GPU Max Series Overview. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-data-center-gpu-max-series-overview.html#gs.25c0bs
Intel-Corporation: oneAPI GPU Optimization Guide. https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/ahead-of-time-compilation.html
Just, O., Obergaulinger, M., Janka, H.T.: A new multidimensional, energy-dependent two-moment transport code for neutrino-hydrodynamics. MNRAS 453, 3386–3413 (2015)
Article Google Scholar
Kuroda, T., Takiwaki, T., Kotake, K.: A new multi-energy neutrino radiation-hydrodynamics code in full general relativity and its application to the gravitational collapse of massive stars. Astrophys. J. Suppl. Ser. 222(2), 20 (2016). https://doi.org/10.3847/0067-0049/222/2/20
Article Google Scholar
Laiu, M.P., Endeve, E., Chu, R., Harris, J.A., Messer, O.E.B.: A DG-IMEX method for two-moment neutrino transport: nonlinear solvers for neutrino-matter coupling*. Astrophys. J. Suppl. Ser. 253(2), 52 (2021). https://doi.org/10.3847/1538-4365/abe2a8
Laiu, M.P., Endeve, E., Harris, J.A., Elledge, Z., Mezzacappa, A.: DG-IMEX method for a two-moment model for radiation transport in the $\cal{O}(v/c)$ Limit. arXiv e-prints arXiv:2309.04429 (2023). https://doi.org/10.48550/arXiv.2309.04429
Liska, M.T.P., et al.: H-AMR: a new GPU-accelerated GRMHD code for exascale computing with 3D adaptive mesh refinement and local adaptive time stepping. APJS 263(2), 26 (2022). https://doi.org/10.3847/1538-4365/ac9966
Article Google Scholar
Luebke, D.: CUDA: scalable parallel programming for high-performance scientific computing. In: 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, pp. 836–838 (2008). https://doi.org/10.1109/ISBI.2008.4541126
Martineau, M., McIntosh-Smith, S., Gaudin, W.: Evaluating OpenMP 4.0’s effectiveness as a heterogeneous parallel programming model. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 338–347 (2016). https://doi.org/10.1109/IPDPSW.2016.70
Mei, X., Chu, X.: Dissecting GPU memory hierarchy through microbenchmarking. IEEE Trans. Parallel Distrib. Syst. 28(1), 72–86 (2016)
Article Google Scholar
NVIDIA: NVIDIA Ampere Architecture In-Depth. https://developer.nvidia.com/blog/nvidia-ampere-architecture-in-depth
NVIDIA: PTX Compiler APIs. https://docs.nvidia.com/cuda/ptx-compiler-api/index.html
Oak Ridge Leadership Computing Facility: Frontier user guide (2023). https://docs.olcf.ornl.gov/systems/frontier_user_guide.html
Oak Ridge Leadership Computing Facility: Summit (2023). https://docs.olcf.ornl.gov/systems/summit_user_guide.html
Pareschi, L., Russo, G.: Implicit-explicit Runge-Kutta schemes and application to hyperbolic systems with relaxation. J. Sci. Comput. 25, 129–155 (2005)
MathSciNet Google Scholar
Pophale, S., et al.: Outcomes of OpenMP hackathon: OpenMP application experiences with the offloading mode. Technical report, Brookhaven National Lab.(BNL), Upton, NY (United States) (2021)
Google Scholar
Shankar, S., Mösta, P., Brandt, S.R., Haas, R., Schnetter, E., de Graaf, Y.: GRaM-X: a new GPU-accelerated dynamical spacetime GRMHD code for Exascale computing with the Einstein toolkit. Class. Quantum Gravity 40(20), 205009 (2023). https://doi.org/10.1088/1361-6382/acf2d9
Article MathSciNet Google Scholar
Shibata, M., Kiuchi, K., Sekiguchi, Y., Suwa, Y.: Truncated moment formalism for radiation hydrodynamics in numerical relativity. Progress Theoret. Phys. 125, 1255–1287 (2011)
Article Google Scholar
Skinner, M.A., Dolence, J.C., Burrows, A., Radice, D., Vartanyan, D.: FORNAX: a flexible code for Multiphysics astrophysical simulations. ApJS 241, 7 (2019)
Article Google Scholar
Vergara Larrea, V.G., Budiardja, R.D., Gayatri, R., Daley, C., Hernandez, O., Joubert, W.: Experiences in porting mini-applications to OpenACC and OpenMP on heterogeneous systems. Concurrency Comput. Pract. Exper. 32(20), e5780 (2020)
Article Google Scholar
White, C.J., et al.: An extension of the Athena++ code framework for radiation-magnetohydrodynamics in general relativity using a finite-solid-angle discretization. APJ 949(2), 103 (2023). https://doi.org/10.3847/1538-4357/acc8cf
Article Google Scholar
Wibking, B.D., Krumholz, M.R.: QUOKKA: a code for two-moment AMR radiation hydrodynamics on GPUs. MNRAS 512(1), 1430–1449 (2022). https://doi.org/10.1093/mnras/stac439
Article Google Scholar
Wienke, S., Springer, P., Terboven, C., an Mey, D.: OpenACC — first experiences with real-world applications. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par 2012. LNCS, vol. 7484, pp. 859–870. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32820-6_85
Chapter Google Scholar
Wu, X., et al.: ytopt: Autotuning scientific applications for energy efficiency at large scales. arXiv preprint arXiv:2303.16245 (2023)

Download references

Acknowledgments

This work was supported by the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357, and by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of two U.S. Department of Energy organizations (Office of Science and the National Nuclear Security Administration). This research also used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We also gratefully acknowledge the computing resources provided and operated by the Joint Laboratory for System Evaluation (JLSE) at Argonne National Laboratory(http://energy.gov/downloads/doe-public-access-plan).

We extend our gratitude to Shaoping Quan, Dahai Guo, and William Dieter from Intel for their invaluable help and guidance in successfully completing this work. We would also like to thank Colleen Bertoni and Thomas Applencourt from Argonne for their fruitful discussions and timely guidance.

Author information

Authors and Affiliations

Argonne National Laboratory, Lemont, IL, USA
Mathialakan Thavappiragasam & Brice Videau
Oak Ridge National Laboratory, Oak Ridge, TN, USA
J. Austin Harris & Eirik Endeve

Authors

Mathialakan Thavappiragasam
View author publications
You can also search for this author in PubMed Google Scholar
J. Austin Harris
View author publications
You can also search for this author in PubMed Google Scholar
Eirik Endeve
View author publications
You can also search for this author in PubMed Google Scholar
Brice Videau
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathialakan Thavappiragasam .

Editor information

Editors and Affiliations

Pawsey Supercomputing Centre, Kensington, WA, Australia
Alexis Espinosa
AMD and OpenMP ARB, Beaverton, OR, USA
Michael Klemm
Lawrence Livermore National Laboratory, Livermore, CA, USA
Bronis R. de Supinski
Pawsey Supercomputing Centre, Kensington, WA, Australia
Maciej Cytowski
RWTH Aachen University, Aachen, Germany
Jannis Klinkenberg

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thavappiragasam, M., Harris, J.A., Endeve, E., Videau, B. (2024). Performance Porting the ExaStar Multi-Physics App Thornado On Heterogeneous Systems - A Fortran-OpenMP Code-Base Evaluation. In: Espinosa, A., Klemm, M., de Supinski, B.R., Cytowski, M., Klinkenberg, J. (eds) Advancing OpenMP for Future Accelerators. IWOMP 2024. Lecture Notes in Computer Science, vol 15195. Springer, Cham. https://doi.org/10.1007/978-3-031-72567-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-72567-8_2
Published: 16 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72566-1
Online ISBN: 978-3-031-72567-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Performance Porting the ExaStar Multi-Physics App Thornado On Heterogeneous Systems - A Fortran-OpenMP Code-Base Evaluation