Abstract
To accelerate multiphysics applications, making use of not only GPUs but also FPGAs has been emerging. Multiphysics applications are simulations involving multiple physical models and multiple simultaneous physical phenomena. Operations with different performance characteristics appear in the simulation, making the acceleration of simulation speed using only GPUs difficult. Therefore, we aim to improve the overall performance of the application by using FPGAs to accelerate operations with characteristics which cause lower GPU efficiency. However, the application is currently implemented through multilingual programming, where the computation kernel running on the GPU is written in CUDA and the computation kernel running on the FPGA is written in OpenCL. This method imposes a heavy burden on programmers; therefore, we are currently working on a programming environment that enables to use both accelerators in a GPU–FPGA equipped high-performance computing (HPC) cluster system with OpenACC. To this end, we port the entire code only with OpenACC from the CUDA-OpenCL mixture. On this basis, this study quantitatively investigates the performance of the OpenACC GPU implementation compared to the CUDA implementation for ARGOT, a radiative transfer simulation code for fundamental astrophysics which is a multiphysics application. We observe that the OpenACC implementation achieves performance and scalability comparable to the CUDA implementation on the Cygnus supercomputer equipped with NVIDIA V100 GPUs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boku, T., Fujita, N., Kobayashi, R., Tatebe, O.: Cygnus - world first multihybrid accelerated cluster with GPU and FPGA coupling. In: Workshop Proceedings of the 51st International Conference on Parallel Processing, ICPP Workshops ’22. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3547276.3548629
Fujita, N., et al.: OpenCL-enabled parallel raytracing for astrophysical application on multiple FPGAs with optical links. In: 2020 IEEE/ACM International Workshop on Heterogeneous High-Performance Reconfigurable Computing (H2RC), pp. 48–55 (2020). https://doi.org/10.1109/H2RC51942.2020.00011
Gorski, K.M., et al.: HEALPix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. Astrophys. J. 622(2), 759–771 (2005). https://doi.org/10.1086/427976
Hoshino, T., Maruyama, N., Matsuoka, S., Takaki, R.: CUDA vs OpenACC: performance case studies with kernel benchmarks and a memory-bound CFD application. In: 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, pp. 136–143 (2013). https://doi.org/10.1109/CCGrid.2013.12
Kobayashi, R., et al.: Multi-hybrid accelerated simulation by GPU and FPGA on radiative transfer simulation in astrophysics. J. Inf. Process. 28, 1073–1089 (2020). https://doi.org/10.2197/ipsjjip.28.1073
Lee, S., Kim, J., Vetter, J.S.: OpenACC to FPGA: a framework for directive-based high-performance reconfigurable computing. In: 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 544–554 (2016). https://doi.org/10.1109/IPDPS.2016.28
Li, X., Shih, P.C.: Performance comparison of CUDA and OpenACC based on optimizations. In: Proceedings of the 2018 2nd High Performance Computing and Cluster Technologies Conference, HPCCT 2018, pp. 53–57. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3234664.3234681
Memeti, S., Li, L., Pllana, S., Kołodziej, J., Kessler, C.: Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption. In: Proceedings of the 2017 Workshop on Adaptive Resource Management and Scheduling for Cloud Computing, ARMS-CC 2017, pp. 1–6. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3110355.3110356
Okamoto, T., Yoshikawa, K., Umemura, M.: ARGOT: accelerated radiative transfer on grids using oct-tree. Monthly Not. R. Astron. Soc. 419(4), 2855–2866 (2012). https://doi.org/10.1111/j.1365-2966.2011.19927.x
Tanaka, S., Yoshikawa, K., Okamoto, T., Hasegawa, K.: A new ray-tracing scheme for 3D diffuse radiation transfer on highly parallel architectures. Publ. Astron. Soc. Jpn. 67(4), 62 (2015). https://doi.org/10.1093/pasj/psv027
Tsunashima, R., et al.: OpenACC unified programming environment for GPU and FPGA multi-hybrid acceleration. In: 13th International Symposium on High-level Parallel Programming and Applications (HLPP) (2020)
Acknowledgements
This work used computational resources of TSUBAME3.0 provided by Tokyo Institute of Technology through the HPCI System Research Project (Project ID: hp190099). This work was supported by JSPS KAKENHI (Grant Number 21H04869). The Cygnus utilization is supported by the MCRP 2022 Program by the Center for Computational Sciences, University of Tsukuba. We also thank Dr. Naruhiko Tan of NVIDIA for his advice on OpenACC optimization.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kobayashi, R. et al. (2023). Accelerating Radiative Transfer Simulation on NVIDIA GPUs with OpenACC. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-031-29927-8_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)