Abstract
Atmospheric radiation is one of the most important atmospheric physics, and its expensive computation cost severely restricts the numerical simulation of atmospheric general circulation models. Therefore, it is necessary to study an efficient radiation parameterization scheme. Due to the powerful computing power of GPU, more and more numerical models are being transplanted to GPU. The CUDA C version (CC-RRTMG_SW) of the rapid radiative transfer model for general circulation models (RRTMG) shortwave radiation scheme (RRTMG_SW) has successfully run on GPU, but its computing efficiency is not yet very high, and the performance potential of GPU computing needs to be realized further. This paper is dedicated to optimizing CC-RRTMG_SW and exploring its maximum computing performance on GPU. First, a three-dimensional acceleration algorithm for CC-RRTMG_SW is proposed. Then, some optimization methods, such as decoupling data dependency, optimizing memory access, and I/O optimization, are studied. Finally, the optimized version of CC-RRTMG_SW is developed, namely CC-RRTMG_SW++. The experimental results demonstrate that the proposed acceleration algorithm and performance optimization methods are effective. CC-RRTMG_SW++ achieved good acceleration effects on different GPU architectures, such as NVIDIA Tesla K20, K40, and V100. Compared to RRTMG_SW running on a single Intel Xeon E5-2680 v2 CPU core, CC-RRTMG_SW++ obtained a speedup of 99.09\(\times\) on a single V100 GPU without I/O transfer. Compared to CC-RRTMG_SW, the computing efficiency of CC-RRTMG_SW++ increased by 174.46%.










Similar content being viewed by others
References
Javadinejad S, Eslamian S, Ostad-Ali-Askari K (2021) The analysis of the most important climatic parameters affecting performance of crop variability in a changing climate. Int J Hydrol Sci Technol 11(1):1–25
Mielikainen J, Price E, Huang B, Huang HLA, Lee T (2015) GPU compute unified device architecture (CUDA)-based parallelization of the RRTMG shortwave rapid radiative transfer model. IEEE J Selected Topics Appl Earth Observ Remote Sens 9(2):921–931
Michalakes J, Vachharajani M (2008) GPU acceleration of numerical weather prediction. Parallel Process Lett 18(04):531–548
Clough S, Shephard M, Mlawer E, Delamere J, Iacono M, Cady-Pereira K, Boukabara S, Brown P (2005) Atmospheric radiative transfer modeling: a summary of the AER codes. J Quantit Spectroscopy Radiative Transf 91(2):233–244
Mlawer EJ, Taubman SJ, Brown PD, Iacono MJ, Clough SA (1997) Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave. J Geophys Res: Atmos 102(D14):16663–16682
Iacono MJ, Delamere JS, Mlawer EJ, Shephard MW, Clough SA, Collins WD (2008) Radiative forcing by long-lived greenhouse gases: calculations with the AER radiative transfer models. J Geophys Res: Atmos 113:13
Pervin L, Gan TY (2021) Sensitivity of physical parameterization schemes in WRF model for dynamic downscaling of climatic variables over the MRB. J Water Clim Change 12(4):1043–1058
Bae SY, Hong SY, Lim KSS (2016) Coupling WRF double-moment 6-class microphysics schemes to RRTMG radiation scheme in weather research forecasting model. Adv Meteorol 2016:84
Zhang H, Zhang M, Zeng QC (2013) Sensitivity of simulated climate to two atmospheric models: interpretation of differences between dry models and moist models. Monthly Weather Rev 141(5):1558–1576
Wang Y, Yan X, Zhang J (2021) Research on GPU parallel algorithm for direct numerical solution of two-dimensional compressible flows. J Supercomput 77(10):10921–10941
Ramon D, Steinmetz F, Jolivet D, Compiègne M, Frouin R (2019) Modeling polarized radiative transfer in the ocean-atmosphere system with the GPU-accelerated SMART-G Monte Carlo code. J Quantit Spectroscopy Radiative Transf 222:89–107
Kelly R (2010) GPU computing for atmospheric modeling. Comput Sci Eng 12(4):26–33
Wang Y, Zhao Y, Li W, Jiang J, Ji X, Zomaya AY (2019) Using a GPU to accelerate a longwave radiative transfer model with efficient CUDA-based methods. Appl Sci 9(19):4039
Wang Z, Wang Y, Wang X, Li F, Zhou C, Hu H, Jiang J (2021) GPU-RRTMG_SW: Accelerating a Shortwave Radiative Transfer Scheme on GPU. IEEE Access 25:6681
Ghorpade, J., Parande, J., Kulkarni, M., Bawaskar, A.: GPGPU processing in CUDA architecture. http://arxiv.org/abs/1202.4347 (2012)
Huang M, Huang B, Chang YL, Mielikainen J, Huang HLA, Goldberg MD (2015) Efficient parallel GPU design on WRF five-layer thermal diffusion scheme. IEEE J Selected Topics Appl Earth Observ Remote Sens 8(5):2249–2259
Leutwyler D, Fuhrer O, Lapillonne X, Lüthi D, Schär C (2016) Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19. Geosci Model Develop 9(9):3393–3412
Mielikainen J, Huang B, Huang HL, Goldberg M, Mehta A (2013) Speeding up the computation of WRF double-moment 6-class microphysics scheme with GPU. J Atmos Oceanic Technol 30(12):2896–2906
Cao, H., Yuan, L., Zhang, H., Zhang, Y.: AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format. http://arxiv.org/abs/2103.10114 (2021)
Lu, F., Cao, X., Song, J., Zhu, X.: GPU computing for longwave radiation physics: A RRTM_LW scheme case study. In: 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications Workshops, pp. 71–76. IEEE (2011)
Mielikainen J, Huang B, Huang HLA, Goldberg MD (2012) GPU acceleration of the updated goddard shortwave radiation scheme in the weather research and forecasting (WRF) model. IEEE J Selected Topics Appl Earth Observ Remote Sens 5(2):555–562
Price E, Mielikainen J, Huang M, Huang B, Huang HLA, Lee T (2014) GPU-accelerated longwave radiation scheme of the rapid radiative transfer model for general circulation models (RRTMG). IEEE J Selected Topics Appl Earth Observ Remote Sens 7(8):3660–3667
Shi, G.Y.: On the k-distribution and correlated k-distribution models in the atmospheric radiation calculations. Scientia Atmospherica Sinica (Special Issue Dedicated to the 70 \(<\) th\(>\) Anniversary of the Founding of the Institute of Atmospheric Physics, Chinese Academy of Sciences) 22(4), 555–576 (1998)
Wang Y, Zhao Y, Jiang J, Zhang H (2020) A novel GPU-based acceleration algorithm for a longwave radiative transfer model. Appl Sci 10(2):649
Li, X., Ye, H., Zhang, J.: Redesigning Peridigm on SIMT accelerators for High-performance Peridynamics Simulations. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 433–443. IEEE (2021)
Xu J, Fu H, Luk W, Gan L, Shi W, Xue W, Yang C, Jiang Y, He C, Yang G (2019) Optimizing finite volume method solvers on NVIDIA GPUs. IEEE Trans Parallel Distrib Syst 30(12):2790–2805
Fu, H., Xu, J., Gan, L., Yang, C., Xue, W., Zhao, W., Shi, W., Wang, X., Yang, G.: Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver. In: 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 41–49. IEEE (2016)
Yang C, Xue W, Fu H, Gan L, Li L, Xu Y, Lu Y, Sun J, Yang G, Zheng W (2013) A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM SIGPLAN Notices 48(8):1–12
Ashcraft MB, Lemon A, Penry DA, Snell Q (2019) Compiler optimization of accelerator data transfers. Int J Parallel Program 47(1):39–58
Wang Y, Guo M, Zhao Y, Jiang J (2021) GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs. J Supercomput 77(5):4698–4717
Farhatuaini, L., Pulungan, R.: Parallelization of Uniformization Algorithm with CUDA-Aware MPI. In: 2019 7th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2019)
Jia, W., Wang, H., Chen, M., Lu, D., Lin, L., Car, R., Weinan, E., Zhang, L.: Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2020)
Váňa F, Düben P, Lang S, Palmer T, Leutbecher M, Salmond D, Carver G (2017) Single precision in weather forecasting models: an evaluation with the IFS. Monthly Weather Rev 145(2):495–502
Thornes T, Düben P, Palmer T (2017) On the use of scale-dependent precision in Earth system modelling. Q J R Meteorol Soc 143(703):897–908
Klöwer M, Düben P, Palmer T (2020) Number formats, error mitigation, and scope for 16-bit arithmetics in weather and climate modeling analyzed with a shallow water model. J Adv Model Earth Syst 12(10):246
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 41931183, in part by the National Key Scientific and Technological Infrastructure project “Earth System Science Numerical Simulator Facility” (EarthLab), and in part by the GHFUND A under Grant ghfund202107013661.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Code availability
The code generated and analyzed during this study is available in the Github repository: https://github.com/guirenbenxin/Heterogeneous-RRTMG_SW.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, F., Wang, Y., Wang, Z. et al. CC-RRTMG_SW++: Further optimizing a shortwave radiative transfer scheme on GPU. J Supercomput 78, 17378–17402 (2022). https://doi.org/10.1007/s11227-022-04566-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04566-5