Skip to main content

Advertisement

Log in

CC-RRTMG_SW++: Further optimizing a shortwave radiative transfer scheme on GPU

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Atmospheric radiation is one of the most important atmospheric physics, and its expensive computation cost severely restricts the numerical simulation of atmospheric general circulation models. Therefore, it is necessary to study an efficient radiation parameterization scheme. Due to the powerful computing power of GPU, more and more numerical models are being transplanted to GPU. The CUDA C version (CC-RRTMG_SW) of the rapid radiative transfer model for general circulation models (RRTMG) shortwave radiation scheme (RRTMG_SW) has successfully run on GPU, but its computing efficiency is not yet very high, and the performance potential of GPU computing needs to be realized further. This paper is dedicated to optimizing CC-RRTMG_SW and exploring its maximum computing performance on GPU. First, a three-dimensional acceleration algorithm for CC-RRTMG_SW is proposed. Then, some optimization methods, such as decoupling data dependency, optimizing memory access, and I/O optimization, are studied. Finally, the optimized version of CC-RRTMG_SW is developed, namely CC-RRTMG_SW++. The experimental results demonstrate that the proposed acceleration algorithm and performance optimization methods are effective. CC-RRTMG_SW++ achieved good acceleration effects on different GPU architectures, such as NVIDIA Tesla K20, K40, and V100. Compared to RRTMG_SW running on a single Intel Xeon E5-2680 v2 CPU core, CC-RRTMG_SW++ obtained a speedup of 99.09\(\times\) on a single V100 GPU without I/O transfer. Compared to CC-RRTMG_SW, the computing efficiency of CC-RRTMG_SW++ increased by 174.46%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Javadinejad S, Eslamian S, Ostad-Ali-Askari K (2021) The analysis of the most important climatic parameters affecting performance of crop variability in a changing climate. Int J Hydrol Sci Technol 11(1):1–25

    Article  Google Scholar 

  2. Mielikainen J, Price E, Huang B, Huang HLA, Lee T (2015) GPU compute unified device architecture (CUDA)-based parallelization of the RRTMG shortwave rapid radiative transfer model. IEEE J Selected Topics Appl Earth Observ Remote Sens 9(2):921–931

    Article  Google Scholar 

  3. Michalakes J, Vachharajani M (2008) GPU acceleration of numerical weather prediction. Parallel Process Lett 18(04):531–548

    Article  MathSciNet  Google Scholar 

  4. Clough S, Shephard M, Mlawer E, Delamere J, Iacono M, Cady-Pereira K, Boukabara S, Brown P (2005) Atmospheric radiative transfer modeling: a summary of the AER codes. J Quantit Spectroscopy Radiative Transf 91(2):233–244

    Article  Google Scholar 

  5. Mlawer EJ, Taubman SJ, Brown PD, Iacono MJ, Clough SA (1997) Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave. J Geophys Res: Atmos 102(D14):16663–16682

    Article  Google Scholar 

  6. Iacono MJ, Delamere JS, Mlawer EJ, Shephard MW, Clough SA, Collins WD (2008) Radiative forcing by long-lived greenhouse gases: calculations with the AER radiative transfer models. J Geophys Res: Atmos 113:13

    Google Scholar 

  7. Pervin L, Gan TY (2021) Sensitivity of physical parameterization schemes in WRF model for dynamic downscaling of climatic variables over the MRB. J Water Clim Change 12(4):1043–1058

    Article  Google Scholar 

  8. Bae SY, Hong SY, Lim KSS (2016) Coupling WRF double-moment 6-class microphysics schemes to RRTMG radiation scheme in weather research forecasting model. Adv Meteorol 2016:84

    Article  Google Scholar 

  9. Zhang H, Zhang M, Zeng QC (2013) Sensitivity of simulated climate to two atmospheric models: interpretation of differences between dry models and moist models. Monthly Weather Rev 141(5):1558–1576

    Article  Google Scholar 

  10. Wang Y, Yan X, Zhang J (2021) Research on GPU parallel algorithm for direct numerical solution of two-dimensional compressible flows. J Supercomput 77(10):10921–10941

    Article  Google Scholar 

  11. Ramon D, Steinmetz F, Jolivet D, Compiègne M, Frouin R (2019) Modeling polarized radiative transfer in the ocean-atmosphere system with the GPU-accelerated SMART-G Monte Carlo code. J Quantit Spectroscopy Radiative Transf 222:89–107

    Article  Google Scholar 

  12. Kelly R (2010) GPU computing for atmospheric modeling. Comput Sci Eng 12(4):26–33

    Article  Google Scholar 

  13. Wang Y, Zhao Y, Li W, Jiang J, Ji X, Zomaya AY (2019) Using a GPU to accelerate a longwave radiative transfer model with efficient CUDA-based methods. Appl Sci 9(19):4039

    Article  Google Scholar 

  14. Wang Z, Wang Y, Wang X, Li F, Zhou C, Hu H, Jiang J (2021) GPU-RRTMG_SW: Accelerating a Shortwave Radiative Transfer Scheme on GPU. IEEE Access 25:6681

    Google Scholar 

  15. Ghorpade, J., Parande, J., Kulkarni, M., Bawaskar, A.: GPGPU processing in CUDA architecture. http://arxiv.org/abs/1202.4347 (2012)

  16. Huang M, Huang B, Chang YL, Mielikainen J, Huang HLA, Goldberg MD (2015) Efficient parallel GPU design on WRF five-layer thermal diffusion scheme. IEEE J Selected Topics Appl Earth Observ Remote Sens 8(5):2249–2259

    Article  Google Scholar 

  17. Leutwyler D, Fuhrer O, Lapillonne X, Lüthi D, Schär C (2016) Towards European-scale convection-resolving climate simulations with GPUs: a study with COSMO 4.19. Geosci Model Develop 9(9):3393–3412

    Article  Google Scholar 

  18. Mielikainen J, Huang B, Huang HL, Goldberg M, Mehta A (2013) Speeding up the computation of WRF double-moment 6-class microphysics scheme with GPU. J Atmos Oceanic Technol 30(12):2896–2906

    Article  Google Scholar 

  19. Cao, H., Yuan, L., Zhang, H., Zhang, Y.: AGCM-3DLF: Accelerating Atmospheric General Circulation Model via 3D Parallelization and Leap-Format. http://arxiv.org/abs/2103.10114 (2021)

  20. Lu, F., Cao, X., Song, J., Zhu, X.: GPU computing for longwave radiation physics: A RRTM_LW scheme case study. In: 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications Workshops, pp. 71–76. IEEE (2011)

  21. Mielikainen J, Huang B, Huang HLA, Goldberg MD (2012) GPU acceleration of the updated goddard shortwave radiation scheme in the weather research and forecasting (WRF) model. IEEE J Selected Topics Appl Earth Observ Remote Sens 5(2):555–562

    Article  Google Scholar 

  22. Price E, Mielikainen J, Huang M, Huang B, Huang HLA, Lee T (2014) GPU-accelerated longwave radiation scheme of the rapid radiative transfer model for general circulation models (RRTMG). IEEE J Selected Topics Appl Earth Observ Remote Sens 7(8):3660–3667

    Article  Google Scholar 

  23. Shi, G.Y.: On the k-distribution and correlated k-distribution models in the atmospheric radiation calculations. Scientia Atmospherica Sinica (Special Issue Dedicated to the 70 \(<\) th\(>\) Anniversary of the Founding of the Institute of Atmospheric Physics, Chinese Academy of Sciences) 22(4), 555–576 (1998)

  24. Wang Y, Zhao Y, Jiang J, Zhang H (2020) A novel GPU-based acceleration algorithm for a longwave radiative transfer model. Appl Sci 10(2):649

    Article  Google Scholar 

  25. Li, X., Ye, H., Zhang, J.: Redesigning Peridigm on SIMT accelerators for High-performance Peridynamics Simulations. In: 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 433–443. IEEE (2021)

  26. Xu J, Fu H, Luk W, Gan L, Shi W, Xue W, Yang C, Jiang Y, He C, Yang G (2019) Optimizing finite volume method solvers on NVIDIA GPUs. IEEE Trans Parallel Distrib Syst 30(12):2790–2805

    Article  Google Scholar 

  27. Fu, H., Xu, J., Gan, L., Yang, C., Xue, W., Zhao, W., Shi, W., Wang, X., Yang, G.: Unleashing the performance potential of CPU-GPU platforms for the 3D atmospheric Euler solver. In: 2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 41–49. IEEE (2016)

  28. Yang C, Xue W, Fu H, Gan L, Li L, Xu Y, Lu Y, Sun J, Yang G, Zheng W (2013) A peta-scalable CPU-GPU algorithm for global atmospheric simulations. ACM SIGPLAN Notices 48(8):1–12

    Article  Google Scholar 

  29. Ashcraft MB, Lemon A, Penry DA, Snell Q (2019) Compiler optimization of accelerator data transfers. Int J Parallel Program 47(1):39–58

    Article  Google Scholar 

  30. Wang Y, Guo M, Zhao Y, Jiang J (2021) GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs. J Supercomput 77(5):4698–4717

    Article  Google Scholar 

  31. Farhatuaini, L., Pulungan, R.: Parallelization of Uniformization Algorithm with CUDA-Aware MPI. In: 2019 7th International Conference on Information and Communication Technology (ICoICT), pp. 1–6. IEEE (2019)

  32. Jia, W., Wang, H., Chen, M., Lu, D., Lin, L., Car, R., Weinan, E., Zhang, L.: Pushing the limit of molecular dynamics with ab initio accuracy to 100 million atoms with machine learning. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14. IEEE (2020)

  33. Váňa F, Düben P, Lang S, Palmer T, Leutbecher M, Salmond D, Carver G (2017) Single precision in weather forecasting models: an evaluation with the IFS. Monthly Weather Rev 145(2):495–502

    Article  Google Scholar 

  34. Thornes T, Düben P, Palmer T (2017) On the use of scale-dependent precision in Earth system modelling. Q J R Meteorol Soc 143(703):897–908

    Article  Google Scholar 

  35. Klöwer M, Düben P, Palmer T (2020) Number formats, error mitigation, and scope for 16-bit arithmetics in weather and climate modeling analyzed with a shallow water model. J Adv Model Earth Syst 12(10):246

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 41931183, in part by the National Key Scientific and Technological Infrastructure project “Earth System Science Numerical Simulator Facility” (EarthLab), and in part by the GHFUND A under Grant ghfund202107013661.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yuzhu Wang or He Zhang.

Ethics declarations

Code availability

The code generated and analyzed during this study is available in the Github repository: https://github.com/guirenbenxin/Heterogeneous-RRTMG_SW.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, F., Wang, Y., Wang, Z. et al. CC-RRTMG_SW++: Further optimizing a shortwave radiative transfer scheme on GPU. J Supercomput 78, 17378–17402 (2022). https://doi.org/10.1007/s11227-022-04566-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04566-5

Keywords

Navigation