Abstract
Atmospheric radiation physical process plays an important role in climate simulations. As a radiative transfer scheme, the rapid radiative transfer model for general circulation models (RRTMG) is widely used in weather forecasting and climate simulation systems. However, its expensive computational overhead poses a severe challenge to system performance. Therefore, improving the radiative transfer model’s computational performance has significant scientific research and practical value. Numerous radiative transfer models have benefited from a widely used and powerful GPU. Nevertheless, few of them have exploited CPU/GPU cluster resources within heterogeneous high-performance computing platforms. In this paper, we endeavor to demonstrate an approach that runs a large-scale, computationally intensive, longwave radiative transfer model on a GPU cluster. First, a CUDA-based acceleration algorithm of the RRTMG longwave radiation scheme (RRTMG_LW) on multiple GPUs is proposed. Then, a heterogeneous, hybrid programming paradigm (MPI+CUDA) is presented and utilized with the RRTMG_LW on a GPU cluster. After implementing the algorithm in CUDA Fortran, a multi-GPU version of the RRTMG_LW, namely GPUs-RRTMG_LW, was developed. The experimental results demonstrate that the multi-GPU acceleration algorithm is valid, scalable, and highly efficient when compared to a single GPU or CPU. Running the GPUs-RRTMG_LW on a K20 cluster achieved a \(77.78 \times\) speedup when compared to a single Intel Xeon E5-2680 CPU core.
Similar content being viewed by others
References
Xue W, Yang C, Fu H et al (2015) Ultra-scalable CPU-MIC acceleration of mesoscale atmospheric modeling on tianhe-2. IEEE Trans Comput 64(8):2382–2393
Wang Y, Jiang J, Zhang J et al (2018) An efficient parallel algorithm for the coupling of global climate models and regional climate models on a large-scale multi-core cluster. J Supercomput 74(8):3999–4018
Lu F, Cao X, Song J, et al (2011) GPU computing for longwave radiation physics: a RRTM_LW scheme case study. In: IEEE 9th international symposium on parallel and distributed processing with applications workshops (ISPAW), pp 71–76
Clough SA, Iacono MJ, Moncet JL (1992) Line-by-line calculations of atmospheric fluxes and cooling rates: application to water vapor. J Geophys Res Atmos 97(D14):15761–15785
Clough SA, Iacono MJ (1995) Line-by-line calculation of atmospheric fluxes and cooling rates II: application to carbon dioxide, ozone, methane, nitrous oxide and the halocarbons. J Geophys Res Atmos 100(D8):16519–16535
Mlawer EJ, Taubman SJ, Brown PD et al (1997) Radiative transfer for inhomogeneous atmospheres: RRTM, a validated correlated-k model for the longwave. J Geophys Res Atmos 102(D14):16663–16682
Iacono MJ, Mlawer EJ, Clough SA et al (2000) Impact of an improved longwave radiation model, RRTM, on the energy budget and thermodynamic properties of the NCAR community climate model, CCM3. J Geophys Res Atmos 105(D11):14873–14890
Iacono MJ, Delamere JS, Mlawer EJ et al (2008) Radiative forcing by long-lived greenhouse gases: calculations with the AER radiative transfer models. J Geophys Res Atmos 113(D13)
Zheng F, Xu X, Xiang D et al (2013) GPU-based parallel researches on RRTM module of GRAPES numerical prediction system. J Comput 8(3):550–558
Iacono MJ (2015) Enhancing cloud radiative processes and radiation efficiency in the advanced research weather research and forecasting (WRF) model. Atmospheric and Environmental Research, Lexington
Morcrette JJ, Mozdzynski G, Leutbecher M (2008) A reduced radiation grid for the ECMWF integrated forecasting system. Mon Weather Rev 136(12):4760–4772
Dong X, Su T, Wang J et al (2014) Decadal variation of the Aleutian low-icelandic low seesaw simulated by a climate system model (CAS-ESM-C). Atmos Ocean Sci Lett 7(2):110–114
Wang Y, Jiang J, Ye H et al (2016) A distributed load balancing algorithm for climate big data processing over a multi-core CPU cluster. Concurr Comput Pract Exp 28(15):4144–4160
Wang Y, Hao H, Zhang J et al (2019) Performance optimization and evaluation for parallel processing of big data in earth system models. Cluster Comput 22:2371–2381
Zhang H, Zhang M, Zeng Q (2013) Sensitivity of simulated climate to two atmospheric models: interpretation of differences between dry models and moist models. Mon Weather Rev 141(5):1558–1576
Wang Y, Jiang J, Zhang H et al (2017) A scalable parallel algorithm for atmospheric general circulation models on a multi-core cluster. Future Gener Comput Syst 72:1–10
Nickolls J, Dally WJ (2010) The GPU computing era. IEEE Micro 30(2):56–69
Deng Z, Chen D, Hu Y et al (2012) Massively parallel non-stationary EEG data processing on GPGPU platforms with Morlet continuous wavelet transform. J Internet Serv Appl 3(3):347–357
Chen D, Wang L, Tian M et al (2013) Massively parallel modelling & simulation of large crowd with GPGPU. J Supercomput 63(3):675–690
Chen D, Li X, Wang L et al (2015) Fast and scalable multi-way analysis of massive neural data. IEEE Trans Comput 64(3):707–719
Candel F, Petit S, Sahuquillo J et al (2018) Accurately modeling the on-chip and off-chip GPU memory subsystem. Future Gener Comput Syst 82:510–519
Norman M, Larkin J, Vose A et al (2015) A case study of CUDA FORTRAN and OpenACC for an atmospheric climate kernel. J Comput Sci 9:1–6
Schalkwijk J, Jonker HJ, Siebesma AP et al (2015) Weather forecasting using GPU-based large-eddy simulations. Bull Am Meteorol Soc 96(5):715–723
Ruetsch G, Phillips E, Fatica M (2010) GPU acceleration of the long-wave rapid radiative transfer model in WRF using CUDA Fortran. In: Many–Core and reconfigurable supercomputing conference
Michalakes J, Vachharajani M (2008) GPU acceleration of numerical weather prediction. Parallel Process Lett 18(04):531–548
Wang Y, Zhao Y, Li W et al (2019) Using a GPU to accelerate a longwave radiative transfer model with efficient CUDA-based methods. Appl Sci 9(19):4039
Wang Y, Zhao Y, Jiang J et al (2020) A novel GPU-based acceleration algorithm for a longwave radiative transfer model. Appl Sci 10(2):649
Price E, Mielikainen J, Huang M et al (2014) GPU-accelerated longwave radiation scheme of the rapid radiative transfer model for general circulation models (RRTMG). IEEE J Sel Topics Appl Earth Obs Remote Sens 7(8):3660–3667
NVIDIA, CUDA C Programming Guide v10.0, Technical Document (2018). Available:https://docs.nvidia.com/pdf/CUDA_C_Programming_Guide.pdf
Mielikainen J, Price E, Huang B et al (2016) GPU compute unified device architecture (CUDA)-based parallelization of the RRTMG shortwave rapid radiative transfer model. IEEE J Sel Topics Appl Earth Obs Remote Sens 9(2):921–931
Huang M, Huang B, Chang YL et al (2015) Efficient parallel GPU design on WRF five-layer thermal diffusion scheme. IEEE J Sel Topics Appl Earth Obs Remote Sens 8(5):2249–2259
Huang M, Huang B, Gu L et al (2015) Parallel GPU architecture framework for the WRF Single Moment 6-class microphysics scheme. Comput Geosci 83:17–26
Xiao H, Sun J, Bian X et al (2013) GPU acceleration of the WSM6 cloud microphysics scheme in GRAPES model. Comput Geosci 59:156–162
Mielikainen J, Huang B, Huang HLA et al (2012) GPU acceleration of the updated Goddard shortwave radiation scheme in the weather research and forecasting (WRF) model. IEEE J Sel Topics Appl Earth Obs Remote Sens 5(2):555–562
Mielikainen J, Huang B, Huang HLA et al (2015) Performance and scalability of the jcsda community radiative transfer model (crtm) on nvidia gpus. IEEE J Sel Topics Appl Earth Obs Remote Sens 8(4):1519–1527
Mielikainen J, Huang B, Wang J et al (2013) Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme. Comput Geosci 52:292–299
Mielikainen J, Huang B, Huang HLA et al (2012) Improved GPU/CUDA based parallel weather and research forecast (WRF) single moment 5-class (WSM5) cloud microphysics. IEEE J Sel Topics Appl Earth Obs Remote Sens 5(4):1256–1265
Solano-Quinde L, Gualan-Saavedra R, Zuiga-Prieto M (2016) Multi-GPU implementation of the Horizontal diffusion method of the weather research and forecast model. In: ACM proceedings of the 7th international workshop on programming models and applications for multicores and Manycores, pp 98–103
Lu F, Song J, Cao X et al (2012) CPU/GPU computing for long-wave radiation physics on large GPU clusters. Comput Geosci 41:47–55
Lu F, Song J, Yin F et al (2012) Performance evaluation of hybrid programming patterns for large CPU/GPU heterogeneous clusters. Comput Phys Commun 183(6):1172–1181
Iacono MJ, Delamere JS, Mlawer EJ et al (2003) Evaluation of upper tropospheric water vapor in the NCAR Community Climate Model (CCM3) using modeled and observed HIRS radiances. J Geophys Res Atmos 108(D2):ACL-1
Morcrette JJ, Barker HW, Cole JNS et al (2008) Impact of a new radiation package, McRad, in the ECMWF integrated forecasting system. Mon Weather Rev 136(12):4773–4798
Clough SA, Shephard MW, Mlawer EJ et al (2005) Atmospheric radiative transfer modeling: a summary of the AER codes. J Quant Spectrosc Radiat Transf 91(2):233–244
Mlawer EJ, Iacono MJ, Pincus R et al (2016) Contributions of the ARM program to radiative transfer modeling for climate and weather applications. AMS Meteorol Monogr 57:15.1–15.19
Chen D, Li D, Xiong M et al (2010) GPGPU-aided ensemble empirical-mode decomposition for EEG analysis during anesthesia. IEEE Trans Inf Technol Biomed 14(6):1417–1427
Acknowledgements
We would like to acknowledge the contributions of Prof. Minghua Zhang for insightful suggestions on algorithm design. This work was supported in part by the National Key Research and Development Program of China under Grant 2016YFB0200800, in part by the National Natural Science Foundation of China under Grant 61602477 and 41931183, and in part by the National Key Scientific and Technological Infrastructure project “Earth System Science Numerical Simulator Facility” (EarthLab).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, Y., Guo, M., Zhao, Y. et al. GPUs-RRTMG_LW: high-efficient and scalable computing for a longwave radiative transfer model on multiple GPUs. J Supercomput 77, 4698–4717 (2021). https://doi.org/10.1007/s11227-020-03451-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03451-3