Skip to main content
Log in

Development of a hybrid parallel MCV-based high-order global shallow-water model

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Utilization of high-order spatial discretizations is an important trend in developing global atmospheric models. As a competitive choice, the multi-moment constrained volume (MCV) method can achieve high accuracy while maintaining similar parallel scalability to classical finite volume methods. In this work, we introduce the development of a hybrid parallel MCV-based global shallow-water model on the cubed-sphere grid. Based on a sequential code, we perform parallelization on both the process and the thread levels. To enable process-level parallelism, we first decompose the six patches of the cubed-sphere in a same 2-D partition and then employ a conflict-free pipe-flow communication scheme for overlapping the halo exchange with computations. To further exploit the heterogeneous computing capacity of an Intel Xeon Phi accelerated supercomputer, we propose a guided panel-based inner–outer partition to distribute workload among the CPUs and the coprocessors. In addition to the above, thread-level parallelism along with various optimizations is done on both the multi-core CPU and the many-core accelerator. Numerical experiments are carried out to validate the correctness of the optimized parallel code and examine its parallel performance. Test results show that both the CPU-only and the hybrid codes scale well to hundreds of processes in terms of both the strong and weak scaling. In particular, the hybrid code can achieve a speedup of \(2.56\times \) as compared to the CPU-only version. In the largest run on a \(9216\,\times \,9216\,\times \,6\) mesh (1.5 billion unknowns), the hybrid code sustains an aggregative performance of 26.5 Tflops with 486 processes (33,534 cores).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Arabnia HR (1990) A parallel algorithm for the arbitrary rotation of digitized images using process-and-data-decomposition approach. J Parallel Distrib Comput 10(2):188–192

    Article  Google Scholar 

  2. Arabnia HR, Bhandarkar SM (1996) Parallel stereocorrelation on a reconfigurable multi-ring network. J Supercomput 10(3):243–269

    Article  MATH  Google Scholar 

  3. Arabnia HR, Oliver MA (1986) Fast operations on raster images with SIMD machine architectures. Comput Graph Forum 5(3):179–188

    Article  Google Scholar 

  4. Arabnia HR, Oliver MA (1987) A transputer network for the arbitrary rotation of digitised images. Comput J 30(5):425–433

    Article  Google Scholar 

  5. Arabnia HR, Oliver MA (1989) A transputer network for fast operations on digitised images. In: Computer graphics forum, vol 8, p 312

  6. Bernaschi M, Bisson M, Endo T, Matsuoka S, Fatica M, Melchionna S (2011) Petaflop biofluidics simulations on a two million-core system. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, p 4

  7. Bhandarkar SM, Arabnia HR (1995) The hough transform on a reconfigurable multi-ring network. J Parallel Distrib Comput 24(1):107–114

    Article  Google Scholar 

  8. Bhandarkar SM, Arabnia HR (1995) The refine multiprocessor: theoretical properties and algorithms. Parallel Comput 21(11):1783–1805

    Article  Google Scholar 

  9. Bhandarkar SM, Arabnia HR, Smith JW (1995) A reconfigurable architecture for image processing and computer vision. Int J Pattern Recognit Artif Intell 9(2):201–229

    Article  Google Scholar 

  10. Carpenter I, Archibald RK, Evans KJ, Larkin J, Micikevicius P, Norman M, Rosinski J, Schwarzmeier J, Taylor MA (2013) Progress towards accelerating HOMME on hybrid multi-core systems. Int J High Perform Comput Appl 27(3):335–347

  11. Chen C, Li X, Shen X, Xiao F (2014) Global shallow water models based on multi-moment constrained finite volume method and three quasi-uniform spherical grids. J Comput Phys 271:191–223

    Article  MathSciNet  MATH  Google Scholar 

  12. Chen C, Xiao F (2008) Shallow water model on cubed-sphere by multi-moment finite volume method. J Comput Phys 227(10):5019–5044

    Article  MathSciNet  MATH  Google Scholar 

  13. Cockburn B, Karniadakis G, Shu C, Griebel M (2000) Discontinuous Galerkin methods theory, computation and applications. Lectures notes in computational science and engineering, vol 11. Inc. Marzo del

  14. Cumming B, Osuna C, Gysi T, Bianco M, Lapillonne X, Fuhrer O, Schulthess TC (2013) A review of the challenges and results of refactoring the community climate code cosmo for hybrid cray hpc systems. In: Proceedings of Cray User Group

  15. Demeshko I, Maruyama N, Tomita H, Matsuoka S (2012) Multi-GPU implementation of the NICAM atmospheric model. In: European Conference on Parallel Processing. Springer, pp 175–184

  16. Govett M, Middlecoff J, Henderson T (2014) Directive-based parallelization of the NIM weather model for GPUs. In: First Workshop on Accelerator Programming using Directives (WACCPD), pp 55–61. doi:10.1109/WACCPD.2014.9

  17. Hamada T, Narumi T, Yokota R, Yasuoka K, Nitadori K, Taiji M (2009) 42 Tflops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis. ACM, p 62

  18. Hamada T, Nitadori K (2010) 190 Tflops astrophysical N-body simulation on a cluster of GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, pp 1–9

  19. Hamilton K, Ohfuchi W (2007) High resolution numerical modelling of the atmosphere and ocean. Springer, Berlin

    Google Scholar 

  20. Huang M, Mielikainen J, Huang B, Chen H, Huang HLA, Goldberg MD (2015) Development of efficient GPU parallelization of WRF Yonsei University planetary boundary layer scheme. Geosci Model Dev 8(9):2977–2990

    Article  Google Scholar 

  21. Huynh H (2007) A flux reconstruction approach to high-order schemes including discontinuous Galerkin methods. AIAA paper 4079

  22. Ii S, Xiao F (2010) A global shallow water model using high order multi-moment constrained finite volume method and icosahedral grid. J Comput Phys 229(5):1774–1796

    Article  MathSciNet  MATH  Google Scholar 

  23. Jameson A, Schmidt W, Turkel E et al (1981) Numerical solutions of the Euler equations by finite volume methods using Runge-Kutta time-stepping schemes. AIAA paper 1259

  24. Li X, Chen C, Xiao F, Shen X (2015) A high-order multi-moment constrained finite-volume global shallow-water model on the Yin-Yang grid. Q J Royal Meteorol Soc 141(691):2090–2102

    Article  Google Scholar 

  25. Li X, Chen D, Peng X, Takahashi K, Xiao F (2008) A multimoment finite-volume shallow-water model on the Yin-Yang overset spherical grid. Mon Weather Rev 136(8):3066–3086

    Article  Google Scholar 

  26. Message Passing Interface Forum. http://www.mpi-forum.org/

  27. Michalakes J, Vachharajani M (2008) GPU acceleration of numerical weather prediction. Parallel Process Lett 18(04):531–548

    Article  MathSciNet  Google Scholar 

  28. Mielikainen J, Huang B, Huang H-LA (2014) Intel Many Integrated Core (MIC) architecture optimization strategies for a memory-bound Weather Research and Forecasting (WRF) Goddard microphysics scheme. Proc SPIE Int Soc Opt Eng 9247(4):1–9

  29. Mielikainen J, Huang B, Huang HA (2015) Optimizing total energy mass flux (TEMF) planetary boundary layer scheme for Intels many integrated core (MIC) architecture. IEEE J Sel Top Appl Earth Obs Remote Sens 8(8):4106–4119

    Article  Google Scholar 

  30. Mielikainen J, Huang B, Huang HL, Goldberg M, Mehta A (2013) Speeding up the computation of WRF double-moment 6-class microphysics scheme with GPU. J Atmos Ocean Technol 30(12):2896–2906

    Article  Google Scholar 

  31. Mielikainen J, Huang B, Huang HLA (2016) Optimizing Purdue-Lin Microphysics Scheme for Intel Xeon Phi Coprocessor. IEEE J Sel Top Appl Earth Obs Remote Sens 9(1):425–438

    Article  Google Scholar 

  32. Mielikainen J, Huang B, Huang HLA, Goldberg MD (2012) Improved GPU/CUDA based parallel weather and research forecast (WRF) single moment 5-class (WSM5) cloud microphysics. IEEE J Sel Top Appl Earth Obs Remote Sens 5(4):1256–1265

    Article  Google Scholar 

  33. Price E, Mielikainen J, Huang B, Huang H-LA, Lee T (2013) GPU acceleration experience with RRTMG long wave radiation model. Proc SPIE Int Soc Opt Eng 8895(H):1–12

  34. Mielikainen J, Huang B, Wang J, Huang HLA, Goldberg MD (2013) Compute unified device architecture (CUDA)-based parallelization of WRF Kessler cloud microphysics scheme. Comput Geosci 52:292–299

    Article  Google Scholar 

  35. Mielikainen J, Price E, Huang B, Huang HLA (2015) GPU compute unified device architecture (CUDA)-based parallelization of the RRTMG shortwave rapid radiative transfer model. IEEE J Sel Top Appl Earth Obs Remote Sens 9(2):1–11

  36. Miura H, Satoh M, Nasuno T, Noda AT, Oouchi K (2007) A Madden-Julian oscillation event realistically simulated by a global cloud-resolving model. Science 318(5857):1763–1765

    Article  Google Scholar 

  37. PAPI: performance application programming interface. http://icl.cs.utk.edu/papi/index.html

  38. Patera AT (1984) A spectral element method for fluid dynamics: laminar flow in a channel expansion. J Comput Phys 54(3):468–488

    Article  MATH  Google Scholar 

  39. Putman WM, Suarez M (2011) Cloud-system resolving simulations with the NASA Goddard Earth Observing System global atmospheric model (GEOS-5). Geophys Res Lett 38(16):239–255

  40. Ronchi C, Iacono R, Paolucci PS (1996) The cubed sphere: a new method for the solution of partial differential equations in spherical geometry. J Comput Phys 124(1):93–114

    Article  MathSciNet  MATH  Google Scholar 

  41. Sadourny R (1972) Conservative finite-difference approximations of the primitive equations on quasi-uniform spherical grids. Mon Weather Rev 100(2):136–144

    Article  Google Scholar 

  42. Shimokawabe T, Aoki T, Ishida J, Kawano K, Muroi C (2011) 145 TFlops performance on 3990 GPUs of TSUBAME 2.0 supercomputer for an operational weather prediction. Proced Comput Sci 4:1535–1544

    Article  Google Scholar 

  43. Shimokawabe T, Aoki T, Muroi C, Ishida J, Kawano K, Endo T, Nukada A, Maruyama N, Matsuoka S (2010) An 80-fold speedup, 15.0 TFlops full GPU acceleration of non-hydrostatic weather model ASUCA production code. In: 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE, pp 1–11

  44. Shimokawabe T, Aoki T, Takaki T, Yamanaka A, Nukada A, Endo T, Maruyama N, Matsuoka S (2011) Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC), IEEE, pp 1–11

  45. Vu VT, Cats G, Wolters L (2013) Graphics processing unit optimizations for the dynamics of the HIRLAM weather forecast model. Concurr Comput Pract Exp 25(10):1376–1393

    Article  Google Scholar 

  46. Wani MA, Arabnia HR (2003) Parallel edge-region-based segmentation algorithm targeted at reconfigurable multiring network. J Supercomput 25(1):43–62

    Article  MATH  Google Scholar 

  47. Williamson DL, Drake JB, Hack JJ, Jakob R, Swarztrauber PN (1992) A standard test set for numerical approximations to the shallow water equations in spherical geometry. J Comput Phys 102(1):211–224

    Article  MathSciNet  MATH  Google Scholar 

  48. Xu S, Huang X, Zhang Y, Hu Y, Fu H, Yang G (2014) Porting the Princeton ocean model to GPUs. In: Algorithms and Architectures for Parallel Processing. Springer, pp 1–14

  49. Xue W, Yang C, Fu H, Wang X, Xu Y, Gan L, Lu Y, Zhu X (2014) Enabling and scaling a global shallow-water atmospheric model on Tianhe-2. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Proceeding Symposium (IPDPS’14), pp 745–754

  50. Yang C, Xue W, Fu H, Gan L, Li L, Xu Y, Lu Y, Sun J, Yang G, Zheng W (2013) A Peta-scalable CPU-GPU Algorithm for Global Atmospheric Simulations. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13ACM, New York, pp 1–12

  51. Zhang P, Ao Y, Yang C, Liu Y, Liu F, Wu C, Zhao H (2015) Pattern-driven hybrid multi-and many-core acceleration in the MPAS shallow-water model. In: 2015 44th International Conference on Parallel Processing (ICPP), IEEE, pp 71–80

Download references

Acknowledgements

This work was supported in part by Natural Science Foundation of China (Grant# 91530323), National Key R&D Plan of China (Grant# 2016YFB0200600), National Key Technology R&D Program of China (Grant# 2012BAC22B01), and Chinese Academy of Sciences (Grant# QYZDB-SSWSYS006).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Yang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, P., Yang, C., Chen, C. et al. Development of a hybrid parallel MCV-based high-order global shallow-water model. J Supercomput 73, 2823–2842 (2017). https://doi.org/10.1007/s11227-017-1958-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-017-1958-1

Keywords

Navigation