Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming

He, Feng; Dong, Xiaoshe; Zou, Nianjun; Wu, Weiguo; Zhang, Xingjun

doi:10.1007/s11227-019-03063-6

Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming

Published: 12 November 2019

Volume 76, pages 2815–2841, (2020)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Feng He^1,2,
Xiaoshe Dong¹,
Nianjun Zou¹,
Weiguo Wu¹ &
…
Xingjun Zhang ORCID: orcid.org/0000-0003-1434-7016¹

384 Accesses
6 Citations
Explore all metrics

Abstract

Despite the shortcomings of the MPI/OpenMP hybrid parallel model that is extensively employed in massively parallel CFD solvers, this paper creates a set of MPI/OpenMP coarse-grained hybrid communication mapping rules for a structured mesh and establishes a mapping relationship among the geometric topology, the boundary communication topology, the topology of processes and threads groups, and the communication buffer. Based on the key technologies of the nonblocking asynchronous message communication and fine-grained mutex synchronization with a double-buffer mechanism for shared memory communication, an MPI/OpenMP coarse-grained hybrid parallel CFD solver framework for a structured mesh is designed. The experimental results show that the framework has high parallel performance and excellent scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data-Centric Approach for Efficient and Scalable CFD Implementation on Multi-GPUs Clusters

Coupe: A Modular, Multi-threaded Mesh Partitioning Platform

Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

References

Wu ZY, Zhu Q (2009) Scalable parallel computing framework for pump scheduling optimization. World Environmental and Water Resources Congress, pp 1–11
Yao J, Jameson A, Alonso JJ, Liu F (2001) Development and validation of a massively parallel flow solver for turbomachinery flows. J Propuls Power 17(3):659–668
Article Google Scholar
van der Weide E, Kalitzin G, Schluter J, Alonso JJ (2006) Unsteady turbomachinery computations using massively parallel platforms. 44th AIAA Aerospace Sciences Meeting and Exhibit, p 421
Corral R, Gisbert F, Pueblas J (2013) Computation of turbomachinery flows with a parallel unstructured mesh Navier–Stokes equations solver on GPUs. In: 21st AIAA Computational Fluid Dynamics Conference, p 2864
Greenshields CJ (2018) OpenFOAM user guide. https://cfd.direct/openfoam/user-guide/. Accessed 11 Dec 2018
Aiqing Z, Zeyao M, Zhang Y (2014) Three-level hierarchical software architecture for data-driven parallel computing with applications. J Comput Res Dev 51:2538–2546
Google Scholar
Li HF, Liang TY, Chiu JY (2013) A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66(1):381–405
Article Google Scholar
Utrera G, Gil M, Martorell X (2015) In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), pp 429–435
Yang L, Chiu SC, Liao W-K, Thomas MA (2014) High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70(1):284–300
Article Google Scholar
Peterson B, Humphrey A, Holmen J et al (2018) Demonstrating GPU code portability and scalability for radiative heat transfer computations. J Comput Sci 27:303–319
Article Google Scholar
Dimakopoulos VV (2014) Parallel programming models. In: Torquati M, Bertels K, Karlsson S, Pacull F (eds) Smart multicore embedded systems. Springer, New York, pp 3–20
Chapter Google Scholar
Jin HW, Sur S, Chai L, Panda DK (2007) Lightweight Kernel-level primitives for high-performance MPI intra-node communication over multi-core systems. In: IEEE International Conference on Cluster Computing, pp 446–451
D.A. Mallón, G.L. Taboada, C. Teijeiro, et al (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, Berlin, pp 174–184
Mininni PD, Rosenberg D, Reddy R, Pouquet A (2011) A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence. Parallel Comput 36(6–7):316–326
Article Google Scholar
Balaji P, Buntinas D, Goodell D et al (2011) MPI on millions of cores. Parallel Process Lett 21(01):45–60
Article MathSciNet Google Scholar
Chorley MJ, Walker DW (2010) Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J Comput Sci 1(3):168–174
Article Google Scholar
Drosinos N, Koziris N (2004) Performance comparison of pure MPI versus hybrid MPI-OpenMP parallelization models on SMP clusters. In: 18th International Parallel and Distributed Processing Symposium, pp 15–24
Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. Proc Cray User Group Conf 4(500):5455
Google Scholar
Yan B, Regueiro RA (2018) Comparison between pure MPI and hybrid MPI-OpenMP parallelism for discrete element method (DEM) of ellipsoidal and poly-ellipsoidal particles. Comput Part Mech 6:1–25
Google Scholar
Smith L, Bull M, Clark J, Building M, King T (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98
Google Scholar
Ashworth M, Anton L, Guo X, Pickles S (2015) Exploiting multi-core processors for scientific applications using hybrid MPI-OpenMP. Techn Rep. https://doi.org/10.13140/2.1.5065.2487
Article Google Scholar
Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI and OpenMP parallel programming on clusters of multicore nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436
Iakymchuk R, Akhmetova D, Iakymchuk R, Laure E (2017) Performance study of multithreaded MPI and OpenMP tasking in a large scientific code performance study of multithreaded MPI and OpenMP tasking in a large scientific code. In: IEEE International Conference on Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 756–765
Xue J (2012) Loop tiling for parallelism. Springer, Boston
MATH Google Scholar
Rabenseifner R (2003) Hybrid parallel programming : performance problems and chances. In: Proceedings of the 45th Cray User Group Conference, Ohio, pp 12–16
Sharma R, Kanungo P (2011) Performance evaluation of MPI and hybrid MPI+OpenMP programming paradigms on multi-core processors cluster. In: International Conference on Recent Trends in Information Systems (ReTIS), pp 137–140
Deck S, Duveau P, d’Espiney P, Guillen P (2002) Development and application of Spalart–Allmaras one equation turbulence model to three-dimensional supersonic complex configurations. Aerosp Sci Technol 6(3):171–183
Article Google Scholar
Ghia U, Ghia KN, Shin CT (1982) High-Re solutions for incompressible flow using the Navier–Stokes equations and a multigrid method. J Comput Phys 48(3):387–411
Article Google Scholar
Liu A, Ju YP, Zhang CH (2018) Parallel simulation of aerodynamic instabilities in transonic axial compressor rotor. J Propul Power 34(6):1561–1573
Article Google Scholar
Debreu L, Blayo E (1998) On the schwarz alternating method for oceanic models on parallel computers. J Comput Phys 141(2):93–111
Article MathSciNet Google Scholar
Tan G, Li L, Triechle S, Phillips E, Bao Y, Sun N (2011) Fast implementation of DGEMM on fermi GPU. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis SC’11, p 35
Yang C, Xue W, Fu H, et al (2017) 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC’17, pp 57–68
TOP500 List Novermber (2018) https://www.top500.org/lists/2018/11/. Accessed 11 Nov 2018
Yan B, Regueiro RA (2018) Superlinear speedup phenomenon in parallel 3D discrete element method (DEM) simulations of complex-shaped particles. Parallel Comput 75:61–87
Article MathSciNet Google Scholar
Skoumpourdis D, Papadopoulos PK, Koziri MG, Tziritas N, Loukopoulos T, Anagnostopoulos I (2017) On improving the speedup of slice and tile level parallelism in HEVC using AVX2. In: Proceedings of the 21st Pan-Hellenic Conference on Informatics, p 52

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grand No. 2016YFB0200902, and the NSFC project under Grand No. 61572394.

Author information

Authors and Affiliations

Xi’an Jiaotong University, Xi’an, 710049, Shaanxi, China
Feng He, Xiaoshe Dong, Nianjun Zou, Weiguo Wu & Xingjun Zhang
Jiuquan Satellite Launch Center, Jiuquan, 732750, Gansu, China
Feng He

Authors

Feng He
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoshe Dong
View author publications
You can also search for this author in PubMed Google Scholar
Nianjun Zou
View author publications
You can also search for this author in PubMed Google Scholar
Weiguo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xingjun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xingjun Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, F., Dong, X., Zou, N. et al. Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming. J Supercomput 76, 2815–2841 (2020). https://doi.org/10.1007/s11227-019-03063-6

Download citation

Published: 12 November 2019
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11227-019-03063-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming

Abstract

Access this article

Similar content being viewed by others

A Data-Centric Approach for Efficient and Scalable CFD Implementation on Multi-GPUs Clusters

Coupe: A Modular, Multi-threaded Mesh Partitioning Platform

Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming

Abstract

Access this article

Similar content being viewed by others

A Data-Centric Approach for Efficient and Scalable CFD Implementation on Multi-GPUs Clusters

Coupe: A Modular, Multi-threaded Mesh Partitioning Platform

Task-Based Sparse Hybrid Linear Solver for Distributed Memory Heterogeneous Architectures

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation