Skip to main content
Log in

Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Despite the shortcomings of the MPI/OpenMP hybrid parallel model that is extensively employed in massively parallel CFD solvers, this paper creates a set of MPI/OpenMP coarse-grained hybrid communication mapping rules for a structured mesh and establishes a mapping relationship among the geometric topology, the boundary communication topology, the topology of processes and threads groups, and the communication buffer. Based on the key technologies of the nonblocking asynchronous message communication and fine-grained mutex synchronization with a double-buffer mechanism for shared memory communication, an MPI/OpenMP coarse-grained hybrid parallel CFD solver framework for a structured mesh is designed. The experimental results show that the framework has high parallel performance and excellent scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Wu ZY, Zhu Q (2009) Scalable parallel computing framework for pump scheduling optimization. World Environmental and Water Resources Congress, pp 1–11

  2. Yao J, Jameson A, Alonso JJ, Liu F (2001) Development and validation of a massively parallel flow solver for turbomachinery flows. J Propuls Power 17(3):659–668

    Article  Google Scholar 

  3. van der Weide E, Kalitzin G, Schluter J, Alonso JJ (2006) Unsteady turbomachinery computations using massively parallel platforms. 44th AIAA Aerospace Sciences Meeting and Exhibit, p 421

  4. Corral R, Gisbert F, Pueblas J (2013) Computation of turbomachinery flows with a parallel unstructured mesh Navier–Stokes equations solver on GPUs. In: 21st AIAA Computational Fluid Dynamics Conference, p 2864

  5. Greenshields CJ (2018) OpenFOAM user guide. https://cfd.direct/openfoam/user-guide/. Accessed 11 Dec 2018

  6. Aiqing Z, Zeyao M, Zhang Y (2014) Three-level hierarchical software architecture for data-driven parallel computing with applications. J Comput Res Dev 51:2538–2546

    Google Scholar 

  7. Li HF, Liang TY, Chiu JY (2013) A compound OpenMP/MPI program development toolkit for hybrid CPU/GPU clusters. J Supercomput 66(1):381–405

    Article  Google Scholar 

  8. Utrera G, Gil M, Martorell X (2015) In search of the best MPI-OpenMP distribution for optimum Intel-MIC cluster performance. In: 2015 International Conference on High Performance Computing and Simulation (HPCS), pp 429–435

  9. Yang L, Chiu SC, Liao W-K, Thomas MA (2014) High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J Supercomput 70(1):284–300

    Article  Google Scholar 

  10. Peterson B, Humphrey A, Holmen J et al (2018) Demonstrating GPU code portability and scalability for radiative heat transfer computations. J Comput Sci 27:303–319

    Article  Google Scholar 

  11. Dimakopoulos VV (2014) Parallel programming models. In: Torquati M, Bertels K, Karlsson S, Pacull F (eds) Smart multicore embedded systems. Springer, New York, pp 3–20

    Chapter  Google Scholar 

  12. Jin HW, Sur S, Chai L, Panda DK (2007) Lightweight Kernel-level primitives for high-performance MPI intra-node communication over multi-core systems. In: IEEE International Conference on Cluster Computing, pp 446–451

  13. D.A. Mallón, G.L. Taboada, C. Teijeiro, et al (2009) Performance evaluation of MPI, UPC and OpenMP on multicore architectures. European Parallel Virtual Machine/Message Passing Interface Users’ Group Meeting. Springer, Berlin, pp 174–184

  14. Mininni PD, Rosenberg D, Reddy R, Pouquet A (2011) A hybrid MPI-OpenMP scheme for scalable parallel pseudospectral computations for fluid turbulence. Parallel Comput 36(6–7):316–326

    Article  Google Scholar 

  15. Balaji P, Buntinas D, Goodell D et al (2011) MPI on millions of cores. Parallel Process Lett 21(01):45–60

    Article  MathSciNet  Google Scholar 

  16. Chorley MJ, Walker DW (2010) Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J Comput Sci 1(3):168–174

    Article  Google Scholar 

  17. Drosinos N, Koziris N (2004) Performance comparison of pure MPI versus hybrid MPI-OpenMP parallelization models on SMP clusters. In: 18th International Parallel and Distributed Processing Symposium, pp 15–24

  18. Hager G, Jost G, Rabenseifner R (2009) Communication characteristics and hybrid MPI/OpenMP parallel programming on clusters of multi-core SMP nodes. Proc Cray User Group Conf 4(500):5455

    Google Scholar 

  19. Yan B, Regueiro RA (2018) Comparison between pure MPI and hybrid MPI-OpenMP parallelism for discrete element method (DEM) of ellipsoidal and poly-ellipsoidal particles. Comput Part Mech 6:1–25

    Google Scholar 

  20. Smith L, Bull M, Clark J, Building M, King T (2001) Development of mixed mode MPI/OpenMP applications. Sci Program 9(2–3):83–98

    Google Scholar 

  21. Ashworth M, Anton L, Guo X, Pickles S (2015) Exploiting multi-core processors for scientific applications using hybrid MPI-OpenMP. Techn Rep. https://doi.org/10.13140/2.1.5065.2487

    Article  Google Scholar 

  22. Rabenseifner R, Hager G, Jost G (2009) Hybrid MPI and OpenMP parallel programming on clusters of multicore nodes. In: 17th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp 427–436

  23. Iakymchuk R, Akhmetova D, Iakymchuk R, Laure E (2017) Performance study of multithreaded MPI and OpenMP tasking in a large scientific code performance study of multithreaded MPI and OpenMP tasking in a large scientific code. In: IEEE International Conference on Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp 756–765

  24. Xue J (2012) Loop tiling for parallelism. Springer, Boston

    MATH  Google Scholar 

  25. Rabenseifner R (2003) Hybrid parallel programming : performance problems and chances. In: Proceedings of the 45th Cray User Group Conference, Ohio, pp 12–16

  26. Sharma R, Kanungo P (2011) Performance evaluation of MPI and hybrid MPI+OpenMP programming paradigms on multi-core processors cluster. In: International Conference on Recent Trends in Information Systems (ReTIS), pp 137–140

  27. Deck S, Duveau P, d’Espiney P, Guillen P (2002) Development and application of Spalart–Allmaras one equation turbulence model to three-dimensional supersonic complex configurations. Aerosp Sci Technol 6(3):171–183

    Article  Google Scholar 

  28. Ghia U, Ghia KN, Shin CT (1982) High-Re solutions for incompressible flow using the Navier–Stokes equations and a multigrid method. J Comput Phys 48(3):387–411

    Article  Google Scholar 

  29. Liu A, Ju YP, Zhang CH (2018) Parallel simulation of aerodynamic instabilities in transonic axial compressor rotor. J Propul Power 34(6):1561–1573

    Article  Google Scholar 

  30. Debreu L, Blayo E (1998) On the schwarz alternating method for oceanic models on parallel computers. J Comput Phys 141(2):93–111

    Article  MathSciNet  Google Scholar 

  31. Tan G, Li L, Triechle S, Phillips E, Bao Y, Sun N (2011) Fast implementation of DGEMM on fermi GPU. In: Proceedings of International Conference for High Performance Computing, Networking, Storage and Analysis SC’11, p 35

  32. Yang C, Xue W, Fu H, et al (2017) 10M-core scalable fully-implicit solver for nonhydrostatic atmospheric dynamics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis SC’17, pp 57–68

  33. TOP500 List Novermber (2018) https://www.top500.org/lists/2018/11/. Accessed 11 Nov 2018

  34. Yan B, Regueiro RA (2018) Superlinear speedup phenomenon in parallel 3D discrete element method (DEM) simulations of complex-shaped particles. Parallel Comput 75:61–87

    Article  MathSciNet  Google Scholar 

  35. Skoumpourdis D, Papadopoulos PK, Koziri MG, Tziritas N, Loukopoulos T, Anagnostopoulos I (2017) On improving the speedup of slice and tile level parallelism in HEVC using AVX2. In: Proceedings of the 21st Pan-Hellenic Conference on Informatics, p 52

Download references

Acknowledgements

This work is supported by the National Key Research and Development Program of China under Grand No. 2016YFB0200902, and the NSFC project under Grand No. 61572394.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xingjun Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, F., Dong, X., Zou, N. et al. Structured mesh-oriented framework design and optimization for a coarse-grained parallel CFD solver based on hybrid MPI/OpenMP programming. J Supercomput 76, 2815–2841 (2020). https://doi.org/10.1007/s11227-019-03063-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-03063-6

Keywords

Navigation