Abstract
This paper aims to report on the open multi-processing (OpenMP) parallel implementation of a fully unstructured high-order discontinuous Galerkin (DG) solver for computational fluid dynamics and computational aeroacoustics applications. Even if the use of OpenMP paradigm is confined to shared memory systems, it has some advantages over the use of the message passing interface (MPI) library, and getting the best of this approach potentially improves the parallel efficiency of codes running on clusters of multi-core nodes. While with MPI the use of a domain decomposition algorithm is almost unavoidable, the OpenMP shared memory context offers several opportunities. Three strategies, here optimised for a DG solver, are presented and compared: the first refers to a customization of a colouring approach, the second mimics an MPI implementation in the OpenMP context, while the third method is somehow half way between the previous two. The numerical tests performed on both inviscid and viscous test cases indicate that, thanks to the compactness of the DG discretization, all the code versions perform quite satisfactory. In particular, the domain decomposition algorithm reaches the highest level of parallel efficiency at low computational loads while the colouring approach excels at larger computational loads and it can be easily implemented within an existing MPI code. Moreover, colouring is very well suited to deal with hardware accelerators, an opportunity given by the OpenMP 4.0 standard. Finally, the performance gain observed in using a hybrid MPI/OpenMP version of the DG code on high performance computing facilities is demonstrated.

















Similar content being viewed by others
Notes
The auto option was introduced in the OpenMP standard only with the release 3.0, it delegates the scheduling decision to the compiler.
References
de Wiart, C.C., Hillewaert, K.: Development and validation of a massively parallel high-order solver for DNS and LES of industrial flows. In: IDIHOM: Industrialization of High-Order Methods-A Top-Down Approach, pp. 251–292. Springer (2015)
Renac, F., Plata, M.L., Martin, E., Chapelier, J.B., Couaillier, V.: IDIHOM: industrialization of high-order methods—A top-down approach: results of a collaborative research project funded by the European Union, 2010–2014, chapter Aghora: a high-order DG solver for turbulent flow simulations, pp. 315–335. Springer International Publishing, Cham (2015)
Brus, S.R., Wirasaet, D., Westerink, J.J., Dawson, C.: Performance and scalability improvements for discontinuous Galerkin solutions to conservation laws on unstructured grids. J. Sci. Comput. 70(1), 210–242 (2017)
Nair, R.D., Choi, H.W., Tufo, H.M.: Computational aspects of a scalable high-order discontinuous Galerkin atmospheric dynamical core. Comput. Fluids 38(2), 309–319 (2009)
Reuter, B., Aizinger, V., Köstler, : A multi-platform scaling study for an OpenMP parallelization of a discontinuous Galerkin ocean model. Comput. Fluids 117, 325–335 (2015)
Dong, S., Karniadakis, G.E.: Dual-level parallelism for high-order CFD methods. Parallel Comput. 30(1), 1–20 (2004)
Chorley, M.J., Walker, D.W.: Performance analysis of a hybrid MPI/OpenMP application on multi-core clusters. J. Comput. Sci. 1(3), 168–174 (2010)
Jin, H., Jespersen, D., Mehrotra, P., Biswas, R., Huang, L., Chapman, B.: High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Comput. 37(9), 562–575 (2011)
Bassi, F., Colombo, F., Crivellini, A., Franciolini, M.: Hybrid OpenMP/MPI parallelization of a high-order Discontinuous Galerkin CFD/CAA solver. In: 7th European Congress on Computational Methods in Applied Sciences and Engineering, ECCOMAS Congress 2016, pp. 7992–8012. National Technical University of Athens (2016)
Crivellini, A., Franciolini, M.: On the implementation of OpenMP and Hybrid MPI/OpenMP parallelization strategies for an explicit DG solver. Adv. Parallel Comput. 32, 527–536 (2018)
Crivellini, A., Bassi, F.: A three-dimensional parallel discontinuous Galerkin solver for acoustic propagation studies. Int. J. Aeroacoust. 2(2), 157–173 (2003)
Bassi, F., Crivellini, A., Rebay, S., Savini, M.: Discontinuous Galerkin solution of the Reynolds averaged Navier–Stokes and k-\(\omega \) turbulence model equations. Comput. Fluids 34, 507–540 (2005)
Bassi, F., Crivellini, A., Ghidoni, A., Rebay, S.: High-order discontinuous Galerkin discretization of transonic turbulent flows. In: 47th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, January 5–8 2009. AIAA (2009)
Bassi, F., Botti, L., Colombo, A., Crivellini, A., Franchina, N., Ghidoni, A., Rebay, S.: Very high-order accurate discontinuous Galerkin computation of transonic turbulent flows on aeronautical configurations. Note Numer. Fluid Mech. Multidiscip. Des. 113, 25–38 (2010)
Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An artificial compressibility flux for the discontinuous Galerkin solution of the incompressible Navier–Stokes equations. J. Comput. Phys. 218(2), 794–815 (2006)
Bassi, F., Crivellini, A., Di Pietro, D.A., Rebay, S.: An implicit high-order discontinuous Galerkin method for steady and unsteady incompressible flows. Comput. Fluids 36(10), 1529–1546 (2007). Special Issue Dedicated to Professor Michele Napolitano on the Occasion of his 60th Birthday
Crivellini, A., D’Alessandro, V., Bassi, F.: A Spalart–Allmaras turbulence model implementation in a discontinuous Galerkin solver for incompressible flows. J. Comput. Phys. 241, 388–415 (2013)
Franciolini, M., Crivellini, A., Nigro, A.: On the efficiency of a matrix-free linearly implicit time integration strategy for high-order discontinuous Galerkin solutions of incompressible turbulent flows. Comput. Fluids 159, 276–294 (2017)
Hu, F.Q., Atkins, H.L.: Eigensolution analysis of the discontinuous Galerkin method with nonuniform grids: I. one space dimension. J. Comput. Phys. 182(2), 516–545 (2002)
Toulopoulos, I., Ekaterinaris, J.A.: High-order discontinuous Galerkin discretizations for computational aeroacoustics in complex domains. AIAA J. 44(3), 502–511 (2006)
Bernacki, M., Fezoui, L., Lanteri, S., Piperno, S.: Parallel discontinuous Galerkin unstructured mesh solvers for the calculation of three-dimensional wave propagation problems. Appl. Math. Model. 30(8), 744–763 (2006)
Baggag, A., Atkins, H.L., Keyes, D.: Parallel implementation of the discontinuous galerkin method, August 1999. NASA/CR-1999-209546, ICASE Report No. 99-35 (1999)
Bassi, F., Rebay, S., Mariotti, G., Pedinotti, S., Savini, M.: A high-order accurate discontinuous finite element method for inviscid and viscous turbomachinery flows. In: Decuypere, R., Dibelius, G. (eds) Proceedings of the 2nd European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, pp. 99–108, Antwerpen, Belgium, March 5–7 1997. Technologisch Instituut (1997)
Cools, R.: An encyclopædia of cubature formulas. J. Complex. 19, 445–453 (2003)
Kennedy, C.A., Carpenter, M.H., Lewis, R.M.: Low-storage, explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. Appl. Numer. Math. 35(3), 177–219 (2000)
Sato, Y., Hino, T., Ohashi, K.: Parallelization of an unstructured Navier–Stokes solver using a multi-color ordering method for OpenMP. Comput. Fluids 88, 496–509 (2013)
Komatitsch, D., Michéa, D., Erlebacher, G.: Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA. J. Parallel Distrib. Comput. 69(5), 451–460 (2009)
Karypis, G., Kumar, V.: METIS, a software package for partitioning unstructured graphs, partitioning meshes, and computing fill-reducing orderings of sparse matrices. Technical Report Version 4.0, University of Minnesota, Department of Computer Science/Army HPC Research Center (1998)
Hoeflinger, J., Alavilli, P., Jackson, T., Kuhn, B.: Producing scalable performance with OpenMP: experiments with two CFD applications. Parallel Comput. 27(4), 391–413 (2001). Parallel computing in aerospace
Hardin, J.C., Ristorcelli, J.R., Tam, C.K.W.: ICASE/LaRC Workshop on Benchmark Problems in Computational Aeroacoustics (CAA). NASA conference publication. National Aeronautics and Space Administration, Langley Research Center (1995)
Tam, C.K.W., Hardin, J.C.: Second computational aeroacoustics (CAA): workshop on benchmark problems. NASA conference publication, NASA (1997)
Crivellini, A.: Assessment of a sponge layer as a non-reflective boundary treatment with highly accurate gust–airfoil interaction results. Int. J. Comput. Fluid Dyn. 30(2), 176–200 (2016)
Colombo, A., Crivellini, A.: Assessment of a sponge layer non-reflecting boundary treatment for high-order CAA/CFD computations. Comput. Fluids 140, 478–499 (2016)
Mani, A.: Analysis and optimization of numerical sponge layers as a nonreflective boundary treatment. J. Comput. Phys. 231(2), 704–716 (2012)
Morris, P.J.: Scattering of sound by a sphere: Category 1: Problems 3 and 4. In: Tam, C.K.W., Hardin, J.C. (eds.) Second Computational Aeroacoustics (CAA) Workshop on Benchmark Problems, 1997. NASA CP 3352 (1997)
Simonaho, S.P., Lähivaara, T., Huttunen, T.: Modeling of acoustic wave propagation in time-domain using the discontinuous Galerkin method—a comparison with measurements. Appl. Acoust. 73(2), 173–183 (2012)
5th International Workshop on High–Order CFD Methods. https://how5.cenaero.be/
Gassner, G.J., Beck, A.D.: On the accuracy of high-order discretizations for underresolved turbulence simulations. Theoret. Comput. Fluid Dyn. 27(3–4), 221–237 (2013)
Bassi, F., Botti, L., Colombo, A., Crivellini, A., Ghidoni, A., Massa, F.C.: On the development of an implicit high-order discontinuous galerkin method for DNS and implicit LES of turbulent flows. Eur. J. Mech. B/Fluids 55, 367–379 (2016)
Van Rees, W.M., Leonard, A., Pullin, D.I., Koumoutsakos, P.: A comparison of vortex and pseudo-spectral methods for the simulation of periodic vortical flows at high reynolds numbers. J. Comput. Phys. 230(8), 2794–2805 (2011)
Advanced Micro Devices, Inc. AMD Opteron 6200 series processors, Linux tuning guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/51803A_OpteronLinuxTuningGuide_SCREEN.pdf
Advanced Micro Devices, Inc. AMD Opteron 6200/4200 series processors compiler options quick reference guide, 2012. Downloadable from https://developer.amd.com/wordpress/media/2012/10/CompilerOptQuickRef-62004200.pdf (2012)
Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An empirical study of Intel Xeon Phi”. ArXiv e-prints, 12 (2013)
Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)
Waltz, J., Wohlbier, J.G., Risinger, L.D., Canfield, T.R., Charest, M.R.J., Long, A.R., Morgan, N.R.: Performance analysis of a 3D unstructured mesh hydrodynamics code on multi-core and many-core architectures. Int. J. Numer. Methods Fluids 77(6), 319–333 (2015)
Kannan, R., Harrand, V., Lee, M., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures: development and implementation in CFD context. Int. J. Numer. Methods Fluids 73(10), 869–882 (2013)
Kannan, R., Harrand, V., Tan, X.G., Yang, H.Q., Przekwas, A.J.: Highly scalable computational algorithms on emerging parallel machine multicore architectures II: development and implementation in the CSD and FSI contexts. J. Parallel Distrib. Comput. 74(9), 2808–2817 (2014)
Altmann, C., Beck, A., Birkefeld, A., Gassner, G., Hindenlang, F., Munz, C.D., Staudenmaier, M.: Discontinuous Galerkin for high performance computational fluid dynamics. In: Nagel, Wolfgang E., Kröner, Dietmar H, Resch, Michael M (eds.) High Performance Computing in Science and Engineering ‘12, pp. 225–238. Springer, Berlin (2013)
Bueno, J., Martinell, L., Duran, A., Farreras, M., Martorell, X., Badia, R.M., Ayguade, E, Labarta, J.: Productive cluster programming with OmpSs. In: European Conference on Parallel Processing, pp. 555–566. Springer (2011)
Matheou, George, Evripidou, Paraskevas: Data-driven concurrency for high performance computing. ACM Trans. Architect. Code Optim. (TACO) 14(4), 53 (2017)
Acknowledgements
We acknowledge Aki Mäkivirta from Genelec, Timo Lähivaara and Simo-Pekka Simonaho from University of Eastern Finland and Tomi Huttunen from Kuava Oy for providing us the Genelec speaker geometry and details of their numerical and experimental tests. We also acknowledge “Centro per le Tecnologie Didattiche e la Comunicazione”, University of Bergamo, for the resources provided by CINECA, within the “Convenzione di Ateneo Università degli Studi di Bergamo”. Moreover, the CINECA award, under the ISCRA initiative (grant numbers HP10CE90VW and HP10BMA1AP), is acknowledged for the availability of high performance computing resources and support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Crivellini, A., Franciolini, M., Colombo, A. et al. OpenMP Parallelization Strategies for a Discontinuous Galerkin Solver. Int J Parallel Prog 47, 838–873 (2019). https://doi.org/10.1007/s10766-018-0589-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-018-0589-3