Elsevier

Parallel Computing

Volume 30, Issue 12, December 2004, Pages 1329-1343
Parallel Computing

Scalability of hybrid programming for a CFD code on the Earth Simulator

https://doi.org/10.1016/j.parco.2004.09.006Get rights and content

Abstract

The Earth Simulator (ES) is an SMP cluster system. There are two types of parallel programming models available on the ES. One is a flat programming model, in which a parallel program is implemented by MPI interfaces only, both within an SMP node and among nodes. The other is a hybrid programming model, in which a parallel program is written by using thread programming within an SMP node and MPI programming among nodes simultaneously. It is generally known that it is difficult to obtain the same high level of performance using the hybrid programming model as can be achieved with the flat programming model.

In this paper, we have evaluated scalability of the code for direct numerical simulation of the Navier–Stokes equations on the ES. The hybrid programming model achieves the sustained performance of 346.9 Gflop/s, while the flat programming model achieves 296.4 Gflop/s with 16 PNs of the ES for a DNS problem size of 2563. For small scale problems, however, the hybrid programming model is not as efficient because of microtasking overhead. It is shown that there is an advantage for the hybrid programming model on the ES for the larger size problems.

Introduction

There are various types of parallel computers currently available for high performance computing. Among them, an SMP cluster system—a multiple node system in which each node is a symmetric multi-processor system (SMP)—is a principal candidate for use due to its wide capability and easy programming. Both thread programming and MPI (Message Passing Interface) [1], [2] programming are adaptable to parallelization of programs on an SMP node. Therefore, we can generally consider two types of parallel programming models on SMP cluster systems. One is a flat programming model, in which a parallel program is implemented by MPI interfaces only, both within an SMP node and among nodes. The other is a hybrid programming model, in which a parallel program is written by using thread programming within an SMP node and MPI programming among nodes simultaneously.

Some studies show it is difficult to obtain high performance using the hybrid programming model, as compared to the flat programming model [3]. This is because dynamic mapping of threads to a scalar processor at execution phase on an SMP node makes it difficult to use cache memory efficiently, thereby degrading the performance of the hybrid programming. On the other hand, MPI processes can be statically assigned to a scalar processor in the flat programming model and it is suitable for utilizing a cache memory architecture.

Development of the Earth Simulator (ES) began in 1997 with the aim of understanding global phenomena. It was completed in February 2002, and is the most powerful massively parallel computer system in the world for various large-scale scientific simulations such as a high-resolution global atmospheric simulation, eddy-resolving simulation in the global ocean, solid earth science simulation, and so forth.

The ES is a distributed-memory type parallel computer system consisting of 640 processor nodes [4], [5]. Each node is a shared memory system and can be assumed into an SMP cluster system. Since each processor of the Earth Simulator is designed based on a vector processor architecture with no cache memory, dynamic mapping of threads can be achieved without concern for cache memory, and the data throughput between processors and memory system is much higher than for a cache-based scalar processor system. Therefore, each thread in a hybrid programming model is expected to be carried out efficiently and hybrid programming is comparable to the flat model.

We have implemented a code for direct numerical simulation (DNS) of incompressible turbulence by the two programming models and have evaluated scalability of the codes on the ES. In this paper, the performance of the two programming models is described.

Section snippets

Architecture of the Earth Simulator

The ES consists of 640 processor nodes (referred to either “PNs” or “nodes”) connected by 640 × 640 single-stage crossbar switches (Fig. 1). Each PN has 8 vector-type arithmetic processors (APs), a 16-GB main memory unit (MMU), a remote access control unit (RCU), and an I/O processor (Fig. 2). The whole ES system has 5120 APs with 10 TB of main memory and the peak performance of 40 Tflop/s.

Each AP consists of a 4-way super-scalar unit (SU), a vector unit (VU), and main memory access control unit on

Numerical methods

We developed an incompressible turbulence simulation code which is named “Trans7” [8]. Trans7 uses the Fourier spectral method for Navier–Stokes (NS) equations. We consider the flow of an incompressible fluid as described byut+(u·)u=-p+ν2u+funder a periodic boundary condition with period 2π, where u = (u1, u2, u3) is the velocity field, p is the pressure, and f is the external force that satisfies ∇ · f = 0. The fluid density is assumed to be unity. The pressure term p can be eliminated by the

Performance evaluation and scalability

Two implementations of Trans7, one by the flat programming method and one by the hybrid method were evaluated on the ES by changing problem size, the number of nodes, and the number of APs in a node. Hereafter, Flat denotes the implementation by the flat programming model and Hybrid the implementation by the hybrid programming model. Flat was compiled without any compiler options for automatic parallelization so that an executable module suitable for microtasking was not generated.

Table 1 shows

Related work

There are many studies on the hybrid programming model with MPI and OpenMP. The relation between the available programming models and hardware architectures is shown in [14]. Parallelization strategies in the hybrid model on an SMP system are also studied in [15] and [16]. In [17], HPF programming based on hybrid MPI + OpenMP is compared with the flat programming model by MPI.

These studies argued that it is important to make a communication bandwidth higher between processes among the nodes. The

Conclusion

We have evaluated the scalability of direct numerical simulation (DNS) of incompressible turbulence which was implemented by two programming models on the ES.

Since the performances of data transfer by MPI either inside a node or between nodes by MPI are different on an SMP cluster, the performance optimization of the flat programming model is generally difficult on such a system. On the other hand, the hybrid programming model on the ES is easy to apply by automatic parallelization of the

Acknowledgements

The authors would like to thank Dr. Tetsuya Sato, director-general of the Earth Simulator Center, for his warm encouragement to this study.

References (17)

  • R.B. Pelz

    The parallel fourier pseudospectral method

    Journal of Computational Physics

    (1991)
  • M.P.I. Forum, MPI: A message-passing interface standard, Proceedings of ACM International Conference on Supercomputing...
  • M.P.I. Forum, MPI-2: Extensions to the message-passing interface, July 1997,...
  • F. Cappello, O. Richard, D. Etiemble, Investigating the performance of two programming models for clusters of SMP PCs,...
  • S. Habata et al.

    The Earth Simulator system

    NEC Res. Dev.

    (2003)
  • J. Inasaki et al.

    Hardware technology of the Earth Simulator

    NEC Res. Dev.

    (2003)
  • H. Uehara, M. Tamura, M. Yokokawa, An MPI benchmark program library and its application to the Earth Simulator,...
  • OpenMP Group,...
There are more references available in the full text version of this article.

Cited by (9)

  • DNS of hydrodynamically interacting droplets in turbulent clouds: Parallel implementation and scalability analysis using 2D domain decomposition

    2014, Computer Physics Communications
    Citation Excerpt :

    Depending on whether the boundaries of the domain are moving or not, the decomposition could be dynamic or static, i.e. the domain covered by each processor could change with time or be fixed. Alfonsi [5] surveyed a compendium of parallel implementations of direct numerical simulations of turbulent flows, all based on dd using a variety of flow solvers including the pseudo spectral method [6–12], lattice Boltzmann method [13,14], finite element method [15], and finite difference method [16–19]. Due to different machine architectures in terms of memory (distributed memory, shared memory, or combination), the techniques for parallel implementation vary from OpenMP, MPI, or a hybrid approach.

  • GPGPU implementation of mixed spectral-finite difference computational code for the numerical integration of the three-dimensional time-dependent incompressible Navier-Stokes equations

    2014, Computers and Fluids
    Citation Excerpt :

    Dong and Karniadakis [23] presented a hybrid two-level parallel paradigm with MPI/OpenMP, in the context of high-order methods as implemented in the spectral/hp element framework, to take advantage of the hierarchical structures arising from Navier–Stokes problems. The scalability of hybrid programming in a Navier–Stokes solver on the Earth Simulator computer has been explored by Itakura et al. [24]. They evaluated the scalability of a computational code for the solution of the Navier–Stokes equations, finding that a hybrid programming model achieved a sustained performance of 346.9 GFlop/s, while a flat programming model achieved 296.4 GFlops.

  • Hybrid MPI+OpenMP parallelization of an FFT-based 3D Poisson solver with one periodic direction

    2011, Computers and Fluids
    Citation Excerpt :

    This hybrid two-level model of parallelism is rather well-known approach for Computational Fluid Dynamics (CFD) applications. Algorithms with MPI + OpenMP and their comparison with the “flat” MPI-only approach can be found in Refs. [4–6], for instance. In general, the hybrid approach improves the performance but not in a significant manner.

  • Parallel implementation of 3D global MHD simulations for Earth's magnetosphere

    2008, Computers and Mathematics with Applications
    Citation Excerpt :

    Numerical simulation becomes more and more popular in experimenting with physical models, and computing plays an important role in investigating the interaction with experiments and analytical theory. Advances in high performance computing make detailed and accurate calculations of hydrodynamics and magnetohydrodynamics (MHD) feasible [1–8]. Specifically, 3D global MHD simulations can obtain the magnetospheric configuration and examine the response of the magnetosphere–ionosphere system to changing solar wind conditions.

  • DNS of Canonical Turbulence with up to 4096<sup>3</sup> Grid Points

    2005, Parallel Computational Fluid Dynamics 2004: Multidisciplinary Applications
View all citing articles on Scopus
View full text