Elsevier

Computers & Geosciences

Volume 64, March 2014, Pages 96-103
Computers & Geosciences

Parallelisation study of a three-dimensional environmental flow model

https://doi.org/10.1016/j.cageo.2013.12.006Get rights and content

Highlights

  • Describes the porting of legacy Fortran 77 geosciences code to MPI parallel.

  • Novel load balancing scheme for incorporating highly irregular coastlines.

  • Discusses number of methods for solving semi-implicit algorithm in parallel model.

  • Presents a simple but effective approach to port legacy code to modern machines.

Abstract

There are many simulation codes in the geosciences that are serial and cannot take advantage of the parallel computational resources commonly available today. One model important for our work in coastal ocean current modelling is EFDC, a Fortran 77 code configured for optimal deployment on vector computers. In order to take advantage of our cache-based, blade computing system we restructured EFDC from serial to parallel, thereby allowing us to run existing models more quickly, and to simulate larger and more detailed models that were previously impractical. Since the source code for EFDC is extensive and involves detailed computation, it is important to do such a port in a manner that limits changes to the files, while achieving the desired speedup. We describe a parallelisation strategy involving surgical changes to the source files to minimise error-prone alteration of the underlying computations, while allowing load-balanced domain decomposition for efficient execution on a commodity cluster. The use of conjugate gradient posed particular challenges due to implicit non-local communication posing a hindrance to standard domain partitioning schemes; a number of techniques are discussed to address this in a feasible, computationally efficient manner. The parallel implementation demonstrates good scalability in combination with a novel domain partitioning scheme that specifically handles mixed water/land regions commonly found in coastal simulations. The approach presented here represents a practical methodology to rejuvenate legacy code on a commodity blade cluster with reasonable effort; our solution has direct application to other similar codes in the geosciences.

Introduction

Numerical modelling has several advantages in the study of coastal ocean flow processes and events. Chief among these is the reduced cost and ease of deployment of a numerical model compared to field work or other methods of investigation. In addition, it is easier to configure a numerical model to investigate different flow conditions and scenarios. However, with the drive to model more realistic and detailed simulations, the computational demands of numerical solutions increase, due primarily to finer grid resolution and the simulation of a greater number of passive and active tracers. As a result, the practical ability of numerical models to solve real-world problems is constrained. Parallel computing allows faster execution and the ability to perform larger, more detailed simulations than is possible with serial code. The research reported here presents details on the porting of an existing coastal ocean model from serial code to parallel. This work is driven partly by a desire to model larger simulations in greater detail, but also to allow experimentation in more computationally demanding methods of data assimilation to improve the performance of real-time predictive modelling.

The model used for the study, Environmental Fluid Dynamics Code (EFDC), is a widely used, three-dimensional, finite difference, hydrodynamic model (Hamrick, 1992). The parallelisation adopts an efficient domain decomposition approach that theoretically permits deployment on a large cluster of machines; however the fundamental objective of our work centres on real-time simulation capabilities of a given model on a commodity blade system and not optimal scalability on an arbitrarily large system. We were therefore guided by the following requirements considered key to the success of the parallelisation effort and subsequent operation on similar cluster systems:

  • 1.

    Limited changes to the large number of source files (approximately 50 000 lines of code), to avoid introducing computational errors.

  • 2.

    Binary regression of the parallel model versus serial simulations, to ensure the simulation runs in parallel exactly as it ran serially. Even a small deviation could mask the presence of an error in the port.

  • 3.

    Automation of the setup process for a parallel run to allow the originally setup serial models to run properly on the parallel code. This involves automatic generation of source code specific to each parallel run of a model, to avoid manual effort and the introduction of errors.

Several parallel versions of numerical ocean models have already been described in the literature, and they have computational methods also used by other codes in the geosciences. Wang et al. (1997) present elements of the widely used Parallel Ocean Program (POP), while Beare and Stevens (1997) build on the parallelisation of the Modular Ocean Model (MOM). However, the fundamental structure of these models makes them more suitable for global, ocean-scale problems, and they are not as well-suited to the finer scale resolution of coastal water phenomena. A parallelisation study on the Princeton Ocean Model (POM) and the Regional Ocean Modelling System (ROMS) is discussed by Sannino et al. (2001) and Wang et al. (2005) respectively. A common feature of these models is the adoption of a split-explicit formulation of the equations governing vertically averaged transport. This representation permits easier parallelisation since global communication in the horizontal is eliminated. However the maximum computational timestep is constrained by the Courant–Friedrich–Levy restriction (Ezer et al., 2002), as opposed to the greater numerical flexibility provided by implicit approaches (Jin et al., 2000). De Marchis et al. (2012) presents details on a parallel code that adopts finite volume methods for the solution of the fundamental governing equations.

Among all branches of the geosciences, atmospheric modelling was one of the first to use parallel computers due to the intrinsic needs of both weather models that run in real-time, and climate models that operate in time scales of centuries. Coupling with ocean models similarly creates computational demands that benefit from parallel computation. Drake et al. (1993) present details on the parallel version of the NCAR Community Climate Model, CCM2. The parallelisation strategy decomposes the model domain into geographical patches with a message passing library conducting communication between segregated domains. Wolters and Cats (1993) describe the parallelisation strategy included in the HIRLAM model, a state-of-the-art system for weather forecasts up to 48 h, while Fournier et al. (2004) discuss aspects of deploying a spectral element atmospheric model in parallel. Michalakes et al. (1998) describe the parallelisation approach adopted for the widely used Weather Research and Forecast model.

In the following sections, the model is introduced along with a description of the computational schemes used to solve the governing equations. Section 3 discusses the parallelisation strategy adopted with particular emphasis on load balancing of the computation within an irregular coastal waterbody. Section 4 presents the parallel speedup and performance of the amended model; a case study analysis focuses on Galway Bay, on the West Coast of Ireland to enable a realistic assessment of practical gain. The conclusions and a discussion are found in section 5.

Section snippets

Model description

EFDC is a public domain, open source, modelling package for simulating three-dimensional flow, transport and biogeochemical processes in surface water systems. The model is specifically designed to simulate estuaries and subestuarine components (tributaries, marshes, wet and dry littoral margins), and has been applied to a wide range of environmental studies in the Chesapeake Bay region (Shen et al., 1999). It is presently being used by universities, research organisations, governmental

Parallelisation

EFDC is a Fortran 77 code originally designed for deployment on vector computers as opposed to distributed systems. The code was configured to achieve a degree of parallelisation on shared memory processors by directives inserted in the source specific to vectorised architectures. However, the existing vectorisation code is not of benefit for parallelisation on distributed memory systems. For performance comparable to vector systems, scalable cache based processors achieve speedup through

Performance

All performance tests were conducted on a local commodity blade cluster of five nodes. Each compute node had a X5690 hex-core processor, with clock speed of 3.47 GHz and 12 MB of cache; the nodes are connected by a 1 GBit/s Ethernet network. Parallel simulations were configured to deploy on the smallest number of blades possible to minimise unnecessary network communication.

Experiments have been performed on a typical coastal region application, Galway Bay, to investigate the performance of

Discussion and conclusions

This study presents details on the parallelisation of a widely used environmental flow model. Preliminary results demonstrate that considerable speedup can be achieved on a distributed cluster by adopting a pragmatic approach to the parallelisation effort, with a load-balanced domain decomposition based on the underlying numerical algorithms. Note that this study presents details only on the hydrodynamic simulation itself, and not more computationally demanding aspects of a simulation.

References (29)

  • T. Ezer et al.

    Developments in terrain-following ocean models: intercomparisons of numerical aspects

    Ocean Model

    (2002)
  • S. Griffies et al.

    Developments in ocean climate modelling

    Ocean Model

    (2000)
  • Y. Wu et al.

    Parallelization of a hydrological model using the message passing interface

    Env. Model. Softw.

    (2013)
  • M. Beare et al.

    Optimisation of a parallel ocean general circulation model

  • A.F. Blumberg et al.

    A description of a three-dimensional coastal ocean circulation model

    Coast. Estuar. Sci.

    (1987)
  • M. De Marchis et al.

    Wind-and tide-induced currents in the stagnone lagoon (sicily)

    Env. Fluid Mech.

    (2012)
  • A. Deane et al.

    Parallel Computational Fluid Dynamics 2005Theory and Applications

    (2006)
  • Drake, J., Flanery, R., Walker, D., Worley, P., Foster, I., Michalakes, J., Stevens, R., Hack, J., Williamson, D.,...
  • Fiduccia, C.M., Mattheyses, R.M., 1982. A linear-time heuristic for improving network partitions. In: 19th Conference...
  • A. Fournier et al.

    The spectral element atmosphere model (SEAM)high-resolution parallel computation and localized resolution of regional dynamics

    Mon. Weather Rev.

    (2004)
  • S. Griffies et al.

    Tracer conservation with an explicit free surface method for z-coordinate ocean models

    Mon. Weather Rev.

    (2001)
  • Hageman, L.A., Young, D.M., 2012. Applied Iterative Methods. Dover...
  • Hamrick, J., 1992. A Three-Dimensional Environmental Fluid Dynamics Computer Code: Theoretical and Computational...
  • Hamrick, J.M., 1996. User's Manual for the Environmental Fluid Dynamics Computer Code. Technical Report. Virginia...
  • Cited by (21)

    • SW2D-GPU: A two-dimensional shallow water model accelerated by GPGPU

      2021, Environmental Modelling and Software
      Citation Excerpt :

      High performance computing (HPC) and codes suitable for parallel processing are the best alternative for accelerating numerical solutions (Smari et al., 2016). Most codes that approximate the solution to the shallow water equations are developed using sequential processing or parallelization schemes based on Message Passing Interface (MPI) or Open Multi-Processing (OpenMP) compatible with clusters composed of several Central Processing Units (CPU) (O’Donncha et al., 2014, 2019; Anguita et al., 2015; Noh et al., 2018, 2019). Recently, massively parallel devices such as the General Purpose Graphics Processing Unit (GPGPU) have been shown highly efficient to accelerate the solution of shallow water equations and environmental models with application in real world phenomena (Brodtkorb et al., 2010; Ransom and Younis, 2016; Vacondio et al., 2017; Carlotto et al., 2019; Dazzi et al., 2020).

    • Simulating current-energy converters: SNL-EFDC model development, verification, and parameter estimation

      2020, Renewable Energy
      Citation Excerpt :

      Despite increased system and measurement uncertainties in real-world systems, SNL-EFDC has recently been used to optimize array layouts [71]. In addition, the amount of simulation information produced for the low computational expense is notable, especially given that techniques are available to parallelize EFDC [30,31] or to run it on cloud computing systems [72]. Calibrated parameter values control the fit between experimental data and simulations.

    • Modelling study of the effects of suspended aquaculture installations on tidal stream generation in Cobscook Bay

      2017, Renewable Energy
      Citation Excerpt :

      Horizontal diffusion is calculated via the Smagorinsky formula [21]. The code is parallelized using a domain decomposition approach with MPI synchronization between domains [22,23], while research is ongoing on provisioning via a Cloud offering [24,25]. Specifically designed to simulate estuaries and subestuarine components (tributaries, marshes, wet and dry littoral margins), the model has been used in large number of environmental studies in the Chesapeake Bay region [26].

    • Scalable parallel implementation for 3D semi-implicit hydrodynamic models of shallow waters

      2015, Environmental Modelling and Software
      Citation Excerpt :

      Semi-)implicit approaches (splitting or non-splitting) additionally require all-to-one/one-to-all collective communications. To be more precise, if a parallel PCG solver is used to obtain surface elevation, all-to-all reduction communications (i.e. all-to-one reduction plus one-to-all broadcast communications) are required at each solver iteration (see for example Nesterov, 2010) and, if a sequential PCG solver within the parallel code is used, a couple of all-to-one gather and one-to-all scatter communications are required (see for example Acosta et al., 2010; O'Donncha et al., 2014). A parallel PCG adds both interchange and all-to-one/one-to-all collective communications at each solver iteration independently of the preconditioner used (the CG algorithm and customary preconditioners can be seen, for example, in Golub and Van Loan, 2012; and Saad, 2003), and they cannot be eliminated.

    View all citing articles on Scopus
    View full text