Elsevier

Parallel Computing

Volume 28, Issue 1, January 2002, Pages 35-52
Parallel Computing

Applications
Distributed-memory concepts in the wave model WAVEWATCH III

https://doi.org/10.1016/S0167-8191(01)00130-2Get rights and content

Abstract

Parallel concepts for spectral wind-wave models are discussed, with a focus on the WAVEWATCH III model which runs in a routine operational mode at NOAA/NCEP. After a brief description of relevant aspects of wave models, basic parallelization concepts are discussed. It is argued that a method including data transposes is more suitable for this model than conventional domain decomposition techniques. Details of the implementation, including specific buffering techniques for the data to be communicated between processors, are discussed. Extensive timing results are presented for up to 450 processors on an IBM RS6000 SP. The resulting model is shown to exhibit excellent parallel behavior for a large range of numbers of processors.

Introduction

For several decades, numerical wind-wave models have been an integral part of weather prediction at weather forecast centers around the world. Major meteorological centers now rely on the so-called third-generation wave models like WAM [3], [9], or WAVEWATCH III [7], [8]. In such models, all physical processes describing wave growth and decay are parameterized explicitly. Compared to previous first- and second-generation models, which parameterized integral effects of the physics rather than the physics itself, this is computationally expensive due to the explicit calculation of nonlinear wave–wave interactions, and due to the relatively small time-steps required by third-generation models. Although such models are still less computationally intensive than atmospheric models, they nevertheless require state-of-the-art supercomputer facilities to produce forecasts at acceptable resolutions and in a timely fashion.

The first supercomputers utilized the concept of vectorization. Such vector computers achieved increased computational performance by efficiently performing identical calculations on large sets of data. Conversion of computer models to such computers generally required systematic reorganization of the programs to generate long loop structures. Initial vector computers also required much hardware-dependent calls for basic operations. In later vector computers, additional programming was essentially limited to the inclusion of compiler directives in the source code.

The second supercomputing paradigm is that of parallelization. In this case the work is spread over multiple processors. In the simplest form (from a user's perspective), the processors share memory. As with vectorization, the success of such parallelization depends on the general structure of the program. If this structure is conducive to parallelization, modifications to the program for shared-memory parallel computers are generally small, and are usually limited to adding compiler directives to the program. Sharing memory between processors, however, requires additional logistics in the computer, which generally limits the number of processors in shared-memory parallel computers to about 16. Much more massively parallel computers with up to O(103) processors can be constructed if the processors do not share their memory. Such distributed-memory parallel computers represent the latest development in supercomputing. Efficient application of models to such computers requires that the communication between processors becomes an integral part of the source code. Application to distributed memory computers therefore requires major code conversions, even for programs that are already applied to shared-memory parallel computers.

The present paper describes the conversion of the operational NOAA implementation of the third-generation wind-wave model WAVEWATCH III (henceforth denoted as NWW3) to a distributed memory computer architecture at the National Centers for Environmental Prediction (NCEP). The program is written in FORTRAN. For message passing between processors the message passing interface (MPI) standard has been used (e.g., Gropp [2]). In Section 2, a brief description of the model is given, together with previously used vectorization and parallelization approaches. In 3 Parallel concepts, 4 Implementation and optimization, the basic distributed memory design and model modifications are discussed, as well as optimization considerations. In Section 5 the performance of the parallel code at NCEP is discussed. 6 Discussion, 7 Conclusions present a discussion and conclusions.

Section snippets

The wave model

In contrast to numerical models for the atmosphere and ocean, which provide a deterministic description of both media, wind-wave models provide a statistical description of the sea state. The spatial and temporal scales of individual waves make it impossible to deterministically predict each individual wave for an entire ocean or sea. The random character of wind-waves makes it undesirable to deterministically model individual waves. Most statistical properties of wind-waves are captured in the

Parallel concepts

Effective parallel distributed-memory computing requires work and data to be distributed efficiently between the available processors. Two major parallel paradigms can be distinguished.

The first is domain decomposition. Here domains considered in the model are divided in contiguous blocks of data points to be stored and processed at individual processors. Continuity of the calculation in the decomposed domains then requires communication of boundary data between processors.

The second is

Implementation and optimization

In discussing the implementation and optimization of the distributed memory version of NWW3, we will first concentrate on the calculations in Section 4.1. Input and output (I/O) will be discussed in Section 4.2. Load balancing has already been discussed in the previous section. Discussion of optimization will therefore not consider load balancing again.

Performance

The performance of the distributed memory version of NWW3 as described in the previous two sections has been tested on the phase II IBM RS6000 SP of NCEP. This machine consists of Winterhawk II nodes. Each node contains four processors, that share 2 GB of memory. The machine consists of 256 nodes (1024 processors), half of which are available for a single job due to the present queuing configuration. Although this machine represents a hybrid shared-distributed memory architecture, all

Discussion

Fig. 3 shows a systematic decrease of the run time tN as a function of the number of processors N, up to N≈240. For larger numbers of processors there appears to be little or no gain in run time, or perhaps even an increase. The latter behavior is difficult to assess due to the variability in the timing results. The three run times considered (symbols, see legend) are similar, particularly for smaller N. For larger N their differences become noticeable, but never dominating. For larger N timing

Conclusions

Parallel concepts for spectral wind-wave models have been discussed in the context of the WAVEWATCH III model as implemented at NOAA/NCEP. Conventionally, domain decomposition methods have been used for the parallelization of such models. In particular for this model, however, a more unconventional data transpose method appeared more suitable on theoretical grounds. Having implemented such a data transpose method, extensive timing and scaling tests have been performed. It was shown that the

Acknowledgements

The author would like to thank Jim Tuccillo and George Vandenberghe for their support during the design and testing of NWW3 on the IBM phase I and II systems at NCEP, and D.B. Rao, Joe Sela and Mark Iredell for their comments on early drafts of this paper. The present study was made possibly by funding from the NOAA High Performance Computing and Communication (HPCC) office.

References (10)

There are more references available in the full text version of this article.

Cited by (28)

  • Hybrid multi-grid parallelisation of WAVEWATCH III model on spherical multiple-cell grids

    2022, Journal of Parallel and Distributed Computing
    Citation Excerpt :

    However, finite-element method on unstructured grids is more expensive in computation than finite-difference one on lat-lon grids and is not favoured for large operational models, as the timing results in [1] revealed. Tolman [23] initiated a unique parallelisation method in WW3 model based on spectral component decomposition (or CD for short hereafter) and data transport via Message Passing Interface (MPI). Instead of splitting the spatial domain, the CD method solves the whole domain propagation of each wave spectral component on a single processor, allowing parallelisation on processors as many as the wave spectral components.

  • Soft-bottom community responses in a marine area influenced by recurrent dumping activities and freshwater discharges

    2020, Marine Pollution Bulletin
    Citation Excerpt :

    Daily values were also obtained throughout the study period (April 2014 to July 2017). The three-dimensional ECOMARS model (Lasure and Dumas, 2008; Tolman, 2002), as performed by Dutertre et al. (2013) at a larger scale, was not used in this study. Resolution grid (3 km) was not adapted to represent hydrological variations within the sampled site, where directly and indirectly impacted stations were spaced from 900 m within the dumping area and from 1,6 km with the reference stations.

  • Large-scale hurricane modeling using domain decomposition parallelization and implicit scheme implemented in WAVEWATCH III wave model

    2020, Coastal Engineering
    Citation Excerpt :

    Therefore the parallelization looks random in the CD algorithm for the unstructured grids. The efficiency of this method has been proven on structured grids Tolman (2002). The CD method has been used in the operational WW3 for years, for single and multiple grids.

  • Wave hindcasting by coupling numerical model and artificial neural networks

    2008, Ocean Engineering
    Citation Excerpt :

    The action spectrum is the energy spectrum divided by the intrinsic frequency of the spectral components. The action spectrum is used in recent models as it allows for the transparent inclusion of effects of mean currents on the evolution of the wave field (Tolman, 2002). WAVEWATCH III (WW3) is a third-generation wave model and has been developed by Tolman (1999) at the Ocean Modeling Branch (OMB) of the Environmental Modeling Center (EMC) of the National Center for Environmental Prediction (NCEP) of the United States of America for deep water.

View all citing articles on Scopus

OMB contribution No. 201.

View full text