A general parallelization strategy for random path based geostatistical simulation methods

https://doi.org/10.1016/j.cageo.2009.11.001Get rights and content

Abstract

The size of simulation grids used for numerical models has increased by many orders of magnitude in the past years, and this trend is likely to continue. Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. As cluster computers become widely available, using parallel strategies is a natural step for increasing the usable grid size and the complexity of the models. These strategies must profit from of the possibilities offered by machines with a large number of processors. On such machines, the bottleneck is often the communication time between processors. We present a strategy distributing grid nodes among all available processors while minimizing communication and latency times. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node every time. The key is to decouple the sending and the receiving operations to avoid synchronization. Centralization allows having a conflict management system ensuring that nodes being simulated simultaneously do not interfere in terms of neighborhood. The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods.

Introduction

The size of the simulation grids used for geological models (and more generally for spatial statistics) has increased by many orders of magnitude in the last years. This trend is likely to continue because the only way of modeling different scales together is to use high-resolution models. This is of utmost importance in applications such as hydrogeology, petroleum and mining, due to the critical influence of small scale heterogeneity on large scale processes (e.g. Mariethoz et al., 2009a).

Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. Furthermore, with increasingly sophisticated simulation techniques including complex spatial constraints, the computational cost for simulating one grid node has also raised. As multicore processors and clusters of computers become more and more available, using parallel strategies is necessary for increasing the usable grid size and hence allowing for models of higher complexity.

Parallel computers can be divided into two main categories: shared memory machines and distributed memory architectures. Shared memory machines have the advantage of ease and rapidity of the communications between the different computing units. Nevertheless, their price is extremely high and the total amount of memory as well as the total number of processors are limited. Therefore, most of the time it is distributed memory machines (or clusters computers) that are used in the industry or in the academic world. As such machines do not have a common shared memory space, the processors have to communicate by sending and receiving messages. The communication time between processors can be important and is often the bottleneck in a program execution.

In this paper, we propose a parallelization strategy applicable in the context of sequential simulation methods and based on the distribution of the grid nodes among all available processors. The method minimizes communication and latency times and can be applied using shared or distributed memory architectures, or a combination of both. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node each time. The key is to decouple the sending and the receiving operations to avoid waiting for synchronization. Centralization allows having a conflict management system making sure that nodes being simulated simultaneously do not interfere in terms of neighborhood.

The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods. It is illustrated with an example using the Direct Sampling approach (Mariethoz, 2009; Mariethoz and Renard, 2010), which is a simulation algorithm using multiple-points (MP) statistics.

Section snippets

Parallelizing sequential simulations

Sequential simulation is a class of methods that is used to generate realizations of a random field (Deutsch and Journel, 1992, Caers, 2005, Remy et al., 2009). The general principle of the method is to discretize the random field on a grid and to draw successively (sequentially) for each node x of the grid an outcome of the random variable Z in a local cumulative conditional density function (ccdf). This local ccdf is conditional to the previously simulated nodes and to local data if those are

Nodes distribution

The solution proposed in this paper is to have one processor, the master, managing the path, the search for neighbors and the conflicts, while all other processors, the slaves, devote their calculation power to the simulation itself. If nCPU processors are available, the processor 0 is the master and processors 1 to nCPU-1 are the slaves. The most obvious strategy would be to group nCPU-1 nodes and distribute them among slave processors. Unfortunately, this strategy is not efficient with a

Conflicts management

Let Usim be the ensemble of all nodes currently simulated by all slave processors (ensemble containing nCPU-1 elements). When a new node x has to be simulated, conflicts arise when N(x)∩Usim≠Ø, i.e. when at least one node currently simulated by any slave processor belongs to the neighborhood of node x.

When a conflict arises, one can consider three ways of dealing with it. The first one is to ignore the conflict. This option is worth considering if the simulation grid is large and the number of

Performance tests

The parallelization strategy described above has been tested on the Direct Sampling algorithm (Mariethoz, 2009, Mariethoz and Renard, 2010), a recent implementation of Multiple-points simulation (Guardiano and Srivastava, 1993, Strebelle, 2002, Hu and Chugunova, 2008). For each node x in the simulation grid, the algorithm scans a training image (TI) representing what the simulated field should look like independently of the data. As soon as a pattern matching the neighborhood of x in the

Conclusion

Among the three simulation parallelization levels (realization, path and node), the path level allows using a large number of processors without losing efficiency. It is applicable to all random path based geostatistical simulation methods, even if a small number of simulations have to be generated (contrarily to the realization level) In this paper, we propose a strategy that takes full profit of this parallelization level.

The overall performance of the strategy is good according to the

Acknowledgments

We acknowledge the Swiss National Science Foundation (Grant PP002-1065557) and the Swiss Confederation’s Innovation Promotion Agency (CTI Project 8836.1 PFES-ES) for funding this work. We also thank Philippe Renard and Julien Straubhaar for their constructive comments.

References (31)

  • M. Armstrong et al.

    Massively parallel strategies for local spatial interpolation

    Computers & Geosciences

    (1997)
  • C. Deutsch et al.

    FLUVSIM: a program for object-based stochastic modeling of fluvial depositional systems

    Computers & Geosciences

    (2002)
  • G. Amdahl

    Validity of the single processor approach to achieving large-scale computing capabilities

    Proceedings AFIPS Conference Proceedings

    (1967)
  • M. Armstrong et al.

    Plurigaussian Simulations in Geosciences

    (2003)
  • B. Arpat et al.

    Conditional simulations with patterns

    Mathematical Geology

    (2007)
  • Caers, J., 2005. Petroleum Geostatistics, SPE Interdisciplinary Primer Series, Society of Petroleum Engineers,...
  • S.F. Carle et al.

    Modeling spatial variability with one and multi-dimensional continuous Markov chains

    Mathematical Geology

    (1997)
  • C. Daly

    Higher order models using entropy, Markov random fields and sequential simulation

  • C. Deutsch et al.

    GSLIB: Geostatistical Software Library

    (1992)
  • R. Dimitrakopoulos et al.

    Generalized sequential Gaussian simulation on group size ν and screen-effect approximations for large field simulations

    Mathematical Geology

    (2004)
  • S. Geman et al.

    Stochastic relaxation, Gibbs distribution and the Bayesian restoration of images

    IEEE Transactions on Pattern Analysis and Matching Intelligence

    (1984)
  • F. Guardiano et al.

    Multivariate geostatistics: beyond bivariate moments

    Geostatistics-Troia

    (1993)
  • L. Hu et al.

    Multiple-point geostatistics for modeling subsurface heterogeneity: a comprehensive review

    Water Resources Research

    (2008)
  • Ingam, B., Cornford, D., 2008. Parallel geostatistics for sparse and dense datasets. In: Proceedings of geoENV VII...
  • E. Isaaks

    Indicator simulation: application to the simulation of a high grade uranium mineralization

  • Cited by (0)

    View full text