A general parallelization strategy for random path based geostatistical simulation methods
Introduction
The size of the simulation grids used for geological models (and more generally for spatial statistics) has increased by many orders of magnitude in the last years. This trend is likely to continue because the only way of modeling different scales together is to use high-resolution models. This is of utmost importance in applications such as hydrogeology, petroleum and mining, due to the critical influence of small scale heterogeneity on large scale processes (e.g. Mariethoz et al., 2009a).
Efficient pixel-based geostatistical simulation algorithms have been developed, but for very large grids and complex spatial models, the computational burden remains heavy. Furthermore, with increasingly sophisticated simulation techniques including complex spatial constraints, the computational cost for simulating one grid node has also raised. As multicore processors and clusters of computers become more and more available, using parallel strategies is necessary for increasing the usable grid size and hence allowing for models of higher complexity.
Parallel computers can be divided into two main categories: shared memory machines and distributed memory architectures. Shared memory machines have the advantage of ease and rapidity of the communications between the different computing units. Nevertheless, their price is extremely high and the total amount of memory as well as the total number of processors are limited. Therefore, most of the time it is distributed memory machines (or clusters computers) that are used in the industry or in the academic world. As such machines do not have a common shared memory space, the processors have to communicate by sending and receiving messages. The communication time between processors can be important and is often the bottleneck in a program execution.
In this paper, we propose a parallelization strategy applicable in the context of sequential simulation methods and based on the distribution of the grid nodes among all available processors. The method minimizes communication and latency times and can be applied using shared or distributed memory architectures, or a combination of both. It consists in centralizing the simulation on a master processor that calls other slave processors as if they were functions simulating one node each time. The key is to decouple the sending and the receiving operations to avoid waiting for synchronization. Centralization allows having a conflict management system making sure that nodes being simulated simultaneously do not interfere in terms of neighborhood.
The strategy is computationally efficient and is versatile enough to be applicable to all random path based simulation methods. It is illustrated with an example using the Direct Sampling approach (Mariethoz, 2009; Mariethoz and Renard, 2010), which is a simulation algorithm using multiple-points (MP) statistics.
Section snippets
Parallelizing sequential simulations
Sequential simulation is a class of methods that is used to generate realizations of a random field (Deutsch and Journel, 1992, Caers, 2005, Remy et al., 2009). The general principle of the method is to discretize the random field on a grid and to draw successively (sequentially) for each node x of the grid an outcome of the random variable Z in a local cumulative conditional density function (ccdf). This local ccdf is conditional to the previously simulated nodes and to local data if those are
Nodes distribution
The solution proposed in this paper is to have one processor, the master, managing the path, the search for neighbors and the conflicts, while all other processors, the slaves, devote their calculation power to the simulation itself. If nCPU processors are available, the processor 0 is the master and processors 1 to nCPU-1 are the slaves. The most obvious strategy would be to group nCPU-1 nodes and distribute them among slave processors. Unfortunately, this strategy is not efficient with a
Conflicts management
Let Usim be the ensemble of all nodes currently simulated by all slave processors (ensemble containing nCPU-1 elements). When a new node x has to be simulated, conflicts arise when N(x)∩Usim≠Ø, i.e. when at least one node currently simulated by any slave processor belongs to the neighborhood of node x.
When a conflict arises, one can consider three ways of dealing with it. The first one is to ignore the conflict. This option is worth considering if the simulation grid is large and the number of
Performance tests
The parallelization strategy described above has been tested on the Direct Sampling algorithm (Mariethoz, 2009, Mariethoz and Renard, 2010), a recent implementation of Multiple-points simulation (Guardiano and Srivastava, 1993, Strebelle, 2002, Hu and Chugunova, 2008). For each node x in the simulation grid, the algorithm scans a training image (TI) representing what the simulated field should look like independently of the data. As soon as a pattern matching the neighborhood of x in the
Conclusion
Among the three simulation parallelization levels (realization, path and node), the path level allows using a large number of processors without losing efficiency. It is applicable to all random path based geostatistical simulation methods, even if a small number of simulations have to be generated (contrarily to the realization level) In this paper, we propose a strategy that takes full profit of this parallelization level.
The overall performance of the strategy is good according to the
Acknowledgments
We acknowledge the Swiss National Science Foundation (Grant PP002-1065557) and the Swiss Confederation’s Innovation Promotion Agency (CTI Project 8836.1 PFES-ES) for funding this work. We also thank Philippe Renard and Julien Straubhaar for their constructive comments.
References (31)
- et al.
Massively parallel strategies for local spatial interpolation
Computers & Geosciences
(1997) - et al.
FLUVSIM: a program for object-based stochastic modeling of fluvial depositional systems
Computers & Geosciences
(2002) Validity of the single processor approach to achieving large-scale computing capabilities
Proceedings AFIPS Conference Proceedings
(1967)- et al.
Plurigaussian Simulations in Geosciences
(2003) - et al.
Conditional simulations with patterns
Mathematical Geology
(2007) - Caers, J., 2005. Petroleum Geostatistics, SPE Interdisciplinary Primer Series, Society of Petroleum Engineers,...
- et al.
Modeling spatial variability with one and multi-dimensional continuous Markov chains
Mathematical Geology
(1997) Higher order models using entropy, Markov random fields and sequential simulation
- et al.
GSLIB: Geostatistical Software Library
(1992) - et al.
Generalized sequential Gaussian simulation on group size ν and screen-effect approximations for large field simulations
Mathematical Geology
(2004)