Distributed evolutionary Monte Carlo for Bayesian computing

https://doi.org/10.1016/j.csda.2008.10.025Get rights and content

Abstract

Sampling from a multimodal and high-dimensional target distribution posits a great challenge in Bayesian analysis. A new Markov chain Monte Carlo algorithm Distributed Evolutionary Monte Carlo (DGMC) is proposed for real-valued problems, which combines the attractive features of the distributed genetic algorithm and the Markov chain Monte Carlo. The DGMC algorithm evolves a population of Markov chains through some genetic operators to simulate the target function. Theoretical justification proves that the DGMC algorithm has the target function as its stationary distribution. The effectiveness of the DGMC algorithm is illustrated by simulating two multimodal distributions and an application to a real data example.

Introduction

In recent decades Markov chain Monte Carlo (MCMC) methods have been used as indispensable tools to simulate complex and intractable functions in a variety of scientific fields. The fundamental algorithm was proposed by Metropolis et al. (1953) and modified by Hastings (1970), which applies a transition kernel to drive a Markov chain to explore the target function. However, the Metropolis–Hastings (MH) algorithm might perform poorly when the target function is high dimensional or has many modes separated by high barriers, and can be easily trapped at a local mode. In the literature, many alternative algorithms have been proposed to overcome these problems, including parallel tempering (Geyer, 1991), Exchange Monte Carlo (Hukushima and Nemeto, 1996), Dynamic Weighting (Wong and Liang, 1997) and others.

Recently, investigators ventured into the population-based MCMC methods (Laskey and Myers, 2003), which evolve multiple Markov chains simultaneously and update each chain by borrowing information from others. Adaptive direction sampling (ADS) algorithm (Gilks et al., 1994) is an early example of the population-based method. The ADS algorithm updates one member of the current population by a line sampling along a direction toward another member. Liu et al. (2000) proposed a conjugate gradient Monte Carlo algorithm that modifies the ADS algorithm by forcing the sampling direction toward a local mode. To exchange information more efficiently, some modern optimization techniques have been introduced into the statistical literature. Liang and Wong, 2000, Liang and Wong, 2001 proposed an Evolutionary Monte Carlo (EMC) algorithm that is based on the powerful genetic algorithm (Holland, 1975, Goldberg, 1989). The Differential-Evolutionary Monte Carlo (DE-MC) algorithm proposed by Ter Braak (2006) relies on its counterpart, the differential evolution algorithm (Storn and Price, 1997). In these two algorithms, a population of Markov chains are updated in parallel by evolution operators such as mutation and crossover. In the EMC algorithm, the Markov chains can also embed a temperature ladder to incorporate the features of parallel tempering.

In this paper, we propose a new evolutionary Monte Carlo algorithm for real-valued problems, called Distributed Evolutionary Monte Carlo (DGMC), which originates from the distributed genetic algorithm (Tanese, 1989). The distributed genetic algorithm (DGA) divides the whole population into a set of subpopulations and runs the genetic algorithm on each subpopulation. A major advantage of the DGA algorithm is that it allows information exchange between subpopulations through migration. Periodically some members are selected from a subpopulation and migrate to a different subpopulation. The DGA algorithm improves the performance of the single-population genetic algorithm by preventing premature convergence that often occurs in practice (Ryan, 1996). Our DGMC algorithm incorporates the DGA algorithm into the MCMC framework, which drives the Markov chains through three genetic operators: mutation, crossover and migration. Table 1 compares the features of several popular population MCMC algorithms. We prove that the DGMC algorithm has the target function as its stationary distribution. The effectiveness of the DGMC algorithm is demonstrated by applications to two multimodal distributions and a real data example.

The remainder of the paper is organized as follows. Section 2 reviews the distributed genetic algorithm. In Section 3, we describe the DGMC algorithm and prove its stationary properties. The proof is deferred to the Appendix. Section 4 illustrates the use of the DGMC algorithm through three numerical examples. Section 5 concludes the paper with a brief discussion.

Section snippets

Distributed genetic algorithm

The genetic algorithm emulates the natural evolution process and works as a powerful optimization technique by leading a population of potential solutions to a global optimum. Suppose that π(x):RdR is the target function, which is referred to as a fitness function. A potential solution x=(β1,,βd) is called a chromosome, where βi is called the gene at locus i. The value of βi is called a genotype. To seek the optima, the genetic algorithm updates a population of chromosomes iteratively by

Distributed evolutionary Monte Carlo

Suppose that we have a population of N=mk Markov chains that are divided equally into k subpopulations. Let xi={x1i,,xmi} denote the current samples (chromosomes) of subpopulation i, where xji is a d-dimensional vector for j=1,,m. Denote X={x1,,xk} the samples of the whole population. In the following three sections, we define, respectively, three genetic operators (mutation, crossover and migration) for the DGMC algorithm. Furthermore, we show that πN()=π()××π() is invariant with

A bimodal example

We first consider an example about simulating from a mixture of two five-dimensional normal distributions π(x)=13N(5,I5)+23N(5,I5), where 5=(5,5,5,5,5)T and I5 is a 5×5 identity matrix. The example was also considered in Ter Braak (2006). The distance between the two modes of the target distribution is 105=22.36, which puts a great challenge to the traditional MH algorithm.

To apply the DGMC algorithm, we used two subpopulations and each subpopulation had five chains. The migration rate pm was

Conclusion and discussion

The DGMC algorithm has the Markov states augmented as a population instead of a single chain, which falls in the generic framework of the population Markov Chain Monte Carlo. The DGMC algorithm consists of three genetic operators: mutation, crossover and migration. The mutation operator is usually efficient in exploring the local modes. The crossover operator is able to help the chains travel through the regions with very low probabilities and is thus more powerful to explore the global space.

Acknowledgments

The authors thank the associate editor and the two reviewers for their helpful comments on improving the manuscript. This research of Kam-Wah Tsui was supported in part by NSF grant DMS-0604931.

References (23)

  • L.J. Eshelman et al.
  • S. Gelman et al.
  • C.J. Geyer

    Markov Chain Monte Carlo maximum likelihood

  • W.R. Gilks et al.

    Adaptive direction sampling

    The Statistician

    (1994)
  • Goldberg, D.E., Segrest, P., 1987. Genetic algorithms with sharing for multimodal function optimization. in: Proc. 2nd...
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization, & Machine Learning

    (1989)
  • W.K. Hastings

    Monte Carlo sampling methods using Markov Chains and their applications

    Biometrika

    (1970)
  • S. Huet et al.

    Statistical Tools for Nonlinear Regression

    (2004)
  • K. Hukushima et al.

    Exchange Monte Carlo method and application to spin class simulations

    J. Phys. Soc Japan.

    (1996)
  • J.H. Holland

    Adaption in Natural and Artificial Systems

    (1975)
  • K.B. Laskey et al.

    Population Markov Chain Monte Carlo

    Machine Learning

    (2003)
  • Cited by (15)

    • Bayesian variable selection in non-homogeneous hidden Markov models through an evolutionary Monte Carlo method

      2020, Computational Statistics and Data Analysis
      Citation Excerpt :

      EMC can be included in the class of population MCMC (PMCMC), as defined by Blackmond-Laskey and Myers (2003), who compared the performance of random walk Metropolis, genetic, and PMCMC algorithms. Other significant contributions to EMC and PMCMC methods are as follows: EMC was generalized by Liang (2005) to sample from a distribution defined in variable dimensional spaces, with specific mutation, crossover, exchange operators designed for Bayesian neural networks; the genetic algorithm called Differential Evolution was combined with PMCMC by Ter Braak (2006); four moves for the EMC were developed by Goswani and Liu (2007), who also introduced a strategy for designing the temperatures; a review of PMCMC methods was presented by Jasra et al. (2007a), who subsequently proposed a trans-dimensional PMCMC (Jasra et al., 2007b); PMCMC methods were used in model choice by Friel and Pettitt (2008) and Calderhead and Girolami (2009) to compute the marginal likelihood via power posteriors; a stereo matching method using PMCMC was designed by Kim et al. (2009); a PMCMC called Distributed EMC was introduced by Hu and Tsui (2010); a sampling algorithm based upon EMC for variable selection in linear models when a large number of covariates are available was proposed by Bottolo and Richardson (2010); PMCMC algorithms to generate multiple history matched models in reservoir modelling were presented by Mohamed et al. (2012); and four gradient-free MCMC samplers (including PMCMC) applied to dynamical causal models in neuroimaging were compared by Sengupta et al. (2015). The plan of the paper is as follows.

    • Parallel hierarchical sampling: A general-purpose interacting Markov chains Monte Carlo algorithm

      2012, Computational Statistics and Data Analysis
      Citation Excerpt :

      Second, PHS is used to approximate marginal posterior inferences for the structure of a treed survival model. Despite the added complexity of these model uncertainty samplers, the PHS samplers presented can be implemented sequentially on single workstations as well as in distributed computing environments using standard message-passing software (Ren and Orkoulas, 2007; Hu and Tsui, 2010). Section 6 includes a critical discussion of PHS and of some research directions in the field of multiple chains MCMC.

    • Evolutionary Markov chain Monte Carlo algorithms for optimal monitoring network designs

      2012, Statistical Methodology
      Citation Excerpt :

      The algorithm is also equally applicable to other combinatorial optimization problems and in general to problems whose search space can be represented as a collection of binary sequences. The idea of simulating from the expected utility surface through an artificial distribution may also be coupled into other recently developed evolutionary Monte Carlo schemes such as [30,10] for optimization in discrete spaces, and [22,17,31,32] for optimization in continuous spaces. This is subject of future research.

    • A comprehensive Bayesian approach for model updating and quantification of modeling errors

      2011, Probabilistic Engineering Mechanics
      Citation Excerpt :

      The population is updated by mutation (Metropolis update in a single chain), crossover (partial states swapping between different chains) and exchange operators (full state swapping between adjacent chains), which is very similar to the genetic algorithm. This is why it is called an evolutionary MCMC [39–41]. Here we give a quick view to understand how the evolutionary MCMC can help to overcome the difficulties encountered in model updating.

    View all citing articles on Scopus
    View full text