Skip to main content

Advertisement

Log in

On parallelizable Markov chain Monte Carlo algorithms with waste-recycling

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Parallelizable Markov chain Monte Carlo (MCMC) generates multiple proposals and parallelizes the evaluations of the likelihood function on different cores at each MCMC iteration. Inspired by Calderhead (Proc Natl Acad Sci 111(49):17408–17413, 2014), we introduce a general ‘waste-recycling’ framework for parallelizable MCMC, under which we show that using weighted samples from waste-recycling is preferable to resampling in terms of both statistical and computational efficiencies. We also provide a simple-to-use criteria, the generalized effective sample size, for evaluating efficiencies of parallelizable MCMC algorithms, which applies to both the waste-recycling and the vanilla versions. A moment estimator of the generalized effective sample size is provided and shown to be reasonably accurate by simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Andrews, D.W.K.: Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59(3), 817–858 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  • Calderhead, B.: A general construction for parallelizing Metropolis-Hastings algorithms. Proc. Natl. Acad. Sci. 111(49), 17408–17413 (2014)

    Article  Google Scholar 

  • Chen, L.Y., Qin, Z., Liu, J.S.: Exploring hybrid Monte Carlo in Bayesian computation. Bayesian methods: with applications to science, policy and official statistics. In: Selected Papers from ISBA 2000, pp. 71–80 (2001)

  • Delmas, J.F., Jourdain, B.: Does waste recycling really improve the multi-proposal Metropolis–Hastings algorithm? An analysis based on control variates. J. Appl. Probab. 46(4), 938–959 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Douc, R., Robert, C.P.: A vanilla Rao–Blackwellization of Metropolis–Hastings algorithms. Ann. Stat. 39(1), 261–277 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)

    Article  Google Scholar 

  • Frenkel, D.: Speed-up of Monte Carlo simulations by sampling of rejected states. Proc. Natl. Acad. Sci. 101(51), 17571–17575 (2004)

    Article  Google Scholar 

  • Frenkel, D.: Waste-recycling Monte Carlo. In: Ferrario, M., Ciccotti, G., Binder, K. (eds.) Computer Simulations in Condensed Matter Systems: From Materials to Chemical Biology, vol. 1. Lecture Notes in Physics, vol. 703, pp. 127–137. Springer, Berlin. doi:10.1007/3-540-35273-2_4 (2006)

  • Gelman, A., Meng, X.: A note on bivariate distributions that are conditionally normal. Am. Stat. 45(2), 125–126 (1991)

    Google Scholar 

  • Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., Rubin, D.: Bayesian Data Analysis, 3rd edn. Chapman and Hall, Boca Raton (2013)

    MATH  Google Scholar 

  • Hastings, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  • Kass, R.E., Carlin, B.P., Gelman, A., Neal, R.M.: Markov chain Monte Carlo in practice: a roundtable discussion. Am. Stat. 52(2), 93–100 (1998)

    MathSciNet  Google Scholar 

  • Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Berlin (2001)

    MATH  Google Scholar 

  • Liu, J.S., Liang, F., Wong, W.H.: The multiple-try method and local optimization in Metropolis sampling. J. Am. Stat. Assoc. 95(449), 121–134 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)

    Article  Google Scholar 

  • Müller, U.K.: HAC corrections for strongly autocorrelated time series. J. Bus. Econ. Stat. 32, 311–322 (2014)

    Article  MathSciNet  Google Scholar 

  • Neal, R.M.: An improved acceptance procedure for the hybrid Monte Carlo algorithm. J. Comput. Phys. 111, 194–203 (1994)

    Article  MathSciNet  MATH  Google Scholar 

  • Qin, Z.S., Liu, J.S.: Multipoint metropolis method with application to hybrid Monte Carlo. J. Comput. Phys. 172, 827–840 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Ritter, C., Tanner, M.A.: Facilitating the Gibbs sampler: the gibbs stopper and the Griddy–Gibbs sampler. J. Am. Stat. Assoc. 87(419), 861–868 (1992)

    Article  Google Scholar 

  • Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  • Tjelmeland, H.: Using all Metropolis–Hastings proposals to estimate mean values. Technical Reports, Norwegian University of Science and Technology. Trondheim, Norway (2004)

Download references

Acknowledgements

JSL’s research was supported in part by the National Science Foundation grant DMS-1613035.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun S. Liu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2183 KB)

Appendices

A Proofs and derivations

1.1 A.1 Proof of Theorem 1

Proof of Theorem 1

For notational simplicity, in this proof we use \(\varvec{x}\) and \(d\varvec{x}\) to represent \((x_0,\ldots ,x_M)\) and \(dx_0\ldots dx_M\), respectively.

Suppose \(x_0 \sim \pi \) and we attach to \(x_0\) a randomly sampled vector \((x_1,\ldots ,x_M)\) drawn from an arbitrary Markov transition kernel \(K(x_0, \cdot )\). With our proposed weights from pMCMC acceptance probability \(r(x_0, x_i) = w(x_i)\) defined in Eqs. (1) or (2), it is apparent that the detailed balance condition holds, i.e.

$$\begin{aligned} r(x_0, x_i) K(x_0,x_{-0}) \pi (x_0) \equiv r(x_i, x_0) K(x_i,x_{-i}) \pi (x_i)\nonumber \\ \end{aligned}$$
(13)

is satisfied with both Eqs. (1) and (2). Then, for any integrable \(h(\cdot )\), we can express the mean of the weighted average estimate as:

$$\begin{aligned}&\int _{x_0}\ldots \int _{x_M}\left\{ \sum _{i=1}^M h(x_i)r(x_0,x_i) + h(x_0) w(x_0) \right\} \\&\qquad \cdot K(x_0,x_{-0}) \pi (x_0) dx_M\ldots \hbox {d}x_0\\&\quad =\int _{\varvec{x}}\left\{ \sum _{i=1}^M h(x_i)r(x_0,x_i) \right\} \cdot K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\qquad +\int _{\varvec{x}} h(x_0) w(x_0) \cdot K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\quad =\sum _{i=1}^M\int _{\varvec{x}}\left\{ h(x_i)r(x_0,x_i) \right\} \cdot K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\qquad +\int _{\varvec{x}} h(x_0) w(x_0) \cdot K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\overset{(*)}{=}\sum _{i=1}^M\int _{\varvec{x}} h(x_i) r(x_i, x_0) K(x_i,x_{-i}) \pi (x_i) \hbox {d}\varvec{x}\\&\qquad +\int _{\varvec{x}} h(x_0) w(x_0) \cdot K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\overset{(**)}{=}\sum _{i=1}^M\int _{\varvec{x}} h(x_0) r(x_0, x_i) K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\qquad +\int _{\varvec{x}} h(x_0) w(x_0) \cdot K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\quad =\int _{\varvec{x}} h(x_0) \left\{ \sum _{i=1}^Mr(x_0, x_i)+w(x_0)\right\} K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}\\&\quad =\int _{\varvec{x}} h(x_0) K(x_0,x_{-0}) \pi (x_0) \hbox {d}\varvec{x}= E_\pi h , \end{aligned}$$

where (*) is by Eq. (13), and (**) is by swapping dummy integrand index notation \(x_0\) and \(x_i\) in first part of the equation, and last equation by noticing \(\sum _{i=1}^Mr(x_0, x_i)+w(x_0) \equiv 1\) with both Eqs. (1) and (2). This proves that the weights proposed give unbiased posterior estimates when the Markov chain is in equilibrium. \(\square \)

1.2 A.2 Proof of Theorem 2

Proof of Theorem 2

We first explain the Rao–Blackwellization of the LWPMCMC as opposed to pMCMC and then prove the variance reduction.

The Rao–Blackwellization is with respect to the following hypothetical one-step MCMC move using Calderhead (2014) pMCMC rule in Algorithm 1. From \(x_0^{(n)}\) to a hypothetical new point \(x_*^{(n)}\):

  • Generate \(x_1^{(n)}, \ldots , x_M^{(n)}\) from proposing kernel \(K\left( x_0^{(n)}, \cdot \right) \),

  • Calculate acceptance probability \(r\left( x_0^{(n)}, x_i^{(n)} \right) \) for the i-th proposal from \(x_0^{(n)}\) as \(r\left( x_0^{(n)}, x_i^{(n)}\right) =w\left( x_i^{(n)}\right) \) using Eqs. (1) or (2),

  • Generate a uniform random variable \(U^{(n)} \sim \text {Unif}[0,1]\),

  • Set \(x_*^{(n)} = x_i^{(n)}\) if and only if \(\sum \nolimits _{k< i} w\left( x_k^{(n)}\right)< U^{(n)} < \sum \nolimits _{k \le i} w\left( x_k^{(n)}\right) \).

Here, the new hypothetical new point \(x_*^{(n)}\) is in fact \(y^{(n)}_1\) in Algorithm 1 when \(N=1\). Equation (13) shows that the detailed balance condition of this one-step Markov chain is satisfied for the invariant distribution \(\pi (\cdot )\).

Due to the construction of transition kernel T, \(x_0^{(n)}\sim \pi (\cdot )\) while in equilibrium, and therefore \(x_*^{(n)}\) also follows \(\pi (\cdot )\). We can also write

$$\begin{aligned} x^{(n)}_* = \sum _{i=0}^M x_i^{(n)}\mathbb {I}_{\sum _{k< i} w\left( x_k^{(n)}\right)< U^{(n)} < \sum _{k \le i} w\left( x_k^{(n)}\right) } . \end{aligned}$$

Therefore, for any integrable function h( ),

$$\begin{aligned}&E_\pi h = E\left[ h\left( x^{(n)}_*\right) \right] = E\left[ E\left[ h\left( x^{(n)}_*\right) |\varvec{x}^{(n)}\right] \right] \\&E\left[ h\left( x^{(n)}_*\right) \big |\varvec{x}^{(n)}\right] \\&\quad =\int _{u^{(n)}} \left\{ \sum _{i=0}^M h\left( x_i^{(n)}\right) \mathbb {I}_{\sum _{k< i} w\left( x_k^{(n)}\right)< u^{(n)} < \sum _{k \le i} w\left( x_k^{(n)}\right) }\right\} \hbox {d}u^{(n)} \\&\quad = \sum _{i=0}^M h\left( x_i^{(n)}\right) w\left( x_i^{(n)}\right) . \end{aligned}$$

The last equation demonstrates that the Rao–Blackwellization of \(U^{(n)}\) is with respect to the hypothetical one-step Markov move from \(x_0^{(n)}\) to \(x_*^{(n)}\), not with respect to the original MCMC propagation kernel T that governs the move from \(x_0^{(n)}\) to \(x_0^{(n+1)}\). Therefore, the propagation (transition) kernel does not need to be the same as that used for the weighting scheme for the Rao–Blackwellization to hold.

We next prove the variance reduction. Algorithm 1 can be thought as repeated sample \(U^{(j)}_i\) in each iteration j and collect points \(y^{(j)}_i\) with each sampled \(U^{(j)}_i\) for \(i=1,\ldots ,N\). Let \(x^{(j)} = \left\{ x^{(j)}_0,\dots ,x^{(j)}_M \right\} \) and \(y^{(j)} = \left\{ y^{(j)}_1,\dots ,y^{(j)}_N \right\} \).

$$\begin{aligned}&\mathrm {var}\left\{ \frac{1}{nN} \sum _{j=1}^n \sum _{i=1}^N h\left( y^{(j)}_i \right) \right\} = \frac{1}{n^2N^2} \sum _{j=1}^n \mathrm {var}\left\{ \sum _{i=1}^N h\left( y_i^{(j)}\right) \right\} \\&\qquad +\, \frac{2}{n^2N^2}\sum _{\begin{array}{c} j=1\\ k=1\\ j < k \end{array}}^n \mathrm {cov}\left\{ \sum _{i=1}^N h\left( y_i^{(j)}\right) ,\sum _{i=1}^N h\left( y_i^{(k)}\right) \right\} . \end{aligned}$$

By the law of total variance, for the first of these terms we have

$$\begin{aligned}&\frac{1}{n^2N^2}\sum _{j=1}^n \mathrm {var}\left\{ \sum _{i=1}^N h\left( y_i^{(j)}\right) \right\} \\&\quad \ge \frac{1}{n^2N^2}\sum _{j=1}^n \mathrm {var}\left[ E\left\{ \sum _{i=1}^N h\left( y_i^{(j)}\right) \mid x^{(j)} \right\} \right] \\&\quad = \frac{1}{n^2} \sum _{j=1}^n \mathrm {var}\left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) \right\} . \end{aligned}$$

For the second, we will show that

$$\begin{aligned}&\frac{2}{n^2N^2} \mathrm {cov}\left\{ \sum _{i=1}^N h\left( y_i^{(j)} \right) ,\sum _{i=1}^N h\left( y_i^{(k)}\right) \right\} \\&\quad = \frac{2}{n^2} \mathrm {cov}\left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) ,\sum _{i=0}^M w\left( x_i^{(k)}\right) h\left( x_i^{(k)}\right) \right\} ,\quad \\&\qquad (j \ne k). \end{aligned}$$

Assume \(j<k\) without loss of generality. By the law of total covariance, we then have

$$\begin{aligned}&\mathrm {cov}\left\{ \frac{1}{N}\sum _{i=1}^N h\left( y_i^{(j)}\right) ,\frac{1}{N}\sum _{i=1}^N h\left( y_i^{(k)}\right) \right\} \\&\quad = \mathrm {cov}\left[ \frac{1}{N}E\left\{ \sum _{i=1}^N h\left( y_i^{(j)} \right) \mid x^{(j)}, x^{(k)}\right\} ,\right. \\&\quad \quad \left. \frac{1}{N}E\left\{ \sum _{i=1}^N h\left( y_i^{(k)} \right) \mid x^{(j)}, x^{(k)}\right\} \right] \\&\quad \quad + E\left[ \mathrm {cov}\left\{ \frac{1}{N}\sum _{i=1}^N h\left( y_i^{(j)}\right) ,\frac{1}{N}\sum _{i=1}^N h\left( y_i^{(k)}\right) \mid x^{(j)}, x^{(k)}\right\} \right] \\&\quad = \mathrm {cov}\left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) ,\sum _{i=0}^M w\left( x_i^{(k)}\right) h\left( x_i^{(k)}\right) \right\} , \end{aligned}$$

where the last equality follows from the conditional independence structure of \(y^{(j)}\) from all the other samples and proposals given \(x^{(j)}\). Summarizing these results, we can conclude that

$$\begin{aligned}&\mathrm {var}\left\{ \frac{1}{nN} \sum _{j=1}^n \sum _{i=1}^N h\left( y^{(j)}_i\right) \right\} \\&\quad \ge \mathrm {var}\left\{ \frac{1}{n} \sum _{j=1}^n \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x^{(j)}_i\right) \right\} . \end{aligned}$$

\(\square \)

1.3 A.3 Derivation of generalized ESS

We shall derive formula (9) here. Recall \(\mathrm{{ESS}}_{lwp}\) is defined as \(\mathrm{{ESS}}_{lwp}= \frac{\sigma ^2}{\mathrm {var}(\hat{\mu }^{lwp})}\) in main text. Therefore,

$$\begin{aligned} \frac{\sigma ^2}{\mathrm{{ESS}}_{lwp}}&= \mathrm {var}\left\{ \frac{1}{n}\sum _{j=1}^n \bar{x}^{(j)}\right\} \\&= \frac{1}{n^2}\sum _{j=1}^n \mathrm {var}(\bar{x}^{(j)}) +\frac{2}{n^2}\sum _{j<k}^n\mathrm {cov}(\bar{x}^{(j)},\bar{x}^{(k)})\\&= \frac{1}{n} \mathrm {var}(\bar{x}) +\frac{2}{n}\sum _{k=1}^{n-1}\left( 1-\frac{k}{n}\right) \gamma _k \mathrm {var}(\bar{x}) \\&= \frac{\mathrm {var}(\bar{x})}{n}\left\{ 1 + 2\sum _{k=1}^{n-1}\left( 1-\frac{k}{n}\right) \gamma _k \right\} , \end{aligned}$$

where the second equality follows from stationarity. Use the Cesàro summability theorem

$$\begin{aligned} \lim _{n\rightarrow \infty } \sum _{k=1}^{n-1}\left( 1-\frac{k}{n}\right) \gamma _k = \sum _k\gamma _k \end{aligned}$$

and rearrange the terms, we have the desired result:

$$\begin{aligned} \frac{\sigma ^2}{\mathrm {var}(\hat{\mu }^{lwp})} = \frac{n\sigma ^2}{{\mathrm {var}(\bar{x})}\left( 1+2\sum _{k}\gamma _k \right) } \end{aligned}$$

for sufficiently large n.

1.4 A.4 Derivation of efficiency gain

From the derivations in proof of the theorem in Appendix, we have

$$\begin{aligned}&\mathrm {var}\left\{ \frac{1}{nN} \sum _{j=1}^n \sum _{i=1}^N h\left( y^{(j)}_i\right) \right\} \\&\qquad - \mathrm {var}\left\{ \frac{1}{n} \sum _{j=1}^n \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x^{(j)}_i\right) \right\} \\&\quad = \frac{1}{n^2 N^2} \sum _{j=1}^{n} E\left[ \mathrm {var}\left\{ \sum _{i=1}^{N} h\left( y_i^{(j)}\right) \mid x^{(j)}\right\} \right] \\&\quad = \frac{1}{n^2 N} E\left[ \mathrm {var}\left\{ h\left( y_i^{(j)}\right) \mid x^{(j)}\right\} \right] \\&\quad = \frac{1}{n^2 N} \sum _{j=1}^{n} E\left[ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) ^2 \right. \\&\left. \qquad -\, \left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) \right\} ^2 \right] . \end{aligned}$$

Therefore, the relative efficiency gain by using locally weighted parallel Markov chain Monte Carlo instead of parallel Markov chain Monte Carlo is

$$\begin{aligned}&\frac{\mathrm {var}\left\{ \frac{1}{nN} \sum \nolimits _{j=1}^n \sum \nolimits _{i=1}^N h\left( y^{(j)}_i\right) \right\} - \mathrm {var}\left\{ \frac{1}{n} \sum \nolimits _{j=1}^n \sum \nolimits _{i=0}^M w\left( x_i^{(j)}\right) h\left( x^{(j)}_i\right) \right\} }{\mathrm {var}\left\{ \frac{1}{n} \sum \nolimits _{j=1}^n \sum \nolimits _{i=0}^M w\left( x_i^{(j)}\right) h\left( x^{(j)}_i\right) \right\} }\\&= \frac{\sum _{j=1}^{n} E\left[ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) ^2 - \left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) \right\} ^2 \right] }{N\sum _{j=1}^n \mathrm {var}\left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) \right\} } \\&= \frac{ E({h^2}) - E\{(\bar{h})^2\}}{N\ {\mathrm {var}}(\bar{h})}. \end{aligned}$$

where last equality is true since at equilibrium, for \(j=1,\ldots ,n\), and

$$\begin{aligned}&{\mathrm {var}}(\bar{h}) = E \left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) - E({h})\right\} ^2, \\&E({h}) = E \left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) \right\} , \\&E({h^2}) = E \left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) ^2\right\} , \\&E\{(\bar{h})^2\} = E\left\{ \sum _{i=0}^M w\left( x_i^{(j)}\right) h\left( x_i^{(j)}\right) \right\} ^2. \end{aligned}$$

Thus, we can obtain moment estimators of \(E({h^2}), E\{(\bar{h})^2\}, \mathrm {var}(\bar{h})\) using all the proposals and weights.

B Supplementary materials

Please refer to online supplementary materials for more examples of LWPMCMC algorithms and numerical results.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, S., Chen, Y., Bernton, E. et al. On parallelizable Markov chain Monte Carlo algorithms with waste-recycling. Stat Comput 28, 1073–1081 (2018). https://doi.org/10.1007/s11222-017-9780-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-017-9780-4

Keywords

Navigation