Abstract
Parallelizable Markov chain Monte Carlo (MCMC) generates multiple proposals and parallelizes the evaluations of the likelihood function on different cores at each MCMC iteration. Inspired by Calderhead (Proc Natl Acad Sci 111(49):17408–17413, 2014), we introduce a general ‘waste-recycling’ framework for parallelizable MCMC, under which we show that using weighted samples from waste-recycling is preferable to resampling in terms of both statistical and computational efficiencies. We also provide a simple-to-use criteria, the generalized effective sample size, for evaluating efficiencies of parallelizable MCMC algorithms, which applies to both the waste-recycling and the vanilla versions. A moment estimator of the generalized effective sample size is provided and shown to be reasonably accurate by simulations.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Andrews, D.W.K.: Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59(3), 817–858 (1991)
Calderhead, B.: A general construction for parallelizing Metropolis-Hastings algorithms. Proc. Natl. Acad. Sci. 111(49), 17408–17413 (2014)
Chen, L.Y., Qin, Z., Liu, J.S.: Exploring hybrid Monte Carlo in Bayesian computation. Bayesian methods: with applications to science, policy and official statistics. In: Selected Papers from ISBA 2000, pp. 71–80 (2001)
Delmas, J.F., Jourdain, B.: Does waste recycling really improve the multi-proposal Metropolis–Hastings algorithm? An analysis based on control variates. J. Appl. Probab. 46(4), 938–959 (2009)
Douc, R., Robert, C.P.: A vanilla Rao–Blackwellization of Metropolis–Hastings algorithms. Ann. Stat. 39(1), 261–277 (2011)
Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D.: Hybrid Monte Carlo. Phys. Lett. B 195(2), 216–222 (1987)
Frenkel, D.: Speed-up of Monte Carlo simulations by sampling of rejected states. Proc. Natl. Acad. Sci. 101(51), 17571–17575 (2004)
Frenkel, D.: Waste-recycling Monte Carlo. In: Ferrario, M., Ciccotti, G., Binder, K. (eds.) Computer Simulations in Condensed Matter Systems: From Materials to Chemical Biology, vol. 1. Lecture Notes in Physics, vol. 703, pp. 127–137. Springer, Berlin. doi:10.1007/3-540-35273-2_4 (2006)
Gelman, A., Meng, X.: A note on bivariate distributions that are conditionally normal. Am. Stat. 45(2), 125–126 (1991)
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., Rubin, D.: Bayesian Data Analysis, 3rd edn. Chapman and Hall, Boca Raton (2013)
Hastings, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Kass, R.E., Carlin, B.P., Gelman, A., Neal, R.M.: Markov chain Monte Carlo in practice: a roundtable discussion. Am. Stat. 52(2), 93–100 (1998)
Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Berlin (2001)
Liu, J.S., Liang, F., Wong, W.H.: The multiple-try method and local optimization in Metropolis sampling. J. Am. Stat. Assoc. 95(449), 121–134 (2000)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Müller, U.K.: HAC corrections for strongly autocorrelated time series. J. Bus. Econ. Stat. 32, 311–322 (2014)
Neal, R.M.: An improved acceptance procedure for the hybrid Monte Carlo algorithm. J. Comput. Phys. 111, 194–203 (1994)
Qin, Z.S., Liu, J.S.: Multipoint metropolis method with application to hybrid Monte Carlo. J. Comput. Phys. 172, 827–840 (2001)
Ritter, C., Tanner, M.A.: Facilitating the Gibbs sampler: the gibbs stopper and the Griddy–Gibbs sampler. J. Am. Stat. Assoc. 87(419), 861–868 (1992)
Roberts, G.O., Tweedie, R.L.: Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2(4), 341–363 (1996)
Tjelmeland, H.: Using all Metropolis–Hastings proposals to estimate mean values. Technical Reports, Norwegian University of Science and Technology. Trondheim, Norway (2004)
Acknowledgements
JSL’s research was supported in part by the National Science Foundation grant DMS-1613035.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
A Proofs and derivations
1.1 A.1 Proof of Theorem 1
Proof of Theorem 1
For notational simplicity, in this proof we use \(\varvec{x}\) and \(d\varvec{x}\) to represent \((x_0,\ldots ,x_M)\) and \(dx_0\ldots dx_M\), respectively.
Suppose \(x_0 \sim \pi \) and we attach to \(x_0\) a randomly sampled vector \((x_1,\ldots ,x_M)\) drawn from an arbitrary Markov transition kernel \(K(x_0, \cdot )\). With our proposed weights from pMCMC acceptance probability \(r(x_0, x_i) = w(x_i)\) defined in Eqs. (1) or (2), it is apparent that the detailed balance condition holds, i.e.
is satisfied with both Eqs. (1) and (2). Then, for any integrable \(h(\cdot )\), we can express the mean of the weighted average estimate as:
where (*) is by Eq. (13), and (**) is by swapping dummy integrand index notation \(x_0\) and \(x_i\) in first part of the equation, and last equation by noticing \(\sum _{i=1}^Mr(x_0, x_i)+w(x_0) \equiv 1\) with both Eqs. (1) and (2). This proves that the weights proposed give unbiased posterior estimates when the Markov chain is in equilibrium. \(\square \)
1.2 A.2 Proof of Theorem 2
Proof of Theorem 2
We first explain the Rao–Blackwellization of the LWPMCMC as opposed to pMCMC and then prove the variance reduction.
The Rao–Blackwellization is with respect to the following hypothetical one-step MCMC move using Calderhead (2014) pMCMC rule in Algorithm 1. From \(x_0^{(n)}\) to a hypothetical new point \(x_*^{(n)}\):
-
Generate \(x_1^{(n)}, \ldots , x_M^{(n)}\) from proposing kernel \(K\left( x_0^{(n)}, \cdot \right) \),
-
Calculate acceptance probability \(r\left( x_0^{(n)}, x_i^{(n)} \right) \) for the i-th proposal from \(x_0^{(n)}\) as \(r\left( x_0^{(n)}, x_i^{(n)}\right) =w\left( x_i^{(n)}\right) \) using Eqs. (1) or (2),
-
Generate a uniform random variable \(U^{(n)} \sim \text {Unif}[0,1]\),
-
Set \(x_*^{(n)} = x_i^{(n)}\) if and only if \(\sum \nolimits _{k< i} w\left( x_k^{(n)}\right)< U^{(n)} < \sum \nolimits _{k \le i} w\left( x_k^{(n)}\right) \).
Here, the new hypothetical new point \(x_*^{(n)}\) is in fact \(y^{(n)}_1\) in Algorithm 1 when \(N=1\). Equation (13) shows that the detailed balance condition of this one-step Markov chain is satisfied for the invariant distribution \(\pi (\cdot )\).
Due to the construction of transition kernel T, \(x_0^{(n)}\sim \pi (\cdot )\) while in equilibrium, and therefore \(x_*^{(n)}\) also follows \(\pi (\cdot )\). We can also write
Therefore, for any integrable function h( ),
The last equation demonstrates that the Rao–Blackwellization of \(U^{(n)}\) is with respect to the hypothetical one-step Markov move from \(x_0^{(n)}\) to \(x_*^{(n)}\), not with respect to the original MCMC propagation kernel T that governs the move from \(x_0^{(n)}\) to \(x_0^{(n+1)}\). Therefore, the propagation (transition) kernel does not need to be the same as that used for the weighting scheme for the Rao–Blackwellization to hold.
We next prove the variance reduction. Algorithm 1 can be thought as repeated sample \(U^{(j)}_i\) in each iteration j and collect points \(y^{(j)}_i\) with each sampled \(U^{(j)}_i\) for \(i=1,\ldots ,N\). Let \(x^{(j)} = \left\{ x^{(j)}_0,\dots ,x^{(j)}_M \right\} \) and \(y^{(j)} = \left\{ y^{(j)}_1,\dots ,y^{(j)}_N \right\} \).
By the law of total variance, for the first of these terms we have
For the second, we will show that
Assume \(j<k\) without loss of generality. By the law of total covariance, we then have
where the last equality follows from the conditional independence structure of \(y^{(j)}\) from all the other samples and proposals given \(x^{(j)}\). Summarizing these results, we can conclude that
\(\square \)
1.3 A.3 Derivation of generalized ESS
We shall derive formula (9) here. Recall \(\mathrm{{ESS}}_{lwp}\) is defined as \(\mathrm{{ESS}}_{lwp}= \frac{\sigma ^2}{\mathrm {var}(\hat{\mu }^{lwp})}\) in main text. Therefore,
where the second equality follows from stationarity. Use the Cesàro summability theorem
and rearrange the terms, we have the desired result:
for sufficiently large n.
1.4 A.4 Derivation of efficiency gain
From the derivations in proof of the theorem in Appendix, we have
Therefore, the relative efficiency gain by using locally weighted parallel Markov chain Monte Carlo instead of parallel Markov chain Monte Carlo is
where last equality is true since at equilibrium, for \(j=1,\ldots ,n\), and
Thus, we can obtain moment estimators of \(E({h^2}), E\{(\bar{h})^2\}, \mathrm {var}(\bar{h})\) using all the proposals and weights.
B Supplementary materials
Please refer to online supplementary materials for more examples of LWPMCMC algorithms and numerical results.
Rights and permissions
About this article
Cite this article
Yang, S., Chen, Y., Bernton, E. et al. On parallelizable Markov chain Monte Carlo algorithms with waste-recycling. Stat Comput 28, 1073–1081 (2018). https://doi.org/10.1007/s11222-017-9780-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-017-9780-4