Abstract
Data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo algorithm. In this paper, an alternative to DA algorithm is proposed. It is shown that the modified Markov chain is always more efficient than DA in the sense that the asymptotic variance in the central limit theorem under the alternative chain is no larger than that under DA. The modification is based on Peskun’s (Biometrika 60:607–612, 1973) result which shows that asymptotic variance of time average estimators based on a finite state space reversible Markov chain does not increase if the Markov chain is altered by increasing all off-diagonal probabilities. In the special case when the state space or the augmentation space of the DA chain is finite, it is shown that Liu’s (Biometrika 83:681–682, 1996) modified sampler can be used to improve upon the DA algorithm. Two illustrative examples, namely the beta-binomial distribution, and a model for analyzing rank data are used to show the gains in efficiency by the proposed algorithms.
Similar content being viewed by others
References
Brémaud P (1999) Markov chains Gibbs fields, Monte Carlo simulation, and queues. Springer, New York
Brook D, Upton GJG (1974) Biases in local government elections due to position on the ballot paper. Appl Stat 23:414–419
Casella G, George E (1992) Explaining the Gibbs sampler. Am Stat 46:167–174
Hobert JP (2011) The data augmentation algorithm: theory and methodology. In: Brooks S, Gelman A, Jones GL, Meng X-L (eds) Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton, pp 253–293
Hobert JP, Marchev D (2008) A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann Stat 36:532–554
Hobert JP, Roy V, Robert CP (2011) Improving the convergence properties of the data augmentation algorithm with an application to Bayesian mixture modelling. Stat Sci 26:332–351
Jones GL (2004) On the Markov chain central limit theorem. Probab Surv 1:299–320
Jones GL, Roberts GO, Rosenthal J (2014) Convergence of conditional Metropolis–Hastings samplers. Adv Appl Probab 46:422–445
Khare K, Hobert JP (2011) A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants. Ann Stat 39:2585–2606
Kipnis C, Varadhan SRS (1986) Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Commun Math Phys 104:1–19
Laha A, Dutta S, Roy V (2013) A novel sandwich algorithm for empirical Bayes analysis of rank data. Tech. rep. Indian Institute of management, Ahmedabad
Laha A, Dongaonkar S (2009) Bayesian analysis of rank data using SIR. In: Sengupta A (ed) Advances in multivariate statistical methods. World Scientific Publishers, Singapore, pp 327–335
Liu JS (1996) Peskun’s theorem and a modified discrete-state Gibbs sampler. Biometrika 83:681–682
Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to comparisons of estimators and augmentation schemes. Biometrika 81:27–40
Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94:1264–1274
Meng X-L, van Dyk DA (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86:301–320
Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, London
Mira A, Geyer CJ (1999) Ordering Monte Carlo Markov chains. Tech. rep. no. 632. School of Statistics, University of Minnesota
Peskun PH (1973) Optimum Monte Carlo sampling using Markov chains. Biometrika 60:607–612
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0, http://www.R-project.org
Roberts GO, Rosenthal JS (2004) General state space Markov chains and MCMC algorithms. Probab Surv 1:20–71
Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98:169–177
Roy V (2014) Efficient estimation of the link function parameter in a robust Bayesian binary regression model. Comput Stat Data Anal 73:87–102
Rudin W (1991) Functional analysis, 2nd edn. McGraw-Hill, New York
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation (with discussion). J Am Stat Assoc 82:528–550
Tierney L (1998) A note on Metropolis–Hastings kernels for general state spaces. Ann Appl Probab 8:1–9
van Dyk DA, Meng X-L (2001) The art of data augmentation (with discussion). J Comput Graph Stat 10:1–50
Acknowledgments
The author thanks Radford Neal for suggesting the algorithm presented in Sect. 3.2.1 for the case \(f_{X|Y}(x |y) > (\sqrt{5}-1)/2\). The author also thanks two reviewers for their suggestions that have improved the paper.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendices
1.1 Appendix A: Proof of Proposition 1
Proof
To show \(f_X (x)\) is invariant for \(\tilde{K}\), note that
where the third equality follows from (4).
Next assume that \(k_y\) is reversible with respect to \(f_{X|Y} (x|y)\), that is, (5) holds. Then
that is, \(\tilde{k}\) is is reversible with respect to \(f_{X}\).
Since (6) is in force, for all \(A \in \mathbb {B}({\mathsf {X}})\) and for \(f_X\) almost all \(x \in {\mathsf {X}}\) we have
that is, \(\tilde{K} \succeq _P K\). \(\square \)
1.2 Appendix B: The Mtm \(K_{MDA}\) when \(p=2, c=2\)
We order the points in the state space as follows: \((\zeta _1,\zeta _1)\), \((\zeta _1,\zeta _2)\), \((\zeta _2,\zeta _1)\), and \((\zeta _2,\zeta _2)\). We denote the entries of \(K_{MDA}\) by \(\tilde{k}_{ij}\). So, for example, the element \(\tilde{k}_{23}\) is the probability of moving from \((\zeta _1,\zeta _2)\) to \((\zeta _2,\zeta _1)\). In order to write down the expressions for \(\tilde{k}_{ij}\) we need to introduce some notations. Recall that \(m_{ij}\) denotes the number of observations in the jth category with rank \(\zeta _i\) for \(i,j=1,2\). Let \(m_{i.}= m_{i1} + m_{i2}\) for \(i=1,2\), \(m_{d}= m_{11} + m_{22}\), and \(m_{od}= m_{12} + m_{21}\). Finally, for fixed \(w \in (0,1)\), let
and \(c=1/\hbox {B}(m_{1.} + a_1, m_{2.} + a_2)\). Recall that \(a_1, a_2\) are the hyper parameters of the prior of \(\theta \). Below, we provide the the expressions for \(\tilde{k}_{1j}\), for \(j=1,\dots ,4\). The other rows of \(K_{MDA}\) can be found similarly. From Sect. 3.2 we know that
Straightforward calculations show that if \(m_{12} \ge m_{22}\) then
On the other hand, if \(m_{12} < m_{22}\) then
Simple calculations show that if \(m_{12} \ge m_{22}\), then
In the case of \(m_{12} < m_{22}\), the range of integration in the above two terms are interchanged. Similarly, we find that the expression for \(\tilde{k}_{13}\) depends on whether \(m_{11} \ge m_{21}\) or \(m_{11} < m_{21}\). If \(m_{11} \ge m_{21}\),
and the ranges of integration in the above two terms are interchanged when \(m_{11} < m_{21}\). Lastly, if \(m_{1.} \ge m_{2.}\),
where \(m = m_{1.} + m_{2.}\) is the number of observations and as before the ranges of integration are interchanged when \(m_{1.} < m_{2.}\). Finally, \(\tilde{k}_{11}\) is set to \(1-\sum _{j=2}^4 \tilde{k}_{1j}\).
Rights and permissions
About this article
Cite this article
Roy, V. Improving efficiency of data augmentation algorithms using Peskun’s theorem. Comput Stat 31, 709–728 (2016). https://doi.org/10.1007/s00180-015-0632-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0632-4