Skip to main content
Log in

Improving efficiency of data augmentation algorithms using Peskun’s theorem

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo algorithm. In this paper, an alternative to DA algorithm is proposed. It is shown that the modified Markov chain is always more efficient than DA in the sense that the asymptotic variance in the central limit theorem under the alternative chain is no larger than that under DA. The modification is based on Peskun’s (Biometrika 60:607–612, 1973) result which shows that asymptotic variance of time average estimators based on a finite state space reversible Markov chain does not increase if the Markov chain is altered by increasing all off-diagonal probabilities. In the special case when the state space or the augmentation space of the DA chain is finite, it is shown that Liu’s (Biometrika 83:681–682, 1996) modified sampler can be used to improve upon the DA algorithm. Two illustrative examples, namely the beta-binomial distribution, and a model for analyzing rank data are used to show the gains in efficiency by the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Brémaud P (1999) Markov chains Gibbs fields, Monte Carlo simulation, and queues. Springer, New York

    MATH  Google Scholar 

  • Brook D, Upton GJG (1974) Biases in local government elections due to position on the ballot paper. Appl Stat 23:414–419

    Article  Google Scholar 

  • Casella G, George E (1992) Explaining the Gibbs sampler. Am Stat 46:167–174

    MathSciNet  Google Scholar 

  • Hobert JP (2011) The data augmentation algorithm: theory and methodology. In: Brooks S, Gelman A, Jones GL, Meng X-L (eds) Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton, pp 253–293

  • Hobert JP, Marchev D (2008) A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann Stat 36:532–554

    Article  MathSciNet  MATH  Google Scholar 

  • Hobert JP, Roy V, Robert CP (2011) Improving the convergence properties of the data augmentation algorithm with an application to Bayesian mixture modelling. Stat Sci 26:332–351

    Article  MathSciNet  MATH  Google Scholar 

  • Jones GL (2004) On the Markov chain central limit theorem. Probab Surv 1:299–320

    Article  MathSciNet  MATH  Google Scholar 

  • Jones GL, Roberts GO, Rosenthal J (2014) Convergence of conditional Metropolis–Hastings samplers. Adv Appl Probab 46:422–445

  • Khare K, Hobert JP (2011) A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants. Ann Stat 39:2585–2606

    Article  MathSciNet  MATH  Google Scholar 

  • Kipnis C, Varadhan SRS (1986) Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Commun Math Phys 104:1–19

    Article  MathSciNet  MATH  Google Scholar 

  • Laha A, Dutta S, Roy V (2013) A novel sandwich algorithm for empirical Bayes analysis of rank data. Tech. rep. Indian Institute of management, Ahmedabad

  • Laha A, Dongaonkar S (2009) Bayesian analysis of rank data using SIR. In: Sengupta A (ed) Advances in multivariate statistical methods. World Scientific Publishers, Singapore, pp 327–335

    Chapter  Google Scholar 

  • Liu JS (1996) Peskun’s theorem and a modified discrete-state Gibbs sampler. Biometrika 83:681–682

    Article  MathSciNet  MATH  Google Scholar 

  • Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to comparisons of estimators and augmentation schemes. Biometrika 81:27–40

    Article  MathSciNet  MATH  Google Scholar 

  • Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94:1264–1274

    Article  MathSciNet  MATH  Google Scholar 

  • Meng X-L, van Dyk DA (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86:301–320

    Article  MathSciNet  MATH  Google Scholar 

  • Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, London

    Book  MATH  Google Scholar 

  • Mira A, Geyer CJ (1999) Ordering Monte Carlo Markov chains. Tech. rep. no. 632. School of Statistics, University of Minnesota

  • Peskun PH (1973) Optimum Monte Carlo sampling using Markov chains. Biometrika 60:607–612

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0, http://www.R-project.org

  • Roberts GO, Rosenthal JS (2004) General state space Markov chains and MCMC algorithms. Probab Surv 1:20–71

    Article  MathSciNet  MATH  Google Scholar 

  • Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98:169–177

    Article  MathSciNet  MATH  Google Scholar 

  • Roy V (2014) Efficient estimation of the link function parameter in a robust Bayesian binary regression model. Comput Stat Data Anal 73:87–102

  • Rudin W (1991) Functional analysis, 2nd edn. McGraw-Hill, New York

    MATH  Google Scholar 

  • Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation (with discussion). J Am Stat Assoc 82:528–550

    Article  MathSciNet  MATH  Google Scholar 

  • Tierney L (1998) A note on Metropolis–Hastings kernels for general state spaces. Ann Appl Probab 8:1–9

    Article  MathSciNet  MATH  Google Scholar 

  • van Dyk DA, Meng X-L (2001) The art of data augmentation (with discussion). J Comput Graph Stat 10:1–50

    Article  Google Scholar 

Download references

Acknowledgments

The author thanks Radford Neal for suggesting the algorithm presented in Sect. 3.2.1 for the case \(f_{X|Y}(x |y) > (\sqrt{5}-1)/2\). The author also thanks two reviewers for their suggestions that have improved the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vivekananda Roy.

Appendices

Appendices

1.1 Appendix A: Proof of Proposition 1

Proof

To show \(f_X (x)\) is invariant for \(\tilde{K}\), note that

$$\begin{aligned} \int _{{\mathsf {X}}} \tilde{k}(x' | x) f_X (x) \mu (dx)= & {} \int _{{\mathsf {X}}} \int _{{\mathsf {Y}}} k_y (x'|x) f_{Y | X}(y | x) \nu (dy) f_X (x) \mu (dx)\\= & {} \int _{{\mathsf {Y}}} \int _{{\mathsf {X}}} k_y (x'|x) f_{X | Y}(x | y) \mu (dx) f_Y (y) \nu (dy)\\= & {} \int _{{\mathsf {Y}}} f_{X | Y}(x' | y) f_Y (y) \nu (dy) = f_X(x') \end{aligned}$$

where the third equality follows from (4).

Next assume that \(k_y\) is reversible with respect to \(f_{X|Y} (x|y)\), that is, (5) holds. Then

$$\begin{aligned} \tilde{k}(x' | x) f_X (x)= & {} \int _{{\mathsf {Y}}} k_y (x'|x) f_{Y | X}(y | x) \nu (dy) f_X (x) \\= & {} \int _{{\mathsf {Y}}} k_y (x'|x) f_{X | Y}(x | y) f_Y (y) \nu (dy)\\= & {} \int _{{\mathsf {Y}}} k_y (x|x') f_{X | Y}(x' | y) f_Y (y) \nu (dy)\\= & {} \int _{{\mathsf {Y}}} k_y (x|x') f_{Y | X}(y | x') \nu (dy) f_X (x') = \tilde{k}(x | x') f_X(x'), \end{aligned}$$

that is, \(\tilde{k}\) is is reversible with respect to \(f_{X}\).

Since (6) is in force, for all \(A \in \mathbb {B}({\mathsf {X}})\) and for \(f_X\) almost all \(x \in {\mathsf {X}}\) we have

$$\begin{aligned} \tilde{K} (x, A {\setminus } \{x\}) = \int _{A{\setminus } \{x\}} \tilde{k}(x' | x) \mu (dx')= & {} \int _{A{\setminus } \{x\}} \int _{{\mathsf {Y}}} k_y (x'|x) f_{Y | X}(y | x) \nu (dy) \mu (dx') \\\ge & {} \int _{{\mathsf {Y}}} \int _{A{\setminus } \{x\}}f_{X | Y} (x' | y) \mu (dx') f_{Y | X}(y | x) \nu (dy)\\= & {} K (x, A {\setminus } \{x\}), \end{aligned}$$

that is, \(\tilde{K} \succeq _P K\). \(\square \)

1.2 Appendix B: The Mtm \(K_{MDA}\) when \(p=2, c=2\)

We order the points in the state space as follows: \((\zeta _1,\zeta _1)\), \((\zeta _1,\zeta _2)\), \((\zeta _2,\zeta _1)\), and \((\zeta _2,\zeta _2)\). We denote the entries of \(K_{MDA}\) by \(\tilde{k}_{ij}\). So, for example, the element \(\tilde{k}_{23}\) is the probability of moving from \((\zeta _1,\zeta _2)\) to \((\zeta _2,\zeta _1)\). In order to write down the expressions for \(\tilde{k}_{ij}\) we need to introduce some notations. Recall that \(m_{ij}\) denotes the number of observations in the jth category with rank \(\zeta _i\) for \(i,j=1,2\). Let \(m_{i.}= m_{i1} + m_{i2}\) for \(i=1,2\), \(m_{d}= m_{11} + m_{22}\), and \(m_{od}= m_{12} + m_{21}\). Finally, for fixed \(w \in (0,1)\), let

$$\begin{aligned} A(w) = [w^{m_{1.}}(1-w)^{m_{2.}}+w^{m_{2.}}(1-w)^{m_{1.}}+w^{m_{d}}(1-w)^{m_{od}}+w^{m_{od}}(1-w)^{m_{d}}], \end{aligned}$$

and \(c=1/\hbox {B}(m_{1.} + a_1, m_{2.} + a_2)\). Recall that \(a_1, a_2\) are the hyper parameters of the prior of \(\theta \). Below, we provide the the expressions for \(\tilde{k}_{1j}\), for \(j=1,\dots ,4\). The other rows of \(K_{MDA}\) can be found similarly. From Sect. 3.2 we know that

$$\begin{aligned} \tilde{k}_{12}= & {} \int _{S_{2}} \frac{p(\pi =(\zeta _1,\zeta _2) | \theta , y)}{1-p(\pi =(\zeta _1,\zeta _1) | \theta , y)}\nonumber \\&\times \; \hbox {min} \bigg (1, \frac{1-p(\pi =(\zeta _1,\zeta _1) | \theta , y)}{1-p(\pi =(\zeta _1,\zeta _2) | \theta , y)} \bigg ) p(\theta | \pi =(\zeta _1,\zeta _1), y) d\theta \end{aligned}$$

Straightforward calculations show that if \(m_{12} \ge m_{22}\) then

$$\begin{aligned} p(\pi =(\zeta _1,\zeta _1) | \theta , y) > p(\pi =(\zeta _1,\zeta _2) | \theta , y) \; \Leftrightarrow \; \theta _1 > 1/2. \end{aligned}$$

On the other hand, if \(m_{12} < m_{22}\) then

$$\begin{aligned} p(\pi =(\zeta _1,\zeta _1) | \theta , y) > p(\pi =(\zeta _1,\zeta _2) | \theta , y) \; \Leftrightarrow \; \theta _1 < 1/2. \end{aligned}$$

Simple calculations show that if \(m_{12} \ge m_{22}\), then

$$\begin{aligned} \tilde{k}_{12}= & {} c\bigg [\int _0^{1/2} \frac{w^{m_d + m_{1.} +a_1 -1} (1-w)^{m_{od} + m_{2.} +a_2 -1}}{A(w) - w^{m_{1.}} (1-w)^{m_{2.}}}dw\\&+ \int _{1/2}^1 \frac{w^{m_d + m_{1.} +a_1-1} (1-w)^{m_{od} + m_{2.} + a_2 -1}}{A(w) - w^{m_{d}} (1-w)^{m_{od}}}dw \bigg ] . \end{aligned}$$

In the case of \(m_{12} < m_{22}\), the range of integration in the above two terms are interchanged. Similarly, we find that the expression for \(\tilde{k}_{13}\) depends on whether \(m_{11} \ge m_{21}\) or \(m_{11} < m_{21}\). If \(m_{11} \ge m_{21}\),

$$\begin{aligned} \tilde{k}_{13}= & {} c\bigg [\int _0^{1/2} \frac{w^{m_{od} + m_{1.} + a_1 -1} (1-w)^{m_{d} + m_{2.} + a_2 -1}}{A(w) - w^{m_{1.}} (1-w)^{m_{2.}}}dw\\&+ \int _{1/2}^1 \frac{w^{m_{od} + m_{1.} + a_1 -1} (1-w)^{m_{d} + m_{2.} + a_2 -1}}{A(w) - w^{m_{od}} (1-w)^{m_{d}}}dw \bigg ], \end{aligned}$$

and the ranges of integration in the above two terms are interchanged when \(m_{11} < m_{21}\). Lastly, if \(m_{1.} \ge m_{2.}\),

$$\begin{aligned} \tilde{k}_{14}= & {} c\bigg [\int _0^{1/2} \frac{w^{m + a_1 -1} (1-w)^{m + a_2 -1}}{A(w) - w^{m_{1.}} (1-w)^{m_{2.}}}dw \\&+ \int _{1/2}^1 \frac{w^{m +a_1 -1} (1-w)^{m +a_2 -1}}{A(w) - w^{m_{2.}} (1-w)^{m_{1.}}}dw \bigg ], \end{aligned}$$

where \(m = m_{1.} + m_{2.}\) is the number of observations and as before the ranges of integration are interchanged when \(m_{1.} < m_{2.}\). Finally, \(\tilde{k}_{11}\) is set to \(1-\sum _{j=2}^4 \tilde{k}_{1j}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roy, V. Improving efficiency of data augmentation algorithms using Peskun’s theorem. Comput Stat 31, 709–728 (2016). https://doi.org/10.1007/s00180-015-0632-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-015-0632-4

Keywords

Navigation