Improving efficiency of data augmentation algorithms using Peskun’s theorem

Roy, Vivekananda

doi:10.1007/s00180-015-0632-4

Improving efficiency of data augmentation algorithms using Peskun’s theorem

Original Paper
Published: 19 November 2015

Volume 31, pages 709–728, (2016)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Vivekananda Roy¹

286 Accesses
Explore all metrics

Abstract

Data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo algorithm. In this paper, an alternative to DA algorithm is proposed. It is shown that the modified Markov chain is always more efficient than DA in the sense that the asymptotic variance in the central limit theorem under the alternative chain is no larger than that under DA. The modification is based on Peskun’s (Biometrika 60:607–612, 1973) result which shows that asymptotic variance of time average estimators based on a finite state space reversible Markov chain does not increase if the Markov chain is altered by increasing all off-diagonal probabilities. In the special case when the state space or the augmentation space of the DA chain is finite, it is shown that Liu’s (Biometrika 83:681–682, 1996) modified sampler can be used to improve upon the DA algorithm. Two illustrative examples, namely the beta-binomial distribution, and a model for analyzing rank data are used to show the gains in efficiency by the proposed algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

Limitations of data augmentation and outlook

Random projections for Bayesian regression

Article Open access 19 November 2015

Leo N. Geppert, Katja Ickstadt, … Christian Sohler

References

Brémaud P (1999) Markov chains Gibbs fields, Monte Carlo simulation, and queues. Springer, New York
MATH Google Scholar
Brook D, Upton GJG (1974) Biases in local government elections due to position on the ballot paper. Appl Stat 23:414–419
Article Google Scholar
Casella G, George E (1992) Explaining the Gibbs sampler. Am Stat 46:167–174
MathSciNet Google Scholar
Hobert JP (2011) The data augmentation algorithm: theory and methodology. In: Brooks S, Gelman A, Jones GL, Meng X-L (eds) Handbook of Markov chain Monte Carlo. CRC Press, Boca Raton, pp 253–293
Hobert JP, Marchev D (2008) A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann Stat 36:532–554
Article MathSciNet MATH Google Scholar
Hobert JP, Roy V, Robert CP (2011) Improving the convergence properties of the data augmentation algorithm with an application to Bayesian mixture modelling. Stat Sci 26:332–351
Article MathSciNet MATH Google Scholar
Jones GL (2004) On the Markov chain central limit theorem. Probab Surv 1:299–320
Article MathSciNet MATH Google Scholar
Jones GL, Roberts GO, Rosenthal J (2014) Convergence of conditional Metropolis–Hastings samplers. Adv Appl Probab 46:422–445
Khare K, Hobert JP (2011) A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants. Ann Stat 39:2585–2606
Article MathSciNet MATH Google Scholar
Kipnis C, Varadhan SRS (1986) Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Commun Math Phys 104:1–19
Article MathSciNet MATH Google Scholar
Laha A, Dutta S, Roy V (2013) A novel sandwich algorithm for empirical Bayes analysis of rank data. Tech. rep. Indian Institute of management, Ahmedabad
Laha A, Dongaonkar S (2009) Bayesian analysis of rank data using SIR. In: Sengupta A (ed) Advances in multivariate statistical methods. World Scientific Publishers, Singapore, pp 327–335
Chapter Google Scholar
Liu JS (1996) Peskun’s theorem and a modified discrete-state Gibbs sampler. Biometrika 83:681–682
Article MathSciNet MATH Google Scholar
Liu JS, Wong WH, Kong A (1994) Covariance structure of the Gibbs sampler with applications to comparisons of estimators and augmentation schemes. Biometrika 81:27–40
Article MathSciNet MATH Google Scholar
Liu JS, Wu YN (1999) Parameter expansion for data augmentation. J Am Stat Assoc 94:1264–1274
Article MathSciNet MATH Google Scholar
Meng X-L, van Dyk DA (1999) Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86:301–320
Article MathSciNet MATH Google Scholar
Meyn SP, Tweedie RL (1993) Markov chains and stochastic stability. Springer, London
Book MATH Google Scholar
Mira A, Geyer CJ (1999) Ordering Monte Carlo Markov chains. Tech. rep. no. 632. School of Statistics, University of Minnesota
Peskun PH (1973) Optimum Monte Carlo sampling using Markov chains. Biometrika 60:607–612
Article MathSciNet MATH Google Scholar
R Development Core Team (2011) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. ISBN 3-900051-07-0, http://www.R-project.org
Roberts GO, Rosenthal JS (2004) General state space Markov chains and MCMC algorithms. Probab Surv 1:20–71
Article MathSciNet MATH Google Scholar
Rosenthal JS (2003) Asymptotic variance and convergence rates of nearly-periodic Markov chain Monte Carlo algorithms. J Am Stat Assoc 98:169–177
Article MathSciNet MATH Google Scholar
Roy V (2014) Efficient estimation of the link function parameter in a robust Bayesian binary regression model. Comput Stat Data Anal 73:87–102
Rudin W (1991) Functional analysis, 2nd edn. McGraw-Hill, New York
MATH Google Scholar
Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation (with discussion). J Am Stat Assoc 82:528–550
Article MathSciNet MATH Google Scholar
Tierney L (1998) A note on Metropolis–Hastings kernels for general state spaces. Ann Appl Probab 8:1–9
Article MathSciNet MATH Google Scholar
van Dyk DA, Meng X-L (2001) The art of data augmentation (with discussion). J Comput Graph Stat 10:1–50
Article Google Scholar

Download references

Acknowledgments

The author thanks Radford Neal for suggesting the algorithm presented in Sect. 3.2.1 for the case $f_{X|Y}(x |y) > (\sqrt{5}-1)/2$. The author also thanks two reviewers for their suggestions that have improved the paper.

Author information

Authors and Affiliations

Iowa State University, Ames, IA, 50011, USA
Vivekananda Roy

Authors

Vivekananda Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vivekananda Roy.

Appendices

1.1 Appendix A: Proof of Proposition 1

Proof

To show $f_X (x)$ is invariant for $\tilde{K}$, note that

$$\begin{aligned} \int _{{\mathsf {X}}} \tilde{k}(x' | x) f_X (x) \mu (dx)= & {} \int _{{\mathsf {X}}} \int _{{\mathsf {Y}}} k_y (x'|x) f_{Y | X}(y | x) \nu (dy) f_X (x) \mu (dx)\\= & {} \int _{{\mathsf {Y}}} \int _{{\mathsf {X}}} k_y (x'|x) f_{X | Y}(x | y) \mu (dx) f_Y (y) \nu (dy)\\= & {} \int _{{\mathsf {Y}}} f_{X | Y}(x' | y) f_Y (y) \nu (dy) = f_X(x') \end{aligned}$$

where the third equality follows from (4).

Next assume that $k_y$ is reversible with respect to $f_{X|Y} (x|y)$, that is, (5) holds. Then

$$\begin{aligned} \tilde{k}(x' | x) f_X (x)= & {} \int _{{\mathsf {Y}}} k_y (x'|x) f_{Y | X}(y | x) \nu (dy) f_X (x) \\= & {} \int _{{\mathsf {Y}}} k_y (x'|x) f_{X | Y}(x | y) f_Y (y) \nu (dy)\\= & {} \int _{{\mathsf {Y}}} k_y (x|x') f_{X | Y}(x' | y) f_Y (y) \nu (dy)\\= & {} \int _{{\mathsf {Y}}} k_y (x|x') f_{Y | X}(y | x') \nu (dy) f_X (x') = \tilde{k}(x | x') f_X(x'), \end{aligned}$$

that is, $\tilde{k}$ is is reversible with respect to $f_{X}$.

Since (6) is in force, for all $A \in \mathbb {B}({\mathsf {X}})$ and for $f_X$ almost all $x \in {\mathsf {X}}$ we have

$$\begin{aligned} \tilde{K} (x, A {\setminus } \{x\}) = \int _{A{\setminus } \{x\}} \tilde{k}(x' | x) \mu (dx')= & {} \int _{A{\setminus } \{x\}} \int _{{\mathsf {Y}}} k_y (x'|x) f_{Y | X}(y | x) \nu (dy) \mu (dx') \\\ge & {} \int _{{\mathsf {Y}}} \int _{A{\setminus } \{x\}}f_{X | Y} (x' | y) \mu (dx') f_{Y | X}(y | x) \nu (dy)\\= & {} K (x, A {\setminus } \{x\}), \end{aligned}$$

that is, $\tilde{K} \succeq _P K$. $\square $

1.2 Appendix B: The Mtm $K_{MDA}$ when $p=2, c=2$

We order the points in the state space as follows: $(\zeta _1,\zeta _1)$, $(\zeta _1,\zeta _2)$, $(\zeta _2,\zeta _1)$, and $(\zeta _2,\zeta _2)$. We denote the entries of $K_{MDA}$ by $\tilde{k}_{ij}$. So, for example, the element $\tilde{k}_{23}$ is the probability of moving from $(\zeta _1,\zeta _2)$ to $(\zeta _2,\zeta _1)$. In order to write down the expressions for $\tilde{k}_{ij}$ we need to introduce some notations. Recall that $m_{ij}$ denotes the number of observations in the jth category with rank $\zeta _i$ for $i,j=1,2$. Let $m_{i.}= m_{i1} + m_{i2}$ for $i=1,2$, $m_{d}= m_{11} + m_{22}$, and $m_{od}= m_{12} + m_{21}$. Finally, for fixed $w \in (0,1)$, let

$$\begin{aligned} A(w) = [w^{m_{1.}}(1-w)^{m_{2.}}+w^{m_{2.}}(1-w)^{m_{1.}}+w^{m_{d}}(1-w)^{m_{od}}+w^{m_{od}}(1-w)^{m_{d}}], \end{aligned}$$

and $c=1/\hbox {B}(m_{1.} + a_1, m_{2.} + a_2)$. Recall that $a_1, a_2$ are the hyper parameters of the prior of $\theta $. Below, we provide the the expressions for $\tilde{k}_{1j}$, for $j=1,\dots ,4$. The other rows of $K_{MDA}$ can be found similarly. From Sect. 3.2 we know that

$$\begin{aligned} \tilde{k}_{12}= & {} \int _{S_{2}} \frac{p(\pi =(\zeta _1,\zeta _2) | \theta , y)}{1-p(\pi =(\zeta _1,\zeta _1) | \theta , y)}\nonumber \\&\times \; \hbox {min} \bigg (1, \frac{1-p(\pi =(\zeta _1,\zeta _1) | \theta , y)}{1-p(\pi =(\zeta _1,\zeta _2) | \theta , y)} \bigg ) p(\theta | \pi =(\zeta _1,\zeta _1), y) d\theta \end{aligned}$$

Straightforward calculations show that if $m_{12} \ge m_{22}$ then

$$\begin{aligned} p(\pi =(\zeta _1,\zeta _1) | \theta , y) > p(\pi =(\zeta _1,\zeta _2) | \theta , y) \; \Leftrightarrow \; \theta _1 > 1/2. \end{aligned}$$

On the other hand, if $m_{12} < m_{22}$ then

$$\begin{aligned} p(\pi =(\zeta _1,\zeta _1) | \theta , y) > p(\pi =(\zeta _1,\zeta _2) | \theta , y) \; \Leftrightarrow \; \theta _1 < 1/2. \end{aligned}$$

Simple calculations show that if $m_{12} \ge m_{22}$, then

$$\begin{aligned} \tilde{k}_{12}= & {} c\bigg [\int _0^{1/2} \frac{w^{m_d + m_{1.} +a_1 -1} (1-w)^{m_{od} + m_{2.} +a_2 -1}}{A(w) - w^{m_{1.}} (1-w)^{m_{2.}}}dw\\&+ \int _{1/2}^1 \frac{w^{m_d + m_{1.} +a_1-1} (1-w)^{m_{od} + m_{2.} + a_2 -1}}{A(w) - w^{m_{d}} (1-w)^{m_{od}}}dw \bigg ] . \end{aligned}$$

In the case of $m_{12} < m_{22}$, the range of integration in the above two terms are interchanged. Similarly, we find that the expression for $\tilde{k}_{13}$ depends on whether $m_{11} \ge m_{21}$ or $m_{11} < m_{21}$. If $m_{11} \ge m_{21}$,

$$\begin{aligned} \tilde{k}_{13}= & {} c\bigg [\int _0^{1/2} \frac{w^{m_{od} + m_{1.} + a_1 -1} (1-w)^{m_{d} + m_{2.} + a_2 -1}}{A(w) - w^{m_{1.}} (1-w)^{m_{2.}}}dw\\&+ \int _{1/2}^1 \frac{w^{m_{od} + m_{1.} + a_1 -1} (1-w)^{m_{d} + m_{2.} + a_2 -1}}{A(w) - w^{m_{od}} (1-w)^{m_{d}}}dw \bigg ], \end{aligned}$$

and the ranges of integration in the above two terms are interchanged when $m_{11} < m_{21}$. Lastly, if $m_{1.} \ge m_{2.}$,

$$\begin{aligned} \tilde{k}_{14}= & {} c\bigg [\int _0^{1/2} \frac{w^{m + a_1 -1} (1-w)^{m + a_2 -1}}{A(w) - w^{m_{1.}} (1-w)^{m_{2.}}}dw \\&+ \int _{1/2}^1 \frac{w^{m +a_1 -1} (1-w)^{m +a_2 -1}}{A(w) - w^{m_{2.}} (1-w)^{m_{1.}}}dw \bigg ], \end{aligned}$$

where $m = m_{1.} + m_{2.}$ is the number of observations and as before the ranges of integration are interchanged when $m_{1.} < m_{2.}$. Finally, $\tilde{k}_{11}$ is set to $1-\sum _{j=2}^4 \tilde{k}_{1j}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, V. Improving efficiency of data augmentation algorithms using Peskun’s theorem. Comput Stat 31, 709–728 (2016). https://doi.org/10.1007/s00180-015-0632-4

Download citation

Received: 26 September 2014
Accepted: 04 November 2015
Published: 19 November 2015
Issue Date: June 2016
DOI: https://doi.org/10.1007/s00180-015-0632-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving efficiency of data augmentation algorithms using Peskun’s theorem

Abstract

Access this article

Similar content being viewed by others

CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

Limitations of data augmentation and outlook

Random projections for Bayesian regression

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

1.1 Appendix A: Proof of Proposition 1

Proof

1.2 Appendix B: The Mtm \(K_{MDA}\) when \(p=2, c=2\)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving efficiency of data augmentation algorithms using Peskun’s theorem

Abstract

Access this article

Similar content being viewed by others

CoPASample: A Heuristics Based Covariance Preserving Data Augmentation

Limitations of data augmentation and outlook

Random projections for Bayesian regression

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendices

1.1 Appendix A: Proof of Proposition 1

Proof

1.2 Appendix B: The Mtm \(K_{MDA}\) when \(p=2, c=2\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation