Abstract
Many situations, especially in Bayesian statistical inference, call for the use of a Markov chain Monte Carlo (MCMC) method as a way to draw approximate samples from an intractable probability distribution. With the use of any MCMC algorithm comes the question of how long the algorithm must run before it can be used to draw an approximate sample from the target distribution. A common method of answering this question involves verifying that the Markov chain satisfies a drift condition and an associated minorization condition (Rosenthal, J Am Stat Assoc 90:558–566, 1995; Jones and Hobert, Stat Sci 16:312–334, 2001). This is often difficult to do analytically, so as an alternative, it is typical to rely on output-based methods of assessing convergence. The work presented here gives a computational method of approximately verifying a drift condition and a minorization condition specifically for the symmetric random-scan Metropolis algorithm. Two examples of the use of the method described in this article are provided, and output-based methods of convergence assessment are presented in each example for comparison with the upper bound on the convergence rate obtained via the simulation-based approach.
Similar content being viewed by others
References
Chen, R.: Convergence analysis and comparisons of Markov chain Monte Carlo algorithms in digital communications. IEEE Trans. Signal Process. 50, 255–270 (2002)
Chib, S., Nardari, F., Shephard, N.: Markov chain Monte Carlo methods for generalized stochastic volatility models. J. Econ. 108, 281–316 (1998)
Cowles, M.K., Rosenthal, J.S.: A simulation-based approach to convergence rates for Markov chain Monte Carlo algorithms. Stat. Comput. 8, 115–124 (1998)
Deller, S.C., Amiel, L., Deller, M.: Model uncertainty in ecological criminology: an application of Bayesian model averaging with rural crime data. Int. J. Crim. Soc. Theory 4(2), 683–717 (2011)
Eraker, B.: MCMC analysis of diffusion models with applications to finance. J. Bus. Econ. Stat. 19–2, 177–191 (2001)
Fort, G., Moulines, E., Roberts, G.O., Rosenthal, J.S.: On the geometric ergodicity of hybrid samplers. J. Appl. Probab. 40, 123–146 (2003)
Gelman, A., Gilks, W.R., Roberts, G.O.: Weak convergence and optimal scaling of random walk metropolis algorithms. Ann. Appl. Probab. 7(1), 110–120 (1997)
Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 7, 457–511 (1992)
Geweke, J.: Bayesian Statistics. Oxford University Press, Oxford (1992)
Heidelberger, P., Welch, P.D.: Simulation run length control in the presence of an initial transient. Oper. Res. 31, 1109–1144 (1983)
Hobert, J.P., Geyer, C.J.: Geometric ergodicity of Gibbs and block Gibbs samplers for a hierarchical random effects model. J. Multivar. Anal. 67, 414–430 (1998)
Ishwaran, H., James, L.F., Sun, J.: Bayesian model selection in finite mixtures by marginal density decompositions. J. Am. Stat. Assoc. 96, 1316–1332 (2001)
Jarner, S.F., Hansen, E.: Geometric ergodicity of metropolis algorithms. Stoch. Process. Appl. 85, 341–361 (2000)
Jones, G., Hobert, J.P.: Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Stat. Sci. 16(4), 312–334 (2001)
Jones, G., Hobert, J.P.: Sufficient burn-in for Gibbs samplers for a hierarchical random effects model. Ann. Stat. 32(2), 734–817 (2004)
Li, S., Pearl, D.K., Doss, H.: Phylogenetic tree construction using Markov chain Monte Carlo. J. Am. Stat. Assoc. 95, 493–508 (2000)
Liang, F.: Continuous contour Monte Carlo for marginal density estimation with an application to a spatial statistical model. J. Comput. Gr. Stat. 16(3), 608–632 (2007)
Madras, N., Sezer, D.: Quantitative bounds for Markov chain convergence: Wasserstein and total variation distances. Bernoulli 16(3), 882–908 (2010)
Neal, R.M.: Annealed importance sampling. Technical Report, Department of Statistics, University of Toronto (1998)
Oh, M., Berger, J.O.: Adaptive importance sampling in Monte Carlo integration. Technical Report, Department of Statistics, Purdue University (1989)
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods, 2nd edn. Springer, New York (2004)
Roberts, G.O., Rosenthal, J.S.: Two convergence properties of hybrid samplers. Ann. Appl. Probab. 8, 397–407 (1998)
Roberts, G.O., Rosenthal, J.S.: Geometric ergodicity and hybrid Markov chains. Electron. Commun. Probab. 2(2), 13–25 (1997)
Roberts, G.O., Tweedie, R.L.: Geometric convergence and central limit theorems for multidimensional hastings and metropolis algorithms. Biometrika 83(1), 95–110 (1996)
Roberts, G.O., Rosenthal, J.S.: On convergence rates of Gibbs samplers for uniform distributions. Ann. Appl. Probab. 8, 1291–1302 (1998)
Rosenthal, J.S.: Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Am. Stat. Assoc. 90, 558–566 (1995)
Sonksen, M.D., Wang, X., Umland, K.: Bayesian partially ordered multinomial probit and logit models with an application to course redesign (2013)
Tierney, L.: Markov chains for exploring posterior distributions (with discussion). Ann. Stat. 22, 1701–1762 (1994)
Acknowledgments
The author would like to thank Radu Herbei and Laura Kubatko, as well as two anonymous reviewers, for their helpful insights and commentary during the completion of this project.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Geometric ergodicity of \((\mathsf {X}_t)_{t\ge 0}\)
Let \((\mathsf {X}_t)_{t\ge 0}\) be the Markov chain described in Sect. 4.1.
Theorem 3
\((\mathsf {X}_t)_{t\ge 0}\) is geometrically ergodic.
Proof
The approach here is to verify that the three conditions given by Fort et al. (2003) hold for this chain.
In order to verify that Condition 1 holds, it is necessary to ensure that the target density \(p(\cdot )\) is positive and continuous over all of \(\mathbb {R}^3\). This will ensure that the stationary measure is absolutely continuous with respect to Lebesgue measure. The target density is given by
Note that \(\mathbb {I}_{\mathbb {R}}(x^{(i)}) = 1\) for all \(x^{(i)}\) \(\in \) \(\mathbb {R}\) and for all i=1, 2, 3. The quantity \(\text {exp}\left\{ -\pi (x^{(i)} - 5)^2\right\} \) is also positive for all \(x^{(i)}\) \(\in \) \(\mathbb {R}\) and for all \(i \in \left\{ 1,\,2,\,3 \right\} \). Therefore, \(p(\mathbf {x})\) is positive for all \(\mathbf {x} \in \mathbb {R}^3\). Furthermore, \((x^{(i)} - 5)^2\) is continuous in \(x^{(i)}\) for all real-valued \(x^{(i)}\) and for all i=1,2,3, and \(\text {exp}\left\{ -\pi (x^{(i)} - 5)^2\right\} \) is a continuous function of \((x^{(i)} - 5)^2\), so \(p(\mathbf {x})\) is positive and continuous over \(\mathbb {R}^3\). Thus, \({\pi }\) is absolutely continuous with respect to Lebesgue measure.
In order to establish that \((\mathsf {X}_t)_{t\ge 0}\) satisfies Condition 2, it is necessary to examine the increment densities. Recall that for \(i = 1,2,3, q_i(y)\) is the normal density with mean 0 and variance \((95/99)^2\). Therefore, if \(\delta _i\) is chosen to be 95 / 99 for \(i = 1,2,3\), then for all y such that \(|y| \le \delta _i, q_i(y) \ge 0.2521\). Thus, choosing \(\eta _i = 0.252\) for each \(i = 1,2,3\) ensures \((\mathsf {X}_t)_{t\ge 0}\) satisfies Condition 2.
Verification that \((\mathsf {X}_t)_{t\ge 0}\) satisfies the third of the Fort et al. (2003) conditions consists of three parts. First, it is necessary to show that there exist constants \(\delta \) and \(\Delta \) with \(0 \le \delta \le \Delta \le +\infty \) such that
Since the increment densities are the same for each \(i=1,2,3\), it suffices to show that
Let \(\delta = 95/99\), and let \(\Delta = 2\left( 95/99\right) \). Then \(\xi = \varPhi (2) - \varPhi (1)\), where \(\varPhi (\cdot )\) denotes the cumulative distribution function (CDF) of the standard normal distribution. Choosing these values of \(\delta \) and \(\Delta \) yields a value of \(\xi \) equal to 0.1358. This establishes that \((\mathsf {X}_t)_{t\ge 0}\) satisfies the first requirement of Condition 3.
The second part of verifying that \((\mathsf {X}_t)_{t\ge 0}\) satisfies Condition 3 requires ensuring that for any sequence \(\left\{ \mathbf {x}_n \right\} \) such that \(\lim \limits _{n\rightarrow \infty }\Vert \mathbf {x}_n\Vert = \infty \), it is possible to extract a subsequence \(\left\{ \tilde{\mathbf {x}}_n\right\} \) such that for some \(i \in \left\{ 1,2,3\right\} \),
In order to establish that this holds, note that if \(\lim \limits _{n\rightarrow \infty }\Vert \mathbf {x}_n\Vert \) \(=\) \(\infty \), then for at least one \(i \in \left\{ 1,2,3\right\} \), \(\lim \limits _{n\rightarrow \infty }|x_n^{(i)}| = +\infty \). Without loss of generality, assume \(\lim \limits _{n\rightarrow \infty }|x_n^{(1)}| = +\infty \). Two cases must be handled here. Let \(\left\{ \tilde{\mathbf {x}}_n\right\} \) be a subsequence of \(\left\{ \mathbf {x}_n\right\} \) such that \(\lim \limits _{n\rightarrow \infty }x_n^{(1)} = +\infty \), and let \(\left\{ \hat{\mathbf {x}}_n\right\} \) be a subsequence of \(\left\{ \mathbf {x}_n\right\} \) such that \(\lim \limits _{n\rightarrow \infty }\hat{x}_n^{(1)} = -\infty \). It is required to show that the above limit is 0 for both subsequences. Beginning with \(\left\{ \tilde{\mathbf {x}}_n\right\} \),
since \(\tilde{x}_n^{(1)}\) is positive in the tail of \(\left\{ \tilde{\mathbf {x}}_n\right\} \). This yields
Now examine the limit for \(\left\{ \hat{\mathbf {x}}_n\right\} \). Since \(\hat{x}_n^{(1)}\) is negative in the tail of \(\left\{ \hat{\mathbf {x}}_n\right\} \), it follows that
Examination of the limit of this ratio gives
This establishes that \((\mathsf {X}_t)_{t\ge 0}\) satisfies the second part of condition 3. In order to establish that \((\mathsf {X}_t)_{t\ge 0}\) satisfies the final part of condition 3, it is necessary to show that the limit of the ratio
is 0 for both \(\left\{ \tilde{\mathbf {x}}_n \right\} \) and \(\left\{ \hat{\mathbf {x}}_n \right\} \) as n tends to \(\infty \). Beginning with \(\left\{ \tilde{\mathbf {x}}_n \right\} \), since \(\tilde{x}_n^{(1)}\) is positive in the tails of \(\left\{ \tilde{\mathbf {x}}_n\right\} \), it follows that
As n tends to \(\infty \), the limit of this ratio is
Since \(\hat{x}_n^{(1)}\) is negative in the tail of \(\left\{ \hat{\mathbf {x}}_n\right\} \), it follows that
Thus, as n tends to \(\infty \), the limit of this ratio is
This establishes that \((\mathsf {X}_t)_{t\ge 0}\) satisfies the three conditions given by Fort et al. (2003). Thus, \((\mathsf {X}_t)_{t\ge 0}\) is geometrically ergodic, and \(V(\mathbf {x})\) \(=\) \(\left[ p(\mathbf {x})\right] ^{-s}\) is a useful drift function, where s is chosen so that \(s(1-s)^{\frac{1}{s} - 1}\) \(<\) 0.01555, in accordance with the hypothesis of Theorem 2. \(\square \)
Appendix 2: Geometric ergodicity of \(({\tau }_t)_{t\ge 0}\)
Let \(({\tau }_t)_{t\ge 0}\) be the Markov chain described in Sect. 4.2.
Theorem 4
The Markov chain \(({\tau }_t)_{t=0}^{\infty }\) is geometrically ergodic.
Proof
The approach here is again to verify that \(({\tau }_t)_{t\ge 0}\) satisfies the three conditions of Fort et al. (2003). Recall that the target density for this example is given by
1.1 Verification of Condition 1
In order to verify that \(({\tau }_t)_{t\ge 0}\) satisfies Condition 1, it is necessary to show that \(p({\tau }|\mathbf {S})\) is positive and continuous over \(\mathbb {R}^9\), and that the stationary measure is absolutely continuous with respect to Lebesgue measure. Note first that for all \(x \in \mathbb {R}\), the function \(g(x) = e^x\) is positive. Since no value of \({\tau }\) exists in \(\mathbb {R}^9\) for which the exponent is not real, it follows that \(p({\tau }|\mathbf {S})\) is positive over \(\mathbb {R}^9\). Note also that the exponent has no points of discontinuity in \({\tau }\), since the exponent is a linear combination of functions that are continuous in \({\tau }\) over \(\mathbb {R}^9\). Therefore, \(p({\tau }|\mathbf {S})\) is continuous over \(\mathbb {R}^9\). This shows that the target density is positive and continuous over \(\mathbb {R}^9\). Therefore, if \(\pi (\cdot )\) is the stationary measure, then \(\pi (B) = 0\) if and only if B is a set of Lebesgue measure 0. This implies that the stationary measure is absolutely continuous with respect to Lebesgue measure, and Condition 1 is satisfied.
1.2 Verification of Condition 2
Condition 2 is easily verified, since the increment density is uniform on the interval \([-2.25, 2.25]\). Hence, for \(i=1,2,\ldots , 9\),
Let \(\delta _i = 2.25\) for \(i=1,2,\ldots , 9\). Then for y such that \(|y| \le \delta _i\), \(q_i(y) = 1/4.5\). Therefore, \(\eta _i = 1/4.5\) for \(i=1,2,\ldots , 9\), and \(({\tau }_t)_{t\ge 0}\) is satisfied.
1.3 Verification of Condition 3
Using the notation from the statement of Condition 3, let \(\delta = 0.1\) and \(\Delta = 2.25\). Since for each \(i=1,2,\ldots , 9\), the increment density is the \(U[-2.25, 2.25]\) density, it follows that
This establishes that \(({\tau }_t)_{t\ge 0}\) satisfies the first part of Condition 3. The next step is to investigate the ratios given in Theorem 2. To begin, let \(\left\{ {\tau }_j^+ \right\} \) be a sequence for which \(\lim _{j}\Vert {\tau }_j\Vert = +\infty \). Then for at least one \(i \in \left\{ 1,2,\ldots , 9\right\} \), \(\lim _j |\tau _j^{(i)}| = +\infty .\) This leaves three possibilities:
-
\(\lim _j \tau _j^{(i)} = +\infty \)
-
\(\lim _j \tau _j^{(i)} = -\infty \)
-
\(\tau _j^{(i)}\) alternates, and it is possible to extract one subsequence \(\left\{ {\tau }\right\} \) for which \(\tau _j^{(i)} \rightarrow +\infty \) and one subsequence for which \(\tau _j^{(i)} \rightarrow -\infty \).
In order to verify that \(({\tau }_t)_{t\ge 0}\) satisfies Condition 3, it suffices to show that
for each of the two subsequences of \(\left\{ {\tau }_j\right\} \) described in the third situation.
To begin, let \(\left\{ {\tau }_j^+\right\} \) be a subsequence of \(\left\{ {\tau }_j\right\} \) for which \(\lim _j |\tau _j^{(i)}| = +\infty \), and assume without loss of generality that \(i=1\). Since \(\tau _j^{+(1)} > 0\) in the tail of \(\left\{ {\tau }_j^+\right\} \),
For the first ratio,
Recall that \(Z_{ij}=1\) if student i had instructor j and \(Z_{ij} = 0 \) otherwise. Let \(G=\left\{ i: \text { Student }i \text {had instructor 1}\right\} \). Then, after cancelation, the ratio can be written as
Examining the second ratio for this subsequence and making use of a similar mathematical argument to the one that was used to establish the limit of the first ratio gives
which tends to 0 as j tends to \(+\infty \). This proves that the two limits in Condition 3 hold for the subsequence \(\left\{ {\tau }_j^+\right\} \).
Now examine the same two ratios for the subsequence \(\left\{ {\tau }_j^- \right\} \), where \(\lim _j \tau _j^{-(1)} = -\infty \). This implies that \(\tau _j^{-(1)}\) is negative in the tail of \(\left\{ {\tau }_j^-\right\} \). Therefore,
This gives the following for the first ratio after similar cancelation to that used to establish that the limit holds for \(\left\{ {\tau }_j^+\right\} \). The limit of the ratio can be written as
which tends to 0 as j approaches \(\infty \). After similar cancelation as has been used throughout the rest of the proof, the second limit can be written as
which tends to 0 as j tends to \(\infty \). This establishes that \(({\tau }_t)_{t\ge 0}\) satisfies the conditions provided by Fort et al. (2003), and as a result, the chain satisfies the specified drift condition for an appropriate choice of s for the small set based on an appropriate choice of d that is presented by Rosenthal (1995). \(\square \)
Rights and permissions
About this article
Cite this article
Spade, D.A. A computational procedure for estimation of the mixing time of the random-scan Metropolis algorithm. Stat Comput 26, 761–781 (2016). https://doi.org/10.1007/s11222-015-9568-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-015-9568-3