MCMC methods to approximate conditional predictive distributions

doi:10.1016/j.csda.2006.01.018

Computational Statistics & Data Analysis

Volume 51, Issue 2, 15 November 2006, Pages 621-640

https://doi.org/10.1016/j.csda.2006.01.018 Get rights and content

Abstract

Sampling from conditional distributions is a problem often encountered in statistics when inferences are based on conditional distributions which are not of closed-form. Several Markov chain Monte Carlo (MCMC) algorithms to simulate from them are proposed. Potential problems are pointed out and some suitable modifications are suggested. Approximations based on conditioning sets are also explored. The issues are illustrated within a specific statistical tool for Bayesian model checking, and compared in an example. An example in frequentist conditional testing is also given.

Introduction

Many situations in statistics require use of conditional distributions of a random vector $X$ given that $X$ lies in some subset of the sample space. The conditioning subset is most often defined by a particular value of another random vector U; usually $X$ is a vector of observations and U is a statistic, a function of $X$ . Perhaps, the best known scenario is that of conditioning on a sufficient statistic to eliminate nuisance parameters (as in Fisher exact test for contingency tables) but there are many others; Kiefer (1977) advocates routine use of conditional measures of statistical evaluation for carefully selected U's; Other examples are provided by the Fisher relevant subsets (Fisher, 1959). Discussions of these and other conditional approaches to statistics can be found in Barnett (1982), Berger and Wolpert (1984), Strickland et al. (2005), Berger (1985), Barndorff-Nielsen (1988), Reid (1995) and Lehman (1997), along with many references. Some recent particular applications requiring evaluation of these type of conditional distributions include Diaconis and Sturmfles (1998), Browne (2006) and Caffo and Booth (2001), and the conditional predictive distributions of Bayarri and Berger (2000). We next give an example taken from Reid (1995).

Gamma example: Suppose that we want to test some hypothesis about the shape parameter $α$ of a gamma distribution with density: $f (x ∣ α, β) = \frac{β^{α}}{Γ (α)} x^{α - 1} \exp (- β x) .$ In this case, for a random sample $x = (x_{1}, \dots, x_{n})$ the minimal sufficient statistic for $(α, β)$ is $(s_{1}, s_{2}) = (\sum \log x_{i}, \sum x_{i})$ . It can be shown that the conditional distribution of $s_{1}$ given $s_{2}$ , $f (s_{1} ∣ s_{2}, α, β)$ does not depend on the nuisance parameter $β$ , which gets, thus, eliminated. Also, in this case, tests of hypothesis about $α$ based on this $f (s_{1} ∣ s_{2}, α, β) = f (s_{1} ∣ s_{2}, α)$ are (unconditionally) uniformly most powerful among the class of unbiased test. It is thus natural to base tests about $α$ on this conditional distribution. However, explicitly deriving and working with $f (s_{1} ∣ s_{2}, α)$ is challenging, to say the least.

Conditional distributions, except for trivial, standard, situations, cannot be obtained in closed form. In these cases, inferences have traditionally relied on asymptotic approximations. These approximations, however, may not be appropriate for small, or even moderate sample sizes, and nowadays it has become standard to base inferences on samples generated from the target distributions.

When the target conditional distributions are easy to generate from, either directly or indirectly, samples are easily and conveniently generated with usual Monte Carlo (MC) methods. However, most distributions for $X$ and functional forms for U result in very complicated conditional distributions and simple simulation methods will not suffice. Markov chain Monte Carlo (MCMC) methods (see, for example, Robert and Casella, 1999) have proved to be an excellent tool to simulate from these challenging distributions.

In addition, the peculiarities of conditional distributions, in which regions of high probability can vary wildly with the conditioning values, make implementation of MC and MCMC methods a bit tricky. In particular, we have to guarantee the moves to such high probability regions, and the algorithms have to be suitably adapted to these difficult conditions.

In this paper, we propose different versions of most usual MCMC algorithms—the Metropolis–Hastings algorithm and the Gibbs sampler algorithm, to simulate from conditional distributions. Our particular application is that of model checking from a Bayesian point of view, but the general ideas are applicable for simulating from conditional distributions in many other settings. As an illustration, in Section 6, we address the conditional frequentist testing of the gamma example above.

In the next section, we briefly review both MCMC algorithms. Our specific statistical problem is presented in Section 3, and in Section 4 the algorithms are applied and suitably modified for our problem. In Section 5, we develop a particular application. After the conditional frequentist example in Section 6, we conclude with a comparison of the different algorithms and some comments.

Section snippets

A quick reminder of MCMC algorithms

The generic Metropolis–Hastings (M–H) algorithm (Metropolis et al., 1953; Hastings, 1970) aims at obtaining simulations from a target distribution $f (\cdot)$ difficult to sample from. Candidates are in fact simulated from a conditional proposal density $q (y | x)$ , and the candidates are accepted with a suitable probability. The result is a Markov chain converging to simulations from the target density. It is important not only that $q (\cdot | x)$ is fast to simulate from, but also that it results in adequate

Conditional predictive distributions

To explore MCMC algorithms for sampling from conditional distributions, we choose a particular statistical problem in which such conditional distributions are needed. Our specific scenario is model checking from a Bayesian perspective. More specifically, the approach taken by Bayarri and Berger (2000) in which the main tools for model checking are conditional distributions defined on the prior predictive distributions through carefully chosen conditioning statistics (conditioning sets are also

MCMC algorithms to approximate CP and PPP distributions

First, we discuss how to simulate from CP distributions using algorithms based on M–H and Gibbs sampling. Then we produce M–H algorithms to simulate from PPP distributions.

CP and PPP in the normal model with $T = maximum$

For simplicity of exposition, we exemplify all of the computations in the previous sections in the familiar normal scenario. More complex situations (outlier detection, checking hierarchical models) have been successfully implemented, with the proposed algorithms, in Bayarri and Morales (2003) and Bayarri and Castellanos (2004) (but basically no computational details were there given). Consider the case in which $X_{1}, \dots, X_{n}$ are i.i.d., $X_{i} \sim N (μ, σ^{2})$ with $θ = (μ, σ^{2})$ unknown and $T = \max (X_{1}, \dots, X_{n})$ .

Prior and

An example on conditional frequentist testing

To give an application to a different statistical scenario, we consider the Gamma conditional testing introduced in Section 1. Specifically, we assume that $x = (x_{1}, x_{2}, \dots, x_{n})$ is a random sample from a $Ga (α, β)$ distribution, with both parameters unknown, and that we wish to test whether the data can be assumed to come from an exponential distribution with parameter $β$ , that is, we wish to test whether $α = 1$ .

According to the argument in Section 1, we use the test statistic $s_{1} = \sum \log x_{i}$ (most powerful unbiased

Conclusions

In this paper, we present different algorithms to simulate from conditional distributions. The main emphasis being in simulating from the (Bayesian) conditional predictive CP and the PPP distributions, but an example is also given in a conditional frequentist testing scenario.

Analytical derivations of conditional distributions are, for the most part, unfeasible; even straight numerical calculus can be severely limited by the difficulty in handling the constraints introduced by the conditioning

Acknowledgements

This research was supported by the Spanish Ministry of Science and Technology, under Grant MTM2004-03290, CGL2004-01335/BOS and TSI2004-06801-C04-01. Part of the work was done while the first author was visiting the Statistical and Applied Mathematical Sciences Institute and ISDS, Duke University. We thank the referees and the editor for their useful comments.

References (28)

M.J. Bayarri et al.
Bayesian measures of surprise for outlier detection
J. Statist. Plann. Inference
(2003)
W.J. Browne
MCMC algorithms for constrained variance matrices
Comput. Statist. Data Anal.
(2006)
O.E. Barndorff-Nielsen
Parametric Statistical Models and Likelihood
(1988)
V. Barnett
Comparative Statistical Inference
(1982)
Bayarri M.J., Berger, J.O., 1997. Measures of surprise in Bayesian analysis. Technical Report 97-46, Duke University,...
M.J. Bayarri et al.
Quantifying measures surprise in the data and model verification
M.J. Bayarri et al.
p-Values for composite null models
J. Amer. Statist. Assoc.
(2000)
M.J. Bayarri et al.
A comparison between p-values for goodness-of-fit checking
Bayarri, M.J., Castellanos, M.E., 2004. Bayesian checking of hierarchical models. ISDS Discussion Paper...
J.O. Berger
Statistical Decision Theory and Bayesian Analysis
(1985)

J.O. Berger et al.

The Likelihood Principle

(1984)

Berger, J., 2006. The case for objective Bayesian analysis. Bayesian Analysis, in press. Available at...

G.E.P. Box

Sampling and Bayes inference in scientific modelling and robustness

J. Roy. Statist. Soc. Ser. A

(1980)

B.S. Caffo et al.

A Markov Chain Monte Carlo algorithm for approximating exact conditional probabilities

J. Comput. Graphical Statist.

(2001)

Cited by (6)

Obtaining similar null distributions in the normal linear model using computational methods
2009, Computational Statistics and Data Analysis
Hypothesis testing when the null hypothesis belongs to the univariate or multivariate normal linear model is discussed. More specifically it is shown how data can be replicated from the null distribution conditional on the sufficient statistics for the parameters of the null hypothesis at hand. This distribution will be called the similar null distribution of the data. It is shown how the similar null distribution of the data can be used to obtain level alpha tests for any test statistic that is of interest. The $p$ -value that is obtained using the distribution of the test statistic conditional on the sufficient statistics is called the similar $p$ -value.
Approach for link prediction of knowledge graph based on probabilistic inferences
2023, Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS
Application of Gibbs sampling in efficient hyperspectral unmixing based on the mixtures of Dirichlet components
2015, Journal of Applied Remote Sensing
Rejoinder: Bayesian checking of the second levels of hierarchical models
2007, Statistical Science
Bayesian checking of the second levels of hierarchical models
2007, Statistical Science
Effective factors on survival time of the leukemic patients and estimating the mean of survival time by expectation and maximization algorithm and Monte Carlo Markov chains simulation method
2007, Journal of Isfahan Medical School

View full text

MCMC methods to approximate conditional predictive distributions

Abstract

Introduction

Section snippets

A quick reminder of MCMC algorithms

Conditional predictive distributions

MCMC algorithms to approximate CP and PPP distributions

CP and PPP in the normal model with T=maximum

An example on conditional frequentist testing

Conclusions

Acknowledgements

J. Statist. Plann. Inference

Comput. Statist. Data Anal.

Parametric Statistical Models and Likelihood

Comparative Statistical Inference

Quantifying measures surprise in the data and model verification

p-Values for composite null models

J. Amer. Statist. Assoc.

A comparison between p-values for goodness-of-fit checking