MCMC methods to approximate conditional predictive distributions

https://doi.org/10.1016/j.csda.2006.01.018Get rights and content

Abstract

Sampling from conditional distributions is a problem often encountered in statistics when inferences are based on conditional distributions which are not of closed-form. Several Markov chain Monte Carlo (MCMC) algorithms to simulate from them are proposed. Potential problems are pointed out and some suitable modifications are suggested. Approximations based on conditioning sets are also explored. The issues are illustrated within a specific statistical tool for Bayesian model checking, and compared in an example. An example in frequentist conditional testing is also given.

Introduction

Many situations in statistics require use of conditional distributions of a random vector X given that X lies in some subset of the sample space. The conditioning subset is most often defined by a particular value of another random vector U; usually X is a vector of observations and U is a statistic, a function of X. Perhaps, the best known scenario is that of conditioning on a sufficient statistic to eliminate nuisance parameters (as in Fisher exact test for contingency tables) but there are many others; Kiefer (1977) advocates routine use of conditional measures of statistical evaluation for carefully selected U's; Other examples are provided by the Fisher relevant subsets (Fisher, 1959). Discussions of these and other conditional approaches to statistics can be found in Barnett (1982), Berger and Wolpert (1984), Strickland et al. (2005), Berger (1985), Barndorff-Nielsen (1988), Reid (1995) and Lehman (1997), along with many references. Some recent particular applications requiring evaluation of these type of conditional distributions include Diaconis and Sturmfles (1998), Browne (2006) and Caffo and Booth (2001), and the conditional predictive distributions of Bayarri and Berger (2000). We next give an example taken from Reid (1995).

Gamma example: Suppose that we want to test some hypothesis about the shape parameter α of a gamma distribution with density: f(xα,β)=βαΓ(α)xα-1exp(-βx).In this case, for a random sample x=x1,,xn the minimal sufficient statistic for (α,β) is s1,s2=logxi,xi. It can be shown that the conditional distribution of s1 given s2, fs1s2,α,β does not depend on the nuisance parameter β, which gets, thus, eliminated. Also, in this case, tests of hypothesis about α based on this fs1s2,α,β=fs1s2,α are (unconditionally) uniformly most powerful among the class of unbiased test. It is thus natural to base tests about α on this conditional distribution. However, explicitly deriving and working with fs1s2,α is challenging, to say the least.

Conditional distributions, except for trivial, standard, situations, cannot be obtained in closed form. In these cases, inferences have traditionally relied on asymptotic approximations. These approximations, however, may not be appropriate for small, or even moderate sample sizes, and nowadays it has become standard to base inferences on samples generated from the target distributions.

When the target conditional distributions are easy to generate from, either directly or indirectly, samples are easily and conveniently generated with usual Monte Carlo (MC) methods. However, most distributions for X and functional forms for U result in very complicated conditional distributions and simple simulation methods will not suffice. Markov chain Monte Carlo (MCMC) methods (see, for example, Robert and Casella, 1999) have proved to be an excellent tool to simulate from these challenging distributions.

In addition, the peculiarities of conditional distributions, in which regions of high probability can vary wildly with the conditioning values, make implementation of MC and MCMC methods a bit tricky. In particular, we have to guarantee the moves to such high probability regions, and the algorithms have to be suitably adapted to these difficult conditions.

In this paper, we propose different versions of most usual MCMC algorithms—the Metropolis–Hastings algorithm and the Gibbs sampler algorithm, to simulate from conditional distributions. Our particular application is that of model checking from a Bayesian point of view, but the general ideas are applicable for simulating from conditional distributions in many other settings. As an illustration, in Section 6, we address the conditional frequentist testing of the gamma example above.

In the next section, we briefly review both MCMC algorithms. Our specific statistical problem is presented in Section 3, and in Section 4 the algorithms are applied and suitably modified for our problem. In Section 5, we develop a particular application. After the conditional frequentist example in Section 6, we conclude with a comparison of the different algorithms and some comments.

Section snippets

A quick reminder of MCMC algorithms

The generic Metropolis–Hastings (M–H) algorithm (Metropolis et al., 1953; Hastings, 1970) aims at obtaining simulations from a target distribution f(·) difficult to sample from. Candidates are in fact simulated from a conditional proposal density q(y|x), and the candidates are accepted with a suitable probability. The result is a Markov chain converging to simulations from the target density. It is important not only that q(·|x) is fast to simulate from, but also that it results in adequate

Conditional predictive distributions

To explore MCMC algorithms for sampling from conditional distributions, we choose a particular statistical problem in which such conditional distributions are needed. Our specific scenario is model checking from a Bayesian perspective. More specifically, the approach taken by Bayarri and Berger (2000) in which the main tools for model checking are conditional distributions defined on the prior predictive distributions through carefully chosen conditioning statistics (conditioning sets are also

MCMC algorithms to approximate CP and PPP distributions

First, we discuss how to simulate from CP distributions using algorithms based on M–H and Gibbs sampling. Then we produce M–H algorithms to simulate from PPP distributions.

CP and PPP in the normal model with T=maximum

For simplicity of exposition, we exemplify all of the computations in the previous sections in the familiar normal scenario. More complex situations (outlier detection, checking hierarchical models) have been successfully implemented, with the proposed algorithms, in Bayarri and Morales (2003) and Bayarri and Castellanos (2004) (but basically no computational details were there given). Consider the case in which X1,,Xn are i.i.d., XiNμ,σ2 with θ=μ,σ2 unknown and T=maxX1,,Xn.

Prior and

An example on conditional frequentist testing

To give an application to a different statistical scenario, we consider the Gamma conditional testing introduced in Section 1. Specifically, we assume that x=x1,x2,,xn is a random sample from a Ga(α,β) distribution, with both parameters unknown, and that we wish to test whether the data can be assumed to come from an exponential distribution with parameter β, that is, we wish to test whether α=1.

According to the argument in Section 1, we use the test statistic s1=logxi (most powerful unbiased

Conclusions

In this paper, we present different algorithms to simulate from conditional distributions. The main emphasis being in simulating from the (Bayesian) conditional predictive CP and the PPP distributions, but an example is also given in a conditional frequentist testing scenario.

Analytical derivations of conditional distributions are, for the most part, unfeasible; even straight numerical calculus can be severely limited by the difficulty in handling the constraints introduced by the conditioning

Acknowledgements

This research was supported by the Spanish Ministry of Science and Technology, under Grant MTM2004-03290, CGL2004-01335/BOS and TSI2004-06801-C04-01. Part of the work was done while the first author was visiting the Statistical and Applied Mathematical Sciences Institute and ISDS, Duke University. We thank the referees and the editor for their useful comments.

References (28)

  • M.J. Bayarri et al.

    Bayesian measures of surprise for outlier detection

    J. Statist. Plann. Inference

    (2003)
  • W.J. Browne

    MCMC algorithms for constrained variance matrices

    Comput. Statist. Data Anal.

    (2006)
  • O.E. Barndorff-Nielsen

    Parametric Statistical Models and Likelihood

    (1988)
  • V. Barnett

    Comparative Statistical Inference

    (1982)
  • Bayarri M.J., Berger, J.O., 1997. Measures of surprise in Bayesian analysis. Technical Report 97-46, Duke University,...
  • M.J. Bayarri et al.

    Quantifying measures surprise in the data and model verification

  • M.J. Bayarri et al.

    p-Values for composite null models

    J. Amer. Statist. Assoc.

    (2000)
  • M.J. Bayarri et al.

    A comparison between p-values for goodness-of-fit checking

  • Bayarri, M.J., Castellanos, M.E., 2004. Bayesian checking of hierarchical models. ISDS Discussion Paper...
  • J.O. Berger

    Statistical Decision Theory and Bayesian Analysis

    (1985)
  • J.O. Berger et al.

    The Likelihood Principle

    (1984)
  • Berger, J., 2006. The case for objective Bayesian analysis. Bayesian Analysis, in press. Available at...
  • G.E.P. Box

    Sampling and Bayes inference in scientific modelling and robustness

    J. Roy. Statist. Soc. Ser. A

    (1980)
  • B.S. Caffo et al.

    A Markov Chain Monte Carlo algorithm for approximating exact conditional probabilities

    J. Comput. Graphical Statist.

    (2001)
  • Cited by (6)

    View full text