Bayesian computation for logistic regression

https://doi.org/10.1016/j.csda.2004.04.009Get rights and content

Abstract

A method for the simulation of samples from the exact posterior distributions of the parameters in logistic regression is proposed. It is based on the principle of data augmentation and a latent variable is introduced, similar to the approach of Albert and Chib (J. Am. Stat. Assoc. 88 (1993) 669), who applied it to the probit model. In general, the full conditional distributions are intractable, but with the introductions of the latent variable all conditional distributions are uniform, and the Gibbs sampler is easily applicable. Marginal likelihoods for model selection can be obtained at the expense of additional Gibbs cycles. The technique is extended and can be applied with nominal or ordinal polychotomous data.

Introduction

When modelling binary data, the outcome variable Y has a Bernoulli distribution with probability of success π. If the probability of success depends on a set of covariates, then we have a distinct probability πi, specific to the ith observation, Yi. The probability πi is regressed on the covariates through a link function that preserves the properties of probability. So πi=H(βxi) where xi is the vector of covariates associated with the ith observation, 0⩽H(.)⩽1, and H(.) is a continuous non-decreasing function. Usually the link function is taken as the cumulative distribution function (CDF) of some continuous random variable, defined on the whole real line. The two link functions in common use are the CDF of the standard normal distribution, the probit model, and the CDF of the logistic distribution, the logit model. These kinds of models are described in detail in a number of books. See, for example, Cox (1971) or Maddala (1983). For a sample of n observations, the likelihood function is given byL(β|data)∝i=1nH(βxi)yi(1−H(βxi))1−yi.When using maximum likelihood estimation, inferences about the model are usually based on asymptotic theory. Griffiths et al. (1987) found that the MLEs have significant bias for small samples. With the Bayesian approach and prior π(β), the posterior of β is given byπ(β|data)∝π(β)L(β|data),which is intractable in the case of the probit and logit models. In the past, asymptotic normal approximations were used for the posterior of β. Zellner and Rossi (1984) used numerical integration when the number of parameters is small. Albert and Chib (1993) introduced a simulation-based approach for the computation of the exact posterior distribution of β in the case of the probit model. The approach is based on the idea of data augmentation (Tanner and Wong, 1987), where a normally distributed latent variable is introduced into the problem. This approach also enables them to model binary data using a t link function.

In this paper we apply the data augmentation approach of Albert and Chib (1993) to the logit model. This enables us to use Gibbs sampling to obtain samples from the posterior distribution of β, drawing only from uniform distributions. The technique is extended in Section 3 to multiple response categories, and in Section 4 applied to ordinal responses where the thresholds, or cut off points, must also be estimated. Again, only simulation from uniform distributions is required to obtain marginal posterior distributions.

Gibbs sampling is a simplified version of the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970), and applicable when it is possible to sample directly from all conditional distributions. The Metropolis–Hastings algorithm is usually employed in the case of logistic regression. Other Markov chain Monte Carlo techniques in use are adaptive rejection sampling (ARS), which is used in the WinBugs software, and adaptive rejection metropolis sampling (ARMS).

While marginal posterior distributions of parameters in logistic regression can be obtained using WinBugs, it cannot provide marginal likelihoods. In Section 5 the data augmentation technique is applied to model selection via Bayes factors. Based on a method proposed by Chib (1995), the marginal likelihood under a particular model can be calculated by running additional Gibbs cycles, one for each parameter in the model. In Section 6 the technique is illustrated by two applications.

Section snippets

Dichotomous response variable

LetYi=1withprobabilityπi0withprobability1−πii=1,2,…,n,wherelogπi1−πi=βxi,i.e. the log-odds for the ith sampling unit is a linear function of the observed covariates xi=(1,xi1,xi2,…,xip)′, where β=(β0,β1,…,βp) is a row vector of regression coefficients. Thenπi=exp(βxi)1+exp(βxi)=FZ(βxi)where FZ(.) is the CDF of the logistic random variable Z, with probability density functionfZ(z)=exp(z)(1+exp(z))2,−∞<z<∞.Soπi=−∞βxiexp(z)(1+exp(z))2dz,=PU<exp(βxi)1+exp(βxi),where U has a Uniform (0,1)

Polychotomous response variable

Consider now the case where Y has more than two response categories and assume independence among repeated trials. This results in a data set of n observations whose distribution is multinomial with r categories. Letπij=P(Yi=j),and assume the logit link function,logπijπir=βjxi,j=1,2,…,r.Thenπij=exp(βjxi)1+∑s=1r−1exp(βsxi),j=1,2,…,r.=PU<exp(βjxi)1+∑s=1r−1exp(βsxi),where U∼Uniform(0,1). Note that the rth category is the baseline category and βr=0. The joint posterior distribution of β={βjk}((r

Ordinal responses

Suppose Yi can take one of r ordered categories, j=1,2,…,r, so that P(Yi=j)=πij, and the cumulative probabilities are ηij=∑k=1jπik=P(Yi⩽j). Introduce the continuous latent variable U, uniformly distributed over [0,1], such thatηij=PUi<expj+βxi)1+expj+βxi)=expj+βxi)1+expj+βxi),where β=(β1,β2,…,βp) are the regression coefficients and α=(α0,α1,…,αr) are the cut-off points of the intervals, such that −∞=α0<α1<⋯<αr=∞.

The joint posterior distribution of α,β and u, given the response y, is

Model selection

Bayesian model selection, or variable selection, is usually based on the Bayes factor, which is the ratio of the marginal likelihoods of two competing models. Priors in general should be proper, so in the context of the dichotomous model of Section 2, exchangeable logistic priors with mean zero and scale parameter σ are assumed for the elements of βt, the set of regression coefficients under model Mt. Let pt be the number of covariates included under model Mt, thenπ(βj|σ,Mt)=expj/σ)σ(1+expj

Application 1

Piegorsch (1992) analysed data on the analgesic effect of iontophoretic treatment with the chemical vincristine on elderly patients complaining of postherpetic neuralgia. Eighteen patients were interviewed 6 weeks after undergoing treatment to determine if any improvement in the neuralgia was evident. The response variable Y is 1 if an improvement was recorded, and 0 otherwise. The four covariates are X1; treatment (1 or 0), X2; age, X3; sex (1 for male), X4; pre-treatment duration of symptoms.

Conclusion

The main purpose of this paper is to illustrate a relatively simple method of simulating values from the marginal posterior distributions of the parameters in a logit model using the Gibbs sampler. This model is also very suitable for calculating marginal likelihoods and thus Bayes factors when comparing competing models. As the full conditional distributions of the parameters are intractable, Bayesian analyses usually employed the Metropolis–Hastings algorithm to obtain posterior

References (14)

  • A. Zellner et al.

    Bayesian analysis of dichotomous quantal response models

    J. Econometrics

    (1984)
  • J.H. Albert et al.

    Bayesian analysis of binary and polychotomous response data

    J. Amer. Statist. Assoc

    (1993)
  • S. Chib

    Marginal likelihood from the Gibbs output

    J. Amer. Statist. Assoc

    (1995)
  • D.R. Cox

    The Analysis of Binary Data

    (1971)
  • L. Fahrmeir et al.

    Multivariate Statistical Modelling Based on Generalized Linear Models

    (2001)
  • A.E. Gelfand et al.

    Sampling-based approaches to calculating marginal densities

    J. Amer. Statist. Assoc

    (1990)
  • W.E. Griffiths et al.

    Small sample properties of probit model estimators

    J. Amer. Statist. Assoc

    (1987)
There are more references available in the full text version of this article.

Cited by (13)

View all citing articles on Scopus
View full text