Bayesian computation for logistic regression

doi:10.1016/j.csda.2004.04.009

Computational Statistics & Data Analysis

Volume 48, Issue 4, 1 April 2005, Pages 857-868

https://doi.org/10.1016/j.csda.2004.04.009 Get rights and content

Abstract

A method for the simulation of samples from the exact posterior distributions of the parameters in logistic regression is proposed. It is based on the principle of data augmentation and a latent variable is introduced, similar to the approach of Albert and Chib (J. Am. Stat. Assoc. 88 (1993) 669), who applied it to the probit model. In general, the full conditional distributions are intractable, but with the introductions of the latent variable all conditional distributions are uniform, and the Gibbs sampler is easily applicable. Marginal likelihoods for model selection can be obtained at the expense of additional Gibbs cycles. The technique is extended and can be applied with nominal or ordinal polychotomous data.

Introduction

When modelling binary data, the outcome variable Y has a Bernoulli distribution with probability of success π. If the probability of success depends on a set of covariates, then we have a distinct probability π_i, specific to the ith observation, Y_i. The probability π_i is regressed on the covariates through a link function that preserves the properties of probability. So π_i=H(βx_i) where x_i is the vector of covariates associated with the ith observation, 0⩽H(.)⩽1, and H(.) is a continuous non-decreasing function. Usually the link function is taken as the cumulative distribution function (CDF) of some continuous random variable, defined on the whole real line. The two link functions in common use are the CDF of the standard normal distribution, the probit model, and the CDF of the logistic distribution, the logit model. These kinds of models are described in detail in a number of books. See, for example, Cox (1971) or Maddala (1983). For a sample of n observations, the likelihood function is given by $L(β | data)∝ ∏ i=1 n H(β x_{i})^{y_{i}} (1−H(β x_{i}))^{1−y_{i}} .$ When using maximum likelihood estimation, inferences about the model are usually based on asymptotic theory. Griffiths et al. (1987) found that the MLEs have significant bias for small samples. With the Bayesian approach and prior π(β), the posterior of β is given by $π(β | data)∝π(β)L(β | data),$ which is intractable in the case of the probit and logit models. In the past, asymptotic normal approximations were used for the posterior of β. Zellner and Rossi (1984) used numerical integration when the number of parameters is small. Albert and Chib (1993) introduced a simulation-based approach for the computation of the exact posterior distribution of β in the case of the probit model. The approach is based on the idea of data augmentation (Tanner and Wong, 1987), where a normally distributed latent variable is introduced into the problem. This approach also enables them to model binary data using a t link function.

In this paper we apply the data augmentation approach of Albert and Chib (1993) to the logit model. This enables us to use Gibbs sampling to obtain samples from the posterior distribution of β, drawing only from uniform distributions. The technique is extended in Section 3 to multiple response categories, and in Section 4 applied to ordinal responses where the thresholds, or cut off points, must also be estimated. Again, only simulation from uniform distributions is required to obtain marginal posterior distributions.

Gibbs sampling is a simplified version of the Metropolis–Hastings algorithm (Metropolis et al., 1953; Hastings, 1970), and applicable when it is possible to sample directly from all conditional distributions. The Metropolis–Hastings algorithm is usually employed in the case of logistic regression. Other Markov chain Monte Carlo techniques in use are adaptive rejection sampling (ARS), which is used in the WinBugs software, and adaptive rejection metropolis sampling (ARMS).

While marginal posterior distributions of parameters in logistic regression can be obtained using WinBugs, it cannot provide marginal likelihoods. In Section 5 the data augmentation technique is applied to model selection via Bayes factors. Based on a method proposed by Chib (1995), the marginal likelihood under a particular model can be calculated by running additional Gibbs cycles, one for each parameter in the model. In Section 6 the technique is illustrated by two applications.

Section snippets

Dichotomous response variable

Let $Y_{i} = 1 with probability π_{i} 0 with probability 1−π_{i} i=1,2,…,n,$ where $log π_{i} 1−π_{i} = β x_{i},$ i.e. the log-odds for the ith sampling unit is a linear function of the observed covariates x_i=(1,x_i1,x_i2,…,x_ip)′, where β=(β₀,β₁,…,β_p) is a row vector of regression coefficients. Then $π_{i} = exp (β x_{i}) 1+ exp (β x_{i}) =F_{Z} (β x_{i})$ where F_Z(.) is the CDF of the logistic random variable Z, with probability density function $f_{Z} (z)= exp (z) (1+ exp (z))^{2}, −∞<z<∞.$ So $π_{i} = ∫ −∞ β x_{i} exp (z) (1+ exp (z))^{2} d z,=P U< exp (β x_{i}) 1+ exp (β x_{i}),$ where U has a Uniform (0,1)

Polychotomous response variable

Consider now the case where Y has more than two response categories and assume independence among repeated trials. This results in a data set of n observations whose distribution is multinomial with r categories. Let $π_{ij} =P(Y_{i} =j),$ and assume the logit link function, $log π_{ij} π_{ir} = β_{j} x_{i}, j=1,2,…,r.$ Then $π_{ij} = exp (β_{j} x_{i}) 1+∑_{s=1}^{r−1} exp (β_{s} x_{i}), j=1,2,…,r.$ $=P U< exp (β_{j} x_{i}) 1+∑_{s=1}^{r−1} exp (β_{s} x_{i}),$ where $U∼ Uniform (0,1)$ . Note that the rth category is the baseline category and β_r=0. The joint posterior distribution of β={β_jk}((r

Ordinal responses

Suppose Y_i can take one of r ordered categories, j=1,2,…,r, so that P(Y_i=j)=π_ij, and the cumulative probabilities are $η_{ij} =∑_{k=1}^{j} π_{ik} =P(Y_{i} ⩽j)$ . Introduce the continuous latent variable U, uniformly distributed over [0,1], such that $η_{ij} =P U_{i} < exp (α_{j} + β x_{i}) 1+ exp (α_{j} + β x_{i}) = exp (α_{j} + β x_{i}) 1+ exp (α_{j} + β x_{i}),$ where β=(β₁,β₂,…,β_p) are the regression coefficients and α=(α₀,α₁,…,α_r) are the cut-off points of the intervals, such that −∞=α₀<α₁<⋯<α_r=∞.

The joint posterior distribution of α,β and u, given the response y, is

Model selection

Bayesian model selection, or variable selection, is usually based on the Bayes factor, which is the ratio of the marginal likelihoods of two competing models. Priors in general should be proper, so in the context of the dichotomous model of Section 2, exchangeable logistic priors with mean zero and scale parameter σ are assumed for the elements of β_t, the set of regression coefficients under model M_t. Let p_t be the number of covariates included under model M_t, then $π(β_{j} | σ,M_{t})= exp (β_{j} /σ) σ(1+ exp (β_{j}$

Application 1

Piegorsch (1992) analysed data on the analgesic effect of iontophoretic treatment with the chemical vincristine on elderly patients complaining of postherpetic neuralgia. Eighteen patients were interviewed 6 weeks after undergoing treatment to determine if any improvement in the neuralgia was evident. The response variable Y is 1 if an improvement was recorded, and 0 otherwise. The four covariates are X₁; treatment (1 or 0), X₂; age, X₃; sex (1 for male), X₄; pre-treatment duration of symptoms.

Conclusion

The main purpose of this paper is to illustrate a relatively simple method of simulating values from the marginal posterior distributions of the parameters in a logit model using the Gibbs sampler. This model is also very suitable for calculating marginal likelihoods and thus Bayes factors when comparing competing models. As the full conditional distributions of the parameters are intractable, Bayesian analyses usually employed the Metropolis–Hastings algorithm to obtain posterior

References (14)

A. Zellner et al.
Bayesian analysis of dichotomous quantal response models
J. Econometrics
(1984)
J.H. Albert et al.
Bayesian analysis of binary and polychotomous response data
J. Amer. Statist. Assoc
(1993)
S. Chib
Marginal likelihood from the Gibbs output
J. Amer. Statist. Assoc
(1995)
D.R. Cox
The Analysis of Binary Data
(1971)
L. Fahrmeir et al.
Multivariate Statistical Modelling Based on Generalized Linear Models
(2001)
A.E. Gelfand et al.
Sampling-based approaches to calculating marginal densities
J. Amer. Statist. Assoc
(1990)
W.E. Griffiths et al.
Small sample properties of probit model estimators
J. Amer. Statist. Assoc
(1987)

There are more references available in the full text version of this article.

Cited by (13)

Uncertainty quantification in logistic regression using random fuzzy sets and belief functions
2024, International Journal of Approximate Reasoning
Evidential likelihood-based inference is a new approach to statistical inference in which the relative likelihood function is interpreted as a possibility distribution. By expressing new data as a function of the parameter and a random variable with known probability distribution, one then defines a random fuzzy set and an associated predictive belief function representing uncertain knowledge about future observations. In this paper, this approach is applied to binomial and multinomial regression. In the binomial case, the predictive belief function can be computed by numerically integrating the possibility distribution of the posterior probability. In the multinomial case, the solution is obtained by a combination of constrained nonlinear optimization and Monte Carlo simulation. In both cases, computations can be considerably simplified using a normal approximation to the relative likelihood. Numerical experiments show that decision rules based on predictive belief functions make it possible to reach lower error rates for different rejection rates, as compared to decisions based on posterior probabilities.
Divergence measures and a general framework for local variational approximation
2011, Neural Networks
The local variational method is a technique to approximate an intractable posterior distribution in Bayesian learning. This article formulates a general framework for local variational approximation and shows that its objective function is decomposable into the sum of the Kullback information and the expected Bregman divergence from the approximating posterior distribution to the Bayesian posterior distribution. Based on a geometrical argument in the space of approximating posteriors, we propose an efficient method to evaluate an upper bound of the marginal likelihood. Moreover, we demonstrate that the variational Bayesian approach for the latent variable models can be viewed as a special case of this general framework.
Uncertainty Quantification in Logistic Regression Using Random Fuzzy Sets and Belief Functions
2023, SSRN
Measuring diachronic sense change: New models and Monte Carlo methods for Bayesian inference
2022, Journal of the Royal Statistical Society. Series C: Applied Statistics
A simulation study to compare reference and other priors in the case of a standard univariate Student t-distribution
2022, South African Statistical Journal
Measuring diachronic sense change: new models and Monte Carlo methods for Bayesian inference
2021, arXiv

View all citing articles on Scopus

View full text

Bayesian computation for logistic regression

Abstract

Introduction

Section snippets

Dichotomous response variable

Polychotomous response variable

Ordinal responses

Model selection

Application 1

Conclusion

J. Econometrics

Bayesian analysis of binary and polychotomous response data

J. Amer. Statist. Assoc

Marginal likelihood from the Gibbs output

J. Amer. Statist. Assoc

The Analysis of Binary Data

Multivariate Statistical Modelling Based on Generalized Linear Models

Sampling-based approaches to calculating marginal densities

J. Amer. Statist. Assoc

Small sample properties of probit model estimators

J. Amer. Statist. Assoc