Likelihood analysis of the multivariate ordinal probit regression model for repeated ordinal responses

https://doi.org/10.1016/j.csda.2007.10.025Get rights and content

Abstract

We consider the analysis of longitudinal ordinal data, meaning regression-like analysis when the response variable is categorical with ordered categories, and is measured repeatedly over time (or space) on the experimental or sampling units. Particular attention is given to the multivariate ordinal probit regression model, in which the correlation between ordered categorical responses on the same unit at different times (or locations) is modeled with a latent variable that has a multivariate normal distribution. An algorithm for maximum likelihood analysis of this model is proposed and the analysis is demonstrated on an example. Simulations clarify the extent to which maximum likelihood estimators can be more efficient than generalized estimating equations (GEE) estimators of regression coefficients and the extent to which likelihood ratio tests can be more accurate than tests based on standard errors and approximate normality of GEE estimators.

Introduction

This paper is about the analysis of longitudinal ordinal data, meaning regression-like analysis when the response variable is categorical with ordered categories, and is measured repeatedly over time (or space) on the experimental or sampling units. This is an important data structure in medical studies, for example, when patients receiving different treatments and with different covariate values are categorized according to ordered grades of health status or improvement at multiple points in time. The following two examples indicate the types of data problems we have in mind.

Example 1: A Randomized Experiment on Anaesthesia Recovery

In a longitudinal study that compared the effects of varying dosages of an anaesthetic on post-surgical recovery (Davis, 1991), 60 young children undergoing outpatient surgery were randomized to one of four dosages (15, 20, 25 and 30 mg/kg) of the anaesthesia, with 15 children per dose group. Recovery scores on a seven-point scale (0: least favourable; 6: most favourable) were assigned upon admission to the recovery room and at minutes 5, 15 and 30 following admission. In addition to the dosage, other potential covariates were (a) time when the measurement was taken, (b) age of the patient (in months), (c) duration of the surgery (in minutes).

Example 2: An Observational Study on Marijuana Use

The National Youth Survey (Elliot et al., 1989, Lang et al., 1999) collected five annual waves (1976–80) of data on ‘marijuana use in the past year’ from the 237 respondents who were 13 years old in 1976. The response is on a trichotomous ordinal scale (1, never; 2, not more than once a month; 3, more than once a month). One of the objectives was to model the distribution of marijuana use status over time as a function of gender and time.

Although there are a variety of models and approaches (see Liu and Agresti (2005)), we believe the currently most useful tool for this type of problem is generalized estimating equations (GEE). This uses a generalized linear model for relating the response means to the explanatory variables, and employs a working correlation structure to account for the nonindependence of multiple responses on the same subject. The treatment of correlation is not thought to be realistic with this approach, but sufficient for obtaining estimates of the regression parameters of interest. GEE algorithms for regression models for ordered categorical responses are available (e.g. ordgee in package geepack from R, geeDesign and gee.fit in the correlatedData library from S-PLUS, proc genmod with the REPEATED statement for an independence working correlation from SAS). An important consideration is that these are easy to use (at least, relative to the alternatives) because they are similar to familiar methods for independent responses.

While semi-parametric approaches such as GEE are popular for analysing repeated ordinal responses, there are potentially useful full parametric approaches based on the multivariate ordinal probit model (Fu et al., 2000) and the (multivariate) grouped continuous model (Anderson and Pemberton, 1985). The multivariate ordinal probit model extends the multivariate probit model for a response with two categories to an ordered categorical response with more than two categories. The grouped continuous model was originally developed by Ashford and Sowden (1970) and has been further studied by Amemiya (1981), Ochi and Prentice (1984), Chib and Greenberg (1998) and Renard et al. (2004). Maximum likelihood (ML) analysis of the multivariate ordinal probit model has been slow to evolve because of computational difficulties. As we will show, the direct ML approach requires an evaluation of integrals of multivariate normal density functions. Until recently, this integration was impractical, especially if it involved more than two dimensions. Consequently, most previous applications of the multivariate ordinal probit model were limited to bivariate ordinal probit models (e.g. Kim (1995)). McFadden (1989) and Hajivassiliou et al. (1996) used Monte Carlo techniques for the integral evaluation, but many researchers feel this approach is too computer intensive. Fu et al. (2000) proposed a limited information estimator for approximate likelihood analysis.

In this paper, we propose an algorithm for maximum likelihood computation of the multivariate ordinal probit model, which incorporates a readily available routine for computing multivariate normal probabilities within a numerical maximization procedure. We believe this algorithm avoids some of the practical problems of previous methods. It can make use of existing routines for optimization and for multivariate normal probabilities, available, for example, in the software package R. The availability of this routine will permit studies of efficiency, accuracy of tests and confidence intervals, and robustness, which will help clarify the relative merits of likelihood analysis and GEE.

The rest of this paper is organized as follows. Section 2 reviews GEE approaches for longitudinal ordinal responses. Section 3 presents the multivariate ordinal probit model and an algorithm for maximum likelihood analysis. Section 4 shows an application to the anaesthesia recovery example. Section 5 documents a simulation study comparing likelihood analysis to generalized estimating equations. Section 6 includes some concluding comments.

Section snippets

GEE for longitudinal ordinal data

The original GEE methodology was proposed by Liang and Zeger (1986) for permitting correlation modelling while retaining marginal generalized linear models, such as binomial logistic and Poisson log-linear regression. It was extended to multinomial responses using the cumulative logit link and the cumulative probit link functions for longitudinal ordinal responses in the mid 1990s. Cumulative logit models have been studied by Kenward et al. (1994), Lipsitz et al. (1994), Lumley (1996), Mark and

The multivariate ordinal probit model

Suppose, in a longitudinal study, there are Ti occasions of measurement on observational unit i(i=1,,n). Let ỹit be a G1 vector to represent an ordinal response variable with G categories and let x̃it be a p-dimensional explanatory variable vector observed on subject i at time t(t=1,,Ti). Specifically, ỹit=(yit1,yit2,,yi,t,G1), where yitg=1 if subject i falls into response category g at time t, and yitg=0 otherwise, for g=1,,G. The responses for different subjects are independent, but

Anaesthesia recovery example

Our preceding method is demonstrated on the anaesthesia data described in Section 1, which appear in Appendix II in Davis (1991). Fig. 1, Fig. 2 are plots of the data, which give a rough indication of the relationship between the category of recovery (0–6) and the dose of anaesthesia and time spent by the children in the recovery room. Since 15 profiles on one plot in each dose group cause too much clutter, only profiles of repeated measures of the first four children in each dose group are

Simulation study

The goal of this simulation study is to examine the extent to which maximum likelihood can provide more efficient estimation and more accurate tests than GEE. There is, of course, an important additional question about whether likelihood analysis is robust against possible model misspecification. But given that GEE is currently popular and useful, we wish to first see if the operating characteristics of the maximum likelihood estimates offer improvements that are substantial enough to warrant

Discussion

For estimating the various models in the anaesthesia recovery example above, the number of iterations required for convergence was never more than four with a convergence criterion of 0.01% for all parameter estimates and the log likelihood. Each iteration required about 2 min using an R routine on an Intel Pentium 1.60 GHz PC with 480 MB of RAM. We believe the algorithm is practical for maximum likelihood estimation and likelihood ratio inference in data analysis of repeated ordinal responses,

Acknowledgments

The authors are grateful to two anonymous referees whose insightful comments have greatly improved the paper.

References (38)

  • C.S. Davis

    Semi-parametric and non-parametric methods for the analysis of repeated measurement with applications to clinical trials

    Stat. Med.

    (1991)
  • J.E. Dennis et al.

    Numerical Methods for Unconstrained Optimization and Nonlinear Equations

    (1983)
  • B. Efron et al.

    Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information

    Biometrika

    (1978)
  • D.S. Elliot et al.

    Multiple Problem Youth: Delinquence, Substance Use and Mental Health Problems

    (1989)
  • T.T. Fu et al.

    A limited information estimator for the multivariate ordinal probit model

    Appl. Econ.

    (2000)
  • A.G. Gelman et al.

    Bayesian Data Analysis

    (2004)
  • A. Genz

    Numerical computation of multivariate normal probabilities

    J. Comput. Graph. Statist.

    (1992)
  • A. Genz

    Comparison of methods for the computation of multivariate normal probabilities

    Comput. Sci. Statist.

    (1993)
  • P.J. Heagerty et al.

    Marginal regression models for clustered ordinal measurements

    J. Amer. Statist. Assoc.

    (1996)
  • Cited by (0)

    View full text