Influence analyses of nonlinear mixed-effects models

https://doi.org/10.1016/S0167-9473(02)00303-1Get rights and content

Abstract

Nonlinear mixed-effects models are very useful in analyzing repeated-measures data and have received a lot of attention in the field. In this paper, we propose a method to detect influential observations in such models, on the basis of the maximum likelihood estimates that are obtained by a stochastic approximation algorithm with Markov chain Monte Carlo method. The development utilizes the data augmentation technique that treats the random effects as missing data, and considers the conditional expectation of the complete-data log-likelihood function relating to an EM algorithm. Diagnostic measures are derived from the case-deletion approach and the local influence approach, and are approximated by a large sample of random effects that are simulated from the appropriate conditional distributions by a Metropolis–Hastings algorithm. Results obtained from two illustrative examples are reported.

Introduction

There has been a great deal of recent interest in mixed-effects models for repeated-measures data that arise in different areas of investigation, such as economics and pharmacokinetics. Repeated-measures data are generated by observing a number of subjects (individuals) repeatedly under differing experimental conditions. Observations on the same subject are usually made at different times, as in longitudinal studies. Mixed-effects models assume that the intrasubject model that relates the response variable to time is the same to all subjects, but the model parameters may vary with subject. The linear mixed-effects model is no doubt an important statistical tool which is frequently used for evaluating the performance of products, for determining sampling designs and quality-control procedures, and for statistical genetics, particularly longitudinal studies. However, many repeated-measures data, such as growth data, dose-response data and pharmacokinetic data, are often inherently nonlinear with respect to a given response regression function. Several different nonlinear mixed-effects models have been proposed in recent years (see Sheiner and Beal, 1980; Mallet et al., 1988; Lindstrom and Bates, 1990; Davidian and Gallant, 1993; Vonesh and Carter, 1992; Pinheiro and Bates, 1995; Walker, 1996; Vonesh et al., 2002 among others). Due to the complexity of the models, obtaining the maximum likelihood (ML) estimates is a nontrivial problem. ML estimation was pioneered by Beal and Sheiner (1979), and since then a number of algorithms have been proposed for achieving the approximate ML solution, including Lindstrom and Bates (1990), Beal and Sheiner (1992), Pinheiro and Bates (1995), and Davidian and Gallant (1993). Recently, Walker (1996) introduced an EM algorithm (Dempster et al., 1977) for exact ML estimation.

Detecting outliers and influential observations and studying the sensitivity about the departure from basic assumptions are important issues in statistical analysis.

Following the pioneering work of Cook 1977, Cook 1986, this area of research has received much attention, see Belsley et al. (1980), Banerjee and Frees (1997), Christensen et al. (1992), Chatterjee and Hadi (1988), Crichley et al. (2001), Lesaffre and Verbeke (1998), Zhu and Lee (2001), Zhu et al. (2001), and Lee and Xu (2002) among others. For the nonlinear mixed-effects models, very little has been done on achieving the local influence measures and the case deletion measures. The main objective of this paper is to develop some methods to obtain these measures. Given the ML estimates, we obtain the diagnostic measures via the methods proposed by Zhu and Lee (2001) and Zhu et al. (2001). The key idea of the development is to treat the random effects as hypothetical missing data and work with the conditional expectation of the complete-data log-likelihood function in the EM algorithm (Dempster et al., 1977). Diagnostic measures for local influence are based on the conformal normal curvature (Poon and Poon, 1999), whilst the case-deletion measures are based on the one-step approximation of Cook and Weisberg (1982). These diagnostic measures cannot be obtained in closed form as they involve intractable integrals. A Metropolis–Hastings (MH) algorithm (Metropolis et al., 1953; Hastings, 1970) is implemented to simulate a sufficiently large sample of random effects from the appropriate conditional distribution for approximating these integrals. As this sample can be obtained as a by-product in the estimation, the computational burden induced is light.

The paper is organized as follows. Section 2 introduces the nonlinear mixed-effects models and the ML estimation. The diagnostics measures are derived in Section 3. Two real examples are give in Section 4. Some technical details are given in the appendices.

Section snippets

Nonlinear mixed-effects model and its ML estimation

Consider the following nonlinear mixed-effects model as proposed by Pinheiro and Bates (1995). In the first stage the jth observation on the ith subject is modeled asyij=f(φij,xij)+εij,i=1,…,I,j=1,…,ni,where f is a nonlinear function of a subject-specific parameter vector φij and the predictor xij, εij is a normally distributed noised term, I is the total number of subjects, and ni is the number of observations on the ith subject. In the second stage the subject-specific parameter vector is

Diagnostic analysis

There are basically two approaches for detecting influential observations that seriously influence results of a statistical analysis. The first approach is the case-deletion approach, in which the impact of deleting an observation to estimation is direct assessed by some metrics such as the likelihood distance and the Cook's distance (see, Cook, 1977). The second approach is the local influence approach (Cook, 1986), in which the stability of the estimation outputs with respect to the model

Illustrative examples

In the following examples, all quantities for achieving diagnostic measures are based on formulas , , , , with T=2000 observations generated by the MH algorithm from the appropriate conditional distributions. The benchmark is taken to be 1/m+2SM(0). Results are obtained by computer programs written in C language, listing of these programs can be obtained from the authors upon request.

Example 1 Orange trees data

The data, which consist of seven measurements of the trunk circumference (in millimeters) on each of five

Discussion

As the observed-data log-likelihood function of the nonlinear mixed-effect models is rather complicated, it is difficult to detect influential observations by direct application of the traditional approaches given in Cook 1977, Cook 1986. In this paper, we propose a procedure for computing case-deletion measures and local influence diagnostics on the basis of the conditional expectation of the complete-data log-likelihood function in relation to the EM algorithm. As observations simulated at

Acknowledgements

The research is fully supported by a grant (CUHK 4356/00H) from the Research Grant Council of the Hong Kong Special Administration Region.

References (35)

  • M. Banerjee et al.

    Influence diagnostics for linear longitudinal models

    J. Amer. Statist. Assoc.

    (1997)
  • D.M. Bates et al.

    Nonlinear Regression Analysis and Its Applications

    (1988)
  • Beal, S.L., Sheiner, L.B., 1979. NONMEM Users’ Guide, Part I. Division of Clinical Pharmacology, University of...
  • Beal, S.L., Sheiner, L.B., 1992. NONMEM Users’ Guide, Part VII, Conditional Estimation Methods, NONMEM Project Group....
  • R.J. Beckman et al.

    Outliers

    Technometrics

    (1983)
  • D.A. Belsley et al.

    Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

    (1980)
  • S. Chatterjee et al.

    Sensitivity Analysis of Linear Regression

    (1988)
  • R. Christensen et al.

    Case-deletion diagnostics for mixed models

    Technometrics

    (1992)
  • R.D. Cook

    Detection of influential observations in linear regression

    Technometrics

    (1977)
  • R.D. Cook

    Assessment of local influence

    J. Roy. Statist. Soc. Ser. B

    (1986)
  • R.D. Cook et al.

    Residuals and Influence in Regression

    (1982)
  • F. Crichley et al.

    Influence analysis based on the case sensitivity function

    J. Roy. Statist. Soc. Ser. B

    (2001)
  • M. Davidian et al.

    The nonlinear mixed effects model with a smooth random effects density

    Biometrika

    (1993)
  • A.P. Dempster et al.

    Maximum likelihood from incomplete data via the EM algorithm

    J. Roy. Statist. Soc. Ser. B

    (1977)
  • N.R. Draper et al.

    Applied Regression Analysis

    (1981)
  • M.G. Gu et al.

    A stochastic approximation algorithm with Markov chain Monte-Carlo method for incomplete data estimation problems

    Proc. Nat. Acad. Sci. USA

    (1998)
  • M.G. Gu et al.

    Maximum likelihood estimation for spatial models by Markov chain Monte Carlo stochastic approximation

    J. Roy. Statist. Soc. Ser. B

    (2001)
  • Cited by (0)

    View full text