Elsevier

Handbook of Statistics

Volume 23, 2003, Pages 575-602
Handbook of Statistics

The Matrix-Valued Counting Process Model with Proportional Hazards for Sequential Survival Data

https://doi.org/10.1016/S0169-7161(03)23033-9Get rights and content

Publisher Summary

This chapter provides an overview of general survival notation and the Cox proportional hazards model and discusses some of the generalizations of this popular model currently in the literature. It introduces the Matrix Valued Counting Process (MVCP) framework. This framework allows for the use of any univariate model with the addition of conditional probabilities that account for the dependence. The framework relies on the underlying model—proportional hazards, parametric, or whatever is most appropriate—to define the main parameters of interest. The chapter presents an overview of Sen and Pedroso de Lima's work. It describes the challenges presented by repeated measures survival data and extends their work by describing a series of possible parameterizations along with example clinical situations for which they would be appropriate. The changes to Pedroso de Lima and Sen's likelihood, score function, and information matrix for the MVCP with a Cox proportional hazards model are also discussed. The chapter applies the model to a dose-escalation study of hydroxyurea in children suffering from sickle cell disease and interprets the results.

Introduction

Biological systems are frequently complex with many different sources of variability. Trying to separate these processes into distinct components often requires measurements from related units as well as independent units. When the outcome of interest is time to event, the common approach is to use survival methods. Survival analysis is rich with multivariate models, specifically designed to handle the complex data questions found in dependent data. These models range from strict parametric distributions to semi-parametric extensions of the proportional hazards model to purely nonparametric methods, all of which approach the problem of correlated data differently. Generally, the parametric models precisely parameterize the dependence; conditional models condition the outcome on the other dependent outcomes; marginal models simply ignore the dependence and adjust for its influence; and random effects models estimate the dependence in the form of the covariance matrix. Although different in their techniques, each of these models strives to overcome the same problem – bias introduced into the estimates of the parameters and their variances by correlated data. An overview of these different approaches to multivariate survival data can be found in Clegg et al. (1999).

All of these models, however, operate under certain assumptions about the structure of the data. In repeated measures survival analysis, the number of events seen in an individual is random and can vary widely. In this type of data, the correlated clusters are typically represented by individuals, with the correlation deriving from all of the measurements coming from one person. A couple of typical examples of this type of data are seen in measuring the time between repeat hospitalizations for some chronic disease or the time between pediatric infections or successive study-defined events, such as a lab defined “toxicity”. Since the typical multivariate survival dataset consists of clusters with a fixed, predetermined size limit, parameterizing a model in the presence of random cluster sizes poses a challenge to most multivariate survival models. Especially when those random sizes can range from no events to fifteen events. One method used in typical models to adjust for the random number of event times is to eliminate valuable information by only looking at the time to the first few events. Additionally, some of these models control for the dependence without describing it, yielding no inference or description of the dependence parameters. Others parameterize the dependence, but in a highly specific manner with a non-intuitive clinical interpretation.

Some novel methods, however, attempt to handle such variably sized clusters. Hoffman et al. (2001) present an approach based on within-cluster resampling. In this method, they randomly sample one observation from each cluster, analyze the resulting dataset using existing methods and then repeat this sample-analyze algorithm a large number of times. The regression parameters are estimated by the average of the resample-based estimates. Their method, although computationally intensive, results in valid estimators even when the risk for the outcome of interest is related to the cluster size. It has, however, only been presented for binary outcomes and could be used in conjunction with the methods presented in this article.

A new alternative for modeling multivariate survival data called the Matrix Valued Counting Process (MVCP) framework was introduced by Pedroso de Lima, 1995, Pedroso de Lima and Sen, 1997, Pedroso de Lima and Sen, 1998. This framework allows for the use of any univariate model with the addition of conditional probabilities that account for the dependence. The framework relies on the underlying model–proportional hazards, parametric, or whatever is most appropriate–to define the main parameters of interest. It then defines a set of conditional probabilities that can be estimated to assess the nature of the dependency between observations. These dependence parameters can be interpreted as the multiplicative increase or decrease in the hazard of an event, conditional on the length of the other event times. Additionally, these parameters can be varied across event times, yielding a more flexible model. Most importantly for this research, the MVCP allows for fixed effect and covariance parameters to be estimated from variably sized clusters.

Pedroso de Lima and Sen described the general multivariate framework and developed the asymptotic theory using the Cox Proportional Hazards model as an underlying parameterization. Pedroso de Lima, 1995, Pedroso de Lima and Sen, 1997, Pedroso de Lima and Sen, 1998 They only considered a component-type data setup with equal cluster sizes, however. The MVCP framework is extended here to handle the issue of variable cluster size found in the multivariate sequential type of data. Additionally, this paper provides the first data analysis using this methodology. The first section after the introduction overviews general survival notation, the Cox Proportional Hazards model and discusses some of the generalizations of this popular model currently in the literature. Then, to introduce the MVCP framework, an overview of Sen and Pedroso de Lima's work will be presented in the Section three. Section four describes the challenges presented by repeated measures survival data and extends their work by describing a series of possible parameterizations along with example clinical situations for which they would be appropriate. Section five overviews the changes to Pedroso de Lima and Sen's likelihood, score function, and information matrix for the MVCP with a Cox proportional hazards model, with specifics in the Appendix. In Section six, we apply the model to a dose-escalation study of hydroxyurea in children suffering from sickle cell disease and interpret the results. The final portion of the paper is dedicated to a discussion of the interpretations, strengths, and weaknesses of the MVCP framework applied to this type of data. Appendix A details the modifications to the likelihood function, score function, information matrix, and asymptotic properties of the model needed to model variably-sized clusters.

Section snippets

Preliminary notation

In survival analysis, the time to some event is used as the outcome, however, there are often cases where this failure time is not observed, when the observations are censored after some time. Specifically, let T, C, and X=X(·) have a joint distribution, where T and C are the survival and censoring times respectively, and X(t) is a random vector of possibly time-dependent design covariates (also known as explanatory or auxiliary covariates). Set Y=min(T,C) and δ=I{TC}. Y is the observed time

The matrix valued counting process framework

In this section, we outline the Matrix Valued Counting Process methodology, a framework in which various survival models can be parameterized for multivariate correlated data. Perhaps, it is more convenient to start with a simple bivariate case and then proceed to the general multivariate case; in that way the complexities arising in the general multivariate case can be visualized more clearly.

Matrix valued counting process framework with repeated measures data

We first address the issue of adapting the Matrix Valued Counting Process Model to handle the challenges found in repeated measures data. We start with some notation and then address the issue of event times running in sequence rather than the traditional event times running concurrently. We then discuss the impact of variable cluster size on the general MVCP framework.

For each cluster, k, we have a variable number of event times, we can denote this with the vector Tk=(Tk1,Tk2,…,Tkjk) as our

Estimation

Using a parameterization appropriate to the repeated measures data structure, we now expand Pedroso de Lima's (1995) likelihood, score function, and description of the asymptotic properties of the model to the variably-sized clusters found in the sequential survival data. Details of these calculations can be found in Appendix A. For brevity's sake, we present only the final results for each of these important functions. Since our focus is on sequential events within an individual, we will

Background

In this section, we present the data analyses for the motivating clinical example. The HUG-KIDS study was designed to determine the short-term toxicity profile, laboratory changes, and clinical efficacy associated with hydroxyurea therapy (HU) in pediatric patients with severe sickle cell anemia (Kinney et al., 1999) Eighty-four children with sickle cell anemia, aged five to fifteen years were enrolled from ten centers. This study was not an efficacy study, but rather a monitoring and

Discussion

The most predictive covariates in the unadjusted marginal models were ALT, neutrophil count, and reticulocyte count. Higher levels of ALT were associated with an increased risk of toxicity, while lower levels of neutrophils and reticulocytes were associated with an increased risk.

The MVCP was fit with only four between-event hazard dependence parameters because that was the maximum number the data would support. However, event times for all other events were left in the data and contributed to

References (17)

  • T.R. Kinney et al.

    Safety of hydroxyurea in children with sickle cell anemia: Results of the HUG-KIDS study, a phase I/II trial

    Pediatric Hydroxyurea Group. Blood.

    (1999)
  • P.K. Andersen et al.

    Statistical Models Based on Counting Processes

    (1981)
  • L.X. Clegg et al.

    A marginal mixed baseline hazards model for multivariate failure time data

    Biometrics

    (1999)
  • D.R. Cox

    Regression models and life-tables (with discussion)

    JRSS-B

    (1972)
  • D.R. Cox

    Partial likelihood

    Biometrika

    (1975)
  • R.A. DeMasi et al.

    A family of bivariate failure time distributions with proportional crude and conditional hazards

    Comm. Statist.: Theory Methods

    (1998)
  • R.A. DeMasi et al.

    Statistical models and asymptotic results for multivariate failure time data with generalized competing risks

    Sankhya (Ser. A)

    (1997)
  • R.A. DeMasi

    Statistical methods for multivariate failure time data and competing risks

There are more references available in the full text version of this article.

Cited by (0)

View full text