Elsevier

Handbook of Statistics

Volume 23, 2003, Pages 383-394
Handbook of Statistics

Hosmer and Lemeshow type Goodness-of-Fit Statistics for the Cox Proportional Hazards Model

https://doi.org/10.1016/S0169-7161(03)23021-2Get rights and content

Publisher Summary

This chapter discusses goodness-of-fit tests for the Cox proportional hazards model that are based on ideas similar to the Hosmer and Lemeshow goodness-of-fit test for logistic regression. All of these tests can be derived by adding group-indicator variables to the model and testing the hypothesis that the coefficients of the group indicator variables are zero via the score test. The tests that can be derived in this way are called the “added variable tests.” Care needs to be taken when implementing these tests because some of them require the use of time-dependent group indicator variables. Information regarding the time-dependent nature of the tests is also provided along with examples.

Introduction

The Cox (1972) proportional hazards (PH) model has been an extremely popular regression model in the analysis of survival data during the last decades. Even though a number of goodness-of-fit tests have been developed for the PH model, authors who utilize this model rarely compute these tests Andersen, 1991, Concato et al., 1993. One reason might be that only a few can be easily calculated in statistical software packages.

We discuss goodness-of-fit tests for the Cox proportional hazards model, which are based on ideas similar to the Hosmer and Lemeshow, 1980, Hosmer and Lemeshow, 2000 goodness-of-fit test for logistic regression. All of these tests can be derived by adding group indicator variables to the model and testing the hypothesis that the coefficients of the group indicator variables are zero via the score test. We will call the tests that can be derived in this way the added variable tests. The tests that we discuss were proposed by Moreau et al., 1985, Moreau et al., 1986 and Grønnesby and Borgan (1996). Care needs to be taken when implementing these tests since some of them require the use of time-dependent group indicator variables.

In Section 2 we discuss the different tests. Section 3 provides information regarding the time-dependent nature of the tests. In Section 4 we provide examples. Details of proofs as well as SAS and STATA code for the examples can be found in Appendix A, Appendix B, Appendix C.

Section snippets

The Hosmer and Lemeshow type test statistics

We assume the typical right-censored survival data where we observe for each of n individuals the time (denoted by t) from study entry to either event or censoring, whether an event occurred or whether the time was censored (denoted by δ), and a vector of p fixed covariates, x=(x1,…,xp)′. Under the PH model the hazard function takes the following form: λ(t,x)=λ0(t)expβx, where λ0(t) represents an unspecified baseline hazard function, and β′=(β1,…,βp) a vector of p coefficients. The component β

Necessity for time-dependent indicator variables

An important aspect of the added variable version of the Moreau et al. (1986) and the Moreau et al. (1985) tests is that the indicator variables for the time intervals are time-dependent. We will use a small example and the Moreau et al. (1985) test to illustrate the time dependence. Assume we observe four non-censored observations denoted t1<t2<t3<t4 and also observe whether each observation belongs to group one (denoted x=0,1) of two groups. Consider two time intervals, with the first two

Examples

The first example is based on the gastric cancer data presented by Stablein et al. (1981) (see also Moreau et al., 1985). Ninety cancer patients were either treated by chemotherapy or by both chemotherapy and radiotherapy. Like Moreau et al. (1985) we divide the time axis into four intervals such that each interval contains 18, 19, 18 and 19 deaths respectively. The Moreau et al. (1985) test in this case has 3 degrees of freedom with values of 10.21 (p=0.02) for the score statistic, 9.55 (p

Summary

While various goodness-of-fit tests have been developed to test the assumptions of the Cox proportional hazards model, only a few are readily available in existing statistical software packages. We discuss previously proposed goodness-of-fit tests for the Cox model, which are of the Hosmer–Lemeshow type. We present results that show that the tests can be calculated easily using existing statistical software packages. Care needs to be taken though when implementing some of these tests, since

References (18)

  • D.M. Stablein et al.

    Analysis of survival data with nonproportional hazard functions

    Controlled Clinical Trials

    (1981)
  • P.K. Andersen

    Survival analysis 1982–1991: The second decade of the proportional hazards regression model

    Statist. Medicine

    (1991)
  • J. Concato et al.

    The risk of determining risk with multivariable models

    Ann. Internal Medicine

    (1993)
  • D.R. Cox

    Regression models and life-tables

    J. Roy. Statist. Soc. Ser. B

    (1972)
  • J.K. Grønnesby et al.

    A method for checking regression models in survival analysis based on the risk score

    Lifetime Data Anal.

    (1996)
  • D.W. Hosmer et al.

    Goodness-of-fit tests for the multiple logistic regression model

    Comm. Statist. Theory Methods A

    (1980)
  • D.W. Hosmer et al.

    Applied Logistic Regression

    (2000)
  • D.W. Hosmer et al.

    Applied Survival Analysis: Regression Modeling of Time to Event Data

    (1999)
  • S. May et al.

    A simplified method of calculating an overall goodness-of-fit test for the Cox proportional hazards model

    Lifetime Data Anal.

    (1998)
There are more references available in the full text version of this article.

Cited by (23)

  • Assessing causes of alarm fatigue in long-term acute care and its impact on identifying clinical changes in patient conditions

    2020, Informatics in Medicine Unlocked
    Citation Excerpt :

    We also relied on Chaudhary et al. [21], who used U.S. Department of Defense TRICARE claims data (2011–2015) queried for trauma patients with risk-adjusted Cox models to determine the influence of a prolonged length of stay in an intensive care unit on 1-year mortality. We also used the Hosmer-Lemeshow test, which is well established in the literature [22]. Wang et al. [23] used the Hosmer-Lemeshow test for a subhealth analysis of software programmers.

  • A Risk Calculator to Predict the Individual Risk of Conversion From Subthreshold Bipolar Symptoms to Bipolar Disorder I or II in Youth

    2018, Journal of the American Academy of Child and Adolescent Psychiatry
    Citation Excerpt :

    The final model was externally validated on the BIOS sample and evaluated by the time-dependent AUC (predicting the 5-year risk of an event) and by the non–time-dependent AUC. Calibration was tested by Hosmer-Lemeshow testing42 and by plotting and comparing observed with predicted probability of conversion to BP-I or BP-II. Sensitivity, specificity, positive predictive value, and negative predictive value were assessed at a range of thresholds.

  • Impact of reclamation on the environment of the lower mekong river basin

    2018, Journal of Hydrology: Regional Studies
    Citation Excerpt :

    The Hosmer Lemeshow test has been used for the evaluation of the goodness of fit of the model. In this method, the values of probability by a regression equation are divided into plural groups to make the number of lines of every group same, and the goodness of fit of model could be discussed by the difference between the observation frequency and the expectation frequency calculated from the estimated probability in each group (Susanne and David, 2003). As a result of the Hosmer Lemeshow test, the significance of the probability was approximately 0.78, which is larger than the significance level of 0.05.

  • Carbonic anhydrase-IX score is a novel biomarker that predicts recurrence and survival for high-risk, nonmetastatic renal cell carcinoma: Data from the phase III ARISER clinical trial

    2015, Urologic Oncology: Seminars and Original Investigations
    Citation Excerpt :

    Treatment weights were included in the Cox model. We confirmed nonviolation of the proportional hazards assumption using “log-log” plots and adequate model fit using Hosmer and Lemeshow analysis [8]. We conducted all analyses with STATA software (College Station, TX).

  • Echocardiographic estimation of pulmonary arterial systolic pressure in acute heart failure. Prognostic implications

    2013, European Journal of Internal Medicine
    Citation Excerpt :

    The model discriminations were assessed by the Harrell's C-statistic. Cox model calibration was tested by the Gronnesby and Borgan test [16]. A 2-sided p-value of < 0.05 was considered statistically significant for all analyses.

  • Prognostic implications of arterial blood gases in acute decompensated heart failure

    2011, European Journal of Internal Medicine
    Citation Excerpt :

    The proportionality assumption for the hazard function over time was tested by means of the Schoenfeld residuals. The discrimination and calibration of the model were assessed using the Harrell's C-statistics and the Gronnesby and Borgan test [10] respectively. A 2-sided p-value of < 0.05 was considered to be statistically significant for all analyses.

View all citing articles on Scopus
View full text