Hosmer and Lemeshow type Goodness-of-Fit Statistics for the Cox Proportional Hazards Model
Introduction
The Cox (1972) proportional hazards (PH) model has been an extremely popular regression model in the analysis of survival data during the last decades. Even though a number of goodness-of-fit tests have been developed for the PH model, authors who utilize this model rarely compute these tests Andersen, 1991, Concato et al., 1993. One reason might be that only a few can be easily calculated in statistical software packages.
We discuss goodness-of-fit tests for the Cox proportional hazards model, which are based on ideas similar to the Hosmer and Lemeshow, 1980, Hosmer and Lemeshow, 2000 goodness-of-fit test for logistic regression. All of these tests can be derived by adding group indicator variables to the model and testing the hypothesis that the coefficients of the group indicator variables are zero via the score test. We will call the tests that can be derived in this way the added variable tests. The tests that we discuss were proposed by Moreau et al., 1985, Moreau et al., 1986 and Grønnesby and Borgan (1996). Care needs to be taken when implementing these tests since some of them require the use of time-dependent group indicator variables.
In Section 2 we discuss the different tests. Section 3 provides information regarding the time-dependent nature of the tests. In Section 4 we provide examples. Details of proofs as well as SAS and STATA code for the examples can be found in Appendix A, Appendix B, Appendix C.
Section snippets
The Hosmer and Lemeshow type test statistics
We assume the typical right-censored survival data where we observe for each of n individuals the time (denoted by t) from study entry to either event or censoring, whether an event occurred or whether the time was censored (denoted by δ), and a vector of p fixed covariates, . Under the PH model the hazard function takes the following form: where λ0(t) represents an unspecified baseline hazard function, and a vector of p coefficients. The component
Necessity for time-dependent indicator variables
An important aspect of the added variable version of the Moreau et al. (1986) and the Moreau et al. (1985) tests is that the indicator variables for the time intervals are time-dependent. We will use a small example and the Moreau et al. (1985) test to illustrate the time dependence. Assume we observe four non-censored observations denoted t1<t2<t3<t4 and also observe whether each observation belongs to group one (denoted x=0,1) of two groups. Consider two time intervals, with the first two
Examples
The first example is based on the gastric cancer data presented by Stablein et al. (1981) (see also Moreau et al., 1985). Ninety cancer patients were either treated by chemotherapy or by both chemotherapy and radiotherapy. Like Moreau et al. (1985) we divide the time axis into four intervals such that each interval contains 18, 19, 18 and 19 deaths respectively. The Moreau et al. (1985) test in this case has 3 degrees of freedom with values of 10.21 (p=0.02) for the score statistic, 9.55 (p
Summary
While various goodness-of-fit tests have been developed to test the assumptions of the Cox proportional hazards model, only a few are readily available in existing statistical software packages. We discuss previously proposed goodness-of-fit tests for the Cox model, which are of the Hosmer–Lemeshow type. We present results that show that the tests can be calculated easily using existing statistical software packages. Care needs to be taken though when implementing some of these tests, since
References (18)
- et al.
Analysis of survival data with nonproportional hazard functions
Controlled Clinical Trials
(1981) Survival analysis 1982–1991: The second decade of the proportional hazards regression model
Statist. Medicine
(1991)- et al.
The risk of determining risk with multivariable models
Ann. Internal Medicine
(1993) Regression models and life-tables
J. Roy. Statist. Soc. Ser. B
(1972)- et al.
A method for checking regression models in survival analysis based on the risk score
Lifetime Data Anal.
(1996) - et al.
Goodness-of-fit tests for the multiple logistic regression model
Comm. Statist. Theory Methods A
(1980) - et al.
Applied Logistic Regression
(2000) - et al.
Applied Survival Analysis: Regression Modeling of Time to Event Data
(1999) - et al.
A simplified method of calculating an overall goodness-of-fit test for the Cox proportional hazards model
Lifetime Data Anal.
(1998)
Cited by (23)
Assessing causes of alarm fatigue in long-term acute care and its impact on identifying clinical changes in patient conditions
2020, Informatics in Medicine UnlockedCitation Excerpt :We also relied on Chaudhary et al. [21], who used U.S. Department of Defense TRICARE claims data (2011–2015) queried for trauma patients with risk-adjusted Cox models to determine the influence of a prolonged length of stay in an intensive care unit on 1-year mortality. We also used the Hosmer-Lemeshow test, which is well established in the literature [22]. Wang et al. [23] used the Hosmer-Lemeshow test for a subhealth analysis of software programmers.
A Risk Calculator to Predict the Individual Risk of Conversion From Subthreshold Bipolar Symptoms to Bipolar Disorder I or II in Youth
2018, Journal of the American Academy of Child and Adolescent PsychiatryCitation Excerpt :The final model was externally validated on the BIOS sample and evaluated by the time-dependent AUC (predicting the 5-year risk of an event) and by the non–time-dependent AUC. Calibration was tested by Hosmer-Lemeshow testing42 and by plotting and comparing observed with predicted probability of conversion to BP-I or BP-II. Sensitivity, specificity, positive predictive value, and negative predictive value were assessed at a range of thresholds.
Impact of reclamation on the environment of the lower mekong river basin
2018, Journal of Hydrology: Regional StudiesCitation Excerpt :The Hosmer Lemeshow test has been used for the evaluation of the goodness of fit of the model. In this method, the values of probability by a regression equation are divided into plural groups to make the number of lines of every group same, and the goodness of fit of model could be discussed by the difference between the observation frequency and the expectation frequency calculated from the estimated probability in each group (Susanne and David, 2003). As a result of the Hosmer Lemeshow test, the significance of the probability was approximately 0.78, which is larger than the significance level of 0.05.
Carbonic anhydrase-IX score is a novel biomarker that predicts recurrence and survival for high-risk, nonmetastatic renal cell carcinoma: Data from the phase III ARISER clinical trial
2015, Urologic Oncology: Seminars and Original InvestigationsCitation Excerpt :Treatment weights were included in the Cox model. We confirmed nonviolation of the proportional hazards assumption using “log-log” plots and adequate model fit using Hosmer and Lemeshow analysis [8]. We conducted all analyses with STATA software (College Station, TX).
Echocardiographic estimation of pulmonary arterial systolic pressure in acute heart failure. Prognostic implications
2013, European Journal of Internal MedicineCitation Excerpt :The model discriminations were assessed by the Harrell's C-statistic. Cox model calibration was tested by the Gronnesby and Borgan test [16]. A 2-sided p-value of < 0.05 was considered statistically significant for all analyses.
Prognostic implications of arterial blood gases in acute decompensated heart failure
2011, European Journal of Internal MedicineCitation Excerpt :The proportionality assumption for the hazard function over time was tested by means of the Schoenfeld residuals. The discrimination and calibration of the model were assessed using the Harrell's C-statistics and the Gronnesby and Borgan test [10] respectively. A 2-sided p-value of < 0.05 was considered to be statistically significant for all analyses.