Abstract
Hypothesis testing for the regression coefficient associated with a dichotomized continuous covariate in a Cox proportional hazards model has been considered in clinical research. Although most existing testing methods do not allow covariates, except for a dichotomized continuous covariate, they have generally been applied. Through an analytic bias analysis and a numerical study, we show that the current practice is not free from an inflated type I error and a loss of power. To overcome this limitation, we develop a bootstrap-based test that allows additional covariates and dichotomizes two-dimensional covariates into a binary variable. In addition, we develop an efficient algorithm to speed up the calculation of the proposed test statistic. Our numerical study demonstrates that the proposed bootstrap-based test maintains the type I error well at the nominal level and exhibits higher power than other methods, as well as that the proposed efficient algorithm reduces computational costs.






Similar content being viewed by others
Data availability
The dataset gbsg is available from the R package survival (Therneau 2024).
Code availability
The R-package DTCox is freely available from https://sites.google.com/view/lwj221, and the program code used for real data analysis is provided on the same website.
References
Amin MB, Edge SB, Greene FL, et al (2017) (eds.) AJCC Cancer Staging Manual. 8th ed. Chicago: American College of Surgeons https://doi.org/10.1007/978-3-319-40618-3
Beran R (1988) Prepivoting test statistics: a bootstrap view of asymptotic refinements. J Am Stat Assoc 83(403):687–697. https://doi.org/10.2307/2289292
Billingsley P (1968) Convergence of Probability Measures. John Wiley & Sons
Collins GS, Ogundimu EO, Cook JA et al (2016) Quantifying the impact of different approaches for handling continuous predictors on the performance of a prognostic model. Stat Med 35(23):4124–4135. https://doi.org/10.1002/sim.6986
Contal C, O’Quigley J (1999) An application of changepoint methods in studying the effect of age on survival in breast cancer. Comput Stat Data Anal 30(3):253–270. https://doi.org/10.1016/S0167-9473(98)00096-6
Cox DR, Snell EJ (1968) A general definition of residuals. J R Stat Soc Series B Stat Methodol 30(2):248–265. https://doi.org/10.1111/j.2517-6161.1968.tb00724.x
Davies RB (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 74(1):33–43. https://doi.org/10.2307/2336019
Davison AC, Hinkley DV (1997) Bootstrap Methods and Their Application. Cambridge University Press
Efron B, Tibshirani R (1993) An Introduction to the Bootstrap. Chapman and Hall, New York
Greenland S, Senn SJ, Rothman KJ et al (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol 31(4):337–350. https://doi.org/10.1007/s10654-016-0149-3
Grogan M, Scott CG, Kyle RA et al (2016) Natural history of wild-type transthyretin cardiac amyloidosis and risk stratification using a novel staging system. Am Coll Cardiol 68(10):1014–1020. https://doi.org/10.1016/j.jacc.2016.06.033
Hall P (1986) On the bootstrap and confidence intervals. Ann Stat 14(4):1431–1452. https://doi.org/10.1214/aos/1176350168
Hall P (2013) The bootstrap and Edgeworth expansion. Springer Science & Business Media, Cham
Halsey LG, Curran-Everett D, Vowler SL et al (2015) The fickle P value generates irreproducible results. Nat Methods 12(3):179–185. https://doi.org/10.1038/nmeth.3288
Horowitz JL (1994) Bootstrap-based critical values for the information matrix test. J Econ 61(2):395–411. https://doi.org/10.1016/0304-4076(94)90092-2
Horowitz JL (2019) Bootstrap methods in econometrics. Ann Rev Econ 11(1):193–224. https://doi.org/10.1146/annurev-economics-080218-025651
Jaroensri R, Wulczyn E, Hegde N et al (2022) Deep learning models for histologic grading of breast cancer and association with disease prognosis. NPJ Breast Cancer 8(1):113. https://doi.org/10.1038/s41523-022-00478-y
Kasangian AA, Gherardi G, Biagioli E et al (2017) The prognostic role of tumor size in early breast cancer in the era of molecular biology. PLoS ONE 12(12):1–12. https://doi.org/10.1371/journal.pone.0189127
Kim E, Jung S, Park WS et al (2019) Upregulation of SLC2A3 gene and prognosis in colorectal carcinoma: analysis of TCGA data. BMC Cancer 19(1):302. https://doi.org/10.1186/s12885-019-5475-x
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York
Klein JP, Wu JT (2003) Discretizing a continuous covariate in survival studies. Handbook Stat 23:27–42. https://doi.org/10.1016/S0169-7161(03)23002-9
Klein JP, Rizzo JD, Zhang MJ et al (2001) Statistical methods for the analysis and presentation of the results of bone marrow transplants Part 2: regression modeling. Bone Marrow Transplant 28(11):1001–1011. https://doi.org/10.1038/sj.bmt.1703271
Lausen B, Schumacher M (1992) Maximally selected rank statistics. Biometrics 48(1):73–85. https://doi.org/10.2307/2532740
Lausen B, Schumacher M (1996) Evaluating the effect of optimized cutoff values in the assessment of prognostic factors. Comput Stat Data Anal 21(3):307–326. https://doi.org/10.1016/0167-9473(95)00016-X
Lee HS, Jang CY, Kim SA et al (2018) Combined use of CEMIP and CA 19–9 enhances diagnostic accuracy for pancreatic cancer. Sci Rep 8(1):3383. https://doi.org/10.1038/s41598-018-21823-x
Lew MJ (2012) Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don’t know P. Br J Pharmacol 166(5):1559–1567. https://doi.org/10.1111/j.1476-5381.2012.01931.x
Lin DY, Psaty BM, Kronmal RA (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948–96. https://doi.org/10.2307/2533848
Ma SJ, Yu H, Yu B et al (2022) Association of pack-years of cigarette smoking with survival and tumor progression among patients treated with chemoradiation for head and neck cancer. JAMA Netw Open 5(12):e2245818–e224581. https://doi.org/10.1001/jamanetworkopen.2022.45818
Miller R, Siegmund D (1982) Maximally selected chi-square statistics. Biometrics 38(4):1011–1016. https://doi.org/10.2307/2529881
Nuzzo R (2014) Scientific method: statistical errors. Nature 506(7487):150–152. https://doi.org/10.1038/506150a
Singh K (1981) On the asymptotic accuracy of Efron’s bootstrap. Ann Stat 9(6):1187–1195. https://doi.org/10.1214/aos/1176345636
Sung H, Ferlay J, Siegel RL et al (2021) Global cancer statistics 2020: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71(3):209–249. https://doi.org/10.3322/caac.21660
Therneau TM (2024) A package for survival analysis in R. https://CRAN.R-project.org/package=survival, R package version 3.5-8
Tolaney SM, Tarantino P, Graham N et al (2023) Adjuvant paclitaxel and trastuzumab for node-negative, HER2-positive breast cancer: final 10-year analysis of the open-label, single-arm, phase 2 APT trial. Lancet Oncol 24(3):273–285. https://doi.org/10.1016/S1470-2045(23)00051-7
van der Vaart AW (2012) Asymptotic Statistics. Cambridge University Press
Funding
This work was supported by the National Research Foundation of Korea (BK21 Center for Integrative Response to Health Disasters, Graduate School of Public Health, Seoul National University)(NO.419 999 0514025).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sim, H., Lee, S., Kim, BH. et al. Hypothesis testing in Cox models when continuous covariates are dichotomized: bias analysis and bootstrap-based test. Comput Stat 40, 907–927 (2025). https://doi.org/10.1007/s00180-024-01520-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-024-01520-2