Abstract
In clinical studies, survival analysis is a well known technique to analyze time to event data with the assumption that every subject in the study will encounter the event of interest. With recent advancements in the drug development industry, a fraction of subjects may not face the event and are considered as immune or cured. However, due to the finite study period, full knowledge of subjects who are immune is usually not known and hence, can be considered as missing. We develop a novel semi-parametric algorithm to address this problem by minimizing a suitable loss function, which incorporates the missing data and generates cure indicators for the censored individuals. We prove the existence of a global minimizer for the loss function and establish some asymptotic properties, demonstrate via numerical experiments that under appropriate circumstances, our approach performs better than simpler alternatives, and use this algorithm to estimate lifetime parameters and the overall survivor function.


Similar content being viewed by others
Abbreviations
- PU:
-
Positive and unlabeled
- EM:
-
Expectation-maximization
- ML:
-
Maximum likelihood
- SCAR:
-
Selected completely at random
- AUC:
-
Area under the curve
- H-score:
-
Null hypothesis (3.1) for the given score
- H-logloss:
-
Null hypothesis (3.1) for the logloss score
- H-Accuracy:
-
Null hypothesis (3.1) for the accuracy score
- H-AUC:
-
Null hypothesis (3.1) for the AUC score
- SLSQP:
-
Sequential least squares programming
References
Andersen PK, Borgan O, Gill RD, Keiding N (2012) Statistical models based on counting processes. Springer
Balakrishnan N, Pal S (2016) Expectation maximization-based likelihood inference for flexible cure rate models with Weibull lifetimes. Stat Methods Med Res 25(4):1535–1563
Balakrishnan N, Barui S, Milienos F (2017) Proportional hazards under Conway-Maxwell-Poisson cure rate model and associated inference. Stat Methods Med Res 26(5):2055–2077
Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach Learn 109:719–760
Bekker J, Robberechts P, Davis J (2019) Beyond the selected completely at random assumption for learning from positive and unlabeled data. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 71–85
Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47:501–515
Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc Ser B 11:15–53
Breslow NE (1975) Analysis of survival data under the proportional hazards model. Int Stat Rev 45–57
Casella G, Berger RL (2002) Statistical inference, vol 2. Duxbury Pacific Grove, CA
Chaudhari S, Shevade S (2012) Learning from positive and unlabelled examples using maximum margin clustering. In: International conference on neural information processing. Springer, pp 465–473
Chen YC (2018) Statistical inference with local optima. arXiv:1807.04431
Cohen SB, Smith NA (2010). Viterbi training for PCFGs: Hardness results and competitiveness of uniform initialization. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 1502–1511
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046
Kuk AY, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79:531–541
Li XL, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: European conference on machine learning. Springer, pp 218–229
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. ICML 2:387–394
Maller RA, Zhou X (1996) Survival analysis with long-term survivors. Wiley, New York
Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243
Prinja S, Gupta N, Verma R (2010) Censoring in clinical trials: review of survival analysis techniques. Indian J Commun Med Off Publ Indian Assoc Prev Soc Med 35(2):217
Rodrigues J, de Castro M, Cancho VG, Balakrishnan N (2009) COM-Poisson cure rate survival models and an application to a cutaneous melanoma data. J Stat Plann Infer 139:3605–3611
Shivaswamy PK, Chu W, Jansche M (2007) A support vector approach to censored targets. In: Seventh IEEE international conference on data mining (ICDM 2007). IEEE, pp 655–660
Sy JP, Taylor JM (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56:227–236
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kosovalić, N., Barui, S. A Hard EM algorithm for prediction of the cured fraction in survival data. Comput Stat 37, 817–835 (2022). https://doi.org/10.1007/s00180-021-01140-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-021-01140-0