A Hard EM algorithm for prediction of the cured fraction in survival data

Kosovalić, Nemanja; Barui, Sandip

doi:10.1007/s00180-021-01140-0

A Hard EM algorithm for prediction of the cured fraction in survival data

Original paper
Published: 11 August 2021

Volume 37, pages 817–835, (2022)
Cite this article

Computational Statistics Aims and scope Submit manuscript

Nemanja Kosovalić¹ &
Sandip Barui²

472 Accesses
Explore all metrics

Abstract

In clinical studies, survival analysis is a well known technique to analyze time to event data with the assumption that every subject in the study will encounter the event of interest. With recent advancements in the drug development industry, a fraction of subjects may not face the event and are considered as immune or cured. However, due to the finite study period, full knowledge of subjects who are immune is usually not known and hence, can be considered as missing. We develop a novel semi-parametric algorithm to address this problem by minimizing a suitable loss function, which incorporates the missing data and generates cure indicators for the censored individuals. We prove the existence of a global minimizer for the loss function and establish some asymptotic properties, demonstrate via numerical experiments that under appropriate circumstances, our approach performs better than simpler alternatives, and use this algorithm to estimate lifetime parameters and the overall survivor function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

A likelihood-based approach for cure regression models

Article 23 November 2020

Semiparametric methods for survival data with measurement error under additive hazards cure rate models

Article 20 August 2019

Goodness-of-fit testing in the presence of cured data: IPCW approach

Article 04 March 2025

Abbreviations

PU:: Positive and unlabeled
EM:: Expectation-maximization
ML:: Maximum likelihood
SCAR:: Selected completely at random
AUC:: Area under the curve
H-score:: Null hypothesis (3.1) for the given score
H-logloss:: Null hypothesis (3.1) for the logloss score
H-Accuracy:: Null hypothesis (3.1) for the accuracy score
H-AUC:: Null hypothesis (3.1) for the AUC score
SLSQP:: Sequential least squares programming

References

Andersen PK, Borgan O, Gill RD, Keiding N (2012) Statistical models based on counting processes. Springer
Balakrishnan N, Pal S (2016) Expectation maximization-based likelihood inference for flexible cure rate models with Weibull lifetimes. Stat Methods Med Res 25(4):1535–1563
Article MathSciNet Google Scholar
Balakrishnan N, Barui S, Milienos F (2017) Proportional hazards under Conway-Maxwell-Poisson cure rate model and associated inference. Stat Methods Med Res 26(5):2055–2077
Article MathSciNet Google Scholar
Bekker J, Davis J (2020) Learning from positive and unlabeled data: a survey. Mach Learn 109:719–760
Article MathSciNet Google Scholar
Bekker J, Robberechts P, Davis J (2019) Beyond the selected completely at random assumption for learning from positive and unlabeled data. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 71–85
Berkson J, Gage RP (1952) Survival curve for cancer patients following treatment. J Am Stat Assoc 47:501–515
Article Google Scholar
Boag JW (1949) Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J R Stat Soc Ser B 11:15–53
MATH Google Scholar
Breslow NE (1975) Analysis of survival data under the proportional hazards model. Int Stat Rev 45–57
Casella G, Berger RL (2002) Statistical inference, vol 2. Duxbury Pacific Grove, CA
MATH Google Scholar
Chaudhari S, Shevade S (2012) Learning from positive and unlabelled examples using maximum margin clustering. In: International conference on neural information processing. Springer, pp 465–473
Chen YC (2018) Statistical inference with local optima. arXiv:1807.04431
Cohen SB, Smith NA (2010). Viterbi training for PCFGs: Hardness results and competitiveness of uniform initialization. In: Proceedings of the 48th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 1502–1511
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc B 39:1–38
MathSciNet MATH Google Scholar
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, pp 213–220
Farewell VT (1982) The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38:1041–1046
Article Google Scholar
Kuk AY, Chen CH (1992) A mixture model combining logistic regression with proportional hazards regression. Biometrika 79:531–541
Article Google Scholar
Li XL, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: European conference on machine learning. Springer, pp 218–229
Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. ICML 2:387–394
Google Scholar
Maller RA, Zhou X (1996) Survival analysis with long-term survivors. Wiley, New York
MATH Google Scholar
Peng Y, Dear KB (2000) A nonparametric mixture model for cure rate estimation. Biometrics 56:237–243
Article Google Scholar
Prinja S, Gupta N, Verma R (2010) Censoring in clinical trials: review of survival analysis techniques. Indian J Commun Med Off Publ Indian Assoc Prev Soc Med 35(2):217
Google Scholar
Rodrigues J, de Castro M, Cancho VG, Balakrishnan N (2009) COM-Poisson cure rate survival models and an application to a cutaneous melanoma data. J Stat Plann Infer 139:3605–3611
Article MathSciNet Google Scholar
Shivaswamy PK, Chu W, Jansche M (2007) A support vector approach to censored targets. In: Seventh IEEE international conference on data mining (ICDM 2007). IEEE, pp 655–660
Sy JP, Taylor JM (2000) Estimation in a Cox proportional hazards cure model. Biometrics 56:227–236
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Data Science Practice, Aimpoint Digital, Dublin, California, USA
Nemanja Kosovalić
Quantitative Methods and Operations Management Area, Indian Institute of Management Kozhikode, Kozhikode, India
Sandip Barui

Authors

Nemanja Kosovalić
View author publications
You can also search for this author inPubMed Google Scholar
Sandip Barui
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Sandip Barui.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kosovalić, N., Barui, S. A Hard EM algorithm for prediction of the cured fraction in survival data. Comput Stat 37, 817–835 (2022). https://doi.org/10.1007/s00180-021-01140-0

Download citation

Received: 31 January 2021
Accepted: 02 August 2021
Published: 11 August 2021
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00180-021-01140-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hard EM algorithm for prediction of the cured fraction in survival data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A likelihood-based approach for cure regression models

Semiparametric methods for survival data with measurement error under additive hazards cure rate models

Goodness-of-fit testing in the presence of cured data: IPCW approach

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now