Abstract
Breast cancer is a fatal disease. There is no one treatment for breast cancer due to its heterogeneity in terms of response to treatment and prognosis. This study deals with identifying the key covariates responsible for the prognosis of breast cancer patients so that proper treatment can be administered which can improve the overall survival of the patients. The study utilizes the clinical and pathological features from the Molecular Taxonomy of Breast Cancer International Consortium dataset (METABRIC). Three models namely the Cox Proportional hazards (CoxPH) model, random survival forests (RSF) model, and DeepHit were utilized for survival prediction. Both the Random survival forests and DeepHit model gave a Concordance Index (C-Index) of 0.86 and performed better than the Cox PH model which provided a C-Index of 0.85. The most important covariate in the random survival forests model with the maximum absolute value was relapse-free status. Relapse-free status had a high positive correlation of 88% with the survival status of the patient. The Cox model gave four important statistically significant covariates with P < 0.05. They are Age at Diagnosis, Estrogen Receptor (ER) Status, Progesterone Receptor (PR) Status, and tumor stage. Among these ER and PR status have a negative regression coefficient value which reduces the risk of hazard for the patients. Thus, the proposed work helps identify the important prognostic covariates and also aids clinicians in determining the type of treatment to be administered to the patients. Both the Random survival forests model and DeepHit performed the best for survival prediction.
Similar content being viewed by others
Data availability
The data used in this study is a publicly available dataset. This dataset was acquired from the cBioPortal website (https://www.cbioportal.org/datasets).
References
Abbass F, Bennis S, Znati K, Akasbi Y, Amrani JK, El Mesbahi O, Amarti M (2011) Epidemiological and biologic profile of breast cancer in fez-Boulemane, Morocco. EMHJ East Mediterr Health J 17(12):930–936 https://apps.who.int/iris/handle/10665/118224
Adeoye J, Hui L, Koohi-Moghadam M, Tan JY, Choi SW, Thomson P (2022) Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis. Int J Med Inform 157:104635. https://doi.org/10.1016/j.ijmedinf.2021.104635
Arya N, Saha S (2021) Multi-modal advanced deep learning architectures for breast cancer survival prediction. Knowl-Based Syst 221:106965. https://doi.org/10.1016/j.knosys.2021.106965
Asif HM, Sultana S, Akhtar N, Rehman JU, Rehman RU (2014) Prevalence, risk factors and disease knowledge of breast cancer in Pakistan. Asian Pac J Cancer Prev 15(11):4411–4416. https://doi.org/10.7314/APJCP.2014.15.11.4411
Atallah DM, Badawy M, El-Sayed A, Ghoneim MA (2019) Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier. Multimed Tools Appl 78(14):20383–20407. https://doi.org/10.1007/s11042-019-7370-5
Biomarkers Definitions Working Group, Atkinson AJ Jr, Colburn WA, DeGruttola VG, DeMets DL, Downing GJ, Hoth DF, Oates JA, Peck CC, Schooley RT, Spilker BA (2001) Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther 69(3):89–95. https://doi.org/10.1067/mcp.2001.113989
Blamey RW, Ellis IO, Pinder SE, Lee AH, Macmillan RD, Morgan DA, Robertson JF, Mitchell MJ, Ball GR, Haybittle JL, Elston CW (2007) Survival of invasive breast cancer according to the Nottingham prognostic index in cases diagnosed in 1990–1999. Eur J Cancer 43(10):1548–1555. https://doi.org/10.1016/j.ejca.2007.01.016
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6):394–424. https://doi.org/10.3322/caac.21492
Camacho-Rivera M, Ragin C, Roach V, Kalwar T, Taioli E (2015) Breast cancer clinical characteristics and outcomes in Trinidad and Tobago. J Immigr Minor Health 17(3):765–772. https://doi.org/10.1007/s10903-013-9930-5
Cox D (1972) Regression models and life tables. J R Stat Soc Ser B 34:187–220 Partial likelihood Biometrika 62–269. http://www.jstor.org/stable/2985181
Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S (2012) The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403):346–352. https://doi.org/10.1038/nature10983
Datema FR, Moya A, Krause P, Bäck T, Willmes L, Langeveld T, Baatenburg de Jong RJ, Blom HM (2012) Novel head and neck cancer survival analysis approach: random survival forests versus Cox proportional hazards regression. Head Neck 34(1):50–58. https://doi.org/10.1002/hed.21698
Dauphine C, Moazzez A, Neal JC, Chlebowski RT, Ozao-Choy J (2020) Single hormone receptor-positive breast cancers have distinct characteristics and survival. Ann Surg Oncol 27(12):4687–4694. https://doi.org/10.1245/s10434-020-08898-5
Dietzel M, Schulz-Wendtland R, Ellmann S, Zoubi R, Wenkel E, Hammon M, Clauser P, Uder M, Runnebaum IB, Baltzer PA (2020) Automated volumetric radiomic analysis of breast cancer vascularization improves survival prediction in primary breast cancer. Sci Rep 10(1):1–1. https://doi.org/10.1038/s41598-020-60393-9
Fong Y, Evans J, Brook D, Kenkre J, Jarvis P, Gower-Thomas K (2015) The Nottingham prognostic index: five-and ten-year data for all-cause survival within a screened population. Ann R Coll Surg Engl 97(2):137–139. https://doi.org/10.1308/003588414X14055925060514
Friese CR, Li Y, Bondarenko I, Hofer TP, Ward KC, Hamilton AS, Deapen D, Kurian AW, Katz SJ (2017) Chemotherapy decisions and patient experience with the recurrence score assay for early-stage breast cancer. Cancer 123(1):43–51. https://doi.org/10.1002/cncr.30324
Guo C, Wang J, Wang Y, Qu X, Shi Z, Meng Y, Qiu J, Hua K (2021) Novel artificial intelligence machine learning approaches to precisely predict survival and site-specific recurrence in cervical cancer: a multi-institutional study. Transl Oncol 14(5):101032. https://doi.org/10.1016/j.tranon.2021.101032
Hashmi AA, Aijaz S, Khan SM, Mahboob R, Irfan M, Zafar NI, Nisar M, Siddiqui M, Edhi MM, Faridi N, Khan A (2018) Prognostic parameters of luminal a and luminal B intrinsic breast cancer subtypes of Pakistani patients. World J Surg Oncol 16(1):1–6. https://doi.org/10.1186/s12957-017-1299-9
Haybittle JL, Blamey RW, Elston CW, Johnson J, Doyle PJ, Campbell FC, Nicholson RI, Griffiths K (1982) A prognostic index in primary breast cancer. Br J Cancer 45(3):361–366. https://doi.org/10.1038/bjc.1982.62
Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860. https://doi.org/10.1214/08-AOAS169
Jing B, Zhang T, Wang Z, Jin Y, Liu K, Qiu W, Ke L, Sun Y, He C, Hou D, Tang L (2019) A deep survival analysis method based on ranking. Artif Intell Med 98:1–9. https://doi.org/10.1016/j.artmed.2019.06.001
Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y (2018) DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18(1):1–2. https://doi.org/10.1186/s12874-018-0482-1
Khalid M, Shah SI, Javaid M, Nadeem K, Kanwal T (2013) Frequency of estrogen and progesterone receptor status in breast cancer patients: a single institutional experience. Ann Punjab Med Coll (APMC) 7(1):6–9. https://doi.org/10.29054/apmc/2013.413
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005
Kumar D, Klefsjö B (1994) Proportional hazards model: a review. Reliab Eng Syst Saf 44(2):177–188. https://doi.org/10.1016/0951-8320(94)90010-8
Kurian AW, Friese CR (2015) Precision medicine in breast cancer care: an early glimpse of impact. JAMA Oncol 1(8):1109–1110. https://doi.org/10.1001/jamaoncol.2015.2719
Lee C, Zame W, Yoon J, Van Der Schaar M (2018) Deephit: a deep learning approach to survival analysis with competing risks. InProceedings of the AAAI conference on artificial intelligence 2018 32(1)
Moncada-Torres A, van Maaren MC, Hendriks MP, Siesling S, Geleijnse G (2021) Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci Rep 11(1):1–3. https://doi.org/10.1038/s41598-021-86327-7
Nabi MG, Ahangar A, Kaneez S (2016) Estrogen receptors, progesterone receptors and their correlation with respect to HER-2/neu status, histological grade, size of lesion, lymph node metastasis, lymphovascular involvement and age in breast cancer patients in a hospital in North India. Asian J Med Sci 7(3):28–34. https://doi.org/10.3126/ajms.v7i3.13563
Nasejje JB, Mwambi H, Dheda K, Lesosky M (2017) A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data. BMC Med Res Methodol 17(1):1–7. https://doi.org/10.1186/s12874-017-0383-8
Ngiam KY, Khor W (2019) Big data and machine learning algorithms for health-care delivery. Lancet Oncol 20(5):e262–e273. https://doi.org/10.1016/S1470-2045(19)30149-4
Omurlu IK, Ture M, Tokatli F (2009) The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Syst Appl 36(4):8582–8588. https://doi.org/10.1016/j.eswa.2008.10.023
Roder DM, de Silva P, Zorbas HM, Kollias J, Malycha PL, Pyke CM, Campbell ID (2012) Age effects on survival from early breast cancer in clinical settings in Australia. ANZ J Surg 82(7–8):524–528. https://doi.org/10.1111/j.1445-2197.2012.06114.x
Siddarth BR, Kumar A, Kumar S, Sindhu N (2016) Clinicopathologic study of infiltrating carcinoma of breast and correlation with the ER/PR status. J Evol Med Dental Sci 5(20):1025–1033. https://doi.org/10.14260/jemds/2016/239
Singh R, Gupta S, Pawar SB, Pawar RS, Gandham SV, Prabhudesai S (2014) Evaluation of ER, PR and HER-2 receptor expression in breast cancer patients presenting to a semi urban cancer centre in Western India. J Cancer Res Ther 10(1):26–28. https://doi.org/10.4103/0973-1482.131348
Sohail SK, Sarfraz R, Imran M, Kamran M, Qamar S (2020) Estrogen and progesterone receptor expression in breast carcinoma and its association with Clinicopathological variables among the Pakistani population. Cureus 12(8):e9751. https://doi.org/10.7759/cureus.9751
Todd JH, Dowle C, Williams MR, Elston CW, Ellis IO, Hinton CP, Blamey RW, Haybittle JL (1987) Confirmation of a prognostic index in primary breast cancer. Br J Cancer 56(4):489–492. https://doi.org/10.1038/bjc.1987.230
Tong J, Zhao X (2022) Deep survival algorithm based on nuclear norm. J Stat Comput Simul 92(9):1964–1976. https://doi.org/10.1080/00949655.2021.2015770
Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A (2015) Global cancer statistics, 2012. CA Cancer J Clin 65(2):87–108. https://doi.org/10.3322/caac.21262
Vedashree MK, Rajalakshmi V (2016) Clinico-pathological study of breast carcinoma with correlation to hormone receptor status & HER2/neu. Indian J Pathol Oncol 3(4):690–695. https://doi.org/10.5958/2394-6792.2016.00128.9
Zhou X, Gao F, Duan S, Zhang L, Liu Y, Zhou J, Bai G, Tao W (2020) Radiomic features of Pk-DCE MRI parameters based on the extensive Tofts model in application of breast cancer. Phys Eng Sci Med 43(2):517–524. https://doi.org/10.1007/s13246-020-00852-9
Zhu W, Xie L, Han J, Guo X (2020) The application of deep learning in cancer prognosis prediction. Cancers 12(3):603. https://doi.org/10.3390/cancers12030603
Author information
Authors and Affiliations
Contributions
Keren Evangeline worked on the formulation of ideas, literature survey, dataset pre-processing, algorithm coding, and also wrote the manuscript. Glory Precious assisted with literature surveys and in writing the code in Python. Angeline Kirubha supervised the study, assisted with the development of ideas, and supported the manuscript writing process. The manuscript was read and approved by all authors.
Corresponding author
Ethics declarations
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Conflict of interest
The authors declare that they have no conflict of interest to this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Evangeline I., K., Kirubha, S.P.A. & Precious, J.G. Survival analysis of breast cancer patients using machine learning models. Multimed Tools Appl 82, 30909–30928 (2023). https://doi.org/10.1007/s11042-023-14989-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14989-8