Skip to main content

Advertisement

Log in

Survival analysis of breast cancer patients using machine learning models

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Breast cancer is a fatal disease. There is no one treatment for breast cancer due to its heterogeneity in terms of response to treatment and prognosis. This study deals with identifying the key covariates responsible for the prognosis of breast cancer patients so that proper treatment can be administered which can improve the overall survival of the patients. The study utilizes the clinical and pathological features from the Molecular Taxonomy of Breast Cancer International Consortium dataset (METABRIC). Three models namely the Cox Proportional hazards (CoxPH) model, random survival forests (RSF) model, and DeepHit were utilized for survival prediction. Both the Random survival forests and DeepHit model gave a Concordance Index (C-Index) of 0.86 and performed better than the Cox PH model which provided a C-Index of 0.85. The most important covariate in the random survival forests model with the maximum absolute value was relapse-free status. Relapse-free status had a high positive correlation of 88% with the survival status of the patient. The Cox model gave four important statistically significant covariates with P < 0.05. They are Age at Diagnosis, Estrogen Receptor (ER) Status, Progesterone Receptor (PR) Status, and tumor stage. Among these ER and PR status have a negative regression coefficient value which reduces the risk of hazard for the patients. Thus, the proposed work helps identify the important prognostic covariates and also aids clinicians in determining the type of treatment to be administered to the patients. Both the Random survival forests model and DeepHit performed the best for survival prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

The data used in this study is a publicly available dataset. This dataset was acquired from the cBioPortal website (https://www.cbioportal.org/datasets).

References

  1. Abbass F, Bennis S, Znati K, Akasbi Y, Amrani JK, El Mesbahi O, Amarti M (2011) Epidemiological and biologic profile of breast cancer in fez-Boulemane, Morocco. EMHJ East Mediterr Health J 17(12):930–936 https://apps.who.int/iris/handle/10665/118224

    Article  Google Scholar 

  2. Adeoye J, Hui L, Koohi-Moghadam M, Tan JY, Choi SW, Thomson P (2022) Comparison of time-to-event machine learning models in predicting oral cavity cancer prognosis. Int J Med Inform 157:104635. https://doi.org/10.1016/j.ijmedinf.2021.104635

    Article  Google Scholar 

  3. Arya N, Saha S (2021) Multi-modal advanced deep learning architectures for breast cancer survival prediction. Knowl-Based Syst 221:106965. https://doi.org/10.1016/j.knosys.2021.106965

    Article  Google Scholar 

  4. Asif HM, Sultana S, Akhtar N, Rehman JU, Rehman RU (2014) Prevalence, risk factors and disease knowledge of breast cancer in Pakistan. Asian Pac J Cancer Prev 15(11):4411–4416. https://doi.org/10.7314/APJCP.2014.15.11.4411

    Article  Google Scholar 

  5. Atallah DM, Badawy M, El-Sayed A, Ghoneim MA (2019) Predicting kidney transplantation outcome based on hybrid feature selection and KNN classifier. Multimed Tools Appl 78(14):20383–20407. https://doi.org/10.1007/s11042-019-7370-5

    Article  Google Scholar 

  6. Biomarkers Definitions Working Group, Atkinson AJ Jr, Colburn WA, DeGruttola VG, DeMets DL, Downing GJ, Hoth DF, Oates JA, Peck CC, Schooley RT, Spilker BA (2001) Biomarkers and surrogate endpoints: preferred definitions and conceptual framework. Clin Pharmacol Ther 69(3):89–95. https://doi.org/10.1067/mcp.2001.113989

    Article  Google Scholar 

  7. Blamey RW, Ellis IO, Pinder SE, Lee AH, Macmillan RD, Morgan DA, Robertson JF, Mitchell MJ, Ball GR, Haybittle JL, Elston CW (2007) Survival of invasive breast cancer according to the Nottingham prognostic index in cases diagnosed in 1990–1999. Eur J Cancer 43(10):1548–1555. https://doi.org/10.1016/j.ejca.2007.01.016

    Article  Google Scholar 

  8. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6):394–424. https://doi.org/10.3322/caac.21492

    Article  Google Scholar 

  9. Camacho-Rivera M, Ragin C, Roach V, Kalwar T, Taioli E (2015) Breast cancer clinical characteristics and outcomes in Trinidad and Tobago. J Immigr Minor Health 17(3):765–772. https://doi.org/10.1007/s10903-013-9930-5

    Article  Google Scholar 

  10. Cox D (1972) Regression models and life tables. J R Stat Soc Ser B 34:187–220 Partial likelihood Biometrika 62–269. http://www.jstor.org/stable/2985181

  11. Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S (2012) The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486(7403):346–352. https://doi.org/10.1038/nature10983

    Article  Google Scholar 

  12. Datema FR, Moya A, Krause P, Bäck T, Willmes L, Langeveld T, Baatenburg de Jong RJ, Blom HM (2012) Novel head and neck cancer survival analysis approach: random survival forests versus Cox proportional hazards regression. Head Neck 34(1):50–58. https://doi.org/10.1002/hed.21698

    Article  Google Scholar 

  13. Dauphine C, Moazzez A, Neal JC, Chlebowski RT, Ozao-Choy J (2020) Single hormone receptor-positive breast cancers have distinct characteristics and survival. Ann Surg Oncol 27(12):4687–4694. https://doi.org/10.1245/s10434-020-08898-5

    Article  Google Scholar 

  14. Dietzel M, Schulz-Wendtland R, Ellmann S, Zoubi R, Wenkel E, Hammon M, Clauser P, Uder M, Runnebaum IB, Baltzer PA (2020) Automated volumetric radiomic analysis of breast cancer vascularization improves survival prediction in primary breast cancer. Sci Rep 10(1):1–1. https://doi.org/10.1038/s41598-020-60393-9

    Article  Google Scholar 

  15. Fong Y, Evans J, Brook D, Kenkre J, Jarvis P, Gower-Thomas K (2015) The Nottingham prognostic index: five-and ten-year data for all-cause survival within a screened population. Ann R Coll Surg Engl 97(2):137–139. https://doi.org/10.1308/003588414X14055925060514

    Article  Google Scholar 

  16. Friese CR, Li Y, Bondarenko I, Hofer TP, Ward KC, Hamilton AS, Deapen D, Kurian AW, Katz SJ (2017) Chemotherapy decisions and patient experience with the recurrence score assay for early-stage breast cancer. Cancer 123(1):43–51. https://doi.org/10.1002/cncr.30324

    Article  Google Scholar 

  17. Guo C, Wang J, Wang Y, Qu X, Shi Z, Meng Y, Qiu J, Hua K (2021) Novel artificial intelligence machine learning approaches to precisely predict survival and site-specific recurrence in cervical cancer: a multi-institutional study. Transl Oncol 14(5):101032. https://doi.org/10.1016/j.tranon.2021.101032

    Article  Google Scholar 

  18. Hashmi AA, Aijaz S, Khan SM, Mahboob R, Irfan M, Zafar NI, Nisar M, Siddiqui M, Edhi MM, Faridi N, Khan A (2018) Prognostic parameters of luminal a and luminal B intrinsic breast cancer subtypes of Pakistani patients. World J Surg Oncol 16(1):1–6. https://doi.org/10.1186/s12957-017-1299-9

    Article  Google Scholar 

  19. Haybittle JL, Blamey RW, Elston CW, Johnson J, Doyle PJ, Campbell FC, Nicholson RI, Griffiths K (1982) A prognostic index in primary breast cancer. Br J Cancer 45(3):361–366. https://doi.org/10.1038/bjc.1982.62

    Article  Google Scholar 

  20. Ishwaran H, Kogalur UB, Blackstone EH, Lauer MS (2008) Random survival forests. Ann Appl Stat 2(3):841–860. https://doi.org/10.1214/08-AOAS169

    Article  MathSciNet  MATH  Google Scholar 

  21. Jing B, Zhang T, Wang Z, Jin Y, Liu K, Qiu W, Ke L, Sun Y, He C, Hou D, Tang L (2019) A deep survival analysis method based on ranking. Artif Intell Med 98:1–9. https://doi.org/10.1016/j.artmed.2019.06.001

    Article  Google Scholar 

  22. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y (2018) DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18(1):1–2. https://doi.org/10.1186/s12874-018-0482-1

    Article  Google Scholar 

  23. Khalid M, Shah SI, Javaid M, Nadeem K, Kanwal T (2013) Frequency of estrogen and progesterone receptor status in breast cancer patients: a single institutional experience. Ann Punjab Med Coll (APMC) 7(1):6–9. https://doi.org/10.29054/apmc/2013.413

    Article  Google Scholar 

  24. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17. https://doi.org/10.1016/j.csbj.2014.11.005

    Article  Google Scholar 

  25. Kumar D, Klefsjö B (1994) Proportional hazards model: a review. Reliab Eng Syst Saf 44(2):177–188. https://doi.org/10.1016/0951-8320(94)90010-8

    Article  Google Scholar 

  26. Kurian AW, Friese CR (2015) Precision medicine in breast cancer care: an early glimpse of impact. JAMA Oncol 1(8):1109–1110. https://doi.org/10.1001/jamaoncol.2015.2719

    Article  Google Scholar 

  27. Lee C, Zame W, Yoon J, Van Der Schaar M (2018) Deephit: a deep learning approach to survival analysis with competing risks. InProceedings of the AAAI conference on artificial intelligence 2018 32(1)

  28. Moncada-Torres A, van Maaren MC, Hendriks MP, Siesling S, Geleijnse G (2021) Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival. Sci Rep 11(1):1–3. https://doi.org/10.1038/s41598-021-86327-7

    Article  Google Scholar 

  29. Nabi MG, Ahangar A, Kaneez S (2016) Estrogen receptors, progesterone receptors and their correlation with respect to HER-2/neu status, histological grade, size of lesion, lymph node metastasis, lymphovascular involvement and age in breast cancer patients in a hospital in North India. Asian J Med Sci 7(3):28–34. https://doi.org/10.3126/ajms.v7i3.13563

    Article  Google Scholar 

  30. Nasejje JB, Mwambi H, Dheda K, Lesosky M (2017) A comparison of the conditional inference survival forest model to random survival forests based on a simulation study as well as on two applications with time-to-event data. BMC Med Res Methodol 17(1):1–7. https://doi.org/10.1186/s12874-017-0383-8

    Article  Google Scholar 

  31. Ngiam KY, Khor W (2019) Big data and machine learning algorithms for health-care delivery. Lancet Oncol 20(5):e262–e273. https://doi.org/10.1016/S1470-2045(19)30149-4

    Article  Google Scholar 

  32. Omurlu IK, Ture M, Tokatli F (2009) The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer. Expert Syst Appl 36(4):8582–8588. https://doi.org/10.1016/j.eswa.2008.10.023

    Article  Google Scholar 

  33. Roder DM, de Silva P, Zorbas HM, Kollias J, Malycha PL, Pyke CM, Campbell ID (2012) Age effects on survival from early breast cancer in clinical settings in Australia. ANZ J Surg 82(7–8):524–528. https://doi.org/10.1111/j.1445-2197.2012.06114.x

    Article  Google Scholar 

  34. Siddarth BR, Kumar A, Kumar S, Sindhu N (2016) Clinicopathologic study of infiltrating carcinoma of breast and correlation with the ER/PR status. J Evol Med Dental Sci 5(20):1025–1033. https://doi.org/10.14260/jemds/2016/239

    Article  Google Scholar 

  35. Singh R, Gupta S, Pawar SB, Pawar RS, Gandham SV, Prabhudesai S (2014) Evaluation of ER, PR and HER-2 receptor expression in breast cancer patients presenting to a semi urban cancer centre in Western India. J Cancer Res Ther 10(1):26–28. https://doi.org/10.4103/0973-1482.131348

    Article  Google Scholar 

  36. Sohail SK, Sarfraz R, Imran M, Kamran M, Qamar S (2020) Estrogen and progesterone receptor expression in breast carcinoma and its association with Clinicopathological variables among the Pakistani population. Cureus 12(8):e9751. https://doi.org/10.7759/cureus.9751

    Article  Google Scholar 

  37. Todd JH, Dowle C, Williams MR, Elston CW, Ellis IO, Hinton CP, Blamey RW, Haybittle JL (1987) Confirmation of a prognostic index in primary breast cancer. Br J Cancer 56(4):489–492. https://doi.org/10.1038/bjc.1987.230

    Article  Google Scholar 

  38. Tong J, Zhao X (2022) Deep survival algorithm based on nuclear norm. J Stat Comput Simul 92(9):1964–1976. https://doi.org/10.1080/00949655.2021.2015770

    Article  MathSciNet  MATH  Google Scholar 

  39. Torre LA, Bray F, Siegel RL, Ferlay J, Lortet-Tieulent J, Jemal A (2015) Global cancer statistics, 2012. CA Cancer J Clin 65(2):87–108. https://doi.org/10.3322/caac.21262

    Article  Google Scholar 

  40. Vedashree MK, Rajalakshmi V (2016) Clinico-pathological study of breast carcinoma with correlation to hormone receptor status & HER2/neu. Indian J Pathol Oncol 3(4):690–695. https://doi.org/10.5958/2394-6792.2016.00128.9

    Article  Google Scholar 

  41. Zhou X, Gao F, Duan S, Zhang L, Liu Y, Zhou J, Bai G, Tao W (2020) Radiomic features of Pk-DCE MRI parameters based on the extensive Tofts model in application of breast cancer. Phys Eng Sci Med 43(2):517–524. https://doi.org/10.1007/s13246-020-00852-9

    Article  Google Scholar 

  42. Zhu W, Xie L, Han J, Guo X (2020) The application of deep learning in cancer prognosis prediction. Cancers 12(3):603. https://doi.org/10.3390/cancers12030603

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Keren Evangeline worked on the formulation of ideas, literature survey, dataset pre-processing, algorithm coding, and also wrote the manuscript. Glory Precious assisted with literature surveys and in writing the code in Python. Angeline Kirubha supervised the study, assisted with the development of ideas, and supported the manuscript writing process. The manuscript was read and approved by all authors.

Corresponding author

Correspondence to S. P. Angeline Kirubha.

Ethics declarations

Ethics approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

The authors declare that they have no conflict of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Evangeline I., K., Kirubha, S.P.A. & Precious, J.G. Survival analysis of breast cancer patients using machine learning models. Multimed Tools Appl 82, 30909–30928 (2023). https://doi.org/10.1007/s11042-023-14989-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-14989-8

Keywords

Navigation