Abstract
Despite being one of the deadliest diseases and the enormous evolution in fighting it, the best methods to predict kidney cancer, namely Renal-Cell Carcinomas (RCC), are not well-known. One of the solutions to accelerate the current knowledge about RCC is through the use of Data Mining techniques based on patients' personal and clinical data. Therefore, it is crucial to understand which techniques are the most suitable to extract knowledge about this disease. In this paper, we followed the CRISP-DM methodology to simulate different techniques to determine the ones with the best predictive performance. For this purpose, we used a dataset of 821 records of RCC patients, obtained from The Cancer Genome Atlas. The present work tests different Data Mining techniques, that can be used to predict the 5-year life expectancy of patients with renal cancer and to predict the number of days to death for patients who have a life expectancy of less than 5 years. The results obtained demonstrated that the best algorithm for estimating the vital status at 5 years was Random Forest. This algorithm presented an accuracy of 87.65% and an AUROC of 0.931. For the prediction of days to death, the best performance was obtained with the k-Nearest Neighbors algorithm with a root mean square error of 354.6 days. The work suggested that Data Mining techniques can help to understand the influence of various risk factors on the life expectancy of patients with RCC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sung, H., et al.: Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA. Cancer J. Clin. 71, 209–249 (2021). https://doi.org/10.3322/caac.21660
Hsieh, J.J., et al.: Renal cell carcinoma. Nat. Rev. Dis. Prim. 3, 1–19 (2017). https://doi.org/10.1038/nrdp.2017.9
Choueiri, T.K., Motzer, R.J.: Systemic therapy for metastatic renal-cell carcinoma. N. Engl. J. Med. 376, 354–366 (2017)
Dizman, N., Philip, E.J., Pal, S.K.: Genomic profiling in renal cell carcinoma. Nat. Rev. Nephrol. 16, 435–451 (2020). https://doi.org/10.1038/s41581-020-0301-x
Brierley, J.D., Gospodarowicz, M.K., Wittekind, C. (eds.): TNM Classification of Malignant Tumours. Wiley Blackwell (2017)
National Cancer Institute: Cancer Staging. https://www.cancer.gov/about-cancer/diagnosis-staging/staging. Accessed 08 June 2021
Scelo, G., Larose, T.L.: Epidemiology and risk factors for kidney cancer. J. Clin. Oncol. 36, 3574–3581 (2018). https://doi.org/10.1200/JCO.2018.79.1905
American Cancer Society: Survival Rates for Kidney Cancer. https://www.cancer.org/cancer/kidney-cancer/detection-diagnosis-staging/survival-rates.html. Accessed 08 June 2021
Jagga, Z., Gupta, D.: Classification models for clear cell renal carcinoma stage progression, based on tumor RNAseq expression trained supervised machine learning algorithms. BMC Proc. 8, 1–7 (2014). https://doi.org/10.1186/1753-6561-8-S6-S2
Rady, E.-H.A., Anwar, A.S.: Prediction of kidney disease stages using data mining algorithms. Inf. Med. Unlocked. 15, 100178 (2019). https://doi.org/10.1016/j.imu.2019.100178
Ola, A.F.: A model for prediction of kidney cancer using data analytics technique. Am. J. Data Min. Knowl. Discov. 5, 27–36 (2020). https://doi.org/10.11648/j.ajdmkd.20200502.12
Grossman, R.L., et al.: Toward a shared vision for cancer genomic data. N. Engl. J. Med. 375, 1109–1112 (2016). https://doi.org/10.1056/nejmp1607591
National Cancer Institute: TCGA Cancers Selected for Study. https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga/studied-cancers. Accessed 17 June 2021
RapidMiner. https://rapidminer.com/. Accessed 07 May 2021
Morais, A., Peixoto, H., Coimbra, C., Abelha, A., Machado, J.: Predicting the need of neonatal resuscitation using data mining. In: Procedia Computer Science, pp. 571–576. Elsevier B.V. (2017). https://doi.org/10.1016/j.procs.2017.08.287
Dickie, L., Johnson, C., Adams, S., Negoita, S.: Solid Tumor Rules. National Cancer Institute, Rockville, MD (2020)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953
Peixoto, C., Peixoto, H., Machado, J., Abelha, A., Santos, M.F.: Iron value classification in patients undergoing continuous ambulatory peritoneal dialysis using data mining. In: Proceedings of the 4th International Conference on Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE), pp. 285–290. SCITEPRESS (2018). https://doi.org/10.5220/0006820802850290
Acknowledgements
This work is funded by “FCT—Fundação para a Ciência e Tecnologia” within the R&D Units Project Scope: UIDB/00319/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Duarte, A., Peixoto, H., Machado, J. (2022). A Comparative Study of Data Mining Techniques Applied to Renal-Cell Carcinomas. In: Spinsante, S., Silva, B., Goleva, R. (eds) IoT Technologies for Health Care. HealthyIoT 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 432. Springer, Cham. https://doi.org/10.1007/978-3-030-99197-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-99197-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99196-8
Online ISBN: 978-3-030-99197-5
eBook Packages: Computer ScienceComputer Science (R0)