Abstract
According to the World Cancer Research Fund, a leading authority on cancer prevention research, lung cancer is the most commonly occurring cancer in men and the third most commonly occurring cancer in women, with the 5-year relative survival percentage being significantly low. Smoking is the major risk factor for lung cancer and the symptoms associated with it include cough, fatigue, shortness of breath, chest pain, weight loss, and loss of appetite. In an attempt to build a model capable of identifying individuals with lung cancer, this study aims to build a data mining classification model to predict whether or not a patient has lung cancer based on crucial features such as the above mentioned symptoms. Through the CRISP-DM methodology and the RapidMiner software, different models were built, using different scenarios, algorithms, sampling methods, and data approaches. The best data mining model achieved an accuracy of 93%, a sensitivity of 96%, a specificity of 90% and a precision of 91%, using the Artificial Neural Network algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
World Cancer Research Fund. Lung cancer statistics. https://www.wcrf.org/dietandcancer/cancer-trends/lung-cancer-statistics. Accessed 10 Nov 2020
Hirsch, F.R., Franklin, W.A., Gazdar, A.F., Bunn, P.A.: Early detection of lung cancer: clinical perspectives of recent advances in biology and radiology. Clin. Cancer Res. 7(1), 5–22 (2001)
Morais, A., Peixoto, H., Coimbra, C., Abelha, A., Machado, J.: Predicting the need of neonatal resuscitation using data mining. Procedia Comput. Sci. 113, 571–576 (2017). https://doi.org/10.1016/j.procs.2017.08.287
Hand, D.J., Adams, N.M.: Data mining. In: Wiley StatsRef: Statistics Reference Online, pp. 1–7 (2014). https://doi.org/10.1002/9781118445112.stat06466.pub2
Centers for Disease Control and Prevention. U.S. Cancer Statistics Data Visualizations Tool. https://www.cdc.gov/cancer/uscs/dataviz/index.htm. Accessed 10 Nov 2020
Torre, L.A., Siegel, R.L., Jemal, A.: Lung cancer statistics. In: Lung Cancer and Personalized Medicine, pp. 1–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24223-1_1
Biesalski, H.K., De Mesquita, B.B., Chesson, A., et al.: European consensus statement on lung cancer: risk factors and prevention. Lung cancer panel. CA Cancer J. Clin. 48(3), 167–176 (1998). https://doi.org/10.3322/canjclin.48.3.167
Bradley, S.H., Kennedy, M.P., Neal, R.D.: Recognising lung cancer in primary care. Adv. Ther. 36(1), 19–30 (2019). https://doi.org/10.1007/s12325-018-0843-5
Martins, B., Ferreira, D., Neto, C., Abelha, A., Machado, J.: Data mining for cardiovascular disease prediction. J. Med. Syst. 45(1), 1–8 (2021)
Krishnaiah, V., Narsimha, G., Chandra, N.S.: Diagnosis of lung cancer prediction system using data mining classification techniques. Int. J. Comput. Sci. Inf. Technol. 4(1), 39–45 (2013)
Nasser, I.M., Abu-Naser, S.S.: Lung cancer detection using artificial neural network. Int. J. Eng. Inf. Syst. (IJEAIS) 3(3), 17–23 (2019)
Murty, N.R., Babu, M.P.: A critical study of classification algorithms for lungcancer disease detection and diagnosis. Int. J. Comput. Intell. Res. 13(5), 1041–1048 (2017)
Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Springer, London (2000)
Kaggle – Lung Cancer Dataset By Staceyinrobert. https://www.kaggle.com/imkrkannan/lung-cancer-dataset-by-staceyinrobert. Accessed 06 Nov 2020
Ferreira, D., Silva, S., Abelha, A., Machado, J.: Recommendation system using autoencoders. Appl. Sci. 10(16), 5510 (2020). https://doi.org/10.3390/app10165510. MDPI
Acknowledgments
This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Vieira, E., Ferreira, D., Neto, C., Abelha, A., Machado, J. (2021). Data Mining Approach to Classify Cases of Lung Cancer. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies. WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1365. Springer, Cham. https://doi.org/10.1007/978-3-030-72657-7_49
Download citation
DOI: https://doi.org/10.1007/978-3-030-72657-7_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72656-0
Online ISBN: 978-3-030-72657-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)