Skip to main content

Data Mining Approach to Classify Cases of Lung Cancer

  • Conference paper
  • First Online:
Book cover Trends and Applications in Information Systems and Technologies (WorldCIST 2021)

Abstract

According to the World Cancer Research Fund, a leading authority on cancer prevention research, lung cancer is the most commonly occurring cancer in men and the third most commonly occurring cancer in women, with the 5-year relative survival percentage being significantly low. Smoking is the major risk factor for lung cancer and the symptoms associated with it include cough, fatigue, shortness of breath, chest pain, weight loss, and loss of appetite. In an attempt to build a model capable of identifying individuals with lung cancer, this study aims to build a data mining classification model to predict whether or not a patient has lung cancer based on crucial features such as the above mentioned symptoms. Through the CRISP-DM methodology and the RapidMiner software, different models were built, using different scenarios, algorithms, sampling methods, and data approaches. The best data mining model achieved an accuracy of 93%, a sensitivity of 96%, a specificity of 90% and a precision of 91%, using the Artificial Neural Network algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. World Cancer Research Fund. Lung cancer statistics. https://www.wcrf.org/dietandcancer/cancer-trends/lung-cancer-statistics. Accessed 10 Nov 2020

  2. Hirsch, F.R., Franklin, W.A., Gazdar, A.F., Bunn, P.A.: Early detection of lung cancer: clinical perspectives of recent advances in biology and radiology. Clin. Cancer Res. 7(1), 5–22 (2001)

    Google Scholar 

  3. Morais, A., Peixoto, H., Coimbra, C., Abelha, A., Machado, J.: Predicting the need of neonatal resuscitation using data mining. Procedia Comput. Sci. 113, 571–576 (2017). https://doi.org/10.1016/j.procs.2017.08.287

    Article  Google Scholar 

  4. Hand, D.J., Adams, N.M.: Data mining. In: Wiley StatsRef: Statistics Reference Online, pp. 1–7 (2014). https://doi.org/10.1002/9781118445112.stat06466.pub2

  5. Centers for Disease Control and Prevention. U.S. Cancer Statistics Data Visualizations Tool. https://www.cdc.gov/cancer/uscs/dataviz/index.htm. Accessed 10 Nov 2020

  6. Torre, L.A., Siegel, R.L., Jemal, A.: Lung cancer statistics. In: Lung Cancer and Personalized Medicine, pp. 1–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-24223-1_1

  7. Biesalski, H.K., De Mesquita, B.B., Chesson, A., et al.: European consensus statement on lung cancer: risk factors and prevention. Lung cancer panel. CA Cancer J. Clin. 48(3), 167–176 (1998). https://doi.org/10.3322/canjclin.48.3.167

  8. Bradley, S.H., Kennedy, M.P., Neal, R.D.: Recognising lung cancer in primary care. Adv. Ther. 36(1), 19–30 (2019). https://doi.org/10.1007/s12325-018-0843-5

    Article  Google Scholar 

  9. Martins, B., Ferreira, D., Neto, C., Abelha, A., Machado, J.: Data mining for cardiovascular disease prediction. J. Med. Syst. 45(1), 1–8 (2021)

    Google Scholar 

  10. Krishnaiah, V., Narsimha, G., Chandra, N.S.: Diagnosis of lung cancer prediction system using data mining classification techniques. Int. J. Comput. Sci. Inf. Technol. 4(1), 39–45 (2013)

    Google Scholar 

  11. Nasser, I.M., Abu-Naser, S.S.: Lung cancer detection using artificial neural network. Int. J. Eng. Inf. Syst. (IJEAIS) 3(3), 17–23 (2019)

    Google Scholar 

  12. Murty, N.R., Babu, M.P.: A critical study of classification algorithms for lungcancer disease detection and diagnosis. Int. J. Comput. Intell. Res. 13(5), 1041–1048 (2017)

    Google Scholar 

  13. Wirth, R., Hipp, J.: CRISP-DM: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining, pp. 29–39. Springer, London (2000)

    Google Scholar 

  14. Kaggle – Lung Cancer Dataset By Staceyinrobert. https://www.kaggle.com/imkrkannan/lung-cancer-dataset-by-staceyinrobert. Accessed 06 Nov 2020

  15. Ferreira, D., Silva, S., Abelha, A., Machado, J.: Recommendation system using autoencoders. Appl. Sci. 10(16), 5510 (2020). https://doi.org/10.3390/app10165510. MDPI

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Machado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vieira, E., Ferreira, D., Neto, C., Abelha, A., Machado, J. (2021). Data Mining Approach to Classify Cases of Lung Cancer. In: Rocha, Á., Adeli, H., Dzemyda, G., Moreira, F., Ramalho Correia, A.M. (eds) Trends and Applications in Information Systems and Technologies. WorldCIST 2021. Advances in Intelligent Systems and Computing, vol 1365. Springer, Cham. https://doi.org/10.1007/978-3-030-72657-7_49

Download citation

Publish with us

Policies and ethics