Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Rai, Teena; Shen, Yuan; Kaur, Jaspreet; He, Jun; Mahmud, Mufti; Brown, David J.; Baldwin, David R.; O’Dowd, Emma; Hubbard, Richard

doi:10.1007/978-3-031-34344-5_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13897))

Included in the following conference series:

International Conference on Artificial Intelligence in Medicine

1758 Accesses

Abstract

Lung cancer has the highest cancer mortality rate in the UK. Most patients are diagnosed at an advanced stage because common symptoms for lung cancer such as cough, pain, dyspnoea and anorexia are also present in other diseases. This partly attributes towards the low survival rate. Therefore, it is crucial to screen high risk patients for lung cancer at an early stage through computed tomography (CT) scans. As shown in a previous study, for patients who were screened for lung cancer and were identified with stage I lung cancer, the estimated survival rate was 88% compared to only 5% who have stage IV lung cancer. This paper aims to build tree-based machine learning models for predicting lung cancer risk by extracting significant factors associated with lung cancer. The Clinical Practice Research Datalink (CPRD) data was used in this study which are anonymised patient data collected from 945 general practices across the UK. Two tree-based models (decision trees and random forest) are developed and implemented. The performance of the two models is compared with a logistic regression model in terms of accuracy, Area Under the Receiver Operating Characteristic curve (AUROC), sensitivity and specificity, and both achieve better results. However, as for interpretability, it was found that, unlike coefficients in logistic regression, the default feature importance is non-negative in random forests and decision trees. This makes tree-based models less interpretable than logistic regression.

Supported by Nottingham Trent University Medical Technologies and Advanced Materials Strategic Research Theme. Teena Rai is funded by NTU VC PhD studentship.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Early-Stage Lung Cancer Prediction: A Machine Learning Approach

Performance Evaluation of Various Machine Learning Algorithms for Lung Cancer Prediction Using Demographic Data

Prognosis and risk factor assessment of patients with advanced lung cancer with low socioeconomic status: model development and validation

Article Open access 10 September 2024

References

Briggs, E., de Kamps, M., Hamilton, W., Johnson, O., McInerney, C.D., Neal, R.D.: Machine learning for risk prediction of Oesophago-gastric cancer in primary care: comparison with existing risk-assessment tools. Cancers 14(20), 5023 (2022). https://doi.org/10.3390/cancers14205023
Article Google Scholar
Cassidy, A., et al.: The LLP risk model: an individual risk prediction model for lung cancer. Br. J. Cancer 98(2), 270–276 (2008). https://doi.org/10.1038/sj.bjc.6604158
Article Google Scholar
Doll, R., Peto, R., Boreham, J., Sutherland, I.: Mortality in relation to smoking: 50 years’ observations on male British doctors. BMJ 328(7455), 1519 (2004). https://doi.org/10.1136/bmj.38142.554479.AE
Article Google Scholar
Durham, A.L., Adcock, I.M.: The Relationship between COPD and Lung Cancer. Lung Can. 90(2), 121–127 (2015). https://doi.org/10.1016/j.lungcan.2015.08.017
Article Google Scholar
Gould, M.K., Huang, B.Z., Tammemagi, M.C., Kinar, Y., Shiff, R.: Machine learning for early lung cancer identification using routine clinical and laboratory data. Am. J. Respir. Crit. Care Med. 204(4), 445–453 (2021). https://doi.org/10.1164/rccm.202007-2791OC
Article Google Scholar
Liu, X.-Y., Wu, J., Zhou, Z.-H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2009). https://doi.org/10.1109/TSMCB.2008.2007853
Raji, O.Y., et al.: Predictive accuracy of the liverpool lung project risk model for stratifying patients for computed tomography screening for lung cancer. Ann. Int. Med. 157(4), 242–250 (2012). https://doi.org/10.7326/0003-4819-157-4-201208210-00004
Sagi, O., Rokach, L.: Explainable decision forest: transforming a decision forest into an interpretable tree. Inf. Fusion 61, 124–138 (2020). https://doi.org/10.1016/j.inffus.2020.03.013
Article Google Scholar
Shen, Y., et al.: A logistic regression approach to a joint classification and feature selection in lung cancer screening using CPRD data. In: 2022 2nd International Conference on Trends in Electronics and Health Informatics (2022)
Google Scholar
Tammemägi, M.C., et al.: Selection criteria for lung-cancer screening. N. Engl. J. Med. 368(8), 728–736 (2013). https://doi.org/10.1056/NEJMoa1211776
Article Google Scholar
Tammemägi, M.C., et al.: Development and validation of a multivariable lung cancer risk prediction model that includes low-dose computed tomography screening results: a secondary analysis of data from the national lung screening trial. JAMA Netw. Open 2(3), e190204 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Nottingham Trent University, Clifton Lane, Nottingham, NG11 8NS, UK
Teena Rai, Yuan Shen, Jun He, Mufti Mahmud & David J. Brown
Division of Epidemiology and Public Health, University of Nottingham, Nottingham, NG5 1PB, UK
Jaspreet Kaur, David R. Baldwin, Emma O’Dowd & Richard Hubbard

Authors

Teena Rai
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Jaspreet Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Jun He
View author publications
You can also search for this author in PubMed Google Scholar
Mufti Mahmud
View author publications
You can also search for this author in PubMed Google Scholar
David J. Brown
View author publications
You can also search for this author in PubMed Google Scholar
David R. Baldwin
View author publications
You can also search for this author in PubMed Google Scholar
Emma O’Dowd
View author publications
You can also search for this author in PubMed Google Scholar
Richard Hubbard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Teena Rai .

Editor information

Editors and Affiliations

University of Murcia, Murcia, Spain
Jose M. Juarez
Universitat Jaume I, Castellón de la Plana, Spain
Mar Marcos
University of Maribor, Maribor, Slovenia
Gregor Stiglic
Brunel University London, Uxbridge, UK
Allan Tucker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rai, T. et al. (2023). Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data. In: Juarez, J.M., Marcos, M., Stiglic, G., Tucker, A. (eds) Artificial Intelligence in Medicine. AIME 2023. Lecture Notes in Computer Science(), vol 13897. Springer, Cham. https://doi.org/10.1007/978-3-031-34344-5_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-34344-5_4
Published: 05 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34343-8
Online ISBN: 978-3-031-34344-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Early-Stage Lung Cancer Prediction: A Machine Learning Approach

Performance Evaluation of Various Machine Learning Algorithms for Lung Cancer Prediction Using Demographic Data

Prognosis and risk factor assessment of patients with advanced lung cancer with low socioeconomic status: model development and validation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Decision Tree Approaches to Select High Risk Patients for Lung Cancer Screening Based on the UK Primary Care Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Early-Stage Lung Cancer Prediction: A Machine Learning Approach

Performance Evaluation of Various Machine Learning Algorithms for Lung Cancer Prediction Using Demographic Data

Prognosis and risk factor assessment of patients with advanced lung cancer with low socioeconomic status: model development and validation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation