Diabetic risk prognosis with tree ensembles integrating feature attribution methods

Hansen, James

doi:10.1007/s12065-021-00663-1

Diabetic risk prognosis with tree ensembles integrating feature attribution methods

Special Issue
Published: 18 September 2021

Volume 17, pages 419–428, (2024)
Cite this article

Evolutionary Intelligence Aims and scope Submit manuscript

James Hansen ORCID: orcid.org/0000-0001-9785-2776¹

1528 Accesses
Explore all metrics

Abstract

Tree ensemble machine learning models offer particular promise for medical applications because of their ability to handle both continuous and categorical data, their faculty for modeling nonlinear relationships, and ease with which hyperparameters can be adapted to improve performance. Modern methods include Random Forests, XGBoost and LightGBM, which are robust across many areas of diagnosis, prognosis, and medical treatments. Yet a critical limiting factor of ensembles is that they are difficult to interpret due to their complex inner workings. In medicine the ability to explain and interpret a model can be vital for clinical acceptance and trust. Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. Utilizing the NHANES diabetes mortality data set, it is shown that the Random Forests ensemble with optimized hyperparameters yields a strong prognosis model. Importantly, conjoining Random Forests with SHapley Additive exPlanations (SHAP) yields reliable interpretability of the contributions and interactions among the features. SHAP results are compared to the recently proposed Agnostic Permutation algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimizing heart disease diagnosis with advanced machine learning models: a comparison of predictive performance

Article Open access 22 March 2025

Efficient diagnosis of diabetes mellitus using an improved ensemble method

Article Open access 25 January 2025

Predicting Heart Disease with Multiple Classifiers

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Lundberg S, Lee S (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30:4768–4777
Google Scholar
Lundberg S, Nair B, Vavilala M, Mayumi H, Eisses M, Adams T, Liston D, Low D, Shu-Fang Newman S, Kim J (2017) Explainable machine learning predictions to help anesthesiologists prevent hypoxemia during surgery. bioRxiv, 206540
Leon B, Maddox B (2015) Diabetes and cardiovascular disease: epidemiology, biological mechanisms, treatment recommendations and future research. World J Diabetes 6:1246–1258
Article PubMed PubMed Central Google Scholar
Oh J, Yun K, Maoz U, Kim T, Chae J (2019) Identifying depression in the national health and nutrition examination survey data using a deep learning algorithm. J Affect Disord 257:623–631
Article PubMed Google Scholar
Dipnall J, Pasco J, Berk M, Williams S, Dodd S, Jacka F, Meyer D (2016) Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression. PLoS One 11(2):e014819511
Article Google Scholar
Boiarskaia E (2016) Recognizing cardiovascular disease patterns with machine learning using NHANES accelerometer determined physical activity data. Doctoral dissertation, University of Illinois, Champaign
Google Scholar
Vangeepuram N, Liu B, Chu P, Wang L, Pandey G (2019) Predicting Youth diabetes risk using NHANES data and machine learning. Sci Rep 11(1):1–9
Google Scholar
Dinh A, Miertschin M, Mohanty S (2019) A data-driven approach to predicting diabetes and cardiovascular disease with machine learning. BMC Med Inform Decis Mak 19:1–15
Article Google Scholar
Bach S (2015) Pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation. PLoS One 10(7):e0130140
Article PubMed PubMed Central Google Scholar
Ribeiro M, Singh S, Guestrin C (2016) Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Saabas A Interpreting random forests. http://blog.datadive.net/interpreting-random-forests/
Shrikumar A (2016) Not just a black box: learning important features through propagating activation differences. In: arXiv preprint http://arxiv.org/arXiv:1605.01713.
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Fisher A, Rudin C, Dominici F (2018) Model class reliance: variable importance measures for any machine learning model class, from the “Rashomon perspective.” http://arxiv.org/abs/1801.01489.
Gunning D, Aha D (2019) DARPA’s explainable artificial intelligence (XAI) program. AI Mag 40(2):44–58. https://doi.org/10.1609/aimag.v40i2.2850
Article Google Scholar
Arrieta A, Diaz N, Ser J, Bennetot A, Tabik S, Barbado A, Garcia S, Lopez S, Molina D, Benjamins R, Chatila R, Herrera F (2020) Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58:82–115
Article Google Scholar

Download references

Author information

Authors and Affiliations

Marriott School, Brigham Young University, Provo, UT, 84602, USA
James Hansen

Authors

James Hansen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to James Hansen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hansen, J. Diabetic risk prognosis with tree ensembles integrating feature attribution methods. Evol. Intel. 17, 419–428 (2024). https://doi.org/10.1007/s12065-021-00663-1

Download citation

Received: 24 March 2021
Revised: 20 August 2021
Accepted: 21 August 2021
Published: 18 September 2021
Issue Date: February 2024
DOI: https://doi.org/10.1007/s12065-021-00663-1

Keywords

Part of a collection:

Special Issue: Towards robust explainable and interpretable artificial intelligence

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diabetic risk prognosis with tree ensembles integrating feature attribution methods

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing heart disease diagnosis with advanced machine learning models: a comparison of predictive performance

Efficient diagnosis of diabetes mellitus using an improved ensemble method

Predicting Heart Disease with Multiple Classifiers

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now