A hybrid super ensemble learning model for the early-stage prediction of diabetes risk

Doğru, Ayşe; Buyrukoğlu, Selim; Arı, Murat

doi:10.1007/s11517-022-02749-z

A hybrid super ensemble learning model for the early-stage prediction of diabetes risk

Original Article
Published: 05 January 2023

Volume 61, pages 785–797, (2023)
Cite this article

Medical & Biological Engineering & Computing Aims and scope Submit manuscript

1368 Accesses
12 Citations
Explore all metrics

Abstract

Diabetes mellitus has become a rapidly growing chronic health problem worldwide. There has been a noticeable increase in diabetes cases in the last two decades. Recent advances in ensemble machine learning methods play an important role in the early detection of diabetes mellitus. These methods are both faster and less costly than traditional methods. This study aims to propose a new super ensemble learning model to enable an early diagnosis of diabetes mellitus. Super learner is a cross-validation-based approach that makes better predictions by combining prediction results of more than one machine learning algorithm. The proposed super learner model was created with four base-learners (logistic regression, decision tree, random forest, gradient boosting) and a meta learner (support vector machines) as a result of a case study. Three different dataset were used to measure the robustness of the proposed model. Chi-square was determined as an optimal feature selection technique from five different techniques, and also hyper-parameter settings were made with GridSearch. Finally, the proposed new super learner model achieved to obtain the best accuracy results in the detection of Diabetes mellitus compared to the base-learners for the early-stage diabetes risk prediction (99.6%), PIMA (92%), and diabetes 130-US hospitals (98%) dataset, respectively. This study revealed that super learner algorithms can be effectively used in the detection of diabetes mellitus. Also, obtaining of the high and convincing statistical scores shows the robustness of the proposed super learner model.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms

Article 24 November 2021

A stacked ensemble machine learning approach for the prediction of diabetes

Article 22 November 2023

iDP: ML-driven diabetes prediction framework using deep-ensemble modeling

Article 21 November 2023

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

Federation ID (2021) “IDF Diabetes Atlas | Tenth Edition,” International Diabetes Federatio 2021. [Online]. Available: https://diabetesatlas.org/. [Accessed: 03-Nov-2021]
Sreedharan J et al (2015) Incidence of type 2 diabetes mellitus among Emirati residents in Ajman, United Arab Emirates. Korean J Fam Med 36(5):253–257. https://doi.org/10.4082/KJFM.2015.36.5.253
Article PubMed PubMed Central Google Scholar
Ki R (2007) Diabetes treatment–bridging the divide. N Engl J Med 356(15):1499–1501. https://doi.org/10.1056/NEJMP078030
Article Google Scholar
Gavin JR et al (2003) Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care 26(SUPPL. 1):s5–s20. https://doi.org/10.2337/diacare.26.2007.s5
Article Google Scholar
Horton WB, Barrett EJ (2021) Microvascular Dysfunction in diabetes mellitus and cardiometabolic disease. Endocr Rev 42(1):29–55. https://doi.org/10.1210/ENDREV/BNAA025
Article PubMed Google Scholar
Buyrukoğlu S, Yılmaz Y, Topalcengiz Z (2022) “Correlation value determined to increase Salmonella prediction success of deep neural network for agricultural waters,” Environ Monit Assess 2022 1945 vol. 194, no. 5, pp. 1–12 April 2022. https://doi.org/10.1007/S10661-022-10050-7
Savaş T, Savaş S (2021) “Tekdüzen Kaynak Bulucu Yoluyla Kimlik Avı Tespiti için Makine Öğrenmesi Algoritmalarının Özellik Tabanlı Performans Karşılaştırması,” Politek Derg pp. 1–1, Dec. 2021. https://doi.org/10.2339/POLITEKNIK.1035286
Buyrukoglu S, Serkan S (2022) Stacked-based ensemble machine learning model for positioning footballer. Arab J Sci Eng 2022:1–13. https://doi.org/10.1007/S13369-022-06857-8
Article Google Scholar
Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) “Predicting Diabetes mellitus with machine learning techniques,” Front Genet vol. 9, Nov. 2018. https://doi.org/10.3389/fgene.2018.00515.
Shamreen Ahamed B, Sumeet Arya M (2021) “Prediction of type-2 diabetes using the LGBM classifier methods and techniques,” May 2021
Emon MU, Keya MS, Kaiser MS, Islam MA, Tanha T, Zulfiker MS (2021) Primary stage of diabetes prediction using machine learning approaches. Proceed - Int Conference Artificial Intel Smart Syst ICAIS 2021:364–367. https://doi.org/10.1109/ICAIS50930.2021.9395968
Article Google Scholar
Buyrukoğlu S, Akbaş A (2022) Machine Learning based early prediction of type 2 diabetes: a new hybrid feature selection approach using correlation matrix with heatmap and SFS. Balk. J Electr Comput Eng 10(2):110–117. https://doi.org/10.17694/BAJECE.973129
Article Google Scholar
Taz NH, Islam A, Mahmud I (2021) “A comparative analysis of ensemble based machine learning techniques for diabetes ıdentification,” 2021, pp. 1–6. https://doi.org/10.1109/icrest51555.2021.9331036
Yadav DC, Pal S (2021) An experimental study of diversity of diabetes disease features by bagging and boosting ensemble method with rule based machine learning classifier algorithms. SN Comput Sci 2(1):50. https://doi.org/10.1007/s42979-020-00446-y
Article Google Scholar
Saxena S, Mohapatra D, Padhee S, Sahoo G K (2021) Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evol. Intell 1:1–17. https://doi.org/10.1007/S12065-021-00685-9
Article Google Scholar
Google Colab, “Google Colab,” Getting Started - Introduction 2020. [Online]. Available: https://research.google.com/colaboratory/faq.html. [Accessed: 01-Nov-2022]
Islam MMF, Ferdousi R, Rahman S, Bushra HY (2020) “UCI machine learning repository: early stage diabetes risk prediction dataset. Data Set,” [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset. [Accessed: 14-Oct-2021].
Urdan TC (2010) “Statistics in plain english,” Stat Plain English p. 211, 2010, https://doi.org/10.4324/9781410612816
Yadav S, Shukla S (2016) “Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification,” Proc - 6th Int Adv Comput Conf IACC 2016, pp. 78–83. https://doi.org/10.1109/IACC.2016.25
Van Der Laan MJ, Polley EC, Hubbard AE (2007) “Super learner,” Stat Appl Genet Mol Biol 6. 1. https://doi.org/10.2202/1544-6115.1309
Kabir MF, Ludwig SA (2019) Enhancing the performance of classification using super learning. Data-Enabled Discov Appl 3(1):1–13. https://doi.org/10.1007/s41688-019-0030-0
Article Google Scholar
Perveen S, Shahbaz M, Guergachi A, Keshavjee K (2016) Performance analysis of data mining classification techniques to predict diabetes. Procedia Comput Sci 82:115–121. https://doi.org/10.1016/j.procs.2016.04.016
Article Google Scholar
Phillips RC, van der Laan MJ, Lee H, Gruber S (2022) “Practical considerations for specifying a super learner,” arXiv, p. arXiv:2204.06139, Apr. 2022
Hosmer DW, Lemeshow S, Sturdivant RX (2013) “Applied logistic regression: third edition,” Appl Logist Regres Third Ed., pp. 1–510, https://doi.org/10.1002/9781118548387
Mason L, Baxter J, Bartlett P, Frean M (2000) “Boosting algorithms as gradient descent,” Adv Neural Inf Process Syst pp. 512–518
Quilan JR (1988) Decision trees and multi-valued attributes | Machine intelligence 11
Rokach L, Maimon O (2006) “Decision trees,” in Data Mining and Knowledge Discovery Handbook, Springer-Verlag 165–192
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Liaw A, Wiener M (2002) “Classification and regression by randomForest,”
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/bf00994018
Article Google Scholar
Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Article Google Scholar
Sadhu A, Jadli A (2021) “This work is licensed under a Creative Commons Attribution 4.0 International License Early-Stage Diabetes Risk Prediction: A Comparative Analysis of Classification Algorithms,” Int Adv Res J Sci Eng Technol 2 (8). https://doi.org/10.17148/IARJSET.2021.8228
Alpan K, ılgi GS (2020) “classification of diabetes dataset with data mining techniques by using weka approach,” 4th Int Symp Multidiscip Stud Innov Technol ISMSIT 2020 - Proc. https://doi.org/10.1109/ISMSIT50672.2020.9254720.
Xue J, Min F, Ma F (2020) “Research on diabetes prediction method based on machine learning,” J Phys Conf Ser Pap • OPEN ACCESS J Phys Conf Ser (1684) 12062. https://doi.org/10.1088/1742-6596/1684/1/012062
Ozer I (2020) “Uzun Kısa Dönem Bellek Ağlarını Kullanarak Erken Aşama Diyabet Tahmini early-stage diabetes prediction using long short-term memory networks. Müh Bil ve Araş Derg 2(2):50–57
Article Google Scholar
Kumari S, Kumar D, Mittal M (2021) An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cogn Comput Eng 2:40–46. https://doi.org/10.1016/j.ijcce.2021.01.001
Article Google Scholar
Akula R, Nguyen N, Garibay I (2019) “Supervised machine learning based ensemble model for accurate prediction of type 2 diabetes,” in Conference Proceedings - IEEE SOUTHEASTCON, vol. 2019-April, https://doi.org/10.1109/SoutheastCon42311.2019.9020358
Lai H, Huang H, Keshavjee K, Guergachi A, Gao X (2019) “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocr Disord 19 1. https://doi.org/10.1186/s12902-019-0436-6
Birjais R, Mourya AK, Chauhan R, Kaur H (2019) Prediction and diagnosis of future diabetes risk: a machine learning approach. SN Appl Sci 1(9):1–8. https://doi.org/10.1007/s42452-019-1117-9
Article Google Scholar
Hammoudeh A, Al-Naymat G, Ghannam I, Obied N (2018) Predicting hospital readmission among diabetics using deep learning. Procedia Comput Sci 141:484–489. https://doi.org/10.1016/j.procs.2018.10.138
Article Google Scholar
Steinhardt J (2018) “Robust learnıng: ınformatıon theory and algorıthms A Dıssertatıon Submıtted To The Department Of Computer Scıence And The Commıttee On Graduate Studıes Of Stanford Unıversıty In Partıal Fulfıllment Of The Requırements For The Degree Of Doctor Of Phılosophy,”
Li JZ (2018) “Principled approaches to robust machine learning and beyond,”

Download references

Acknowledgements

The authors gratefully acknowledge partial support of the Faculties of Engineering at Çankırı Karatekin University.

Author information

Authors and Affiliations

Department of Electrical and Electronics Engineering, Çankırı Karatekin University, 18100, Çankırı, Turkey
Ayşe Doğru & Murat Arı
Department of Computer Engineering, Çankırı Karatekin University, 18100, Çankırı, Turkey
Selim Buyrukoğlu

Authors

Ayşe Doğru
View author publications
You can also search for this author in PubMed Google Scholar
Selim Buyrukoğlu
View author publications
You can also search for this author in PubMed Google Scholar
Murat Arı
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Publicly available data were used in this study. Conceptualization, formal analysis, methodology, and writing – original draft was performed by Ayşe Doğru, Selim Buyrukoğlu, and Murat Arı. Resources, software, supervision, writing – review, and editing were organized by Ayşe Doğru, Selim Buyrukoğlu, and Murat Arı.

Corresponding author

Correspondence to Selim Buyrukoğlu.

Ethics declarations

Conflict of ınterest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Doğru, A., Buyrukoğlu, S. & Arı, M. A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Med Biol Eng Comput 61, 785–797 (2023). https://doi.org/10.1007/s11517-022-02749-z

Download citation

Received: 31 July 2022
Accepted: 22 December 2022
Published: 05 January 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11517-022-02749-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A hybrid super ensemble learning model for the early-stage prediction of diabetes risk