Skip to main content

Advertisement

Log in

A hybrid super ensemble learning model for the early-stage prediction of diabetes risk

  • Original Article
  • Published:
Medical & Biological Engineering & Computing Aims and scope Submit manuscript

Abstract

Diabetes mellitus has become a rapidly growing chronic health problem worldwide. There has been a noticeable increase in diabetes cases in the last two decades. Recent advances in ensemble machine learning methods play an important role in the early detection of diabetes mellitus. These methods are both faster and less costly than traditional methods. This study aims to propose a new super ensemble learning model to enable an early diagnosis of diabetes mellitus. Super learner is a cross-validation-based approach that makes better predictions by combining prediction results of more than one machine learning algorithm. The proposed super learner model was created with four base-learners (logistic regression, decision tree, random forest, gradient boosting) and a meta learner (support vector machines) as a result of a case study. Three different dataset were used to measure the robustness of the proposed model. Chi-square was determined as an optimal feature selection technique from five different techniques, and also hyper-parameter settings were made with GridSearch. Finally, the proposed new super learner model achieved to obtain the best accuracy results in the detection of Diabetes mellitus compared to the base-learners for the early-stage diabetes risk prediction (99.6%), PIMA (92%), and diabetes 130-US hospitals (98%) dataset, respectively. This study revealed that super learner algorithms can be effectively used in the detection of diabetes mellitus. Also, obtaining of the high and convincing statistical scores shows the robustness of the proposed super learner model.

Graphical Abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author.

References

  1. Federation ID (2021) “IDF Diabetes Atlas | Tenth Edition,” International Diabetes Federatio 2021. [Online]. Available: https://diabetesatlas.org/. [Accessed: 03-Nov-2021]

  2. Sreedharan J et al (2015) Incidence of type 2 diabetes mellitus among Emirati residents in Ajman, United Arab Emirates. Korean J Fam Med 36(5):253–257. https://doi.org/10.4082/KJFM.2015.36.5.253

    Article  PubMed  PubMed Central  Google Scholar 

  3. Ki R (2007) Diabetes treatment–bridging the divide. N Engl J Med 356(15):1499–1501. https://doi.org/10.1056/NEJMP078030

    Article  Google Scholar 

  4. Gavin JR et al (2003) Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care 26(SUPPL. 1):s5–s20. https://doi.org/10.2337/diacare.26.2007.s5

    Article  Google Scholar 

  5. Horton WB, Barrett EJ (2021) Microvascular Dysfunction in diabetes mellitus and cardiometabolic disease. Endocr Rev 42(1):29–55. https://doi.org/10.1210/ENDREV/BNAA025

    Article  PubMed  Google Scholar 

  6. Buyrukoğlu S, Yılmaz Y, Topalcengiz Z (2022) “Correlation value determined to increase Salmonella prediction success of deep neural network for agricultural waters,” Environ Monit Assess 2022 1945 vol. 194, no. 5, pp. 1–12 April 2022. https://doi.org/10.1007/S10661-022-10050-7

  7. Savaş T, Savaş S (2021) “Tekdüzen Kaynak Bulucu Yoluyla Kimlik Avı Tespiti için Makine Öğrenmesi Algoritmalarının Özellik Tabanlı Performans Karşılaştırması,” Politek Derg pp. 1–1, Dec. 2021. https://doi.org/10.2339/POLITEKNIK.1035286

  8. Buyrukoglu S, Serkan S (2022) Stacked-based ensemble machine learning model for positioning footballer. Arab J Sci Eng 2022:1–13. https://doi.org/10.1007/S13369-022-06857-8

    Article  Google Scholar 

  9. Zou Q, Qu K, Luo Y, Yin D, Ju Y, Tang H (2018) “Predicting Diabetes mellitus with machine learning techniques,” Front Genet vol. 9, Nov. 2018. https://doi.org/10.3389/fgene.2018.00515.

  10. Shamreen Ahamed B, Sumeet Arya M (2021) “Prediction of type-2 diabetes using the LGBM classifier methods and techniques,” May 2021

  11. Emon MU, Keya MS, Kaiser MS, Islam MA, Tanha T, Zulfiker MS (2021) Primary stage of diabetes prediction using machine learning approaches. Proceed - Int Conference Artificial Intel Smart Syst ICAIS 2021:364–367. https://doi.org/10.1109/ICAIS50930.2021.9395968

    Article  Google Scholar 

  12. Buyrukoğlu S, Akbaş A (2022) Machine Learning based early prediction of type 2 diabetes: a new hybrid feature selection approach using correlation matrix with heatmap and SFS. Balk. J Electr Comput Eng 10(2):110–117. https://doi.org/10.17694/BAJECE.973129

    Article  Google Scholar 

  13. Taz NH, Islam A, Mahmud I (2021) “A comparative analysis of ensemble based machine learning techniques for diabetes ıdentification,” 2021, pp. 1–6. https://doi.org/10.1109/icrest51555.2021.9331036

  14. Yadav DC, Pal S (2021) An experimental study of diversity of diabetes disease features by bagging and boosting ensemble method with rule based machine learning classifier algorithms. SN Comput Sci 2(1):50. https://doi.org/10.1007/s42979-020-00446-y

    Article  Google Scholar 

  15. Saxena S, Mohapatra D, Padhee S, Sahoo G K (2021) Machine learning algorithms for diabetes detection: a comparative evaluation of performance of algorithms. Evol. Intell 1:1–17. https://doi.org/10.1007/S12065-021-00685-9

    Article  Google Scholar 

  16. Google Colab, “Google Colab,” Getting Started - Introduction 2020. [Online]. Available: https://research.google.com/colaboratory/faq.html. [Accessed: 01-Nov-2022]

  17. Islam MMF, Ferdousi R, Rahman S, Bushra HY (2020) “UCI machine learning repository: early stage diabetes risk prediction dataset. Data Set,” [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset. [Accessed: 14-Oct-2021].

  18. Urdan TC (2010) “Statistics in plain english,” Stat Plain English p. 211, 2010, https://doi.org/10.4324/9781410612816

  19. Yadav S, Shukla S (2016) “Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification,” Proc - 6th Int Adv Comput Conf IACC 2016, pp. 78–83. https://doi.org/10.1109/IACC.2016.25

  20. Van Der Laan MJ, Polley EC, Hubbard AE (2007) “Super learner,” Stat Appl Genet Mol Biol 6. 1. https://doi.org/10.2202/1544-6115.1309

  21. Kabir MF, Ludwig SA (2019) Enhancing the performance of classification using super learning. Data-Enabled Discov Appl 3(1):1–13. https://doi.org/10.1007/s41688-019-0030-0

    Article  Google Scholar 

  22. Perveen S, Shahbaz M, Guergachi A, Keshavjee K (2016) Performance analysis of data mining classification techniques to predict diabetes. Procedia Comput Sci 82:115–121. https://doi.org/10.1016/j.procs.2016.04.016

    Article  Google Scholar 

  23. Phillips RC, van der Laan MJ, Lee H, Gruber S (2022) “Practical considerations for specifying a super learner,” arXiv, p. arXiv:2204.06139, Apr. 2022

  24. Hosmer DW, Lemeshow S, Sturdivant RX (2013) “Applied logistic regression: third edition,” Appl Logist Regres Third Ed., pp. 1–510, https://doi.org/10.1002/9781118548387

  25. Mason L, Baxter J, Bartlett P, Frean M (2000) “Boosting algorithms as gradient descent,” Adv Neural Inf Process Syst pp. 512–518

  26. Quilan JR (1988) Decision trees and multi-valued attributes | Machine intelligence 11

  27. Rokach L, Maimon O (2006) “Decision trees,” in Data Mining and Knowledge Discovery Handbook, Springer-Verlag 165–192

  28. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  29. Liaw A, Wiener M (2002) “Classification and regression by randomForest,”

  30. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1007/bf00994018

    Article  Google Scholar 

  31. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874. https://doi.org/10.1016/j.patrec.2005.10.010

    Article  Google Scholar 

  32. Sadhu A, Jadli A (2021) “This work is licensed under a Creative Commons Attribution 4.0 International License Early-Stage Diabetes Risk Prediction: A Comparative Analysis of Classification Algorithms,” Int Adv Res J Sci Eng Technol 2 (8). https://doi.org/10.17148/IARJSET.2021.8228

  33. Alpan K, ılgi GS (2020) “classification of diabetes dataset with data mining techniques by using weka approach,” 4th Int Symp Multidiscip Stud Innov Technol ISMSIT 2020 - Proc. https://doi.org/10.1109/ISMSIT50672.2020.9254720.

  34. Xue J, Min F, Ma F (2020) “Research on diabetes prediction method based on machine learning,” J Phys Conf Ser Pap • OPEN ACCESS J Phys Conf Ser (1684) 12062. https://doi.org/10.1088/1742-6596/1684/1/012062

  35. Ozer I (2020) “Uzun Kısa Dönem Bellek Ağlarını Kullanarak Erken Aşama Diyabet Tahmini early-stage diabetes prediction using long short-term memory networks. Müh Bil ve Araş Derg 2(2):50–57

    Article  Google Scholar 

  36. Kumari S, Kumar D, Mittal M (2021) An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cogn Comput Eng 2:40–46. https://doi.org/10.1016/j.ijcce.2021.01.001

    Article  Google Scholar 

  37. Akula R, Nguyen N, Garibay I (2019) “Supervised machine learning based ensemble model for accurate prediction of type 2 diabetes,” in Conference Proceedings - IEEE SOUTHEASTCON, vol. 2019-April, https://doi.org/10.1109/SoutheastCon42311.2019.9020358

  38. Lai H, Huang H, Keshavjee K, Guergachi A, Gao X (2019) “Predictive models for diabetes mellitus using machine learning techniques,” BMC Endocr Disord 19 1. https://doi.org/10.1186/s12902-019-0436-6

  39. Birjais R, Mourya AK, Chauhan R, Kaur H (2019) Prediction and diagnosis of future diabetes risk: a machine learning approach. SN Appl Sci 1(9):1–8. https://doi.org/10.1007/s42452-019-1117-9

    Article  Google Scholar 

  40. Hammoudeh A, Al-Naymat G, Ghannam I, Obied N (2018) Predicting hospital readmission among diabetics using deep learning. Procedia Comput Sci 141:484–489. https://doi.org/10.1016/j.procs.2018.10.138

    Article  Google Scholar 

  41. Steinhardt J (2018) “Robust learnıng: ınformatıon theory and algorıthms A Dıssertatıon Submıtted To The Department Of Computer Scıence And The Commıttee On Graduate Studıes Of Stanford Unıversıty In Partıal Fulfıllment Of The Requırements For The Degree Of Doctor Of Phılosophy,”

  42. Li JZ (2018) “Principled approaches to robust machine learning and beyond,”

Download references

Acknowledgements

The authors gratefully acknowledge partial support of the Faculties of Engineering at Çankırı Karatekin University.

Author information

Authors and Affiliations

Authors

Contributions

Publicly available data were used in this study. Conceptualization, formal analysis, methodology, and writing – original draft was performed by Ayşe Doğru, Selim Buyrukoğlu, and Murat Arı. Resources, software, supervision, writing – review, and editing were organized by Ayşe Doğru, Selim Buyrukoğlu, and Murat Arı.

Corresponding author

Correspondence to Selim Buyrukoğlu.

Ethics declarations

Conflict of ınterest

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Doğru, A., Buyrukoğlu, S. & Arı, M. A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Med Biol Eng Comput 61, 785–797 (2023). https://doi.org/10.1007/s11517-022-02749-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11517-022-02749-z

Keywords

Navigation