Abstract
The influence of applied machine learning in our day-to-day life has seen significant improvement over the last few years. The use of machine learning in Artificial Intelligence to predict various aspects of human life has helped industries in knowledge discovery, to draw inferences and to ultimately increase the business aspects. In healthcare industry, when different machines which monitor various health parameters are increasingly getting connected, it is important to process the information and draw inferences which could be very helpful and easy for the doctors to prescribe medicines and to give advice on lifestyle changes. In this paper, disease progression of Diabetes Mellitus of 442 patients is analyzed in terms of various health parameters along with six related blood serum measurements. Here, optimized stacking method is used to perform both regression and classification. In regression, the quantitative measurement of disease progression is predicted where as in classification, the disease progression is classified into high progression or low progression category. In both cases, certain base models are chosen and the accuracy score of these base models are compared with the score of optimized stacking based ensemble model.Optimized Stacking has shown promising results in comparison with the individual methods. The method is also tested on standard datasets. The result validation is performed using a large dataset with 22 features and 70,692 records, which is used to predict the diabetic information of patients. It was found that the technique has performed well with all the datasets.This method can be used as a data analysis backbone of healthcare based IoT systems for predicting diabetic progression as well as for any other related applications.
Similar content being viewed by others
Data Availability
The authors hereby declare that the data used in this study is available in the public repository, the link for which is given in the manuscript.
Code Availability
The authors declare that as the study is part of Ph.D. work, the custom code is not made available in public.
Notes
The dataset is taken from the following link https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html, which is part of Sci-Kitlearn dataset library for machine learning
References
Abdollahi J, Nouri-Moghaddam B (2021) Hybrid stacked ensemble combined with genetic algorithms for Prediction of Diabetes arXiv:2103.08186
Akula R, Ni N, Garibay I (2017) Supervised machine learning based ensemble model for accurate prediction of type 2 diabetes, for disease control national diabetes statistics report
Alama F, Mehmoodb R, Katiba I, Albeshria A (2016) Analysis of eight data mining algorithms for smarter internet of things (IoT). In: International workshop on data mining in IoT systems, DaMIS
Alehegn M, Joshi RR, Mulay P (2019) Diabetes analysis and prediction using random forest, KNN, Naïve Bayes, and J48: an ensemble approach. Int J Sci Technol Res 8(09). (issn 2277-8616 1346, ijstr)
Ang Q, Liu Z, Wang W, Li K, Chen W-K (2010) Explored research on data preprocessing and mining technology for clinical data. 2nd IEEE international conference on information management and engineering
Christoph F, Maier KW, Rink C (2020) A greedy stacking algorithm for model ensembling and domain weighting. BMC research notes
Daliya VK, Ramesh TK, SEOK-BUM KO (2021) An optimised multivariable regression model for predictive analysis of diabetic disease progression. IEEE ACCESS
Daliya VK, Ramesh TK, Shashikanth A (2020) A machine learning based ensemble approach for predictive analysis of healthcare data, 2nd PhD colloquium on ethically driven innovation and technology for society (PhD EDITS)
Daskalaki E, Nørgaard K, Züger T, Prountzou A, Diem P, Mougiakakou S (2013) An early warning system for hypoglycemic/hyperglycemic events based on fusion of adaptive prediction models. J Diabetes Sci Technol 7(3):689–698. https://doi.org/10.1177/193229681300700314
Dreiseitla S, Ohno-Machadob L (2002) Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inf 35:352–359
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Annals Stat 32(2):407–451. https://doi.org/10.1214/009053604000000067c. Institute of mathematical statistics
Hamdi T, Ali JB, Costanzo VD, Fnaiech F, Moreau E, Ginoux J-M (2018) Accurate prediction of continuous blood glucose based on support vector regression and differential evolution algorithm. Biocybern Biomed Eng
He Y, Ding Y, Liang B, Lin J, Kim T-K, Yu H, Hang H, Wang K (2017) A systematic study of dysregulated microrna in type 2 diabetes mellitus. Int J Mol Sci 18:456. https://doi.org/10.3390/ijms18030456
Heureux AL’, Grolinger K, Elymany HF, Miriam AM (2017) Capretz :machine learning with big data:challenges and approaches. IEEE Access
Hu X, Zhang H, Mei H, Xiao D, Li Y, Li M (2020) Landslide susceptibility mapping using the stacking ensemble machine learning method in Lushui Southwest China. Appl Sci 10(11):4016. https://doi.org/10.3390/app10114016
Jangam E, Annavarapu CSR (2021) A stacked ensemble for the detection of COVID-19 with high recall and accuracy. Comput Biol Med 135:104608
Kalaiyarasi P, Suguna J (2020) Prediction of diabetic disease using ensemble classifier. Int J Psych Rehab 24(7)
Kumari S, Kumari D, Mitta M (2021) An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. Int J Cognit Comput Eng 2:40–46. https://doi.org/10.1016/j.ijcce.2021.01.001
Liu J, Wang L, Zhang L, Zhang Z, Zhang S (2020) Predictive analytics for blood glucose concentration: an empirical study using the tree-based ensemble approach. Library High Tech. https://doi.org/10.1108/lht-08-2019-0171
Liu Y, Ye S, Xiao X, Sun C, Wang G, Wang G, Zhang B (2019) Machine learning for tuning, selection, and ensemble of multiple risk scores for predicting type 2 diabetes. Risk Manag Healthcare Policy 12:189–198. https://doi.org/10.2147/RMHP.S225762
Mahdavinejad MS, Rezvan M, Barekatain M-M, Adibi P, Barnaghi P, Amit P (2018) Sheth machine learning for internet of things data analysis: a survey. Digital Commun Netw 4
MolinRibeiro MHD, Coelho LS (2020) Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput 86:105837
Nai-aruna N, Moungmaia R (2015) Comparison of classifiers for the risk of diabetes prediction, 7th international conference on advances in information technology
Nai-aruna N, Sittidechb P (2014) Ensemble learning model for diabetes classification. Adv Mater Res
Report of the expert committee on the diagnosis and classification of diabetes mellitus Medscape (2000) https://www.medscape.com/viewarticle/412642_4. Accessed on 1 Jul 2022
Shailaja K, Seetharamulu B, Jabbar MA (2018) Machine learning in healthcare: a review. In: Proceedings of the 2nd international conference on electronics, communication and aerospace technology, ICECA
Shanthamallu US, Spanias A, Tepedelenlioglu C, Stanley M (2017) A brief survey of machine learning methods and their sensor and IoT applications. In: 8th International conference on information intelligence systems and applications
Singh N, Singh P (2020) Stacking-based multi-objective evolutionary ensemble framework for prediction of diabetes mellitus. Biocybern Biomed Eng 40:1–22
Somannavar S, Ganesan A, Deepa M, Datta M, Mohan V (2009) Random capillary blood glucose cut points for diabetes and pre-diabetes derived from community-based opportunistic screening in India. Diabetes Care 32 (4):641–643. https://doi.org/10.2337/dc08-0403
Susairaj P, Snehalatha C, Raghavan A, Nanditha A, Vinitha R, Satheesh K, Johnston DG, Ramachandran NJWA (2019) Cut-off Value of Random Blood Glucose among Asian Indians for Preliminary Screening of Persons with Prediabetes and Undetected Type 2 Diabetes Defined by the Glycosylated Haemoglobin Criteria. J Diabetes Clinical Res 1(2):53–58. https://doi.org/10.33696/diabetes.1.009
Tama BA, Rhee K-H (2019) Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif Intell Rev 51:355–370
Wang Y, Wu X, Mo X (2013) A novel adaptive-weighted-average framework for blood glucose prediction. Diabetes Technol Ther 15(10):792–801
Woldaregaya AZ, Arsand E, Walderhaug S, Albers D, Mamykinad L, Botsise T, Hartvigsena G (2019) Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes. J Artif Intell Med
Zhiqiang GE, Song Z, Ding SX, Huang B (2019) Data mining and analytics in the process industry: the role of machine learning. IEEE Access. https://doi.org/10.1109/ACCESS.2017.2756872
Author information
Authors and Affiliations
Contributions
We hereby declare that both the authors have contributed equally towards the work carried out in this paper.
Corresponding author
Ethics declarations
Conflict of Interests/Competing Interests
The authors hereby declare that there is no conflict of interest/competing interest with regard to the article submitted.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
V. K., D., Ramesh, T.K. Optimized stacking ensemble models for the prediction of diabetic progression. Multimed Tools Appl 82, 42901–42925 (2023). https://doi.org/10.1007/s11042-023-14858-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-14858-4