skip to main content
research-article

eDiaPredict: An Ensemble-based Framework for Diabetes Prediction

Published: 14 June 2021 Publication History

Abstract

Medical systems incorporate modern computational intelligence in healthcare. Machine learning techniques are applied to predict the onset and reoccurrence of the disease, identify biomarkers for survivability analysis depending upon certain health conditions of the patient. Early prediction of diseases like diabetes is essential as the number of diabetic patients of all age groups is increasing rapidly. To identify underlying reasons for the onset of diabetes in its early stage has become a challenging task for medical practitioners. Continuously increasing diabetic patient data has necessitated for the applications of efficient machine learning algorithms, which learns from the trends of the underlying data and recognizes the critical conditions in patients. In this article, an ensemble-based framework named eDiaPredict is proposed. It uses ensemble modeling, which includes an ensemble of different machine learning algorithms comprising XGBoost, Random Forest, Support Vector Machine, Neural Network, and Decision tree to predict diabetes status among patients. The performance of eDiaPredict has been evaluated using various performance parameters like accuracy, sensitivity, specificity, Gini Index, precision, area under curve, area under convex hull, minimum error rate, and minimum weighted coefficient. The effectiveness of the proposed approach is shown by its application on the PIMA Indian diabetes dataset wherein an accuracy of 95% is achieved.

References

[1]
Chitra Jegan, V. Anuja Kumari, and R. Chitra. 2018. Classification of diabetes disease using support vectormachine. Int. J. Eng. Res. Appl. 3, 2 (2018), 1797–1801. Retrieved from https://www.researchgate.net/publication/320395340.
[2]
Parampreet Kaur, Neha Sharma, Ashima Singh, and Bob Gill. 2019. CI-DPF: A cloud IoT based framework for diabetes prediction. In Proceedings of the 2018 IEEE 9th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON’18), 654–660.
[3]
Kevin Plis, Razvan Bunescu, Cindy Marling, Jay Shubrook, and Frank Schwartz. 2014. A Machine Learning Approach to Predicting Blood Glucose Levels for Diabetes. AAAI Workshop Technical Report WS-14-08 (2014), 35–39.
[4]
Tao Zheng, Wei Xie, Liling Xu, Xiaoying He, Ya Zhang, Mingrong You, Gong Yang, and You Chen. 2017. A machine learning-based framework to identify type 2 diabetes through electronic health records. Int. J. Med. Inform 97, (2017) 120–127.
[5]
Ambika Choudhury and Deepak Gupta. 2019. Recent Developments in Machine Learning and Data Analytics. Springer Singapore.
[6]
Radia Belkeziz and Zahi Jarir. 2017. A survey on internet of things coordination. In Proceedings of the 2016 3rd International Conference on Systems of Collaboration (SysCo’16), 619–635.
[7]
M. S. Hossain. 2017. Cloud-supported cyber–physical localization framework for patients monitoring. IEEE Syst J. 11, 1 (2017), 118--127.
[8]
Usha Devi Gandhi, Priyan Malarvizhi Kumar, R. Varatharajan, Gunasekaran Manogaran, Revathi Sundarasekar, and Shreyas Kadu. 2018. HIoTPOT: Surveillance on IoT devices against recent threats. Wireless Pers. Commun. 103, 2 (2018), 1179–1194.
[9]
Quan Zou, Kaiyang Qu, Yamei Luo, Dehui Yin, Ying Ju, and Hua Tang. 2018. Predicting diabetes mellitus with machine learning techniques. Front. Genet. 9, (2018) 1–10.
[10]
V. Veena Vijayan and C. Anjali. 2016. Prediction and diagnosis of diabetes mellitus—A machine learning approach. In Proceedings of the 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS’15), 122–127.
[11]
S. U. Amin et al. 2019. Cognitive smart healthcare for pathology detection and monitoring. IEEE Access. 7 (2019), 10745--10753.
[12]
Khyati K. Gandhi and Nilesh B. Prajapati. 2014. Diabetes prediction using feature selection and classification. Int. J. Adv. Eng. Res. Dev 1, 05 (2014), 1–7.
[13]
Madhuri Panwar, Amit Acharyya, Rishad A. Shafik, and Dwaipayan Biswas. 2017. K-nearest neighbor based methodology for accurate diagnosis of diabetes mellitus. In Proceedings of the 2016 6th International Symposium on Embedded Computing and System Design (ISED’16), 132–136.
[14]
K. Sowjanya, Ayush Singhal, and Chaitali Choudhary. 2015. MobDBTest: A machine learning based system for predicting diabetes risk using mobile devices. In Proceedings of the Souvenir 2015 IEEE International Advanced Computing Conference (IACC’15), 397–402.
[15]
Emrana Kabir Hashi, Md Shahid Uz Zaman, and Md Rokibul Hasan. 2017. An expert clinical decision support system to predict disease using classification techniques. In Proceedings of the International Conference Electrical Computer and Communications Engineering ECCE 2017.(2017), 396–400.
[16]
H. Balaji, N. Ch. S. N. Iyengar, and Ronnie D. Caytiles. 2017. Optimal predictive analytics of pima diabetics using deep learning. Int. J. Database Theory Appl.10, 9 (2017), 47–62.
[17]
S. Srivastava, L. Sharma, V. Sharma, A. Kumar, A. and H. Darbari. 2019. Prediction of diabetes using artificial neural network approach. In Engineering Vibration, Communication and Information Processing. Springer, Singapore, 679–687.
[18]
Sajida Perveen, Muhammad Shahbaz, Aziz Guergachi, and Karim Keshavjee. 2016. Performance analysis of data mining classification techniques to predict diabetes. Proc. Comput. Sci. 82, (2016) 115–121.
[19]
Ayush Anand and Divya Shakti. 2016. Prediction of diabetes based on personal lifestyle indicators. In Proceedings of the 2015 1st International Conference on Next Generation Computing Technologies (NGCT’15), 673–676.
[20]
Shivani Jakhmola and Tribikram Pradhan. 2015. A computational approach of data smoothening and prediction of diabetes dataset. ACM Intnational Conference Proceeding Series, 744–748.
[21]
A. A. A. Jarullah. 2011. Decision tree discovery for the diagnosis of type II diabetes. In Proceedings of the 2011 International Conference on Innovations in Information Technology. IEEE.
[22]
Ahmed Hamza and Hani Moetque. 2017. Diabetes disease diagnosis method based on feature extraction using K-SVM. Int. J. Adv. Comput. Sci. Appl 8, 1 (2017), 236–244.
[23]
Mahmoud Heydari, Mehdi Teimouri, Zainabolhoda Heshmati, and Seyed Mohammad Alavinia. 2016. Comparison of various classification algorithms in the diagnosis of type 2 diabetes in Iran. Int. J. Diabetes Dev. Ctries. 36, 2 (2016), 167–173.
[24]
Messan Komi, Jun Li, Yongxin Zhai, and Zhang Xianguo. 2017. Application of data mining methods in diabetes prediction. In Proceedings of the 2nd International Conference on Image, Vision and Computing (ICIVC’17), 1006–1010.
[25]
A. Swain, S. N. Mohanty, and A. C. Das. 2016. Comparative risk analysis on prediction of diabetes mellitus using machine learning approach. In Proceedings of the 2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT’16).
[26]
N. Douali, J. Dollon, and M. Jaulent. 2015. Personalized prediction of gestational Diabetes using a clinical decision support system. In Proceedings of the IEEE International Conference on Fuzzy Systems (FUZZ-IEEE'15). 1--5.
[27]
Nitin Bhatia and Sangeet Kumar. 2015. Prediction of severity of diabetes mellitus using fuzzy cognitive maps. Life Sci. Adv. Tech. 29 (2015), 71–79.
[28]
Han Wu, Shengqi Yang, Zhangqin Huang, Jian He, and Xiaoyi Wang. 2018. Type 2 diabetes mellitus prediction model based on data mining. Informat. Med. Unlocked 10, (2018), 100–107.
[29]
Mehrbakhsh Nilashi, Othman bin Ibrahim, Hossein Ahmadi, and Leila Shahmoradi. 2017. An analytical method for diseases prediction using machine learning techniques. Comput. Chem. Eng. 106, (2017), 212–223.
[30]
WDBC. Retrieved 2019 from https://datahub.io/machine-learning/wdbc.
[31]
AdilHusain and Muneeb Khan. 2018. Early diabetes prediction using voting based ensemble learning. In Proceedings of the International Conference on Advances in Computing and Data Sciences, Springer, Singapore. 2018, 95–103.
[32]
S. Rasoul Safavian and David Landgrebe. 1991. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybernet. 21, 3 (1991), 660–674.
[33]
Mohamed Ahmed Ahmed, Ahmet Rizaner, and Hakan Ulusoy Ali. 2018. A novel decision tree classification based on post-pruning with Bayes minimum risk. PLoS One 13, 4 (2018), 1–12.
[34]
C. Cortes and V. Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (1995), 273–297.
[35]
M. S. Hossain, S. U. Amin, M. Alsulaiman, and G. Muhammad. 2019. Applying deep learning for epilepsy seizure detection and brain mapping visualization. ACM Trans. Multimed. Comput. Commun. Appl 15, 1 (2019), 1--17.
[36]
S. U. Amin et al. 2019. Deep Learning for EEG motor imagery classification based on multi-layer CNNs feature fusion. Future Gener Comput Syst. 101 (2019), 542--554.
[37]
What Is Correlation. Retrieved 2019 from https://www.displayr.com/what-is-correlation/.
[38]
Arwinder Dhillon, Ashima Singh 2019. Mach. Learn. Healthcare. 8, (July 2019), 92–109.
[39]
Diseases Conditions. Retrieved 2019 from https://www.mayoclinic.org/diseases-conditions/diabetes/diagnosis-treatment/drc-20371451.
[40]
Ensemble Learning to Improve Machine Learning Results. Retreived 2019 from https://blog.statsbot.co/ensemble-learning-d1dcd548e936.
[41]
Gestational Diabetes and Pregnancy. Retrieved 2019 from https://www.cdc.gov/pregnancy/diabetes-gestational.html.
[42]
How Does a Continuous Glucose Monitor Work? Retrieved 2019 from https://www.webmd.com/diabetes/guide/continuous-glucose-monitoring#1.
[43]
Decision Tree Classification in Python. Retrieved 2020 from https://www.datacamp.com/community/tutorials/decision-tree-classification-python.
[44]
Feature Selection Is Python—Recursive Feature Elimination. Retreived 2020 from https://towardsdatascience.com/feature-selection-in-python-recursive-feature-elimination-19f1c39b8d15.
[45]
M. Chen, J. Yang, L. Hu, M. S. Hossain, and G. Muhammad. 2018. Urban Healthcare Big Data System Based on Crowdsourced and Cloud-Based Air Quality Indicators. IEEE Commun. Mag. 56, 11 (2018), 14--20.
[46]
Gagangeet Singh Aujla, Anish Jindal, Rajat Chaudhary, Neeraj Kumar, Sahil Vashist, Neeraj Sharma, and Mohammad S. Obaidat. 2019. DLRS: Deep learning-based recommender system for smart healthcare ecosystem. In Proceedings of the IEEE International Conference on Communications.
[47]
Pratt. 2018. Anti-drug antibodies: emerging approaches to predict, reduce or reverse biotherapeutic immunogenicity. Antibodies 7, 2 (2018), 19.
[48]
Arwinder Dhillon and Ashima Singh. 2020. eBreCaP: Extreme learning based model for breast cancer survival prediction. IET Sys. Biol. (2020), 12.
[49]
Parampreet Kaur, Ashima Singh, and Inderveer Chana, 2021. Computational techniques and tools for omics data analysis: State-of-the-art, challenges, and future directions. Arch. Computat. Methods Eng. (2021).
[50]
G. Muhammad, M. S. Hossain, and N. Kumar. 2021. EEG-based pathology detection for home health monitoring. IEEE J. Sel. Areas Commun. 39, 2 (2021), 603--610.
[51]
Neha Sharma and Ashima Singh. 2018. Diabetes detection and prediction using machine learning/IoT: A survey. In Proceedings of the IEEE International Conference on Advanced Informatics for Computing Research, Springer, Singapore, (2018), 471–479.
[52]
Thinking Before Building: XGBoost Parallelization. Retreived 2020 from https://medium.com/blablacar-tech/thinking-before-building-xgboost-parallelization-f1a3f37b6e68.
[53]
Arwinder Dhillon, Ashima Singh, Harpreet Vohra, Caroline Ellis, Blesson Varghese, and Sukhpal Singh Gill. 2020. IoTPulse: Machine learning-based enterprise health information system to predict alcohol addiction in Punjab (India) using IoT and fog computing. Enter. Inform. Sys. (2020), 1–33.
[54]
How XGBoost Works. Retreived 2020 from https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost-HowItWorks.html.
[55]
PIMA INDIAN DIABETES. Retreived 2019 from https://www.kaggle.com/rnmehta5/pima-indian-diabetes-binary-classification.
[56]
Emsemble Methods. Retreived 2020 from https://www.toptal.com/machine-learning/ensemble-methods-machine-learning.

Cited By

View all
  • (2025)Improving the local diagnostic explanations of diabetes mellitus with the ensemble of label noise filtersInformation Fusion10.1016/j.inffus.2025.102928117(102928)Online publication date: May-2025
  • (2024)A systematic review on artificial intelligence approaches for smart health devicesPeerJ Computer Science10.7717/peerj-cs.223210(e2232)Online publication date: 21-Oct-2024
  • (2024)Predictive Analysis of Diabetes PredictionReal-World Applications of AI Innovation10.4018/979-8-3693-4252-7.ch004(61-84)Online publication date: 22-Nov-2024
  • Show More Cited By

Index Terms

  1. eDiaPredict: An Ensemble-based Framework for Diabetes Prediction

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Multimedia Computing, Communications, and Applications
    ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 17, Issue 2s
    June 2021
    349 pages
    ISSN:1551-6857
    EISSN:1551-6865
    DOI:10.1145/3465440
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 June 2021
    Accepted: 01 August 2020
    Revised: 01 July 2020
    Received: 01 January 2020
    Published in TOMM Volume 17, Issue 2s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Diabetes prediction, ensembled models
    2. XGBoost, decision tree, random forest

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • Vice Deanship of Scientific Research Chairs: Chair of Pervasive and Mobile Computing

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)95
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 17 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Improving the local diagnostic explanations of diabetes mellitus with the ensemble of label noise filtersInformation Fusion10.1016/j.inffus.2025.102928117(102928)Online publication date: May-2025
    • (2024)A systematic review on artificial intelligence approaches for smart health devicesPeerJ Computer Science10.7717/peerj-cs.223210(e2232)Online publication date: 21-Oct-2024
    • (2024)Predictive Analysis of Diabetes PredictionReal-World Applications of AI Innovation10.4018/979-8-3693-4252-7.ch004(61-84)Online publication date: 22-Nov-2024
    • (2024)Comparative analysis of features and classification techniques in breast cancer detection for Biglycan biomarker imagesCancer Biomarkers10.3233/CBM-23054440:3-4(263-273)Online publication date: 1-Jul-2024
    • (2024)Diabetes prediction model for unbalanced community follow-up data set based on optimal feature selection and scorecardDIGITAL HEALTH10.1177/2055207624123637010Online publication date: 29-Feb-2024
    • (2024)OptiANN-LR: Augmenting Diabetes Prediction Accuracy through Hyper Learning Rate Tuning in Optimized Artificial Neural Networks2024 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS)10.1109/SCEECS61402.2024.10481943(1-5)Online publication date: 24-Feb-2024
    • (2024)A Comprehensive Survey on Diabetes Forecasting Using ML2024 International Conference on Integrated Circuits, Communication, and Computing Systems (ICIC3S)10.1109/ICIC3S61846.2024.10603124(1-5)Online publication date: 8-Jun-2024
    • (2024)Investigating Gender and Age Variability in Diabetes Prediction: A Multi-Model Ensemble Learning ApproachIEEE Access10.1109/ACCESS.2024.340235012(71535-71554)Online publication date: 2024
    • (2024)Enhanced AI Based Diabetic Risk Prediction Using Feature Scaled Ensemble Learning Technique Based on Cloud ComputingSN Computer Science10.1007/s42979-024-03492-y5:8Online publication date: 4-Dec-2024
    • (2024)Denoising and segmentation in medical image analysis: A comprehensive review on machine learning and deep learning approachesMultimedia Tools and Applications10.1007/s11042-024-19313-6Online publication date: 17-May-2024
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media