Skip to main content

Advertisement

Log in

Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Diabetes mellitus is a well-known chronic disease that diminishes the insulin producing capability of the human body. This results in high blood sugar level which might lead to various complications such as eye damage, nerve damage, cardiovascular damage, kidney damage and stroke. Although diabetes has attracted huge research attention, the overall performance of such medical disease classification using machine learning techniques is relatively low, majorly due to existence of class imbalance and missing values in the data. In this paper, we propose a novel Prediction Model using Synthetic Minority Oversampling Technique, Genetic Algorithm and Decision Tree (PMSGD) for Classification of Diabetes Mellitus on Pima Indians Diabetes Database (PIDD) dataset. The framework of the proposed PMSGD prediction model is composed of four different layers. The first layer is the pre-processing layer which is responsible for handling missing values, detection of outlier and oversampling the minority class. In the second layer, the most significant features are selected using correlation and genetic algorithm. In the third layer, the proposed model is trained, and its effectiveness is evaluated in the fourth layer in terms of classification accuracy (CA), classification error (CE), precision, recall (sensitivity), measure (FM), and Area_Under_ROC (AUROC). The proposed PMSGD algorithm clearly outperforms its counterparts and achieves a remarkable accuracy of 82.1256%. The best outcome achieved by the proposed system in terms of CA, CE, precision, sensitivity, FM and AUROC is 82.1256%, 17.8744%, 0.8070%, 0.8598, 0.8326 and 0.8511, respectively. The obtained simulation results show the effectiveness and superiority of our proposed PMSGD model and their by reduced error rate to help in decision-making process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Amutha, A., Mohan, V.: Diabetes complications in childhood and adolescent onset type 2 diabetes—a review. J. Diabetes Complicat. 30(5), 951–957 (2016). https://doi.org/10.1016/j.jdiacomp.2016.02.009

    Article  Google Scholar 

  2. Domingueti, C.P., Dusse, L.M., Carvalho, M.D., Sousa, L.P., Gomes, K.B., Fernandes, A.P.: Diabetes mellitus: the linkage between oxidative stress, inflammation, hypercoagulability and vascular complications. J. Diabetes Complicat. 30(4), 738–745 (2016). https://doi.org/10.1016/j.jdiacomp.2015.12.018

    Article  Google Scholar 

  3. World health organization statistics on diabetes. http://www.who.int/mediacentre/factsheets/fs312/en/. Accessed 02 Mar 2020

  4. Pham, H.N., Triantaphyllou, E.: Prediction of diabetes by employing a new data mining approach which balances fitting and generalization. Comput. Inf. Sci. Stud. Comput. Intell. (2008). https://doi.org/10.1007/978-3-540-79187-4_2

    Article  Google Scholar 

  5. Wild, S., Roglic, G., Green, A., Sicree, R., King, H.: Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care 27(5), 1047–1053 (2004). https://doi.org/10.2337/diacare.27.5.1047

    Article  Google Scholar 

  6. Wang, X., Bi, D., Wang, S.: Fault recognition with labeled multi-category support vector machine. In: Third international conference on natural computation (ICNC 2007). https://doi.org/10.1109/icnc.2007.382(2007)

  7. Zhang, B., Wei, Z., Ren, J., Cheng, Y., Zheng, Z.: An empirical study on predicting blood pressure using classification and regression trees. IEEE Access 6, 21758–21768 (2018). https://doi.org/10.1109/access.2017.2787980

    Article  Google Scholar 

  8. Tejedor, M., Woldaregay, A.Z., Godtliebsen, F.: Reinforcement learning application in diabetes blood glucose control: A systematic review. Artif. Intell. Med. 104, 101836 (2020). https://doi.org/10.1016/j.artmed.2020.101836

    Article  Google Scholar 

  9. Pramanik, P.K., Solanki, A., Debnath, A., Nayyar, A., El-Sappagh, S., Kwak, K.: Advancing modern healthcare with nanotechnology, nanobiosensors, and internet of nano things: taxonomies, applications, architecture, and challenges. IEEE Access 8, 65230–65266 (2020). https://doi.org/10.1109/access.2020.2984269

    Article  Google Scholar 

  10. Nielsen, K.B., Lautrup, M.L., Andersen, J.K., Savarimuthu, T.R., Grauslund, J.: Deep learning-based algorithms in screening of diabetic retinopathy: a systematic review of diagnostic performance. Ophthalmology Retina 3(4), 294–304 (2019). https://doi.org/10.1016/j.oret.2018.10.014

    Article  Google Scholar 

  11. Remeseiro, B., Bolon-Canedo, V.: A review of feature selection methods in medical applications. Comput. Biol. Med. 112, 103375 (2019). https://doi.org/10.1016/j.compbiomed.2019.103375

    Article  Google Scholar 

  12. Santos, B.S., Steiner, M.T., Fenerich, A.T., Lima, R.H.: Data mining and machine learning techniques applied to public health problems: a bibliometric analysis from 2009 to 2018. Comput. Ind. Eng. 138, 106120 (2019). https://doi.org/10.1016/j.cie.2019.106120

    Article  Google Scholar 

  13. Rendón, E., Alejo, R., Castorena, C., Isidro-Ortega, F.J., Granda-Gutiérrez, E.E.: Data sampling methods to deal with the big data multi-class imbalance problem. Appl. Sci. 10(4), 1276 (2020). https://doi.org/10.3390/app10041276

    Article  Google Scholar 

  14. Kumar, A., Krishnamurthi, R., Nayyar, A., Sharma, K., Grover, V., Hossain, E.: A novel smart healthcare design, simulation, and implementation using healthcare 4.0 processes. IEEE Access 8, 118433–118471 (2020). https://doi.org/10.1109/access.2020.3004790

    Article  Google Scholar 

  15. Thabtah, F., Hammoud, S., Kamalov, F., Gonsalves, A.: Data imbalance in classification: experimental evaluation. Inf. Sci. 513, 429–441 (2020). https://doi.org/10.1016/j.ins.2019.11.004

    Article  MathSciNet  Google Scholar 

  16. Hu, T., Sung, S.Y.: Detecting pattern-based outliers. Pattern Recognit. Lett. 24(16), 3059–3068 (2003). https://doi.org/10.1016/s0167-8655(03)00165-x

    Article  Google Scholar 

  17. Maniruzzaman, M., Rahman, M.J., Al-Mehedihasan, M., Suri, H.S., Abedin, M.M., El-Baz, A., Suri, J.S.: Accurate diabetes risk stratification using machine learning: role of missing value and outliers. J. Med. Syst. (2018). https://doi.org/10.1007/s10916-018-0940-7

    Article  Google Scholar 

  18. Ijaz, M., Alfian, G., Syafrudin, M., Rhee, J.: Hybrid prediction model for type 2 diabetes and hypertension using DBSCAN-based outlier detection, synthetic minority over sampling technique (SMOTE), and random forest. Appl. Sci. 8(8), 1325 (2018). https://doi.org/10.3390/app8081325

    Article  Google Scholar 

  19. Shuja, M., Mittal, S., Zaman, M.: Effective prediction of type II diabetes mellitus using data mining classifiers and SMOTE. Adv. Comput. Intell. Syst. Algorithms Intell. Syst. (2020). https://doi.org/10.1007/978-981-15-0222-4_17

    Article  Google Scholar 

  20. Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus with machine learning techniques. Front. Genet. (2018). https://doi.org/10.3389/fgene.2018.00515

    Article  Google Scholar 

  21. Barakat, N., Bradley, A.P., Barakat, M.N.: intelligible support vector machines for diagnosis of diabetes mellitus. IEEE Trans. Inf Technol. Biomed. 14(4), 1114–1120 (2010). https://doi.org/10.1109/titb.2009.2039485

    Article  Google Scholar 

  22. Ganji, M.F., Abadeh, M.S.: A fuzzy classification system based on Ant Colony Optimization for diabetes disease diagnosis. Expert Syst. Appl. 38(12), 14650–14659 (2011). https://doi.org/10.1016/j.eswa.2011.05.018

    Article  Google Scholar 

  23. Karegowda, A.G., Manjunath, A., Jayaram, M.: Application of genetic algorithm optimized neural network connection weights for medical diagnosis of PIMA Indians diabetes. Int. J. Soft Comput. 2(2), 15–23 (2011). https://doi.org/10.5121/ijsc.2011.2202

    Article  Google Scholar 

  24. Aslam, M.W., Zhu, Z., Nandi, A.K.: Feature generation using genetic programming with comparative partner selection for diabetes classification. Expert Syst. Appl. 40(13), 5402–5412 (2013). https://doi.org/10.1016/j.eswa.2013.04.003

    Article  Google Scholar 

  25. Han, L., Luo, S., Yu, J., Pan, L., Chen, S.: Rule extraction from support vector machines using ensemble learning approach: an application for diagnosis of diabetes. IEEE J. Biomed. Health Inform. 19(2), 728–734 (2015). https://doi.org/10.1109/jbhi.2014.2325615

    Article  Google Scholar 

  26. Hayashi, Y., Yukita, S.: Rule extraction using recursive-rule extraction algorithm with J48graft combined with sampling selection techniques for the diagnosis of type 2 diabetes mellitus in the Pima Indian dataset. Inform. Med. Unlocked 2, 92–104 (2016). https://doi.org/10.1016/j.imu.2016.02.001

    Article  Google Scholar 

  27. Li, H., Wang, Y., Zhang, G.: Probabilistic fuzzy classification for stochastic data. IEEE Trans. Fuzzy Syst. 25(6), 1391–1402 (2017). https://doi.org/10.1109/tfuzz.2017.2687402

    Article  Google Scholar 

  28. Cheruku, R., Edla, D.R., Kuppili, V., Dharavath, R.: RST-BatMiner: a fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease. Appl. Soft Comput. 67, 764–780 (2018). https://doi.org/10.1016/j.asoc.2017.06.032

    Article  Google Scholar 

  29. Sharma, A.: Guided stochastic gradient descent algorithm for inconsistent datasets. Appl. Soft Comput. 73, 1068–1080 (2018). https://doi.org/10.1016/j.asoc.2018.09.038

    Article  Google Scholar 

  30. Wang, Q., Cao, W., Guo, J., Ren, J., Cheng, Y., Davis, D.N.: DMP_MI: an effective diabetes mellitus classification algorithm on imbalanced data with missing values. IEEE Access 7, 102232–102238 (2019). https://doi.org/10.1109/access.2019.2929866

    Article  Google Scholar 

  31. Ontiveros-Robles, E., Melin, P.: A hybrid design of shadowed type-2 fuzzy inference systems applied in diagnosis problems. Eng. Appl. Artif. Intell. 86, 43–55 (2019). https://doi.org/10.1016/j.engappai.2019.08.017

    Article  Google Scholar 

  32. Zhang, X., Jiang, Y., Hu, W., Wang, S.: A parallel ensemble fuzzy classifier for diabetes diagnosis. J. Med. Imaging Health Inform. 10(3), 544–551 (2020). https://doi.org/10.1166/jmihi.2020.2972

    Article  Google Scholar 

  33. Das, H., Naik, B., Behera, H.: Medical disease analysis using neuro-fuzzy with feature extraction model for classification. Inform. Med. Unlocked 18, 100288 (2020). https://doi.org/10.1016/j.imu.2019.100288

    Article  Google Scholar 

  34. Nnamoko, N., Korkontzelos, I.: Efficient treatment of outliers and class imbalance for diabetes prediction. Artif. Intell. Med. 104, 101815 (2020). https://doi.org/10.1016/j.artmed.2020.101815

    Article  Google Scholar 

  35. Ameena, R.R., Ashadevi, B.: Predictive analysis of diabetic women patients using R. Syst. Simul. Model. Cloud Comput. Big Data Appl. (2020). https://doi.org/10.1016/b978-0-12-819779-0.00006-x

    Article  Google Scholar 

  36. Tan, F.H., Hor, C.P., Lim, S.L., Tong, C.V., Hong, J.Y., Zain, F.M., Yeow, T.P.: Traditional and emerging cardiometabolic risk profiling among Asian youth with type 2 diabetes: a case-control study. Obes. Med. 18, 100206 (2020). https://doi.org/10.1016/j.obmed.2020.100206

    Article  Google Scholar 

  37. American Diabetes Association: Classification and diagnosis of diabetes: standards of medical care in diabetes—2020. Diabetes Care 43(Supplement 1), S14–S31 (2020). https://doi.org/10.2337/dc20-s002

    Article  Google Scholar 

  38. Heslinga, F.G., Pluim, J.P., Houben, A., Schram, M.T., Henry, R.M., Stehouwer, C.D., Veta, M.: Direct classification of type 2 diabetes from retinal fundus images in a population-based sample from The Maastricht Study. Med. Imaging 2020 Comput. Aided Diagn. (2020). https://doi.org/10.1117/12.2549574

    Article  Google Scholar 

  39. Albahli, S.: Type 2 machine learning: an effective hybrid prediction model for early type 2 diabetes detection. J. Med. Imaging Health Inform. 10(5), 1069–1075 (2020). https://doi.org/10.1166/jmihi.2020.3000

    Article  Google Scholar 

  40. Zhu, C., Idemudia, C.U., Feng, W.: Improved logistic regression model for diabetes prediction by integrating PCA and K-means techniques. Inform. Med. Unlocked 17, 100179 (2019). https://doi.org/10.1016/j.imu.2019.100179

    Article  Google Scholar 

  41. Alshamlan, H., Taleb, H. B., Sahow, A. A.: A gene prediction function for type 2 diabetes mellitus using logistic regression. In: 2020 11th International conference on information and communication systems (ICICS). https://doi.org/10.1109/icics49469.2020.239549 (2020)

  42. Lukmanto, R.B., Suharjito, N.A., Akbar, H.: Early detection of diabetes mellitus using feature selection and fuzzy support vector machine. Proc. Comput. Sci. 157, 46–54 (2019). https://doi.org/10.1016/j.procs.2019.08.140

    Article  Google Scholar 

  43. Tripathi, D., Manoj, I., Prasanth, G.R., Neeraja, K., Varma, M.K., Reddy, B.R.: Survey on classification and feature selection approaches for disease diagnosis. Emerg. Res. Data Eng. Syst. Comput. Commun. Adv. Intell. Syst. Comput. (2020). https://doi.org/10.1007/978-981-15-0135-7_52

    Article  Google Scholar 

  44. Dzulkalnine, M.F., Sallehuddin, R.: Missing data imputation with fuzzy feature selection for diabetes dataset. SN Appl. Sci. (2019). https://doi.org/10.1007/s42452-019-0383-x

    Article  Google Scholar 

  45. Zhou, M., Sun, S.D.: GA principle and application. National Defense industry press, Beijing (1999)

    Google Scholar 

  46. Mantawy, A., Abdel-Magid, Y., Selim, S.: Integrating genetic algorithms, tabu search, and simulated annealing for the unit commitment problem. IEEE Trans. Power Syst. 14(3), 829–836 (1999). https://doi.org/10.1109/59.780892

    Article  Google Scholar 

  47. Han, X., Dong, Y., Yue, L., Xu, Q.: State transition simulated annealing algorithm for discrete-continuous optimization problems. IEEE Access 7, 44391–44403 (2019). https://doi.org/10.1109/access.2019.2908961

    Article  Google Scholar 

  48. Hughes, G.: On the mean accuracy of statistical pattern recognizers. IEEE Trans. Inf. Theory 14(1), 55–63 (1968). https://doi.org/10.1109/tit.1968.1054102

    Article  Google Scholar 

  49. Abdel-Aal, R.: GMDH-based feature ranking and selection for improved classification of medical data. J. Biomed. Inform. 38(6), 456–468 (2005). https://doi.org/10.1016/j.jbi.2005.03.003

    Article  Google Scholar 

  50. Zaki, M.J., Meira, W., Jr.: Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press (2014)

    Book  Google Scholar 

  51. Sun, K., Likhate, S., Vittal, V., Kolluri, V.S., Mandal, S.: An online dynamic security assessment scheme using phasor measurements and decision trees. IEEE Trans. Power Syst. 22(4), 1935–1943 (2007). https://doi.org/10.1109/tpwrs.2007.908476

    Article  Google Scholar 

  52. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002). https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  53. Kaur, P., Kaur, R.: Comparative analysis of classification techniques for diagnosis of diabetes. Adv. Intell. Syst. Comput. Adv. Bioinform. Multimedia Electron. Circuits Signals (2019). https://doi.org/10.1007/978-981-15-0339-9_17

    Article  MATH  Google Scholar 

  54. Hemeida, A.M., Hassan, S.A., Mohamed, A.A.A., Alkhalaf, S., Mahmoud, M.M., Senjyu, T., El-Din, A.B.: Nature-inspired algorithms for feed-forward neural network classifiers: a survey of one decade of research. Ain Shams Eng. J. (2020). https://doi.org/10.1016/j.asej.2020.01.007

    Article  Google Scholar 

  55. Hasan, M.K., Alam, M.A., Das, D., Hossain, E., Hasan, M.: Diabetes prediction using ensembling of different machine learning classifiers. IEEE Access 8, 76516–76531 (2020). https://doi.org/10.1109/ACCESS.2020.2989857

    Article  Google Scholar 

  56. Tama, B.A., Rhee, K.: Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif. Intell. Rev. 51(3), 355–370 (2017). https://doi.org/10.1007/s10462-017-9565-3

    Article  Google Scholar 

  57. Rehman, A., Naz, S., Razzak, I.: Leveraging big data analytics in healthcare enhancement: trends, challenges and opportunities. Multimedia Syst. 21, 1–33 (2021)

    Google Scholar 

  58. Hossain, M.S., Muhammad, G., Alamri, A.: Smart healthcare monitoring: a voice pathology detection paradigm for smart cities. Multimedia Syst. 25(5), 565–575 (2019)

    Article  Google Scholar 

  59. Li, J., Zhang, B., Lu, G., You, J., Zhang, D.: Body surface feature-based multi-modal learning for diabetes mellitus detection. Inf. Sci. 472, 1–14 (2019)

    Article  Google Scholar 

  60. Tama, B.A., Rhee, K.H.: Tree-based classifier ensembles for early detection method of diabetes: an exploratory study. Artif. Intell. Rev. 51(3), 355–370 (2019)

    Article  Google Scholar 

  61. Islam, M.M., Rahman, M.J., Roy, D.C., Maniruzzaman, M.: Automated detection and classification of diabetes disease based on Bangladesh demographic and health survey data, 2011 using machine learning approach. Diabetes Metab. Syndr. 14(3), 217–219 (2020)

    Article  Google Scholar 

  62. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Proc. Comput. Sci. 132, 1578–1585 (2018). https://doi.org/10.1016/j.procs.2018.05.122

    Article  Google Scholar 

  63. Larabi-Marie-Sainte, A., Almohaini, R., Saba, T.: Current techniques for diabetes prediction: review and case study. Appl. Sci. 9(21), 4604 (2019). https://doi.org/10.3390/app9214604

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rohit Sharma.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Azad, C., Bhushan, B., Sharma, R. et al. Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus. Multimedia Systems 28, 1289–1307 (2022). https://doi.org/10.1007/s00530-021-00817-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-021-00817-2

Keywords

Navigation