Skip to main content

Predicting Type 2 Diabetes Through Machine Learning: Performance Analysis in Balanced and Imbalanced Data

  • Conference paper
  • First Online:
Ubiquitous Networking (UNet 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 12845))

Included in the following conference series:

Abstract

Type 2 diabetes is a lifelong disease that causes a substantial increase of sugar (glucose) in the blood. Nowadays, diabetes type 2 is a major public worldwide health challenge. Therefore, it is necessary to automate the process of predicting diseases. The dataset used was the “PIMA Indians Diabetes Data Set”. This dataset is imbalanced. Consequently, the authors have randomly selected 268 cases from each class to create a new balanced dataset. The objective is to analyse the impact of imbalanced data for predicting diabetes type 2. Four different machine learning methods have been applied to the original and balanced dataset. Neural network, k-nearest neighbors, Logistic Regression, and AdaBoost have been implemented with 10-fold cross-validation. Detailed information concerning the proposed model’s parameters is presented. The results recommend the use of Neural Networks for predicting diabetes type 2. This method presents 71.4% and 82.3% of accuracy for the original and balanced dataset, respectively. Furthermore, the proposed method has been compared with other studies available in the state of the art. Neural Networks presented 85.9% for AUC, 82.2% for F1-Score, 82.6% for Precision, 82.3% for Recall/sensitivity and 77.6% for specificity when applied in balanced data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. “Diabetes” World Wealth Organization 8 June 2020, 27 December 2020. https://www.who.int/news-room/fact-sheets/detail/diabetes

  2. Osborn, C.O.: Type 1 and Type 2 diabetes: what’s the difference? Healthline 28 October 2020, 27 December 2020. https://www.healthline.com/health/difference-between-type-1-and-type 2-diabetes

  3. Stewart, C.: Prevalence of diabetes in adult population in Europe 2019, by country. Statista, 24 Jun 2020, 2 January 2021. https://www.statista.com/statistics/1081006/prevalence-of-diabetes-in-europe/

  4. Saeedi, P., et al.: Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas, 9th edn. Diabetes Research and Clinical Practice, vol. 157, p. 107843, November 2019, https://doi.org/10.1016/j.diabres.2019.107843

  5. Smith, J.W., et al.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Annual Symposium on Computer Application in Medical Care. American Medical Informatics Association (1988)

    Google Scholar 

  6. Rajni, B., Bagga, A.: RB-Bayes algorithm for the prediction of diabetic in Pima Indian dataset. Int. J. Electr. Comput. Eng. 9(6), 4866 (2019)

    Google Scholar 

  7. Sisodia, D., Sisodia, D.S.: Prediction of diabetes using classification algorithms. Proc. Comput. Sci. 132, 1578–1585 (2018)

    Article  Google Scholar 

  8. Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press, Piscataway (1988)

    Google Scholar 

  9. Sehly, R., Mezher, M.: Comparative analysis of classification models for pima dataset. In: 2020 International Conference on Computing and Information Technology (ICCIT-1441), Tabuk, Saudi Arabia, pp. 1–5 (2020). https://doi.org/10.1109/ICCIT-144147971.2020.9213821

  10. Kayaer, K., Yildirim, T.: Medical diagnosis on Pima Indian diabetes using general regression neural networks. In: Proceedings of the International Conference on Artificial Neural Networks and Neural Information Processing (2003)

    Google Scholar 

  11. Alpan, K.,, İlgi, G.S.: Classification of diabetes dataset with data mining techniques by using WEKA approach. In: 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), Istanbul, Turkey, pp. 1–7 (2020). https://doi.org/10.1109/ISMSIT50672.2020.9254720

  12. Zdravevski, E., et al.: Feature ranking based on information gain for large classification problems with mapreduce. In: 2015 IEEE Trustcom/BigDataSE/ISPA, vol. 2. IEEE (2015)

    Google Scholar 

  13. Demšar, J., et al.: Orange: data mining toolbox in Python. J. Mach. Learn. Res. 14(1), 2349–2353 (2013)

    MATH  Google Scholar 

  14. Kramer, O.: K-nearest neighbors. In: Dimensionality Reduction with Unsupervised Nearest Neighbors, pp. 13–23. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38652-7_2

  15. Kleinbaum, D.G., et al.: Logistic Regression. Springer, New York (2002). https://doi.org/10.1007/b97379

  16. Yegnanarayana, B.: Artificial Neural Networks. PHI Learning Pvt. Ltd. (2009)

    Google Scholar 

  17. Schapire, R.E.: Explaining AdaBoost. In: Schölkopf, B., Luo, Z., Vovk, V. (eds.) Empirical Inference, pp. 37–52. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41136-6_5

    Chapter  Google Scholar 

  18. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Books/Cole Advanced Boks & Software, Monterey (1984)

    Google Scholar 

  19. Chawla, N.V.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 875–886. Springer, Boston (2009). https://doi.org/10.1007/978-0-387-09823-4_45

  20. Wei Q, Dunbrack RL Jr. The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS ONE 8(7), e67863 (013). PMID: 23874456, PMCID: PMC3706434. https://doi.org/10.1371/journal.pone.0067863

Download references

Acknowledgements

This work was supported by the Polytechnic of Coimbra (ESTGOH). We thank Polytechnic of Coimbra (ESTGOH) for their continuous support in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gonçalo Marques .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mesquita, F., Marques, G. (2021). Predicting Type 2 Diabetes Through Machine Learning: Performance Analysis in Balanced and Imbalanced Data. In: Elbiaze, H., Sabir, E., Falcone, F., Sadik, M., Lasaulce, S., Ben Othman, J. (eds) Ubiquitous Networking. UNet 2021. Lecture Notes in Computer Science(), vol 12845. Springer, Cham. https://doi.org/10.1007/978-3-030-86356-2_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86356-2_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86355-5

  • Online ISBN: 978-3-030-86356-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics