Skip to main content

Predicting Diabetes Disease in the Female Adult Population, Using Data Mining

  • Conference paper
  • First Online:
IoT Technologies for Health Care (HealthyIoT 2021)

Abstract

The aim of this study is to predict, through data mining, the incidence of diabetes disease in the Pima Female Adult Population. Diabetes is a chronic disease that occurs either when the pancreas does not produce enough insulin or when the body cannot effectively use the insulin it produces and is a major cause of blindness, kidney failure, heart attacks, stroke and lower limb amputation. The information collected from this population combined with the data mining techniques, may help to detect earlier the presence of this decease. To achieve the best possible ML model, this work uses the CRISP-DM methodology and compares the results of five ML models (Logistic Regression, Naive Bayes, Random Forest, Gradient Boosted Trees and k-NN) obtained from two different datasets (originated from two different data preparation strategies). The study shows that the most promising model as k-NN, which produced results of 90% of accuracy and also 90% of F1 Score, in the most realistic evaluation scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://rapidminer.com.

  2. 2.

    Insulin is a hormone that regulates blood sugar.

References

  1. Organization, W.H: Diabetes - Fact Sheet. https://www.who.int/en/news-room/fact-sheets/detail/diabetes. Accessed 05 June 2021

  2. Aljumah, A.A., Ahamad, M.G., Siddiqui, M.K.: Application of data mining: diabetes health care in young and old patients. J. King Saud Univ. Comput. Inf. Sci. 25(2), 127–136 (2013). https://doi.org/10.1016/j.jksuci.2012.10.003, https://www.sciencedirect.com/science/article/pii/S1319157812000390

  3. Witten, I.H., Frank, E., Hall, M.A.: Chapter 1 - what’s it all about? In: Witten, I.H., Frank, E., Hall, M.A. (eds.) Data Mining: Practical Machine Learning Tools and Techniques, pp. 3–38. 3rd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, Boston (2011). https://doi.org/10.1016/B978-0-12-374856-0.00001-8, https://www.sciencedirect.com/science/article/pii/B9780123748560000018

  4. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2014 (2014)

    Google Scholar 

  5. Cruz, M., Esteves, M., Peixoto, H., Abelha, A., Machado, J.: Application of data mining for the prediction of prophylactic measures in patients at risk of deep vein thrombosis. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) New Knowledge in Information Systems and Technologies, pp. 557–567. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-16187-3_54

  6. Konda, S., Rani, B., Govardhan, D.: Applications of data mining techniques in healthcare and prediction of heart attacks. Int. J. Comput. Sci. Eng. 2, 250–255 (2010)

    Google Scholar 

  7. Peixoto, H., et al.: Predicting postoperative complications for gastric cancer patients using data mining. In: Cortez, P., Magalhães, L., Branco, P., Portela, C.F., Adão, T. (eds.) Intelligent Technologies for Interactive Entertainment, pp. 37–46. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-16447-8_4

  8. Loreto, P., Peixoto, H., Abelha, A., Machado, J.: Predicting low birth weight babies through data mining. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) New Knowledge in Information Systems and Technologies, pp. 568–577. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-16187-3_55

  9. Silva, C., Oliveira, D., Peixoto, H., Machado, J., Abelha, A.: Data mining for prediction of length of stay of cardiovascular accident inpatients. In: Alexandrov, D.A., Boukhanovsky, A.V., Chugunov, A.V., Kabanov, Y., Koltsova, O. (eds.) Digital Transformation and Global Society, pp. 516–527. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-02843-5_43

  10. Alpan, K., İlgi, G.S.: Classification of diabetes dataset with data mining techniques by using weka approach. In: 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–7 (2020). https://doi.org/10.1109/ISMSIT50672.2020.9254720

  11. Wu, H., Yang, S., Huang, Z., He, J., Wang, X.: Type 2 diabetes mellitus prediction model based on data mining. Inf. Med. Unlock. 10, 100–107 (2018). https://doi.org/10.1016/j.imu.2017.12.006, https://www.sciencedirect.com/science/article/pii/S2352914817301405

  12. Portela, F., Santos, M.F., Machado, J., Abelha, A., Rua, F., Silva, Á.: Real-time decision support using data mining to predict blood pressure critical events in intensive medicine patients. In: Bravo, J., Hervás, R., Villarreal, V. (eds.) Ambient Intelligence for Health, pp. 77–90. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-26508-7_8

  13. Guide, I.S.M.C.D: ftp://ftp.software.ibm.com/software/analytics/spss/documentation/modeler/14.2/en.CRISP_DM.pdf (2011)

  14. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with java implementations. ACM SIGMOD Rec. 31(1), 76–77 (2002)

    Article  Google Scholar 

  15. Wirth, R., Hipp, J.: Crisp-dm: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining (2000)

    Google Scholar 

  16. Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)

    Article  Google Scholar 

  17. Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowl. Manage. Process 5(2), 01–11 (2015). https://doi.org/10.5121/ijdkp.2015.5201

Download references

Acknowledgements

This work is funded by “FCT-Fundação para a Ciência e Tecnologia” within the R&D Units Project Scope: UIDB/00319/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hugo Peixoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marques, C., Ramos, V., Peixoto, H., Machado, J. (2022). Predicting Diabetes Disease in the Female Adult Population, Using Data Mining. In: Spinsante, S., Silva, B., Goleva, R. (eds) IoT Technologies for Health Care. HealthyIoT 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 432. Springer, Cham. https://doi.org/10.1007/978-3-030-99197-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-99197-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-99196-8

  • Online ISBN: 978-3-030-99197-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics