Abstract
The aim of this study is to predict, through data mining, the incidence of diabetes disease in the Pima Female Adult Population. Diabetes is a chronic disease that occurs either when the pancreas does not produce enough insulin or when the body cannot effectively use the insulin it produces and is a major cause of blindness, kidney failure, heart attacks, stroke and lower limb amputation. The information collected from this population combined with the data mining techniques, may help to detect earlier the presence of this decease. To achieve the best possible ML model, this work uses the CRISP-DM methodology and compares the results of five ML models (Logistic Regression, Naive Bayes, Random Forest, Gradient Boosted Trees and k-NN) obtained from two different datasets (originated from two different data preparation strategies). The study shows that the most promising model as k-NN, which produced results of 90% of accuracy and also 90% of F1 Score, in the most realistic evaluation scenario.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
Insulin is a hormone that regulates blood sugar.
References
Organization, W.H: Diabetes - Fact Sheet. https://www.who.int/en/news-room/fact-sheets/detail/diabetes. Accessed 05 June 2021
Aljumah, A.A., Ahamad, M.G., Siddiqui, M.K.: Application of data mining: diabetes health care in young and old patients. J. King Saud Univ. Comput. Inf. Sci. 25(2), 127–136 (2013). https://doi.org/10.1016/j.jksuci.2012.10.003, https://www.sciencedirect.com/science/article/pii/S1319157812000390
Witten, I.H., Frank, E., Hall, M.A.: Chapter 1 - what’s it all about? In: Witten, I.H., Frank, E., Hall, M.A. (eds.) Data Mining: Practical Machine Learning Tools and Techniques, pp. 3–38. 3rd edn. The Morgan Kaufmann Series in Data Management Systems, Morgan Kaufmann, Boston (2011). https://doi.org/10.1016/B978-0-12-374856-0.00001-8, https://www.sciencedirect.com/science/article/pii/B9780123748560000018
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2014 (2014)
Cruz, M., Esteves, M., Peixoto, H., Abelha, A., Machado, J.: Application of data mining for the prediction of prophylactic measures in patients at risk of deep vein thrombosis. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) New Knowledge in Information Systems and Technologies, pp. 557–567. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-16187-3_54
Konda, S., Rani, B., Govardhan, D.: Applications of data mining techniques in healthcare and prediction of heart attacks. Int. J. Comput. Sci. Eng. 2, 250–255 (2010)
Peixoto, H., et al.: Predicting postoperative complications for gastric cancer patients using data mining. In: Cortez, P., Magalhães, L., Branco, P., Portela, C.F., Adão, T. (eds.) Intelligent Technologies for Interactive Entertainment, pp. 37–46. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-16447-8_4
Loreto, P., Peixoto, H., Abelha, A., Machado, J.: Predicting low birth weight babies through data mining. In: Rocha, Á., Adeli, H., Reis, L.P., Costanzo, S. (eds.) New Knowledge in Information Systems and Technologies, pp. 568–577. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-16187-3_55
Silva, C., Oliveira, D., Peixoto, H., Machado, J., Abelha, A.: Data mining for prediction of length of stay of cardiovascular accident inpatients. In: Alexandrov, D.A., Boukhanovsky, A.V., Chugunov, A.V., Kabanov, Y., Koltsova, O. (eds.) Digital Transformation and Global Society, pp. 516–527. Springer International Publishing, Cham (2018). https://doi.org/10.1007/978-3-030-02843-5_43
Alpan, K., İlgi, G.S.: Classification of diabetes dataset with data mining techniques by using weka approach. In: 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–7 (2020). https://doi.org/10.1109/ISMSIT50672.2020.9254720
Wu, H., Yang, S., Huang, Z., He, J., Wang, X.: Type 2 diabetes mellitus prediction model based on data mining. Inf. Med. Unlock. 10, 100–107 (2018). https://doi.org/10.1016/j.imu.2017.12.006, https://www.sciencedirect.com/science/article/pii/S2352914817301405
Portela, F., Santos, M.F., Machado, J., Abelha, A., Rua, F., Silva, Á.: Real-time decision support using data mining to predict blood pressure critical events in intensive medicine patients. In: Bravo, J., Hervás, R., Villarreal, V. (eds.) Ambient Intelligence for Health, pp. 77–90. Springer International Publishing, Cham (2015). https://doi.org/10.1007/978-3-319-26508-7_8
Guide, I.S.M.C.D: ftp://ftp.software.ibm.com/software/analytics/spss/documentation/modeler/14.2/en.CRISP_DM.pdf (2011)
Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques with java implementations. ACM SIGMOD Rec. 31(1), 76–77 (2002)
Wirth, R., Hipp, J.: Crisp-dm: towards a standard process model for data mining. In: Proceedings of the 4th International Conference on the Practical Applications of Knowledge Discovery and Data Mining (2000)
Wu, X., Kumar, V., Quinlan, J.R., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Philip, S.Y., et al.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Hossin, M., Sulaiman, M.N.: A review on evaluation metrics for data classification evaluations. Int. J. Data Mining Knowl. Manage. Process 5(2), 01–11 (2015). https://doi.org/10.5121/ijdkp.2015.5201
Acknowledgements
This work is funded by “FCT-Fundação para a Ciência e Tecnologia” within the R&D Units Project Scope: UIDB/00319/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Marques, C., Ramos, V., Peixoto, H., Machado, J. (2022). Predicting Diabetes Disease in the Female Adult Population, Using Data Mining. In: Spinsante, S., Silva, B., Goleva, R. (eds) IoT Technologies for Health Care. HealthyIoT 2021. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 432. Springer, Cham. https://doi.org/10.1007/978-3-030-99197-5_6
Download citation
DOI: https://doi.org/10.1007/978-3-030-99197-5_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-99196-8
Online ISBN: 978-3-030-99197-5
eBook Packages: Computer ScienceComputer Science (R0)