skip to main content
10.1145/3542954.3542990acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccaConference Proceedingsconference-collections
research-article

Early-Stage Diabetes Prediction using Data Mining Algorithms

Authors Info & Claims
Published:11 August 2022Publication History

ABSTRACT

Diabetes is a very common disease nowadays. If not treated early diabetes can pose a profoundly serious health threat. Much research has been conducted to find out the optimal solution for diabetes detection by applying different data mining algorithms, where the dataset consists of different medicinal attributes. In this study, our aim is to examine whether diabetes can be detected at early-stage by applying different data mining algorithms to the non-medicinal dataset; as well as to investigate whether data normalization techniques can improve the classifiers accuracy.

Naive Bayes, K-Nearest Neighbor (KNN), Support Vector Machines (SVM), Decision Tree, Random Forest, and Gradient Boosting Classifier (GBC) algorithms are applied to the Early Stage Diabetes Risk Prediction Dataset in conjunction with Decimal Point Scaling, Z-Score Normalization, Pareto Scaling, Variable Stability Scaling, Min-Max normalization, Max normalization, Maximum Absolute Scaling, Mean Centered Scaling, Soft-max normalization, Power Transformer, Median and Median Absolute Deviation Normalization, Robust Scaling and Log Scaling normalization methods. In this experiment, we discovered that early-stage diabetes detection is possible without any medical diagnosis data. The result shows that GBC performs better compared to other classification algorithms in combination with data normalization and achieved an impressive 99.038% prediction accuracy.

References

  1. Vijiyarani S and Sudha S. 2013. Disease Prediction in Data Mining Technique – A Survey. International Journal of Computer Applications & Information Technology 2 (1).Google ScholarGoogle Scholar
  2. Mamatha Bai B.G., Nalini B.M. and Jharna Majumdar. 2019. Alalysis and Detection of Diabetes Using Data Mining Techniques – A Big Data Application in Health Care. Emerging Research in Computing, Information, Communication and Applications 882, 443-455. DOI: https://doi.org/10.1007/978-981-13-5953-8_37Google ScholarGoogle Scholar
  3. Robert A. Aronowitz. 2001. When Do Symptoms Become a Disease? Annals of Internal Medicine 134 (9), 803. DOI: https://doi.org/10.7326/0003-4819-134-9_part_2-200105011-00002Google ScholarGoogle Scholar
  4. Aiswarya Iyer, Jeyalatha S, and Ronak Sumbaly. 2015. Diagnosis of Diabetes Using Classification Mining Techniques. International Journal of Data Mining & Knowledge Management Process 5 (1), 01-14. DOI: https://doi.org/10.5121/ijdkp.2015.5101Google ScholarGoogle Scholar
  5. Gaganjot Kaur and Amit Chhabra. 2014. Improved J48 Classification Algorithm for the Prediction of Diabetes. International Journal of Computer Applications 98 (22), 13-17. DOI: https://doi.org/10.5120/17314-7433Google ScholarGoogle Scholar
  6. How does data mining help healthcare? cprimestudios.com. Retrieved December 28, 2021 from https://cprimestudios.com/blog/how-does-data-mining-help-healthcareGoogle ScholarGoogle Scholar
  7. Nesreen Samer El_Jerjawi and Samy S. Abu-Naser. 2018. Diabetes Prediction Using Artificial Neural Network. International Journal of Advanced Science and Technology 121, 55-64. DOI: http://dx.doi.org/10.14257/ijast.2018.121.05Google ScholarGoogle Scholar
  8. Diabetes - Health topics. Retrieved December 28, 2021 from https://www.who.int/health-topics/diabetesGoogle ScholarGoogle Scholar
  9. Patrick J. Lustman, Ray E. Clouse, and Robert M. Carney. 1989. Depression and the Reporting of Diabetes Symptoms. The International Journal of Psychiatry in Medicine 18 (4), 295-303. DOI: https://doi.org/10.2190/lw52-jfkm-jchv-j67xGoogle ScholarGoogle Scholar
  10. J. Pradeep Kandhasamy and S. Balamurali. 2015. Performance Analysis of Classifier Models to Predict Diabetes Mellitus. Procedia Computer Science 47, 45-51. DOI: https://doi.org/10.1016/j.procs.2015.03.182Google ScholarGoogle Scholar
  11. Yang Guo, Guohua Bai, and Yan Hu. 2012. Using Bayes Network for Prediction of Type-2 diabetes. 2012 International Conference for Internet Technology and Secured Transactions, 471-472.Google ScholarGoogle Scholar
  12. Srideivanai Nagarajan and R. M. Chandrasekaran. 2015. Design and Implementation of Expert Clinical System for Diagnosing Diabetes Using Data Mining Techniques. Indian Journal of Science and Technology 8 (8), 771. DOI: https://doi.org/10.17485/ijst/2015/v8i8/69272Google ScholarGoogle Scholar
  13. Huy Nguyen Anh Pham and Evangelos Triantaphyllou. 2008. Prediction of Diabetes by Employing a New Data Mining Approach Which Balances Fitting and Generalization. Computer and Information Science, 11-26. DOI: https://doi.org/10.1007/978-3-540-79187-4_2Google ScholarGoogle Scholar
  14. Jie Gao, J. Denzinger, and R.C. James. 2005. CoLe: A Cooperative Data Mining Approach and Its Application to Early Diabetes Detection. Fifth IEEE International Conference on Data Mining (ICDM’05), 4. DOI: https://doi.org/10.1109/icdm.2005.44Google ScholarGoogle Scholar
  15. Vrushali R. Balpande and Rakhi D. Wajgi. 2017. Prediction and severity estimation of diabetes using data mining technique. 2017 International Conference on Innovative Mechanisms for Industry Applications (ICIMIA), 576-580. DOI: https://doi.org/10.1109/icimia.2017.7975526Google ScholarGoogle Scholar
  16. Saranya C and Manikandan G. 2013. A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology (IJET) 5 (3), 2701-2704.Google ScholarGoogle Scholar
  17. Zohair Ihsan, Mohd Yazid Idris, and Abdul Hanan Abdullah. 2013. Attribute normalization techniques and performance of intrusion classifiers: A comparative analysis. Life Science Journal 10 (4), 2568-2576.Google ScholarGoogle Scholar
  18. RD Canlas. 2009. Data mining in healthcare: Current applications and issues. School of Information Systems & Management, Carnegie Mellon University, Australia.Google ScholarGoogle Scholar
  19. Sudhir M Gorade, Ankit Deo, and Preetesh Purohit. 2017. Early Identification of Diseases Based on Responsible Attribute Using Data Mining. International Research Journal of Engineering and Technology (IRJET) 4 (7).Google ScholarGoogle Scholar
  20. Amit Pandey and Achin Jain. 2017. Comparative Analysis of KNN Algorithm using Various Normalization Techniques. International Journal of Computer Network and Information Security 9 (11), 36-42. DOI: https://doi.org/10.5815/ijcnis.2017.11.04Google ScholarGoogle Scholar
  21. S Selvakumar, K. Senthamarai Kannan, and S. Gothai Nachiyar. 2017. Prediction of diabetes diagnosis using classification based data mining techniques. International Journal of Statistics and Systems 12 (2), 183-188Google ScholarGoogle Scholar
  22. Fikirte Girma Woldemichael and Sumitra Menaria. 2018. Prediction of Diabetes Using Data Mining Techniques. 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), 414-418. DOI: https://doi.org/10.1109/icoei.2018.8553959Google ScholarGoogle Scholar
  23. M M Faniqul Islam, Rahatara Ferdousi, Sadikur Rahman, and Humayra Yasmin Bushra. Early stage diabetes risk prediction dataset. UCI Machine Learning Repository. Retrieved December 28, 2021 from https://archive.ics.uci.edu/ml/datasets/Early+stage+diabetes+risk+prediction+dataset.Google ScholarGoogle Scholar
  24. Jiawei Han, Jian Pei, and Micheline Kamber. 2011. Data Mining: Concepts and Techniques (3rd ed.). Elsevier.Google ScholarGoogle Scholar
  25. Anil Jain, Karthik Nandakumar, and Arun Ross. 2005. Score normalization in multimodal biometric systems. Pattern Recognition 38 (12), 2270-2285. DOI: https://doi.org/10.1016/j.patcog.2005.01.012Google ScholarGoogle Scholar
  26. Keinosuke Fukunaga. 2013. Introduction to Statistical Pattern Recognition. Elsevier.Google ScholarGoogle Scholar
  27. Isao Noda. 2008. Scaling techniques to enhance two-dimensional correlation spectra. Journal of Molecular Structure 883, 216-227. DOI: https://doi.org/10.1016/j.molstruc.2007.12.026Google ScholarGoogle Scholar
  28. Lennart Eriksson, Joanna Jaworska, Andrew P Worth, Mark T D Cronin, Robert M McDowell, and Paola Gramatica. 2003. Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environmental Health Perspectives 111 (10), 1361-1375. DOI: https://doi.org/10.1289/ehp.5758Google ScholarGoogle Scholar
  29. Robert A Van den Berg, Huub CJ Hoefsloot, Johan A Westerhuis, Age K Smilde, and Mariët J Van der Werf. 2006. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics 7 (1). DOI: https://doi.org/10.1186/1471-2164-7-142Google ScholarGoogle Scholar
  30. Weijun Li and Zhenyu Liu. 2011. A method of SVM with Normalization in Intrusion Detection. Procedia Environmental Sciences 11, 256-262. DOI: https://doi.org/10.1016/j.proenv.2011.12.040Google ScholarGoogle Scholar
  31. Geoff Dougherty. 2012. Pattern Recognition and Classification. Springer Science & Business Media.Google ScholarGoogle Scholar
  32. Andrew Craig, Olivier Cloarec, Elaine Holmes, Jeremy K. Nicholson, and John C. Lindon. 2006. Scaling and Normalization Effects in NMR Spectroscopic Metabonomic Data Sets. Analytical Chemistry 78 (7), 2262-2267. DOI: https://doi.org/10.1021/ac0519312Google ScholarGoogle Scholar
  33. Olav M. Kvalheim, Frode. Brakstad, and Yizeng. Liang. 1994. Preprocessing of analytical profiles in the presence of homoscedastic or heteroscedastic noise. Analytical Chemistry 66 (1), 43-51. DOI: https://doi.org/10.1021/ac00073a010Google ScholarGoogle Scholar
  34. Jiaqi Pan, Yan Zhuang, and Simon Fong. 2016. The Impact of Data Normalization on Stock Market Prediction: Using SVM and Technical Indicators. Communications in Computer and Information Science, 72-88. DOI:https://doi.org/10.1007/978-981-10-2777-2_7Google ScholarGoogle Scholar
  35. How to Scale Data with Outliers for Machine Learning. Retrieved December 28, 2021 from https://machinelearningmastery.com/robust-scaler-transforms-for-machine-learning/Google ScholarGoogle Scholar
  36. Harry Zhang. 2004. The optimality of naive Bayes. AA 1 (2), 3.Google ScholarGoogle Scholar
  37. Oliver Kramer. 2013. Dimensionality Reduction with Unsupervised Nearest Neighbors. Springer Science & Business Media.Google ScholarGoogle Scholar
  38. Nello Cristianini and John Shawe-Taylor. 2000. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.Google ScholarGoogle Scholar
  39. Yan-Yan Song and LU Ying. 2015. Decision tree methods: applications for classification and prediction. Shanghai archives of psychiatry 27 (2), 130-135. DOI: https://doi.org/10.11919/j.issn.1002-0829.215044Google ScholarGoogle Scholar
  40. Gérard Biau and Erwan Scornet. 2016. A random forest guided tour. TEST 25 (2), 197-227. DOI: https://doi.org/10.1007/s11749-016-0481-7Google ScholarGoogle Scholar
  41. Navoneel Chakrabarty, Tuhin Kundu, Sudipta Dandapat, Apurba Sarkar, and Dipak Kumar Kole. 2018. Flight Arrival Delay Prediction Using Gradient Boosting Classifier. Advances in Intelligent Systems and Computing, 651-659. DOI:https://doi.org/10.1007/978-981-13-1498-8_57Google ScholarGoogle Scholar
  1. Early-Stage Diabetes Prediction using Data Mining Algorithms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        ICCA '22: Proceedings of the 2nd International Conference on Computing Advancements
        March 2022
        543 pages
        ISBN:9781450397346
        DOI:10.1145/3542954

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 August 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited
      • Article Metrics

        • Downloads (Last 12 months)71
        • Downloads (Last 6 weeks)9

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format