skip to main content
10.1145/3639592.3639595acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaicccConference Proceedingsconference-collections
research-article

The effect of Data Augmentation Using SMOTE: Diabetes Prediction by Machine Learning Techniques

Authors Info & Claims
Published:13 April 2024Publication History

ABSTRACT

Diabetes mellitus, a severe and enduring condition characterized by impaired glucose metabolism, poses a substantial threat to public health. Its pervasive impact continues to escalate globally, with a rising incidence that challenges preventive measures. Despite earnest efforts, individuals struggle to evade the clutches of diabetes, necessitating innovative approaches for disease management. Traditional methodologies in diabetes health monitoring exhibit limitations, prompting the exploration of advanced techniques. This study employs machine learning (ML) methods to delve into diabetes, aiming to enhance diagnostic accuracy. The primary objective is to develop a method capable of precise diabetes diagnoses with a heightened level of precision. The investigation incorporates machine learning algorithms, specifically Random Forest (RF), K Nearest Neighbor (KNN), and Logistic Regression. The inclusion of these algorithms seeks to streamline data processing times.Notably, this study incorporates the Synthetic Minority Over-sampling Technique (SMOTE) as a data augmentation strategy. SMOTE addresses imbalances in the dataset, contributing to a more robust and representative sample. The research evaluates the effectiveness and accuracy of diabetes prediction using these algorithms both before and after SMOTE implementation. By considering the impact of SMOTE, the study aims to determine the optimal algorithm for assessing diabetes development. The comparative analysis sheds light on how SMOTE enhances the overall performance of machine learning models. This nuanced approach not only refines diabetes diagnostic protocols but also underscores the significance of addressing data imbalances in predictive modeling for enhanced precision in disease prediction.

References

  1. Kayaer, K., & Yildirim, T. (2003, June). Medical diagnosis on Pima Indian diabetes using general regression neural networks. In Proceedings of the international conference on artificial neural networks and neural information processing (ICANN/ICONIP) (Vol. 181, p. 184).Google ScholarGoogle Scholar
  2. Christobel, Y. A., & Sivaprakasam, P. (2013). A new classwise k nearest neighbor (CKNN) method for the classification of diabetes dataset. International Journal of Engineering and Advanced Technology, 2(3), 396-200.‏Google ScholarGoogle Scholar
  3. Farahmandian, M., Lotfi, Y., & Maleki, I. (2015). Data mining algorithms application in diabetes diseases diagnosis: A case study. vol, 3, 989-997.‏Google ScholarGoogle Scholar
  4. Alauthman M, Al-qerem A, Sowan B, Alsarhan A, Eshtay M, Aldweesh A, Aslam N. Enhancing Small Medical Dataset Classification Performance Using GAN. Informatics. 2023; 10(1):28.Google ScholarGoogle Scholar
  5. Alauthman M, Aldweesh A, Al-qerem A, Aburub F, Al-Smadi Y, Abaker AM, Alzubi OR, Alzubi B. Tabular Data Generation to Improve Classification of Liver Disease Diagnosis. Applied Sciences. 2023; 13(4):2678.‏Google ScholarGoogle Scholar
  6. Panda, M., Mishra, D. P., Patro, S. M., & Salkuti, S. R. (2022). Prediction of diabetes disease using machine learning algorithms. IAES International Journal of Artificial Intelligence, 11(1), 284.‏Google ScholarGoogle Scholar
  7. Sharma, A., Guleria, K., & Goyal, N. (2021). Prediction of diabetes disease using machine learning model. In International Conference on Communication, Computing and Electronics Systems: Proceedings of ICCCES 2020 (pp. 683-692). Springer Singapore.‏Google ScholarGoogle ScholarCross RefCross Ref
  8. Maniruzzaman, M., Kumar, N., Abedin, M. M., Islam, M. S., Suri, H. S., El-Baz, A. S., & Suri, J. S. (2017). Comparative approaches for classification of diabetes mellitus data: Machine learning paradigm. Computer methods and programs in biomedicine, 152, 23-34.‏Google ScholarGoogle Scholar
  9. Pham, B. T., Bui, D. T., Prakash, I., & Dholakia, M. B. (2017). Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena, 149, 52-63.‏Google ScholarGoogle ScholarCross RefCross Ref
  10. T.Mitchell, Machine Learning, McGrawHill, New York, 1997Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Herron P., “Machine Learning for Medical Decision Support: Evaluating Diagnostic Performance of Machine Learning Classification Algorithms”, INLS 110, Data Mining, 2004Google ScholarGoogle Scholar
  12. H. Wu, S. Yang, Z. Huang, J. He, and X. Wang, “Type 2 diabetes mellitus prediction model based on data mining,” Informatics in Medicine Unlocked, vol. 10, pp. 100–107, 2018, doi: 10.1016/j.imu.2017.12.006.Google ScholarGoogle ScholarCross RefCross Ref
  13. A. B. Olokoba, O. A. Obateru, and L. B. Olokoba, “Type 2 diabetes mellitus: a review of current trends,” Oman Medical Journal, vol. 27, no. 4, pp. 269–273, Jul. 2012, doi: 10.5001/omj.2012.68.Google ScholarGoogle ScholarCross RefCross Ref
  14. T. Zheng , “A machine learning-based framework to identify type 2 diabetes through electronic health records,” International Journal of Medical Informatics, vol. 97, pp. 120–127, Jan. 2017, doi: 10.1016/j.ijmedinf.2016.09.014.Google ScholarGoogle ScholarCross RefCross Ref
  15. Kim, S. J., Bae, S. J., & Jang, M. W. (2022). Linear Regression Machine Learning Algorithms for Estimating Reference Evapotranspiration Using Limited Climate Data. Sustainability, 14(18), 11674.‏Google ScholarGoogle ScholarCross RefCross Ref
  16. Anazi, M. M. A., & Shahin, O. R. (2022). A machine learning model for the identification of the holy Quran reciter utilizing k-nearest neighbor and artificial neural networks. Inf. Sci. Lett., 11(4), 1093-1102.‏Google ScholarGoogle ScholarCross RefCross Ref
  17. Elbasi, Ersin, and Aymen I. Zreikat. "Heart Disease Classification for Early Diagnosis based on Adaptive Hoeffding Tree Algorithm in IoMT Data." INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY 20.1 (2023): 38-48.‏Google ScholarGoogle Scholar

Index Terms

  1. The effect of Data Augmentation Using SMOTE: Diabetes Prediction by Machine Learning Techniques

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      AICCC '23: Proceedings of the 2023 6th Artificial Intelligence and Cloud Computing Conference
      December 2023
      280 pages
      ISBN:9798400716225
      DOI:10.1145/3639592

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 April 2024

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)12

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format