Skip to main content

Advertisement

Log in

Improving Coronary Heart Disease Prediction Through Machine Learning and an Innovative Data Augmentation Technique

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Coronary heart disease (CHD) is a leading cause of death globally, with over 382,000 deaths in the USA alone in 2020. The early detection of CHD is critical in reducing mortality rates. Artificial intelligence (AI) is a constantly evolving field of computer science that employs computational models to extract insights from past data and provide rapid and accurate predictions for future cases. This paper presents a novel approach that generates an augmented dataset by selectively duplicating misclassified instances during the leave-one-out cross-validation (CV) process to overfit a model. We used a paired machine learning model with an augmented dataset approach to evaluate several classifiers. The comprehensive heart disease dataset [1] served as our base dataset. Our approach achieved higher accuracy than the base dataset, with the bagged decision tree (DT) algorithm outperforming state-of-the-art models and achieving an accuracy of 97.1% in the 10-fold CV test. Further experiments using the Cleveland dataset and the same 10-fold CV test resulted in an even higher accuracy of 99.2%. Combining an augmented dataset and the bagged-DT algorithm holds great promise for early CHD prediction helping reduce CHD mortality rates. The use of AI in early CHD prediction could potentially make a difference between the life and death of the patient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data Availability

The datasets are available for non-commercial use. Download from https://github.com/Abdulrakeeb/Heart-disease-dataset.

Notes

  1. Heart Disease Facts, https://www.cdc.gov/heartdisease/facts.htm.

  2. The report lists death per country for the latest available year.

  3. After the repository holding the individual datasets, the UCI (UC Irvine) machine learning repository. The two sets of numbers refer to the number of instances and the number of features.

  4. https://www.kaggle.com/johnsmith88/heart-disease-dataset.

  5. Downloaded from https://ieee-dataport.org/open-access/heart-disease-dataset-comprehensive.

  6. Available at https://scikit-learn.org/.

References

  1. Siddhartha M. Heart disease dataset (comprehensive). IEEE Dataport. 2020. https://doi.org/10.21227/dz4t-cm36.

  2. Wilkins E, Wilson L, Wickramasinghe K, Bhatnagar P, Leal J, Luengo-Fernandez R, et al. European cardiovascular disease statistics 2017. Brussels: European Heart Network; 2017.

    Google Scholar 

  3. Mackay J, Mensah G. The atlas of heart disease and stroke. Geneva: World Health Organization; 2004.

    Google Scholar 

  4. Virani SS, Alonso A, Aparicio HJ, Benjamin EJ, Bittencourt MS, Callaway CW, et al. Heart disease and stroke statistics-2021 update: A report from the American Heart Association. Circulation. 2021;143(8):e254–743.

    Article  Google Scholar 

  5. Durairaj M, Revathi V. Prediction of heart disease using back propagation MLP algorithm. Int J Sci Technol Res. 2015;4(8):235–9.

    Google Scholar 

  6. Saxena K, Sharma R, et al. Efficient heart disease prediction system. Procedia Comput Sci. 2016;85:962–9.

    Article  Google Scholar 

  7. Mohan S, Thirumalai C, Srivastava G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 2019;7:81542–54.

    Article  Google Scholar 

  8. Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl. 2008;34(1):366–74.

    Article  Google Scholar 

  9. Ambale-Venkatesh B, Yang X, Wu CO, Liu K, Hundley WG, McClelland R, et al. Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circ Res. 2017;121(9):1092–101.

    Article  Google Scholar 

  10. Mahoto NA, Shaikh A, Sulaiman A, Al Reshan MS, Rajab A, Rajab K. A machine learning based data modeling for medical diagnosis. Biomed Signal Process Control. 2023;81:104481.

  11. Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep learning in mining biological data. Cogn Comput. 2021;13:1–33.

    Article  Google Scholar 

  12. Han J, Kamber M, Pei J. Data mining: concepts and techniques. 3rd ed. 2012.

  13. Theodoridis S. Machine Learning: a Bayesian and optimization perspective. 2nd ed. 2020.

  14. Breiman L. Random forests. Mach Learn. 2001;45:5–32.

    Article  MATH  Google Scholar 

  15. Latha CBC, Jeeva SC. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked. 2019;16:100203.

  16. Cummins RO, Hazinski MF. Guidelines based on fear of type II (false-negative) errors: why we dropped the pulse check for lay rescuers. Circulation. 2000;102(suppl_1):I–377.

  17. Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl. 2008;35(1–2):82–9.

    Article  Google Scholar 

  18. Das R, Turkoglu I, Sengur A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst Appl. 2009;36(4):7675–80.

    Article  Google Scholar 

  19. Lahsasna A, Ainon RN, Zainuddin R, Bulgiba A. Design of a fuzzy-based decision support system for coronary heart disease diagnosis. J Med Syst. 2012;36(5):3293–306.

    Article  Google Scholar 

  20. Shilaskar S, Ghatol A. Feature selection for medical diagnosis: Evaluation for cardiovascular diseases. Expert Syst Appl. 2013;40(10):4146–53.

    Article  Google Scholar 

  21. Verma L, Srivastava S, Negi P. A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst. 2016;40(7):1–7.

    Article  Google Scholar 

  22. Hassan N, Sayed OR, Khalil AM, Ghany MA. Fuzzy soft expert system in prediction of coronary artery disease. Int J Fuzzy Syst. 2017;19(5):1546–59.

    Article  Google Scholar 

  23. Uyar K, İlhan A. Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks. Procedia Comput Sci. 2017;120:588–93.

    Article  Google Scholar 

  24. Samuel OW, Asogbon GM, Sangaiah AK, Fang P, Li G. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst Appl. 2017;68:163–72.

    Article  Google Scholar 

  25. Paul AK, Shill PC, Rabin MRI, Murase K. Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Appl Intell. 2018;48(7):1739–56.

    Article  Google Scholar 

  26. Pouriyeh S, Vahid S, Sannino G, DePietro G, Arabnia H, Gutierrez J. A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: IEEE symposium on computers and communications (ISCC); 2017. p. 204–207.

  27. Alkeshuosh AH, Moghadam MZ, AlMansoori I, Abdar M. Using PSO algorithm for producing best rules in diagnosis of heart disease. In: International Conference on Computer and Applications (ICCA); 2017. p. 306–311.

  28. Haq AU, Li JP, Memon MH, Nazir S, Sun R. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob Inf Syst. 2018;2018:3860146.

    Google Scholar 

  29. Dutta A, Batabyal T, Basu M, Acton ST. An efficient convolutional neural network for coronary heart disease prediction. Expert Syst Appl. 2020;159:113408.

  30. Almustafa KM. Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinf. 2020;21(1):1–18.

    Article  Google Scholar 

  31. Valarmathi R, Sheela T. Heart disease prediction using hyper parameter optimization (HPO) tuning. Biomed Signal Process Control. 2021;70:103033.

  32. Al-Ssulami AM, Mathkour H. Faster string matching based on hashing and bit-parallelism. Inf Process Lett. 2017;123:51–5.

    Article  MathSciNet  MATH  Google Scholar 

  33. Al-Ssulami AM, Azmi AM, Mathkour H, Aboalsamh H. LsHASHq: A string matching algorithm exploiting longer q-gram shifting. Inf Process Manag. 2022;59(5):103057.

  34. Rosen KH. Discrete mathematics and its applications (7th edition). McGraw-Hill Companies, Inc.; 2011.

  35. Breiman L. Bagging predictors. Mach Learn. 1996;24:123–40.

    Article  MATH  Google Scholar 

  36. Rajendran R, Karthi A. Heart disease prediction using entropy based feature engineering and ensembling of machine learning classifiers. Expert Syst Appl. 2022;207:117882.

  37. Tiwari A, Chugh A, Sharma A. Ensemble framework for cardiovascular disease prediction. Comput Biol Med. 2022;146:105624.

  38. Budholiya K, Shrivastava SK, Sharma V. An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ - Comput Inf Sci. 2022;34(7):4514–23.

    Google Scholar 

  39. Ayon SI, Islam MM, Hossain MR. Coronary artery heart disease prediction: A comparative study of computational intelligence techniques. IETE Journal of Research. 2020;p. 1–20.

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their critical review of the paper.

Funding

The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project no. IFKSURG-2-23. The funding sponsors were not involved in any matters related to research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aqil M. Azmi.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by the authors.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Ssulami, A.M., Alsorori, R.S., Azmi, A.M. et al. Improving Coronary Heart Disease Prediction Through Machine Learning and an Innovative Data Augmentation Technique. Cogn Comput 15, 1687–1702 (2023). https://doi.org/10.1007/s12559-023-10151-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-023-10151-6

Keywords

Navigation