Abstract
Coronary heart disease (CHD) is a leading cause of death globally, with over 382,000 deaths in the USA alone in 2020. The early detection of CHD is critical in reducing mortality rates. Artificial intelligence (AI) is a constantly evolving field of computer science that employs computational models to extract insights from past data and provide rapid and accurate predictions for future cases. This paper presents a novel approach that generates an augmented dataset by selectively duplicating misclassified instances during the leave-one-out cross-validation (CV) process to overfit a model. We used a paired machine learning model with an augmented dataset approach to evaluate several classifiers. The comprehensive heart disease dataset [1] served as our base dataset. Our approach achieved higher accuracy than the base dataset, with the bagged decision tree (DT) algorithm outperforming state-of-the-art models and achieving an accuracy of 97.1% in the 10-fold CV test. Further experiments using the Cleveland dataset and the same 10-fold CV test resulted in an even higher accuracy of 99.2%. Combining an augmented dataset and the bagged-DT algorithm holds great promise for early CHD prediction helping reduce CHD mortality rates. The use of AI in early CHD prediction could potentially make a difference between the life and death of the patient.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets are available for non-commercial use. Download from https://github.com/Abdulrakeeb/Heart-disease-dataset.
Notes
Heart Disease Facts, https://www.cdc.gov/heartdisease/facts.htm.
The report lists death per country for the latest available year.
After the repository holding the individual datasets, the UCI (UC Irvine) machine learning repository. The two sets of numbers refer to the number of instances and the number of features.
Available at https://scikit-learn.org/.
References
Siddhartha M. Heart disease dataset (comprehensive). IEEE Dataport. 2020. https://doi.org/10.21227/dz4t-cm36.
Wilkins E, Wilson L, Wickramasinghe K, Bhatnagar P, Leal J, Luengo-Fernandez R, et al. European cardiovascular disease statistics 2017. Brussels: European Heart Network; 2017.
Mackay J, Mensah G. The atlas of heart disease and stroke. Geneva: World Health Organization; 2004.
Virani SS, Alonso A, Aparicio HJ, Benjamin EJ, Bittencourt MS, Callaway CW, et al. Heart disease and stroke statistics-2021 update: A report from the American Heart Association. Circulation. 2021;143(8):e254–743.
Durairaj M, Revathi V. Prediction of heart disease using back propagation MLP algorithm. Int J Sci Technol Res. 2015;4(8):235–9.
Saxena K, Sharma R, et al. Efficient heart disease prediction system. Procedia Comput Sci. 2016;85:962–9.
Mohan S, Thirumalai C, Srivastava G. Effective heart disease prediction using hybrid machine learning techniques. IEEE Access. 2019;7:81542–54.
Kurt I, Ture M, Kurum AT. Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease. Expert Syst Appl. 2008;34(1):366–74.
Ambale-Venkatesh B, Yang X, Wu CO, Liu K, Hundley WG, McClelland R, et al. Cardiovascular event prediction by machine learning: the multi-ethnic study of atherosclerosis. Circ Res. 2017;121(9):1092–101.
Mahoto NA, Shaikh A, Sulaiman A, Al Reshan MS, Rajab A, Rajab K. A machine learning based data modeling for medical diagnosis. Biomed Signal Process Control. 2023;81:104481.
Mahmud M, Kaiser MS, McGinnity TM, Hussain A. Deep learning in mining biological data. Cogn Comput. 2021;13:1–33.
Han J, Kamber M, Pei J. Data mining: concepts and techniques. 3rd ed. 2012.
Theodoridis S. Machine Learning: a Bayesian and optimization perspective. 2nd ed. 2020.
Breiman L. Random forests. Mach Learn. 2001;45:5–32.
Latha CBC, Jeeva SC. Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlocked. 2019;16:100203.
Cummins RO, Hazinski MF. Guidelines based on fear of type II (false-negative) errors: why we dropped the pulse check for lay rescuers. Circulation. 2000;102(suppl_1):I–377.
Kahramanli H, Allahverdi N. Design of a hybrid system for the diabetes and heart diseases. Expert Syst Appl. 2008;35(1–2):82–9.
Das R, Turkoglu I, Sengur A. Effective diagnosis of heart disease through neural networks ensembles. Expert Syst Appl. 2009;36(4):7675–80.
Lahsasna A, Ainon RN, Zainuddin R, Bulgiba A. Design of a fuzzy-based decision support system for coronary heart disease diagnosis. J Med Syst. 2012;36(5):3293–306.
Shilaskar S, Ghatol A. Feature selection for medical diagnosis: Evaluation for cardiovascular diseases. Expert Syst Appl. 2013;40(10):4146–53.
Verma L, Srivastava S, Negi P. A hybrid data mining model to predict coronary artery disease cases using non-invasive clinical data. J Med Syst. 2016;40(7):1–7.
Hassan N, Sayed OR, Khalil AM, Ghany MA. Fuzzy soft expert system in prediction of coronary artery disease. Int J Fuzzy Syst. 2017;19(5):1546–59.
Uyar K, İlhan A. Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks. Procedia Comput Sci. 2017;120:588–93.
Samuel OW, Asogbon GM, Sangaiah AK, Fang P, Li G. An integrated decision support system based on ANN and Fuzzy_AHP for heart failure risk prediction. Expert Syst Appl. 2017;68:163–72.
Paul AK, Shill PC, Rabin MRI, Murase K. Adaptive weighted fuzzy rule-based system for the risk level assessment of heart disease. Appl Intell. 2018;48(7):1739–56.
Pouriyeh S, Vahid S, Sannino G, DePietro G, Arabnia H, Gutierrez J. A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: IEEE symposium on computers and communications (ISCC); 2017. p. 204–207.
Alkeshuosh AH, Moghadam MZ, AlMansoori I, Abdar M. Using PSO algorithm for producing best rules in diagnosis of heart disease. In: International Conference on Computer and Applications (ICCA); 2017. p. 306–311.
Haq AU, Li JP, Memon MH, Nazir S, Sun R. A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms. Mob Inf Syst. 2018;2018:3860146.
Dutta A, Batabyal T, Basu M, Acton ST. An efficient convolutional neural network for coronary heart disease prediction. Expert Syst Appl. 2020;159:113408.
Almustafa KM. Prediction of heart disease and classifiers’ sensitivity analysis. BMC Bioinf. 2020;21(1):1–18.
Valarmathi R, Sheela T. Heart disease prediction using hyper parameter optimization (HPO) tuning. Biomed Signal Process Control. 2021;70:103033.
Al-Ssulami AM, Mathkour H. Faster string matching based on hashing and bit-parallelism. Inf Process Lett. 2017;123:51–5.
Al-Ssulami AM, Azmi AM, Mathkour H, Aboalsamh H. LsHASHq: A string matching algorithm exploiting longer q-gram shifting. Inf Process Manag. 2022;59(5):103057.
Rosen KH. Discrete mathematics and its applications (7th edition). McGraw-Hill Companies, Inc.; 2011.
Breiman L. Bagging predictors. Mach Learn. 1996;24:123–40.
Rajendran R, Karthi A. Heart disease prediction using entropy based feature engineering and ensembling of machine learning classifiers. Expert Syst Appl. 2022;207:117882.
Tiwari A, Chugh A, Sharma A. Ensemble framework for cardiovascular disease prediction. Comput Biol Med. 2022;146:105624.
Budholiya K, Shrivastava SK, Sharma V. An optimized XGBoost based diagnostic system for effective prediction of heart disease. J King Saud Univ - Comput Inf Sci. 2022;34(7):4514–23.
Ayon SI, Islam MM, Hossain MR. Coronary artery heart disease prediction: A comparative study of computational intelligence techniques. IETE Journal of Research. 2020;p. 1–20.
Acknowledgements
The authors would like to thank the anonymous reviewers for their critical review of the paper.
Funding
The authors extend their appreciation to the Deputyship for Research & Innovation, Ministry of Education in Saudi Arabia for funding this research work through the project no. IFKSURG-2-23. The funding sponsors were not involved in any matters related to research.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human participants or animals performed by the authors.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Al-Ssulami, A.M., Alsorori, R.S., Azmi, A.M. et al. Improving Coronary Heart Disease Prediction Through Machine Learning and an Innovative Data Augmentation Technique. Cogn Comput 15, 1687–1702 (2023). https://doi.org/10.1007/s12559-023-10151-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-023-10151-6