Skip to main content

Advertisement

Log in

Performance analysis of supervised classification models on heart disease prediction

  • S.i. : Intelligence for Systems and Software Engineering
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

This paper presents a predictive analysis of data on heart disease patients to determine the possible risk factors associated with their heart disease status. Two independent (but similar) published heart disease datasets, the Cleveland data (used to build classification models) and the Statlog data (used for results’ validation), were considered for analysis. A detailed exploratory analysis using the Chi-square test of independence was performed on the Cleveland data after which ten standard classification models were trained for class prediction. The classification models were built by partitioning the Cleveland data randomly into 208 (70%) training samples and 89 (30%) test samples over 200 replications. Preliminary results showed that some of the bio-clinical categorical variables are strongly associated with the heart disease conditions of the patients (p < 0.001). The classification results from the test samples indicated that the support vector machine yielded the best predictive performances with 85% accuracy, 82% sensitivity, 88% specificity, 87% precision, 91% area under the ROC curve, and 38% log loss value. These results were validated on the Statlog data in tenfold cross-validation which were all consistent with those obtained from the Cleveland dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Libby P, Bonow RO, Mann DL, Tomaselli GF, Bhatt D, Solomon SD, Braunwald E (2021) Braunwald’s heart disease—E-book: a textbook of cardiovascular medicine. https://bit.ly/braunwald-heart-disease. Accessed 6 Nov 2022

  2. Gandhi M, Singh SN (2015) Predictions in heart disease using techniques of data mining. In: 2015 International conference on futuristic trends on computational analysis and knowledge management (ABLAZE), pp 520–525

  3. Hannah R, Max R (2018) Causes of death. Our World in Data. Retrieved from: https://ourworldindata.org/causes-of-death. Accessed 23 Feb 2022

  4. Murphy SL, Xu J, Kochanek KD, Arias E, Tejada-Vera B (2021) Deaths: Final Data for 2018. National vital statistics reports: from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 69(13), 1–83

  5. Fida B, Nazir M, Naveed N, Akram S (2011) Heart disease classification ensemble optimization using genetic algorithm. In: 2011 IEEE 14th international multitopic conference. IEEE, pp 19–24

  6. Anderson RN, Smith BL (2005) Deaths: leading causes for 2002. National vital statistics reports: from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 53(17), 1–89

  7. Nahar J, Imam T, Tickle KS, Chen Y-PP (2013) Computational intelligence for heart disease diagnosis: a medical knowledge driven approach. Expert Syst Appl 40:96–104. https://doi.org/10.1016/j.eswa.2012.07.032

    Article  Google Scholar 

  8. Dalen JE, Alpert JS, Goldberg RJ, Weinstein RS (2014) The epidemic of the 20th century: coronary heart disease. Am J Med 127:807–812. https://doi.org/10.1016/j.amjmed.2014.04.015

    Article  Google Scholar 

  9. Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707

    Article  Google Scholar 

  10. Dulhare U (2018) Prediction system for heart disease using Naive Bayes and particle swarm optimization. Biomed Res. https://doi.org/10.4066/biomedicalresearch.29-18-620

    Article  Google Scholar 

  11. Esfahani HA, Ghazanfari M (2017) Cardiovascular disease detection using a new ensemble classifier. In: 2017 IEEE 4th international conference on knowledge-based engineering and innovation (KBEI), pp 1011–1014

  12. Patel SB, Yadav PK, Shukla DP (2013) Predict the diagnosis of heart disease patients using classification mining techniques. IOSR J Agric Vet Sci (IOSR-JAVS) 4:61–64

    Google Scholar 

  13. Yahya WB, Rosenberg R, Ulm K (2014) Microarray-based classification of histopathologic responses of locally advanced rectal carcinomas to neoadjuvant radiochemotherapy treatment. Turk Klinikleri J Biostat 6:53–61

    Google Scholar 

  14. Pouriyeh S, Vahid S, Sannino G, et al (2017) A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: 2017 IEEE symposium on computers and communications (ISCC). IEEE, pp 204–207

  15. Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlock 16:100203. https://doi.org/10.1016/j.imu.2019.100203

    Article  Google Scholar 

  16. Ogundepo EA, Fokoué E (2019) An empirical demonstration of the no free lunch theorem. Math Appl 8:173–188. https://doi.org/10.13164/ma.2019.11

    Article  MathSciNet  MATH  Google Scholar 

  17. Janosi A, Steinbrunn W, Pfisterer M, Detrano R (1988) Heart disease data set. The UCI KDD Archive. https://archive.ics.uci.edu/ml/datasets/Heart+Disease. Accessed 02 Jan 2021

  18. Dua D, Graff C (2017) UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml

  19. Song Y-Y, Ying LU (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27:130. https://doi.org/10.11919/j.issn.1002-0829.215044

    Article  Google Scholar 

  20. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232

    Article  MathSciNet  MATH  Google Scholar 

  21. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794

  22. Strobl C, Zeileis A (2009) Party on!—a new, conditional variable importance measure for random forests available in party. https://www.r-project.org/conferences/useR-2009/slides/Strobl+Zeileis.pdf. Accessed 02 Jan 2021

  23. Hapfelmeier A, Babatunde W, Yahya RR, Ulm K (2012) 26 Predictive modeling of gene expression data. Handb Stat Clin Oncol 4:71. https://doi.org/10.1201/b11800-31

    Article  Google Scholar 

  24. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  25. Zou J, Han Y, So S-S (2008) Overview of artificial neural networks. Artif Neural Netw 2015:14–22

    Article  Google Scholar 

  26. Yahya WB, Oladiipo MO, Jolayemi ET (2012) A fast algorithm to construct neural networks classification models with high-dimensional genomic data. Ann Comput Sci Ser 10:39–58

    Google Scholar 

  27. Yahya WB, Ulm K, Ludwig F, Hapflemeir A (2011) K-SS: a sequential feature selection and prediction method in microarray study. Int J Artif Intell 6:19–47

    Google Scholar 

  28. Kouiroukidis N, Evangelidis G (2011) The effects of dimensionality curse in high dimensional knn search. In: 2011 15th panhellenic conference on informatics. IEEE, pp 41–45

  29. McLachlan GJ (2004) Discriminant analysis and statistical pattern recognition. Wiley, New York

    MATH  Google Scholar 

  30. Brownlee J (2016) Master Machine Learning Algorithms: Discover How They Work and Implement Them From Scratch. https://machinelearningmastery.com/master-machine-learning-algorithms

  31. Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Working draft, November 3

  32. Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874

    Article  Google Scholar 

  33. Tharwat A (2021) Classification assessment methods. Appl Comput Inform 17(1):168–192. https://doi.org/10.1016/j.aci.2018.08.003

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ezekiel Adebayo Ogundepo.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ogundepo, E.A., Yahya, W.B. Performance analysis of supervised classification models on heart disease prediction. Innovations Syst Softw Eng 19, 129–144 (2023). https://doi.org/10.1007/s11334-022-00524-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-022-00524-9

Keywords

Navigation