Performance analysis of supervised classification models on heart disease prediction

Ogundepo, Ezekiel Adebayo; Yahya, Waheed Babatunde

doi:10.1007/s11334-022-00524-9

Performance analysis of supervised classification models on heart disease prediction

S.i. : Intelligence for Systems and Software Engineering
Published: 04 January 2023

Volume 19, pages 129–144, (2023)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

392 Accesses
5 Citations
2 Altmetric
Explore all metrics

Abstract

This paper presents a predictive analysis of data on heart disease patients to determine the possible risk factors associated with their heart disease status. Two independent (but similar) published heart disease datasets, the Cleveland data (used to build classification models) and the Statlog data (used for results’ validation), were considered for analysis. A detailed exploratory analysis using the Chi-square test of independence was performed on the Cleveland data after which ten standard classification models were trained for class prediction. The classification models were built by partitioning the Cleveland data randomly into 208 (70%) training samples and 89 (30%) test samples over 200 replications. Preliminary results showed that some of the bio-clinical categorical variables are strongly associated with the heart disease conditions of the patients (p < 0.001). The classification results from the test samples indicated that the support vector machine yielded the best predictive performances with 85% accuracy, 82% sensitivity, 88% specificity, 87% precision, 91% area under the ROC curve, and 38% log loss value. These results were validated on the Statlog data in tenfold cross-validation which were all consistent with those obtained from the Cleveland dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda

Article 13 January 2022

Heart Disease Prediction using Machine Learning Techniques

Article 16 October 2020

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Libby P, Bonow RO, Mann DL, Tomaselli GF, Bhatt D, Solomon SD, Braunwald E (2021) Braunwald’s heart disease—E-book: a textbook of cardiovascular medicine. https://bit.ly/braunwald-heart-disease. Accessed 6 Nov 2022
Gandhi M, Singh SN (2015) Predictions in heart disease using techniques of data mining. In: 2015 International conference on futuristic trends on computational analysis and knowledge management (ABLAZE), pp 520–525
Hannah R, Max R (2018) Causes of death. Our World in Data. Retrieved from: https://ourworldindata.org/causes-of-death. Accessed 23 Feb 2022
Murphy SL, Xu J, Kochanek KD, Arias E, Tejada-Vera B (2021) Deaths: Final Data for 2018. National vital statistics reports: from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 69(13), 1–83
Fida B, Nazir M, Naveed N, Akram S (2011) Heart disease classification ensemble optimization using genetic algorithm. In: 2011 IEEE 14th international multitopic conference. IEEE, pp 19–24
Anderson RN, Smith BL (2005) Deaths: leading causes for 2002. National vital statistics reports: from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System, 53(17), 1–89
Nahar J, Imam T, Tickle KS, Chen Y-PP (2013) Computational intelligence for heart disease diagnosis: a medical knowledge driven approach. Expert Syst Appl 40:96–104. https://doi.org/10.1016/j.eswa.2012.07.032
Article Google Scholar
Dalen JE, Alpert JS, Goldberg RJ, Weinstein RS (2014) The epidemic of the 20th century: coronary heart disease. Am J Med 127:807–812. https://doi.org/10.1016/j.amjmed.2014.04.015
Article Google Scholar
Mohan S, Thirumalai C, Srivastava G (2019) Effective heart disease prediction using hybrid machine learning techniques. IEEE Access 7:81542–81554. https://doi.org/10.1109/ACCESS.2019.2923707
Article Google Scholar
Dulhare U (2018) Prediction system for heart disease using Naive Bayes and particle swarm optimization. Biomed Res. https://doi.org/10.4066/biomedicalresearch.29-18-620
Article Google Scholar
Esfahani HA, Ghazanfari M (2017) Cardiovascular disease detection using a new ensemble classifier. In: 2017 IEEE 4th international conference on knowledge-based engineering and innovation (KBEI), pp 1011–1014
Patel SB, Yadav PK, Shukla DP (2013) Predict the diagnosis of heart disease patients using classification mining techniques. IOSR J Agric Vet Sci (IOSR-JAVS) 4:61–64
Google Scholar
Yahya WB, Rosenberg R, Ulm K (2014) Microarray-based classification of histopathologic responses of locally advanced rectal carcinomas to neoadjuvant radiochemotherapy treatment. Turk Klinikleri J Biostat 6:53–61
Google Scholar
Pouriyeh S, Vahid S, Sannino G, et al (2017) A comprehensive investigation and comparison of machine learning techniques in the domain of heart disease. In: 2017 IEEE symposium on computers and communications (ISCC). IEEE, pp 204–207
Latha CBC, Jeeva SC (2019) Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Inform Med Unlock 16:100203. https://doi.org/10.1016/j.imu.2019.100203
Article Google Scholar
Ogundepo EA, Fokoué E (2019) An empirical demonstration of the no free lunch theorem. Math Appl 8:173–188. https://doi.org/10.13164/ma.2019.11
Article MathSciNet MATH Google Scholar
Janosi A, Steinbrunn W, Pfisterer M, Detrano R (1988) Heart disease data set. The UCI KDD Archive. https://archive.ics.uci.edu/ml/datasets/Heart+Disease. Accessed 02 Jan 2021
Dua D, Graff C (2017) UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml
Song Y-Y, Ying LU (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27:130. https://doi.org/10.11919/j.issn.1002-0829.215044
Article Google Scholar
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MathSciNet MATH Google Scholar
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Strobl C, Zeileis A (2009) Party on!—a new, conditional variable importance measure for random forests available in party. https://www.r-project.org/conferences/useR-2009/slides/Strobl+Zeileis.pdf. Accessed 02 Jan 2021
Hapfelmeier A, Babatunde W, Yahya RR, Ulm K (2012) 26 Predictive modeling of gene expression data. Handb Stat Clin Oncol 4:71. https://doi.org/10.1201/b11800-31
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Zou J, Han Y, So S-S (2008) Overview of artificial neural networks. Artif Neural Netw 2015:14–22
Article Google Scholar
Yahya WB, Oladiipo MO, Jolayemi ET (2012) A fast algorithm to construct neural networks classification models with high-dimensional genomic data. Ann Comput Sci Ser 10:39–58
Google Scholar
Yahya WB, Ulm K, Ludwig F, Hapflemeir A (2011) K-SS: a sequential feature selection and prediction method in microarray study. Int J Artif Intell 6:19–47
Google Scholar
Kouiroukidis N, Evangelidis G (2011) The effects of dimensionality curse in high dimensional knn search. In: 2011 15th panhellenic conference on informatics. IEEE, pp 41–45
McLachlan GJ (2004) Discriminant analysis and statistical pattern recognition. Wiley, New York
MATH Google Scholar
Brownlee J (2016) Master Machine Learning Algorithms: Discover How They Work and Implement Them From Scratch. https://machinelearningmastery.com/master-machine-learning-algorithms
Buja A, Stuetzle W, Shen Y (2005) Loss functions for binary class probability estimation and classification: structure and applications. Working draft, November 3
Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874
Article Google Scholar
Tharwat A (2021) Classification assessment methods. Appl Comput Inform 17(1):168–192. https://doi.org/10.1016/j.aci.2018.08.003
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of Ilorin, Ilorin, Nigeria
Ezekiel Adebayo Ogundepo & Waheed Babatunde Yahya

Authors

Ezekiel Adebayo Ogundepo
View author publications
You can also search for this author in PubMed Google Scholar
Waheed Babatunde Yahya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ezekiel Adebayo Ogundepo.

Ethics declarations

Conflict of interest

All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ogundepo, E.A., Yahya, W.B. Performance analysis of supervised classification models on heart disease prediction. Innovations Syst Softw Eng 19, 129–144 (2023). https://doi.org/10.1007/s11334-022-00524-9

Download citation

Received: 23 May 2022
Accepted: 20 December 2022
Published: 04 January 2023
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11334-022-00524-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance analysis of supervised classification models on heart disease prediction

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda

Heart Disease Prediction using Machine Learning Techniques

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance analysis of supervised classification models on heart disease prediction

Abstract

Access this article

Similar content being viewed by others

Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda

Heart Disease Prediction using Machine Learning Techniques

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation