Skip to main content

Advertisement

Log in

Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Heart is one of the essential operating organs of the human body and its failure is a major contributing factor toward the human deaths. Coronary heart disease may be asymptotic but can be anticipated through the medical tests and daily life routine of the subject. Diagnosis of the coronary heart disease needs a specialized medical resource with the plenty of experience. All over the world and particularly in the developing countries, there is a lack of such experts which make the diagnosis more difficult. In this paper, we present a clinical heart disease diagnostic system by proposing feature subset selection methodology with an object of achieving improved performance. The proposed methodology presents three algorithms for selecting candidate feature subsets: (1) mean Fisher score-based feature selection algorithm, (2) forward feature selection algorithm and (3) reverse feature selection algorithm. Feature subset selection algorithm is presented to select the most decisive subset from the candidate feature subsets. The features are added to the feature subsets on the basis of their individual Fisher scores, while the selection of a feature subset depends on its Matthews correlation coefficient score and dimension. The selected feature subset with the reduced dimension is fed to the RBF kernel-based SVM which results in binary classification: (1) heart disease patient and (2) normal control subject. The proposed methodology is validated through accuracy, specificity and sensitivity using four UCI datasets, i.e., Cleveland, Switzerland, Hungarian and SPECTF. The statistical results achieved using the proposed technique are shown in comparison with the existing techniques reflecting its better performance. It has an accuracy of 81.19, 84.52, 92.68 and 82.7% for Cleveland, Hungarian, Switzerland and SPECTF, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. WHO (2016) Who—cardiovascular diseases (cvds). http://www.who.int/mediacentre/factsheets/fs317/en/. Accessed 10 Jan 2016

  2. Naicker S, Plange-Rhule J, Tutt RC, Eastwood JB (2009) Shortage of healthcare workers in developing countries-Africa. Ethn Dis 19(1):60

    Google Scholar 

  3. Chaudhry B, Wang J, Wu S, Maglione M, Mojica W, Roth E, Morton SC, Shekelle PG (2006) Systematic review: impact of health information technology on quality, efficiency, and costs of medical care. Ann Intern Med 144(10):742–752

    Article  Google Scholar 

  4. Rajkumar A, Reena GS (2010) Diagnosis of heart disease using datamining algorithm. Glob J Comput Sci Technol 10(10):38–43

    Google Scholar 

  5. Long CN, Meesad P, Unger H (2015) A highly accurate firefly based algorithm for heart disease prediction. Expert Syst Appl 42(21):8221–8231

    Article  Google Scholar 

  6. Ismaeel S, Miri A, Chourishi D (2015) Using the extreme learning machine (elm) technique for heart disease diagnosis. In: 2015 IEEE Canada international on humanitarian technology conference (IHTC2015), pp 1–3

  7. Krishnaiah V, Narsimha G, Chandra N (2015) Heart disease prediction system using data mining technique by fuzzy k-nn approach. In: Satapathy SC, Govardhan A, Raju KS, Mandal JK (eds) Emerging ICT for bridging the future—proceedings of the 49th annual convention of the computer society of India (CSI) volume 1, ser. Advances in intelligent systems and computing, vol 337. Springer International Publishing, pp 371–384

  8. Chitra R, Seenivasagam V (2015) Heart disease prediction system using intelligent network. In: Power electronics and renewable energy systems. Springer, New York, pp 1377–1384

  9. Srinivas K, Rao G, Govardhan A (2014) Rough-fuzzy classifier: a system to predict the heart disease by blending two different set theories. Arab J Sci Eng 39(4):2857–2868

    Article  Google Scholar 

  10. Yang GJ, Kim J-K, Kang U-G, Lee Y-H (2014) Coronary heart disease optimization system on adaptive-network-based fuzzy inference system and linear discriminant analysis (anfislda). Pers Ubiquitous Comput 18(6):1351–1362

    Article  Google Scholar 

  11. Yuehjen ES, Hou C-D, Chiu C-C (2014) Hybrid intelligent modeling schemes for heart disease classification. Appl Soft Comput 14(Part A):47–52, special issue on hybrid intelligent methods for health technologies

  12. Muthukaruppan S, Er M (2012) A hybrid particle swarm optimization based fuzzy expert system for the diagnosis of coronary artery disease. Expert Syst Appl 39(14):11 657–11 665

    Article  Google Scholar 

  13. Anooj PK (2012) Clinical decision support system: risk level prediction of heart disease using decision tree fuzzy rules. Int J Res Rev Comput Sci 3(3):1659–1667

    Google Scholar 

  14. Anooj PK (2012) Clinical decision support system: risk level prediction of heart disease using weighted fuzzy rules. J King Saud Univ Comput Inf Sci 24(1):27–40

    Google Scholar 

  15. Shilaskar S, Ghatol A (2013) Feature selection for medical diagnosis: evaluation for cardiovascular diseases. Expert Syst Appl 40(10):4146–4153

    Article  Google Scholar 

  16. Nahar J, Imam T, Tickle KS, Chen Y-PP (2013) Computational intelligence for heart disease diagnosis: a medical knowledge driven approach. Expert Syst Appl 40(1):96–104

    Article  Google Scholar 

  17. Giri D, Acharya UR, Martis RJ, Sree SV, Lim T-C, Ahamed T, Suri JS (2013) Automated diagnosis of coronary artery disease affected patients using lda, pca, ica and discrete wavelet transform. Knowl-Based Syst 37:274–282

    Article  Google Scholar 

  18. Zhao Z, Wang L, Liu H, Ye J (2013) On similarity preserving feature selection. IEEE Trans Knowl Data Eng 25(3):619–632

    Article  Google Scholar 

  19. Hancer E, Xue B, Karaboga D, Zhang M (2015) A binary abc algorithm based on advanced similarity scheme for feature selection. Appl Soft Comput 36:334–348

    Article  Google Scholar 

  20. Zhang J, Yu J, Wan J, Zeng Z (2015) l2, 1 norm regularized fisher criterion for optimal feature selection. Neurocomputing 166:455–463

    Article  Google Scholar 

  21. Li C, Shi C, Zhang H, Hui C, Lam K-M, Zhang S (2014) Cost-sensitive feature selection in medical data analysis with trace ratio criterion. In: 2014 12th international conference on signal processing (ICSP). IEEE, pp 1077–1082

  22. Markos GT, Exarchos TP, Fotiadis D, Kotsia AP, Vakalis KV, Naka KK, Michalis LK et al (2008) Automated diagnosis of coronary artery disease based on data mining and fuzzy modeling. IEEE Trans Inf Technol Biomed 12(4):447–458

    Article  Google Scholar 

  23. Gu Q, Li Z, Han J (2012) Generalized fisher score for feature selection. arXiv preprint arXiv:1202.3725

  24. Lee C, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag 42(1):155–165

    Article  Google Scholar 

  25. Shah SMS, Batool S, Khan I, Ashraf MU, Abbas SH, Hussain SA (2017) Feature extraction through parallel probabilistic principal component analysis for heart disease diagnosis. Phys A Stat Mech Appl 482:796–807

    Article  Google Scholar 

  26. Akay MF (2009) Support vector machines combined with feature selection for breast cancer diagnosis. Expert Syst Appl 36(2):3240–3247

    Article  Google Scholar 

  27. Çomak E, Arslan A, Türkoğlu İ (2007) A decision support system based on support vector machines for diagnosis of the heart valve diseases. Comput Biol Med 37(1):21–27

    Article  Google Scholar 

  28. Yahiaoui A, Er O, Yumusak N (2017) A new method of automatic recognition for tuberculosis disease diagnosis using support vector machines. Biomed Res 28(9):4208–4212

    Google Scholar 

  29. Huang M-W, Chen C-W, Lin W-C, Ke S-W, Tsai C-F (2017) SVM and SVM ensembles in breast cancer prediction. PloS one 12(1):e0161501

    Article  Google Scholar 

  30. Asuncion A, Newman D (2007) UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html. Accessed 15 June 2015

  31. Chapelle O (2007) Training a support vector machine in the primal. Neural comput 19(5):1155–1178

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research did not receive any specific grant from funding agencies in the public, commercial or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Syed Muhammad Saqlain.

Ethics declarations

Conflicts of interest

There have been no involvements that might raise the question of bias in the work reported or in the conclusions, implications or opinions stated. The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saqlain, S.M., Sher, M., Shah, F.A. et al. Fisher score and Matthews correlation coefficient-based feature subset selection for heart disease diagnosis using support vector machines. Knowl Inf Syst 58, 139–167 (2019). https://doi.org/10.1007/s10115-018-1185-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-018-1185-y

Keywords

Navigation