A new machine learning technique for an accurate diagnosis of coronary artery disease

https://doi.org/10.1016/j.cmpb.2019.104992Get rights and content

Highlights

  • Novel data mining method is proposed for CAD diagnosis.

  • Application of feature selection (based on GA and PSO) is proposed.

  • New genetic training (N2Genetic optimizer) based on fusion of 10-fold cross-validation with GA or PSO is employed.

  • SVM (SVC, nuSVM, LinSVM) is employed for classification.

  • High classification accuracy of 93.08% is obtained.

Abstract

Background and objective

Coronary artery disease (CAD) is one of the commonest diseases around the world. An early and accurate diagnosis of CAD allows a timely administration of appropriate treatment and helps to reduce the mortality. Herein, we describe an innovative machine learning methodology that enables an accurate detection of CAD and apply it to data collected from Iranian patients.

Methods

We first tested ten traditional machine learning algorithms, and then the three-best performing algorithms (three types of SVM) were used in the rest of the study. To improve the performance of these algorithms, a data preprocessing with normalization was carried out. Moreover, a genetic algorithm and particle swarm optimization, coupled with stratified 10-fold cross-validation, were used twice: for optimization of classifier parameters and for parallel selection of features.

Results

The presented approach enhanced the performance of all traditional machine learning algorithms used in this study. We also introduced a new optimization technique called N2Genetic optimizer (a new genetic training). Our experiments demonstrated that N2Genetic-nuSVM provided the accuracy of 93.08% and F1-score of 91.51% when predicting CAD outcomes among the patients included in a well-known Z-Alizadeh Sani dataset. These results are competitive and comparable to the best results in the field.

Conclusions

We showed that machine-learning techniques optimized by the proposed approach, can lead to highly accurate models intended for both clinical and research use.

Introduction

Coronary artery disease (CAD) is the commonest type of cardiovascular disease (CVD). It is the main cause of death around the world [1]. Statistics published by World Health Organization (WHO) reveal that about 17.7 million of people died of CAD in 2015, contributing to 31% of all mortality [2], [3]. It is likely that an accurate CAD detection and timely intervention could have averted many of these CAD deaths. This research aims to present a new methodology (a combination of using several classical machine learning algorithms and evolutionary algorithms) to detect CAD.

Abbreviations:

Data mining (sometimes also named knowledge discovery) is a computer-based process to extract useful information from huge amounts of raw data. Some well-known data mining algorithms include: Support Vector Machines (SVMs), Neural Networks (NNs), Decision Trees (DTs), Genetic algorithms (GAs), and Bayesian Networks (BNs). Each of these methods has both advantages and disadvantages and should be used with caution. Based on the type of data and conditions of research (e.g., including variables and parameters for each algorithm), some of these methods produce good results for predicting a disease, while others may not perform optimally. The performance of the methods depends not only on the nature of the algorithm but also on the type of data and conditions of implementation, which may have a significant impact on the results. Therefore, customizing the approach precisely to the type of data being mined and the outcome that is sought can improve the performance of the method.

Nowadays, the use of different machine learning algorithms for diverse applications is increasingly popular and common. Due to the huge size and complexity of data involved in these various applications, extracting hidden and useful information from them is essential. This information can be used to enhance the quality of various services. Analyzing this information, organizations, governments, researchers and others can provide improved services and add value to their interactions with their users, customers, patients, etc. This study concentrates on the application of machine learning and data mining methods to healthcare data [4], [5], [6].

In the healthcare domain, different machine learning techniques have been applied in the past few decades to study various diseases, such as Parkinson's disease [7], [8], liver disease [9], [10], heart disease [11], [12], [13], [14], breast cancer [15], [16], lung cancer [17], etc. In general, the applied methods provided good results for predicting the respective diseases. Timely diagnosis of these diseases, including cancers, with the least possible error is the foremost expectation of patients. Not surprisingly, the detection process demands special knowledge and experience. We believe that machine learning as well as data mining can be applied to improve the accuracy of the approach, decrease the number of diagnostic errors, and ultimately deliver quality services to the patients.

The main objective of the current paper is to design a novel effective model using several machine learning and data mining methods to detect CAD. In this study, we used a Diagnosis Support System (DSS) [18] as CDSS (clinical decision support system), in line with its growing popularity and ubiquitous application in diverse information systems. We tested ten reputable traditional machine learning techniques namely Naive Bayes classifier (gaussNB), linear discriminant analysis (LDA), K-nearest neighbors (KNN), QDA (quadratic discriminant analysis), SVM (nu-SVC called nuSVM), MLP1 (a multilayer perceptron algorithm with one hidden layer that the hidden layer included maximum 200 neurons), linear SVM (LinSVM), Random Forest (with 1000 trees), logistic regression (called reglog), and C-SVC as a type of SVM (called SVC). In the end, the three-best performing algorithms (nu-SVC, nuSVM, and LinSVM) were selected. The clinical CAD dataset used in our study included 303 records with 54 features.

The first step in our experiments was pre-processing of both categorical and numerical attributes using a normalization approach. For feature selection, the elimination of redundant features was performed by using a GA and a PSO. We then optimized the parameters when coupled with feature selection and cross-validation techniques. Our findings showed that the performance of traditional algorithms significantly enhanced using the proposed methodology (named New 2level Genetic optimizer: N2Genetic optimizer). The proposed methodology can potentially be applied for an early diagnosis of CAD as a powerful computer-aided diagnosis system.

Several reasons motivated us to apply data mining and machine learning techniques on CAD data. First, CAD is a leading cause of mortality and morbidity worldwide. Early diagnosis and effective management can prevent these adverse outcomes and improve quality of life. Second, the diagnostic accuracies of conventional methods are not satisfactory. We propose a new methodology for classifying CAD that we believe would surpass the current methods. Third, we aim to design a new DSS system, which is a computer-aided diagnosis system for an early detection of CAD that has a minimal clinical error in the CAD diagnosis. Finally, we envisage that the proposed methodology can be useful in triage, reduce specialist's input, ultimately leading to both cost and time savings during CAD diagnosis. This should appeal to physicians, patients and.

The three main contributions of our work are as follows: (1) investigating pre-processing and testing of well-known machine learning methods applied to CAD diagnosis; (2) studying the use of feature selection based on GA or PSO and (3) designing a new genetic training technique —N2Genetic optimizer — based on the fusion of cross-validation with GA or PSO for parameter optimization and feature selection.

The rest of the work is organized as follows. The literature review is presented in Section 2. In this section, we reviewed some studies that had used data mining and machine learning techniques for detection of heart disease detection, especially CAD. Section 3 reports the proposed methodology. Experimental outcomes and discussion are presented in Section 4. The conclusion and future work are presented in Section 5.

Section snippets

An overview of related work

Remarkable progress has been made in the use of different machine learning algorithms applied to medical datasets for detecting different of diseases, e.g. detection of various types of cancer. The main clinical methods for assessment of heart disease include electrocardiogram (ECG) [19], echocardiogram [20], cardiac computerized tomography (CT) scan [21], Holter monitoring [22], cardiac magnetic resonance imaging (MRI) [23], blood tests [24], and cardiac catheterization [25]. In this section,

Methodology

Several studies on the heart disease detection have been conducted using machine learning algorithms and different data sets [67], [68], [69], [70], [71], [72]. Current research investigates CAD detection using a novel approach. This section provides more details about the proposed methodology. The description of the methodology used is presented below:

  • 1.

    Pre-processing (data manipulation and normalization):

    • -

      Categorical attributes -> change according to one hot encoding. Coding 1 with N, for

Experimental results

In this section, feature selection is included to the previous methods. GA was used in parallel twice (parameters optimization + feature selection) with:

  • (1)

    Fitness function equal to accuracy; and

  • (2)

    Fitness function equal to F1-score.

As expressed earlier, in the preliminary tests, 10 algorithms including GaussNB, LDA, KNN, QDA, reglog, nuSVM, MLP, LinSVM, Random Forest, and SVC were compared. However, only the best three classifiers (nu-SVC, nuSVM, and LinSVM) were selected for the rest of our study.

Conclusion and future work

CAD is a common deadly heart disease. We presented a novel approach for an early diagnosis of CAD, which may improve the clinical decision-making process. The principal contribution of this study is the machine-based detection system for predicting CAD with a better performance compared to classical machine learning techniques. We have first applied ten traditional machine learning algorithms on the Z-Alizadeh Sani heart disease dataset. In our experiments, we carried out data pre-processing

Declaration of Competing Interest

None.

References (88)

  • K.R. Nandalur et al.

    Diagnostic performance of stress cardiac magnetic resonance imaging in the detection of coronary artery disease: a meta-analysis

    J. Am. Coll. Cardiol.

    (2007)
  • R.M. Wyman et al.

    Current complications of diagnostic and therapeutic cardiac catheterization

    J. Am. Coll. Cardiol.

    (1988)
  • P. Pławiak

    Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system

    Expert Syst. Appl.

    (2018)
  • R. Alizadehsani et al.

    A data mining approach for diagnosis of coronary artery disease

    Comput. Methods Programs Biomed.

    (2013)
  • R. Alizadehsani et al.

    Coronary artery disease detection using computational intelligence methods

    Knowl. Based. Syst.

    (2016)
  • Z. Arabasadi et al.

    Computer aided decision making for heart disease detection using hybrid neural network-Genetic algorithm

    Comput. Methods Programs Biomed.

    (2017)
  • D. Han et al.

    Incremental role of resting myocardial computed tomography perfusion for predicting physiologically significant coronary artery disease: a machine learning approach

    J. Nuclear Cardiol.

    (2018)
  • L. Shi et al.

    A novel ensemble algorithm for biomedical classification based on ant colony optimization

    Appl. Soft. Comput.

    (2011)
  • R. Alizadehsani et al.

    Non-invasive detection of coronary artery disease in high-risk patients based on the stenosis prediction of separate coronary arteries

    Comput. Methods Programs Biomed.

    (2018)
  • T.T. Hoang et al.

    A novel differential particle swarm optimization for parameter selection of support vector machines for monitoring metal-oxide surge arrester conditions

    Swarm Evol. Comput.

    (2018)
  • O. Yildirim et al.

    An efficient compression of ECG signals using deep convolutional autoencoders

    Cogn. Syst. Res.

    (2018)
  • Ö Yildirim

    A novel wavelet sequence based on deep bidirectional LSTM network model for ECG signal classification

    Comput. Biol. Med.

    (2018)
  • U.R. Acharya et al.

    Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network

    Inf. Sci.

    (2017)
  • U.R. Acharya et al.

    Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals

    Inf. Sci.

    (2017)
  • P. Pławiak

    Novel genetic ensembles of classifiers applied to myocardium dysfunction recognition based on ECG signals

    Swarm Evol. Comput.

    (2018)
  • U.B. Baloglu et al.

    Classification of myocardial infarction with multi-lead ECG signals and deep CNN

    Pattern Recognit. Lett.

    (2019)
  • N. Michielli et al.

    Cascaded LSTM recurrent neural network for automated sleep stage classification using single-channel EEG signals

    Comput. Biol. Med.

    (2019)
  • S.L. Oh et al.

    Automated diagnosis of arrhythmia using combination of CNN and LSTM techniques with variable length heart beats

    Comput. Biol. Med.

    (2018)
  • U.R. Acharya et al.

    Automated detection of coronary artery disease using different durations of ECG segments with convolutional neural network

    Knowl. Based. Syst.

    (2017)
  • U. Raghavendra et al.

    Deep convolution neural network for accurate diagnosis of glaucoma using digital fundus images

    Inf. Sci.

    (2018)
  • W. Książek et al.

    Pławiak

    (2019)
  • P. Pławiak
    (2014)
  • T.Y. Wah et al.

    Automated diagnosis of coronary artery disease: a review and workflow

    Cardiol. Res. Pract.

    (2018)
  • Cardiovascular Diseases (CVDs), [accessed on 11/8/2018]...
  • Cardiovascular Disease, [accessed on 11/8/2018]...
  • T.B. Murdoch et al.

    The inevitable application of big data to health care

    JAMA

    (2013)
  • R.B. Parikh et al.

    Integrating predictive analytics into high-value care: the dawn of precision delivery

    JAMA

    (2016)
  • A.M. Darcy et al.

    Machine learning and the profession of medicine

    JAMA

    (2016)
  • M. Abdar et al.

    Impact of Patients’ gender on Parkinson's disease using classification algorithms

    J. AI Data Mining

    (2018)
  • M. Hassoon et al.

    Rule optimization of boosted C5. 0 classification using genetic algorithm for liver disease prediction

  • M. Abdar et al.

    Improving the diagnosis of liver disease using multilayer perceptron neural network and boosted decision trees

    J. Med. Biol. Eng.

    (2017)
  • L. Ogiela et al.

    Cognitive computing in intelligent medical pattern recognition systems

    Intelligent Control and Automation

    (2006)
  • A.P. Davie et al.

    Value of the electrocardiogram in identifying heart failure due to left ventricular systolic dysfunction

    BMJ

    (1996)
  • D.J. Brenner et al.

    Computed tomography—an increasing source of radiation exposure

    NEngl. J. Med.

    (2007)
  • Cited by (219)

    View all citing articles on Scopus
    View full text