Automated heart disease prediction model by hybrid heuristic-based feature optimization and enhanced clustering

https://doi.org/10.1016/j.bspc.2021.103260Get rights and content

Highlights

  • Exploits a new algorithm called Jaya Algorithm with Red Deer Algorithm (J-RDA).

  • Hybrid clustering is formed by integrating the optimized DBSCAN and optimized KMC.

  • Promotes a novel heart diseases prediction using ECG signals and data attributes.

Abstract

The main intent of this paper is to implement a novel clustering model for heart disease prediction with numerical data and ECG signals using an optimal feature extraction approach. Rather than the direct use of numerical data to clustering, the Electrocardiogram (ECG) signals are initially subjected for the signal decomposition using Discrete Wavelet Transform (DWT), and dimensionality reduction is performed through Principal Component Analysis (PCA). Both the data are processed for the optimized feature extraction stage. Here, the hybrid meta-heuristic concept is adopted for the optimized feature extraction based on Jaya Algorithm with Red Deer Algorithm (J-RDA). Once the feature optimization is done, the hybrid clustering is formed by integrating the optimized Density-based Spatial Clustering of Applications with Noise (DBSCAN) and optimized K-Means Clustering (KMC), in which the proposed J-RDA is used for tuning the significant parameters. Moreover, the objective model for feature optimization and optimized hybrid clustering for proposed heart disease prediction tries to solve the multi-objective function. The results reveal that the proposed model achieves good performance in rectifying the problems in heart disease prediction for dual data types.

Introduction

The research in the field of medical applications-based on computer aided systems plays an exciting and vital role [41]. Generally, the symptoms of patients and confirmed diagnostic reports are used for decision making process by doctors. The accuracy rate of diagnosis has been determined with the physician’s experience [5]. Nowadays, the diagnosis of heart disease is a complex medical task that motivates researchers for focusing on this field. The failure in diagnosis rate leads to death around all age groups [28]. Therefore, there is a huge need for developing a prediction model for heart diseases or attacks through deep learning approaches. Accurate and precise diagnosis of heart disease primarily depends on previous information and knowledge from concerned pathological events [16]. The medical applications employ two types of input data that are data and ECG signals for the prediction systems [24]. However, manual ECG interpretation for the prediction and diagnosis is time consuming, tedious, suffers from intra- and inter-observer inconsistency, and needs more expertise [28]. ECG is a digital signal recorded through inserting electrodes on human body voltages, where the changes in voltage levels also detect the electrical activity of the heart [16]. Usually, the conventional diagnosis model use ECG interpretation that builds a demand of high expertise from the doctors’ perspective and employing more precise devices [21].

In real world mapping or classification problems, dataset size is very large, and hence, the learning process should not work on eliminating the unnecessary features [19]. It also has the over-fitting problems, and the number of features has to be reduced [5]. While creating the large databases and subsequent requirements for better machine learning approaches, then new issues can be raised and new techniques for feature selection are also raising demand [14]. The significant task of any feature extraction technique is data reduction to improve the system performance and understanding of data [12]. Though, in any feature selection approaches, evaluating the quality of chosen features is a major and complex task [41]. Therefore, there is a requirement of reducing the size of feature vector by applying prominent algorithms to choose the optimal set of features [1]. As the number of data is increasing progressively that is more complex to analyze and process and mainly, it becomes hard for maintaining the e-healthcare data [42]. Hence, disease prediction remains more significant to health care groups for maintaining better patient care [35].

Numerous data mining approaches are used in prediction models; among them clustering method is one of the popular and efficient data mining approaches for grouping data items through their similarity [17]. The relationships and their patterns in such data items are explored by applying different similarity and dissimilarity metrics [40]. Clustering algorithms such as partitioning and hierarchical algorithms, EM algorithm, and grid based distance based, and density based algorithms, Self organizing Maps, k-means etc. are applied for making a group C of representatives to prediction process [39]. In recent years, diverse clusters are often used in various researches to predict the heart diseases [3], [30]. The more precise heart disease prediction model has been offered by using diverse clustering schemes to attain better cluster performance. These strategies provide a noteworthy consideration for predicting the heart diseases and their progressions [2].

The major contribution of this heart disease prediction model is given here.

  • To design an automated heart disease prediction model by applying both data and ECG signals through an optimal feature selection and meta-heuristic-based hybrid clustering approach for overcoming the challenges in the conventional heart disease prediction models with superior performance.

  • To propose an optimal feature selection process for the appropriate selection of features to get helpful information with minimizing the redundant data from both data and ECG signals. The optimal feature selection supports the clustering process by reducing the overfitting problem through improving the accuracy rate.

  • To verify the prediction process by developing a hybrid clustering model through integrating the optimized KMC and optimized DBSCAN to attain accurate results with better prediction accuracy.

  • To propose a new hybrid algorithm termed J-RDA for selecting optimal features from both data and ECG signals and for developing a hybrid clustering method. It is done by optimizing the significant parameter of DBSCAN and optimizing the centroid of KMC.

  • To analyze the performance of the proposed heart disease prediction model with the analysis of conventional clustering models and meta-heuristic approaches in terms of diverse performance measures to ensure precision.

The remaining sections of the proposed heart disease prediction model are given here. Section II discusses the literature survey. Section III explains the proposed architectural representation of heart disease prediction with optimal feature selection and clustering. Section IV specifies the optimal feature selection for heart disease prediction by J-RDA. Section V describes the improved meta-heuristic-based hybrid clustering for heart disease prediction with data and signal. Section VI discusses the results. Section VII concludes this paper.

Section snippets

Related works

In Irene et al. [22] have proposed a hybrid model integrating with Fuzzy K-Models Clustering-based Attribute Weighting (FKMAW) and Deep belief Network and Extreme Learning Machine (DBNKELM). It has developed for improving the medical diagnosis procedure. Initially, the input attributes were added with weight through FKMAW approach. Correspondingly, the performance of classification has enhanced through weighing technique. These weighted attributes were employed with regression method for

Architectural view

In recent studies, the following challenges are found in heart disease prediction by diverse computational systems. This model suffers from restrictions on capability of accurate categorization, selection of the kernel and additional parameter choices, low prediction accuracy and execution speed due to the minimal set of attributes, and false classification rate. Moreover, poor medical decisions lead to more mortality rate when the huge number of data is available. The time for constructing the

Optimal feature selection

The proposed heart disease prediction model considers the major contribution as the optimal feature selection for extraction of most useful information by reducing the number of input variables. The both input data features Fshadata and features of ECG signals FsheECG are given to the optimal feature selection process, which is jointly termed as Fshfeature=Fshadata,FsheECG . The optimal feature selection is used for reducing the total length of features to select the significant features for

Optimized KMC

KMC [10] is a famous clustering approach for the prediction models. The proposed heart disease prediction model use hybrid clustering through optimized KMC and optimized DBSCAN for the prediction of the heart diseases by considering both data and ECG signals. This model use developed J-RDA for the optimization of centroid of KMC technique. It also finds the groups, which have not been visibly labeled in the data. This procedure divides a group of data into a specific number of groups termed k

Experimental setup

The proposed heart disease prediction using data and signal was executed in PYTHON, and the experimental analysis was performed. This heart disease prediction model was developed by considering the number of iterations as 10. Here, the proposed model was analyzed by comparing the proposed model over the traditional models in terms of Type I and Type II measures. Here, Type I measures were “positive measures like Accuracy, Sensitivity, Specificity, Precision, Negative Predictive Value (NPV),

Discussion

The proposed J-RDA-DB-KMC is capable of solving constrained and unconstrained optimization issues. It has high speed convergence particularly in the large-scale global optimization. Because of these advantages, this method performs better than the other existing methods. The efficiency of the proposed model in terms of accuracy is consistently achieved higher for all the datasets by all the algorithms, but the proposed AAP-SFO-AR-NMF gets better values than other algorithms. On the other hand,

Conclusion

This paper has developed an automated heart disease prediction model through hybrid heuristic-based feature optimization and enhanced clustering by considering both data and ECG signals. The optimal features were extracted using a new proposed J-RDA from both data and dimensionality reduced ECG signals, in which ECG signals were decomposed using DWT and dimensionality reduced by PCA. The optimal features were subjected to the hybrid prediction model using the integrated clustering of optimized

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (42)

  • K.-M. Osei-Bryson et al.

    A hybrid clustering algorithm

    Comput. Oper. Res.

    (2007)
  • M.H. Vafaie et al.

    Heart diseases prediction based on ECG signals’ classification using a genetic-fuzzy system and dynamical model of ECG signals

    Biomed. Signal Process. Control

    (2014)
  • AL-Raddadi RM

    Clustering of cardiovascular diseases risk factors, and cardiovascular risk prediction, primary health care centers

    J. Saudi Heart Associat.

    (2013)
  • K. Balaji et al.

    Machine learning algorithm for clustering of heart disease and chemoinformatics datasets

    Comput. Chem. Eng.

    (2020)
  • M.M. Beno et al.

    Threshold prediction for segmenting tumour from brain MRI scans

    Int. J. Imaging Syst. Technol.

    (2014)
  • C. Beulah et al.

    Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques

    Inf. Med. Unlocked

    (2019)
  • M.R. Bonyadi et al.

    Analysis of Stability, Local Convergence, and Transformation Sensitivity of a Variant of the Particle Swarm Optimization Algorithm

    IEEE Trans. Evol. Comput.

    (2016)
  • R.P. Cherian et al.

    Weight optimized neural network for heart disease prediction using hybrid lion plus particle swarm algorithm

    J. Biomed. Inform.

    (2020)
  • A. Dutta

    An efficient convolutional neural network for coronary heart disease prediction

    Expert Syst. Appl.

    (2020)
  • M.A. Elhossein et al.

    On the performance improvement of elephant herding optimization algorithm

    Knowl.-Based Syst.

    (2019)
  • Eltrass AS, Tayel MB, Ammar AI. 2021. A new automated CNN deep learning approach for identification of ECG congestive...
  • View full text