Classification of electrocardiogram signals with support vector machines and genetic algorithms using power spectral features

https://doi.org/10.1016/j.bspc.2010.07.006Get rights and content

Abstract

This paper proposes a new power spectral-based hybrid genetic algorithm-support vector machines (SVMGA) technique to classify five types of electrocardiogram (ECG) beats, namely normal beats and four manifestations of heart arrhythmia. This method employs three modules: a feature extraction module, a classification module and an optimization module. Feature extraction module extracts electrocardiogram's spectral and three timing interval features. Non-parametric power spectral density (PSD) estimation methods are used to extract spectral features. Support vector machine (SVM) is employed as a classifier to recognize the ECG beats. We investigate and compare two such classification approaches. First they are specified experimentally by the trial and error method. In the second technique the approach optimizes the relevant parameters through an intelligent algorithm. These parameters are: Gaussian radial basis function (GRBF) kernel parameter σ and C penalty parameter of SVM classifier. Then their performances in classification of ECG signals are evaluated for eight files obtained from the MIT–BIH arrhythmia database. Classification accuracy of the SVMGA approach proves superior to that of the SVM which has constant and manually extracted parameter.

Introduction

An arrhythmia is any abnormal cardiac rhythm [1]. Heart arrhythmias result from any disturbance in the rate, regularity, and site of origin or conduction of the cardiac electric impulse [2]. Classification of arrhythmia is an important step in developing devices for monitoring the health of individuals. The sequence of electrical signals of heart provides symptomatic information for classifying cardiac arrhythmias. Classification of normal and abnormal beats requires offline analysis of the ECG record data. This paper investigates the detection and classification of ECG arrhythmias.

In the literature, several methods have been proposed for the automatic classification of ECG signals. Among the most recently published work are those presented in [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20].

These works are a clear indication of research maturation in the field of automatic ECG classification. However there are still some open issues in the design of an ECG classification system which may lead to the development of more robust and efficient classifiers. One of these issues is related to the choice of the classification approach. In particular, the SVM approach does not seem to have received the attention it deserves in the ECG classification literature despite its great potential. Indeed, the SVM classifier exhibits a promising generalization capability, thanks to the maximal margin principle (MMP) it is based upon [21]. Another important property is that it is less sensitive to the curse of dimensionality than traditional classification approaches. This is explained by the fact that the MMP makes it unnecessary to estimate explicitly the statistical distributions of classes in the hyper-dimensional feature space in order to carry out the classification task. Thanks to these interesting properties, the SVM classifier has proved successful in a number of different application fields. Turning back to ECG classification, another issue that need to be addressed is that, the selection of the best free parameters of the adopted classifier is generally done empirically (model selection issue).

In this paper, we propose an automated method for differentiating normal heartbeats (N) from left bundle branch blocks (LBBB or L), right bundle branch blocks (RBBB or R), atrial premature contractions (APC or A) and premature ventricular contractions (PVC or V) heartbeats [1]. The spectral feature extraction in combination with temporal features is used in this study. As mentioned, SVM classifier is used due to its popularity in various classification problems in recent years. One of the strengths of this study is the use of search capability of genetic algorithms for finding optimum values of parameters of SVM (model selection). The value of soft margin constant C penalty parameter of support vector machines which is a positive integer number and the value of Gaussian radial basis function (GRBF) kernel parameter σ which is a positive real number must be optimized. In our proposed power spectral-based hybrid genetic algorithm-support vector machine (SVMGA) method, the values of C and σ parameters of SVM classifier are specified by genetic algorithms.

The paper is organized as follows. Section 2 describes non-parametric power spectral density (PSD) estimation and the feature extraction module. Sections 3 Support vector machine (SVM), 4 Genetic algorithms explain the support vector machines (SVM) and genetic algorithms (GA), respectively. Section 5 presents our proposed SVMGA method. Section 6 describes the database and performance metrics. Section 7 shows some simulation results. Section 8 discusses the results and finally Section 9 concludes the paper.

Section snippets

Feature extraction

Power spectrum estimation is perhaps the most widely used method of signal analysis. The power spectrum is related to the correlation function through the Fourier transform. The power spectrum reveals the repetitive and correlated patterns of a signal, which are important in detection, estimation, data forecasting and decision-making systems. The goal of spectral estimation is to describe the distribution (over frequency) of the power contained in a signal, based on a finite set of data. The

Support vector machine (SVM)

SVM is a supervised machine learning method. SVM uses structural risk minimization (SRM) principle whereas in ANN, empirical risk minimization (ERM) is used to minimize training data error [25], [26].

SVM performs classification tasks by constructing optimal separating hyper-planes (OSH). OSH maximizes the margin between the two nearest data points belonging to two separate classes (Fig. 1).

Suppose the training set, (xi, yi), i = 1, 2, …, l, x  Rd, y ∈{− 1,  + 1} can be separated by the hyper-plane wTx+

Genetic algorithms

Fig. 2 illustrates the operation of a general genetic algorithm. In GA, a candidate solution for a specific problem is called an individual or a chromosome and consists of a linear list of genes. Each individual represents a point in the search space, and hence a possible solution to the problem. A population consists of a finite number of individuals. Each individual is decided by an evaluating mechanism to obtain its fitness value. Based on this fitness value and undergoing genetic operators,

Proposed method

The free parameters C and σ greatly affect the classification accuracy of SVM model. However, it is not known beforehand what values of the parameters are appropriate. Therefore, GA is used to search for better combinations of the parameters in SVM. Based on the Darwinian principle of ‘survival of the fittest’, GA can obtain the optimal solution after a series of iterative computations. Fig. 3 presents the whole process of ECG beat classification method implemented in the paper. The obtaining

MIT–BIH arrhythmia database

The MIT–BIH arrhythmia database [30] was used as the data source in this study. The database contains 48 recordings. Each has a duration of 30 min and includes two leads; the modified limb lead II and one of the modified leads V1, V2, V4 or V5. The sampling frequency is 360 Hz, the data are bandpass filtered at 0.1–100 Hz and the resolution is 200 samples per mV. Twenty-three of the recordings are intended to serve as a representative sample of routine clinical recordings and 25 recordings contain

Results

We randomly selected 100 beats from each class, and used these 500 beats for training of classifiers. Total number of beats in our database was 18,290. Clearly, the number of our training set is less than 3% of all beats. Thus, our study is well generalized. We conducted six experiments in the evaluation of our algorithm. In the first experiment we try to find best feature extraction method among four different non-parametric PSD estimation methods that introduced in Section 2.1. Since

Discussion

As seen in Table 2, the multitaper PSD estimation method achieved best classification accuracy of 93.97% among four non-parametric PSD estimation methods (as bolded). The next best method is modified periodogram. Periodogram and Welch methods are in the next ranks.

Values of C and σ that used in Table 2 were experimentally predicted. Multiple execution of the program under a multitude of variables resulted in C = 10,000 and σ = 0.1 to show better outcome through trial and error.

One of superiorities

Conclusion

In this study, SVMGA approach is proposed for an automatic ECG beat classification. The SVMGA approach optimizes the value of GRBF kernel function parameter σ and the value of C parameter for SVM classifier, simultaneously.

In the first experiment periodogram, modified periodogram, Welch and multitaper non-parametric PSD estimation methods are compared for feature subset. As the result, the MTM method was selected to obtain a compact set of spectral features. Three timing features are extracted

References (31)

  • R.R. Sarvestani et al.

    VT and VF classification using trajectory analysis

    Nonlinear Anal.

    (2009)
  • S. Osowski et al.

    ECG beat recognition using fuzzy hybrid neural network

    IEEE Trans. Biomed. Eng.

    (2001)
  • E.D. Ubeyli

    Recurrent neural networks employing Lyapunov exponents for analysis of ECG signals

    Expert Syst. Appl.

    (2010)
  • P. Chazal et al.

    Automatic classification of heartbeats using ECG morphology and heartbeat interval features

    IEEE Trans. Biomed. Eng.

    (2004)
  • M. Lagerholm

    Clustering ECG complexes using Hermite functions and self-organizing maps

    IEEE Trans. Biomed. Eng.

    (2000)
  • Cited by (138)

    • Hardness prediction of high entropy alloys with machine learning and material descriptors selection by improved genetic algorithm

      2022, Computational Materials Science
      Citation Excerpt :

      However, GA is a stochastic global optimization algorithm, and thus the result of GA is strongly depended on the quality of initial population which is normally generated randomly, as well as the iteration times for reaching steady state. There are lots of improved GAs proposed for various specific problems [39–42], in which some GAs [34,37,43–46], such as svmGA [43], rfGA [44] and GARS [34] are designed to focus on feature selection. To the author’s knowledge, however, these methods [34,37,43–46] are mainly aimed to classify datasets.

    View all citing articles on Scopus
    View full text