Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability

https://doi.org/10.1016/j.cmpb.2011.12.015Get rights and content

Abstract

Feature selection plays an important role in pattern recognition systems. In this study, we explored the problem of selecting effective heart rate variability (HRV) features for recognizing congestive heart failure (CHF) based on mutual information (MI). The MI-based greedy feature selection approach proposed by Battiti was adopted in the study. The mutual information conditioned by the first-selected feature was used as a criterion for feature selection. The uniform distribution assumption was used to reduce the computational load. And, a logarithmic exponent weighting was added to model the relative importance of the MI with respect to the number of the already-selected features. The CHF recognition system contained a feature extractor that generated four categories, totally 50, features from the input HRV sequences. The proposed feature selector, termed UCMIFS, proceeded to select the most effective features for the succeeding support vector machine (SVM) classifier. Prior to feature selection, the 50 features produced a high accuracy of 96.38%, which confirmed the representativeness of the original feature set. The performance of the UCMIFS selector was demonstrated to be superior to the other MI-based feature selectors including MIFS-U, CMIFS, and mRMR. When compared to the other outstanding selectors published in the literature, the proposed UCMIFS outperformed them with as high as 97.59% accuracy in recognizing CHF using only 15 features. The results demonstrated the advantage of using the recruited features in characterizing HRV sequences for CHF recognition. The UCMIFS selector further improved the efficiency of the recognition system with substantially lowered feature dimensions and elevated recognition rate.

Highlights

► We propose a feature selector UCMIFS for congestive heart failure (CHF) recognition. ► UCMIFS is based on the mutual information conditioned by the first-selected feature. ► The performance of UCMIFS is superior to the other MI-based feature selectors. ► UCMIFS selects only 15 features to achieve a high recognition rate of 97.59%.

Introduction

Heart rate variability (HRV) is a widely used tool for studying the role of cardiovascular diseases and affliction influences. Recently, numerous studies have focused on using HRV measurements for diagnosis purpose, especially in recognizing congestive heart failure (CHF) from normal sinus rhythms (NSR) [1], [2], [3]. CHF is an omen of cardiac morbidity, which is a dysfunction of the cardiovascular system that the heart is unable to drain the blood away. CHF usually accompanies with chest tightness, abdominal swelling, and hard breathing. However, the patients usually do not suffer from pain in daily life such that the symptoms may be ignored.

In recent years, numerous methods have been developed to recognize CHF based on HRV [4], [5], [6]. In these studies, different categories of features calculated from long-term HRV sequences were recruited as an attempt to improve the performance of the classifier. This process resulted in increased feature dimensions and elevated computation load. Therefore, it becomes important to select the most representative features from the original feature set such that the recognition rate is retained with considerably reduced feature dimensions.

In practice, the optimal subset of features is usually unknown and it is common to have irrelevant or redundant features at the beginning of the pattern classification tasks. To tackle this problem, two main dimension-reduction approaches, namely feature extraction and feature selection, are usually applied [4]. Feature extraction creates new features based on transformation or weight combination of the original feature set. On the contrary, feature selection refers to methods that select the best subset of features from the original feature set.

Feature selection can be further categorized into filters and wrappers [5]. A filter involves a predefined performance measure which is independent of the subsequent classifier. Alternatively, a wrapper requires a specific learning machine and refers its classification accuracy as a performance measure to search for an optimal feature subset. Although wrappers usually produce better accuracy than filters, they are criticized as being computational extensive and over-fitted only for specific classifiers. Consequently, filters are usually preferred to wrappers.

A number of measures, such as distance [6], [7], correlation [8], and mutual information (MI) [9], have been applied in filters for evaluating the efficacy of a feature. Techniques, such as linear discriminant analysis between features and classes [10], fast correlation-base filter using approximate Markov blanket method for feature relevance calculation [11], and filters using entropy and other information concepts for feature selection [11], are some examples that have successfully applied feature selection in clinical practice. Among them, mutual information (MI) has been reported to be effective in selecting features for a global category of pattern classification problems [9], [11]. The main advantages of using MI as a criterion for feature selection are two folds. Firstly, MI is capable of measuring the relationship among attributes and between attributes and classes which may not be easily characterized by other measures. Secondly, MI is invariant under space transformations. These advantages distinguish MI from other measures. In this study, we tackle the problem about how to improve the approximation of MI in a high-dimensional feature space and how to effectively use MIs as criteria for selecting the most representative features for CHF recognition.

Battiti is one of the major pioneers who applied a greedy algorithm based on MI to select relevant features form the original feature set [9]. The greedy algorithm sequentially selects optimal features from the remaining feature set. The criterion of selecting the next feature is based on maximizing the conditional mutual information between the candidate feature and the class attribute. It is apparent that this process is complicated and computational extensive as the number of features increases. To cope with these problems, Battiti's algorithm, termed mutual information feature selection (MIFS), approximates the conditional MI with the summation of paired MI between the candidate feature and each of the features inside the already-selected feature subset. However, a great deal of information was lost with this approximation.

In view of MIFS's potential in feature selection, several attempts have been made to improve the performance of MIFS. Kwak and Choi [12] assumed uniform distributions in the information of input features and proposed the MIFS-U algorithm that amends the ignorance of the joint probability term in the MIFS. Cheng et al. [13] proposed a conditional mutual information feature selector (CMIFS), which conditioned the calculation of mutual information with the first selected feature. Also included into consideration is the last feature selected just prior to the candidate feature. In this manner, the conditional MI required in the MIFS is more reasonably approximated. Other techniques, including min-redundancy max-relevance (mRMR) [14] and normalized mutual information feature selection (NMIFS) [15], were also proposed to improve the performance of MIFS.

Inspired by MIFS-U and CMIFS, we propose to modify CMIFS and apply the uniform distribution approximation exploited in MIFS-U to simplify the calculation of the conditional MI. The result is a modified conditional mutual information feature selector with uniform distribution assumption (UCMIFS). Considering the significance of the first-selected feature f1 in the greedy algorithm, we also employ the mutual information conditioned by f1 in the approximation. However, different from the CMIFS, all the features in the already-selected feature subset, instead of only the last feature, are considered. Secondly, the uniform distribution assumption is recruited to simplify the calculation. Finally, a weighting parameter represented as a logarithmic function of the number of the already-selected feature subset is added to model the relative importance of individual terms in the calculation of MIs. The original feature set contains typical features, including personal data and features calculated from the time statistics, Poincare plots, and frequency-domain distribution, and features calculated from the third-cumulant spectra (bispectra) [16] of the RR interval (RRI) sequences. In this study, the efficiency of the proposed UCMIFS algorithm in selecting features for CHF recognition is justified and compared to that of other MI-based feature selectors. The performance of the proposed system is also compared to that of the other outstanding CHF classifiers published in the literature.

Section 2 reviews the background knowledge of using MI for feature selection and discusses some of the popular and effective MI-based feature selectors. Section 3 proposes the modified conditional mutual information feature selector with uniform distribution assumption (UCMIFS). Section 4 establishes the original feature set applied in this study. Section 5 demonstrates the experimental results with some critical discussions. Finally, some conclusions are drawn in Section 6.

Section snippets

Entropy and mutual information related to feature selection

The Shannon's information theory provides an approach to quantize the information of random variables with entropy and mutual information (MI). In this section, we summarize the theoretic background required to calculate the MIs between features and between features and classes as quantitative measures for feature selection. Please refer to [9], [14] for details.

Assume p(fi) represents the probability density function (pdf) of fi, the entropy of a feature fi is defined asH(fi)=fip(fi)logp(fi)

UCMIFS—the proposed modified conditional mutual information feature selector with uniform distribution assumption

The idea of using the first and the last features selected into S for calculating the MI proposed in the CMIFS algorithm is inspirational. The first feature f1 is definitely the most significant feature in the selected feature set. Comparing to the MIFS which considering only the paired relationship between the candidate feature and each of the features in the already-selected feature set S, recruiting f1 in the approximation of I(C; fi|S) undoubtedly enhances the precision of estimation.

Features exploited in this study

Two categories of features associated with heart rate variability (HRV) were used in this study to testify the feature selection power of the proposed greedy algorithm in discriminating the congestive heart failure (CHF) from the normal sinus rhythm (NSR). The first category of features contained typical features including personal data and features calculated from the time statistics, Poincare plot, and the frequency-domain distribution of the RR interval (RRI) sequences. The second category

Experimental design

Data for this research were selected from the congestive heart failure (CHF) and normal sinus rhythm (NSR) database, both of which were available on the PhysioNet [25]. Recording from 29 CHF and 54 NSR subjects were selected respectively from the CHF and NSR databases for analysis. The data sampling rate was 128 samples/s. Each record comprised also a beat annotation file which showed the occurring time of the specify R peaks confirmed by specialists.

The HRV sequences were generated by first

Performance of the CHF classifiers using the original feature set

The entire 83 records (29 CHF and 54 NSR) in the database were used in the study. To test the baseline performance of the features and classifier, the entire 50 features were used in the stimulation and the discriminating capability of the classifier was compared to that of two well-known CHF classifiers proposed by Asyali [1] and Isler and Kuntalp [2], respectively, which also differentiated CHF without feature selection. They were referred to as Asyali's and Isler's methods, respectively, in

Discussion

The results in the previous section demonstrated the superiority of the proposed feature selector UCMIFS over the other MI-based feature selectors. To assess the advantages of using UCMIFS, the features selected by different MI-based selectors were compared. By using the leave-one-out procedure, a subset of features that resulted in optimal recognition rate was discovered in each trial. With a total of 83 records, we obtained 83 subsets of optimal features. After statistical analysis of the

Conclusion

This paper proposed a feature selector based on mutual information for congestive heart failure (CHF) recognition based on heart rate variability (HRV). The proposed algorithm UCMIFS took advantage of the conditioned mutual information that extracted information from all the selected features conditioned with the first selected one. The uniform distribution assumption was adopted to simplify the calculation of mutual information and a logarithmic weighting was applied to model the relative

Acknowledgements

This study was supported in part by the grants NSC 97-2220-E-194-010, NSC 98-2220-E-194-003, and NSC 99-2220-E-194-002 from the National Science Council, Taiwan, R.O.C.

References (29)

  • A. Schuman et al.

    Potential of feature selection methods in heart rate variability analysis for the classification of different cardiovascular disease

    Statistics in Medicine

    (2002)
  • M.B. Malarvili et al.

    HRV feature selection based on discriminant and redundancy analysis for neonatal seizure detection

  • N. Kwak et al.

    Input feature selection for classification problems

    IEEE Transactions on Neural Networks

    (2002)
  • H. Cheng et al.

    Conditional mutual information based feature selection

  • Cited by (45)

    • Mutual Information Reveals Non-linear Relationships between Electrocardiographic Conduction or Repolarization Indices and Mechanical Dispersion by Speckle-Tracking Echocardiography in the General Population

      2021, Ultrasound in Medicine and Biology
      Citation Excerpt :

      It can be determined even when the correlation coefficient (e.g., by Pearson, Kendall and Spearman methods) is zero or close to zero. With this advantage of capturing non-linear correlation patterns between variables, mutual information has been successfully used for variable selection in many practical applications such as detection of congestive heart failure using heart rate variability (Yu and Lee 2012), left ventricular ejection fraction estimation (Yang et al. 2015) and classification of medical images (Diamant et al. 2017). In this study, we employed mutual information as a measure to reveal the non-linear correlation pattern between ECG parameters of conduction and/or repolarization and myocardial longitudinal-strain derived parameters in a general population.

    View all citing articles on Scopus
    View full text