Conditional mutual information-based feature selection for congestive heart failure recognition using heart rate variability
Highlights
► We propose a feature selector UCMIFS for congestive heart failure (CHF) recognition. ► UCMIFS is based on the mutual information conditioned by the first-selected feature. ► The performance of UCMIFS is superior to the other MI-based feature selectors. ► UCMIFS selects only 15 features to achieve a high recognition rate of 97.59%.
Introduction
Heart rate variability (HRV) is a widely used tool for studying the role of cardiovascular diseases and affliction influences. Recently, numerous studies have focused on using HRV measurements for diagnosis purpose, especially in recognizing congestive heart failure (CHF) from normal sinus rhythms (NSR) [1], [2], [3]. CHF is an omen of cardiac morbidity, which is a dysfunction of the cardiovascular system that the heart is unable to drain the blood away. CHF usually accompanies with chest tightness, abdominal swelling, and hard breathing. However, the patients usually do not suffer from pain in daily life such that the symptoms may be ignored.
In recent years, numerous methods have been developed to recognize CHF based on HRV [4], [5], [6]. In these studies, different categories of features calculated from long-term HRV sequences were recruited as an attempt to improve the performance of the classifier. This process resulted in increased feature dimensions and elevated computation load. Therefore, it becomes important to select the most representative features from the original feature set such that the recognition rate is retained with considerably reduced feature dimensions.
In practice, the optimal subset of features is usually unknown and it is common to have irrelevant or redundant features at the beginning of the pattern classification tasks. To tackle this problem, two main dimension-reduction approaches, namely feature extraction and feature selection, are usually applied [4]. Feature extraction creates new features based on transformation or weight combination of the original feature set. On the contrary, feature selection refers to methods that select the best subset of features from the original feature set.
Feature selection can be further categorized into filters and wrappers [5]. A filter involves a predefined performance measure which is independent of the subsequent classifier. Alternatively, a wrapper requires a specific learning machine and refers its classification accuracy as a performance measure to search for an optimal feature subset. Although wrappers usually produce better accuracy than filters, they are criticized as being computational extensive and over-fitted only for specific classifiers. Consequently, filters are usually preferred to wrappers.
A number of measures, such as distance [6], [7], correlation [8], and mutual information (MI) [9], have been applied in filters for evaluating the efficacy of a feature. Techniques, such as linear discriminant analysis between features and classes [10], fast correlation-base filter using approximate Markov blanket method for feature relevance calculation [11], and filters using entropy and other information concepts for feature selection [11], are some examples that have successfully applied feature selection in clinical practice. Among them, mutual information (MI) has been reported to be effective in selecting features for a global category of pattern classification problems [9], [11]. The main advantages of using MI as a criterion for feature selection are two folds. Firstly, MI is capable of measuring the relationship among attributes and between attributes and classes which may not be easily characterized by other measures. Secondly, MI is invariant under space transformations. These advantages distinguish MI from other measures. In this study, we tackle the problem about how to improve the approximation of MI in a high-dimensional feature space and how to effectively use MIs as criteria for selecting the most representative features for CHF recognition.
Battiti is one of the major pioneers who applied a greedy algorithm based on MI to select relevant features form the original feature set [9]. The greedy algorithm sequentially selects optimal features from the remaining feature set. The criterion of selecting the next feature is based on maximizing the conditional mutual information between the candidate feature and the class attribute. It is apparent that this process is complicated and computational extensive as the number of features increases. To cope with these problems, Battiti's algorithm, termed mutual information feature selection (MIFS), approximates the conditional MI with the summation of paired MI between the candidate feature and each of the features inside the already-selected feature subset. However, a great deal of information was lost with this approximation.
In view of MIFS's potential in feature selection, several attempts have been made to improve the performance of MIFS. Kwak and Choi [12] assumed uniform distributions in the information of input features and proposed the MIFS-U algorithm that amends the ignorance of the joint probability term in the MIFS. Cheng et al. [13] proposed a conditional mutual information feature selector (CMIFS), which conditioned the calculation of mutual information with the first selected feature. Also included into consideration is the last feature selected just prior to the candidate feature. In this manner, the conditional MI required in the MIFS is more reasonably approximated. Other techniques, including min-redundancy max-relevance (mRMR) [14] and normalized mutual information feature selection (NMIFS) [15], were also proposed to improve the performance of MIFS.
Inspired by MIFS-U and CMIFS, we propose to modify CMIFS and apply the uniform distribution approximation exploited in MIFS-U to simplify the calculation of the conditional MI. The result is a modified conditional mutual information feature selector with uniform distribution assumption (UCMIFS). Considering the significance of the first-selected feature f1 in the greedy algorithm, we also employ the mutual information conditioned by f1 in the approximation. However, different from the CMIFS, all the features in the already-selected feature subset, instead of only the last feature, are considered. Secondly, the uniform distribution assumption is recruited to simplify the calculation. Finally, a weighting parameter represented as a logarithmic function of the number of the already-selected feature subset is added to model the relative importance of individual terms in the calculation of MIs. The original feature set contains typical features, including personal data and features calculated from the time statistics, Poincare plots, and frequency-domain distribution, and features calculated from the third-cumulant spectra (bispectra) [16] of the RR interval (RRI) sequences. In this study, the efficiency of the proposed UCMIFS algorithm in selecting features for CHF recognition is justified and compared to that of other MI-based feature selectors. The performance of the proposed system is also compared to that of the other outstanding CHF classifiers published in the literature.
Section 2 reviews the background knowledge of using MI for feature selection and discusses some of the popular and effective MI-based feature selectors. Section 3 proposes the modified conditional mutual information feature selector with uniform distribution assumption (UCMIFS). Section 4 establishes the original feature set applied in this study. Section 5 demonstrates the experimental results with some critical discussions. Finally, some conclusions are drawn in Section 6.
Section snippets
Entropy and mutual information related to feature selection
The Shannon's information theory provides an approach to quantize the information of random variables with entropy and mutual information (MI). In this section, we summarize the theoretic background required to calculate the MIs between features and between features and classes as quantitative measures for feature selection. Please refer to [9], [14] for details.
Assume p(fi) represents the probability density function (pdf) of fi, the entropy of a feature fi is defined as
UCMIFS—the proposed modified conditional mutual information feature selector with uniform distribution assumption
The idea of using the first and the last features selected into S for calculating the MI proposed in the CMIFS algorithm is inspirational. The first feature f1 is definitely the most significant feature in the selected feature set. Comparing to the MIFS which considering only the paired relationship between the candidate feature and each of the features in the already-selected feature set S, recruiting f1 in the approximation of I(C; fi|S) undoubtedly enhances the precision of estimation.
Features exploited in this study
Two categories of features associated with heart rate variability (HRV) were used in this study to testify the feature selection power of the proposed greedy algorithm in discriminating the congestive heart failure (CHF) from the normal sinus rhythm (NSR). The first category of features contained typical features including personal data and features calculated from the time statistics, Poincare plot, and the frequency-domain distribution of the RR interval (RRI) sequences. The second category
Experimental design
Data for this research were selected from the congestive heart failure (CHF) and normal sinus rhythm (NSR) database, both of which were available on the PhysioNet [25]. Recording from 29 CHF and 54 NSR subjects were selected respectively from the CHF and NSR databases for analysis. The data sampling rate was 128 samples/s. Each record comprised also a beat annotation file which showed the occurring time of the specify R peaks confirmed by specialists.
The HRV sequences were generated by first
Performance of the CHF classifiers using the original feature set
The entire 83 records (29 CHF and 54 NSR) in the database were used in the study. To test the baseline performance of the features and classifier, the entire 50 features were used in the stimulation and the discriminating capability of the classifier was compared to that of two well-known CHF classifiers proposed by Asyali [1] and Isler and Kuntalp [2], respectively, which also differentiated CHF without feature selection. They were referred to as Asyali's and Isler's methods, respectively, in
Discussion
The results in the previous section demonstrated the superiority of the proposed feature selector UCMIFS over the other MI-based feature selectors. To assess the advantages of using UCMIFS, the features selected by different MI-based selectors were compared. By using the leave-one-out procedure, a subset of features that resulted in optimal recognition rate was discovered in each trial. With a total of 83 records, we obtained 83 subsets of optimal features. After statistical analysis of the
Conclusion
This paper proposed a feature selector based on mutual information for congestive heart failure (CHF) recognition based on heart rate variability (HRV). The proposed algorithm UCMIFS took advantage of the conditioned mutual information that extracted information from all the selected features conditioned with the first selected one. The uniform distribution assumption was adopted to simplify the calculation of mutual information and a logarithmic weighting was applied to model the relative
Acknowledgements
This study was supported in part by the grants NSC 97-2220-E-194-010, NSC 98-2220-E-194-003, and NSC 99-2220-E-194-002 from the National Science Council, Taiwan, R.O.C.
References (29)
- et al.
Combining classical HRV indices with wavelet entropy measures improves to performance in diagnosing congestive heart failure
Computers in Biology and Medicine
(2007) - et al.
Feature selection for classification
Intelligence Data Analysis
(1997) - et al.
Quantification of EEG irregularity by use of the entropy of the power spectrum
Electroencephalography and Clinical Neurophysiology
(1991) Discrimination power of long-term heart rate variability measures
- et al.
Discrimination power of long-term heart rate variability measures for chronic heart failure detection
Medical and Biological Engineering and Computing
(2011) - et al.
Statistical pattern recognition: a review
IEEE Transactions on Pattern Analysis and Machine Intelligence
(2000) - et al.
Irrelevant features and the subset selection problem
- et al.
Feature selection from huge feature set
- M. A. Hall, Correlation-base feature selection for machine learning, Ph.D. Dissertation, Department of Computer...
Using mutual information for selecting features in supervised neural net learning
IEEE Transactions on Neural Networks
(1994)
Potential of feature selection methods in heart rate variability analysis for the classification of different cardiovascular disease
Statistics in Medicine
HRV feature selection based on discriminant and redundancy analysis for neonatal seizure detection
Input feature selection for classification problems
IEEE Transactions on Neural Networks
Conditional mutual information based feature selection
Cited by (45)
Evaluation of handcrafted features and learned representations for the classification of arrhythmia and congestive heart failure in ECG
2023, Biomedical Signal Processing and ControlAn efficient feature selection framework based on information theory for high dimensional data
2021, Applied Soft ComputingMutual Information Reveals Non-linear Relationships between Electrocardiographic Conduction or Repolarization Indices and Mechanical Dispersion by Speckle-Tracking Echocardiography in the General Population
2021, Ultrasound in Medicine and BiologyCitation Excerpt :It can be determined even when the correlation coefficient (e.g., by Pearson, Kendall and Spearman methods) is zero or close to zero. With this advantage of capturing non-linear correlation patterns between variables, mutual information has been successfully used for variable selection in many practical applications such as detection of congestive heart failure using heart rate variability (Yu and Lee 2012), left ventricular ejection fraction estimation (Yang et al. 2015) and classification of medical images (Diamant et al. 2017). In this study, we employed mutual information as a measure to reveal the non-linear correlation pattern between ECG parameters of conduction and/or repolarization and myocardial longitudinal-strain derived parameters in a general population.
Detection of congestive heart failure from short-term heart rate variability segments using hybrid feature selection approach
2019, Biomedical Signal Processing and ControlHEARTEN KMS – A knowledge management system targeting the management of patients with heart failure
2019, Journal of Biomedical Informatics