Feature selection considering Uncertainty Change Ratio of the class label

doi:10.1016/j.asoc.2020.106537

Applied Soft Computing

Volume 95, October 2020, 106537

https://doi.org/10.1016/j.asoc.2020.106537 Get rights and content

Highlights

•
We design a new term named UCR.
•
A new method named UCRFS based on UCR is proposed.
•
UCRFS takes the reduced uncertainty of class labels into account.
•
UCRFS takes the remained uncertainty of class labels into account.
•
UCRFS outperforms other seven methods in terms of multiple evaluation criterion.

Abstract

The topic of feature selection in high-dimensional data sets has attracted considerable attention. Feature selection can reduce the dimension of feature and improve the prediction accuracy of the classification model. Information-theoretical-based feature selection methods intend to obtain classification information regarding class labels from the already-selected feature subset as much as possible. Existing methods focus on the reduced uncertainty of class labels while ignoring the change of the remained uncertainty of class labels. In the process of feature selection, the large reduced uncertainty of class labels does not signify the few remained uncertainty of class labels when different candidate features are given. In this paper, we analyze the difference between the reduced uncertainty of class labels and the remained uncertainty of class labels and propose a new term named Uncertainty Change Ratio that considers the change of uncertainty of class labels. Finally, a novel method named Feature Selection considering Uncertainty Change Ratio (UCRFS) is proposed. To prove the classification superiority of the proposed method, UCRFS is compared to three traditional methods and four state-of-the-art methods on fourteen benchmark data sets. The experimental results demonstrate that UCRFS outperforms seven other methods in terms of average classification accuracy, AUC and F1 score.

Introduction

Feature selection is an essential step in data mining and machine learning [1], [2], [3], [4], [5]. In real-world applications, high dimensional feature space leads to some issues in the classification task such as over-fitting, high computational cost and terrible classification performance [6], [7], [8], [9]. Feature selection intends to find the most informative feature subset from the original feature set and eliminate the irrelevant and redundant features, which can effectively reduce the high dimensionality of data sets and improve classification performance [10], [11], [12], [13].

Feature selection methods can be divided into three groups based on the selection strategy: wrapper methods, embedded methods and filter methods [14], [15], [16]. Wrapper methods are dependent on a specific classifier for evaluating candidate features. Embedded methods implement feature selection process simultaneously with the training process of the learning algorithm. Filter methods evaluate feature subsets by the predefined evaluation criteria that are independent of any classifier. Filter methods have attracted more and more attention because they are effective and efficient. In this paper, we focus on filter methods.

There are many metrics for evaluating the feature subset such as correlation-based [17], consistency-based [18] and information theory-based [19]. Among them, information theory is widely applied because it can be used to measure linear and nonlinear relationships among variables. In the past years, a large number of information-theoretical-based feature selection methods have been proposed [20], [21], [22], [23], [24]. Peng et al. [25] propose a minimal-redundancy-maximum-relevance (mRMR) method that employs mutual information to evaluate feature relevance and feature redundancy. Max-Relevance and Max-Independence (MRI) is proposed [26], which uses conditional mutual information to select the most informative features. Let $X$ , $Y$ and $Z$ be three variables. Mutual information $I (X; Y)$ is used to measure the reduced uncertainty of variable $X$ under the condition of $Y$ . Conditional mutual information $I (X; Y | Z)$ measures the amount of the reduction of the uncertainty of $X$ under the condition of $Y$ when the variable $Z$ is given. The purpose of information-theoretical-based feature selection methods is to select the features that maximize the reduction of the uncertainty of class labels and capture the most classification information of class labels. In other words, the optimal feature subset should ensure that the remained uncertainty of class labels is minimum. Existing methods focus on the reduced uncertainty of class labels while ignoring the change of the remained uncertainty of class labels. In the process of feature selection, the large reduced uncertainty of class labels does not signify the few remained uncertainty of class labels when different candidate features are given. To select the informative features for class labels, we propose a new feature selection method in this paper. We highlight the main contributions as follows:

(1)
We analyze and discuss the difference between the reduced uncertainty of class labels and the remained uncertainty of class labels in feature selection process.
(2)
We design a new term named Uncertainty Change Ratio (UCR) that assembles the reduced uncertainty of class labels and the remained uncertainty of class labels using conditional mutual information and conditional entropy.
(3)
A novel Feature Selection method considering Uncertainty Change Ratio (UCRFS) is proposed. UCRFS method combines UCR with the traditional feature relevance term and feature redundancy term to evaluate candidate features.
(4)
To evaluate the classification performance of the proposed method, UCRFS is compared to three traditional methods and four state-of-the-art methods on fourteen benchmark data sets. The experimental results demonstrate that the proposed method achieves better classification performance than other compared feature selection methods.

The rest of this paper is organized as follows. Section 2 introduces some important information theory concepts applied in the process of feature selection. Section 3 reviews the related work. In Section 4, we present the definition of Uncertainty Change Ratio and propose a novel feature selection method (UCRFS). In Section 5, the experimental results are shown to verify the effectiveness of our method. In Section 6, we make a conclusion and look forward to future research direction.

Section snippets

Preliminaries

In this section, some fundamental concepts of information theory used in the process of feature selection are reviewed. Information theory [27], [28] is employed to measure the amount of information of random variables according to the probability distributions of variables. Information entropy of a random variable $X = {x_{1}, x_{2}, \dots, x_{n}}$ describes its uncertainty, it is defined by: $H (X) = - \sum_{i = 1}^{n} p (x_{i}) log p (x_{i})$ where $p (x_{i})$ is the probability of $x_{i}$ . The base of $log$ is 2, therefore, $H (X) \geq 0$ . $Y = {y_{1}, y_{2}, \dots, y_{m}}$ . The

Related work

For a given data set $D \in R^{N \times M}$ and the class label $Y \in R^{N \times 1}$ where $N$ is the number of instances and $M$ is the number of features, the task of feature selection methods is to select a feature subset with size $m$ related to the class labels, where $m ≪ M$ . In information-theoretical-based feature selection methods, the straightforward strategy is Mutual Information Maximization (MIM) [29] that applies mutual information to measure the correlation between each feature and the class label. However, MIM method

Proposed feature selection method

In Section 4.1, we discuss the reduced uncertainty of class labels and the remained uncertainty of class labels under the condition of different features. Then, we define a new term Uncertainty Change Ratio (UCR). In Section 4.2, we present the new feature selection method named Feature Selection considering Uncertainty Change Ratio (UCRFS), and give the pseudo code of UCRFS.

Experimental results and analysis

In this section, the proposed method UCRFS is compared to seven competitive feature selection methods CIFE, JMI, mRMR, IWFS, MRI, JMIM and CFR. Section 5.1 describes the benchmark data sets used in our experiment and details the experimental setting. Section 5.2 analyzes and discusses the experimental results in terms of classification accuracy, AUC and F1 score.

A real-life application

In the study of chemistry, aluminophosphate zeolites (AlPOs) are important in various applications, such as adsorption, separation, and catalysis. Nowadays, AlPOs materials have become extremely important for industrial applications [45], [46]. AlPOs can be divided into two categories: pure AlPOs and the heteroatom-stabilized AlPOs, which can be regarded as the class label in binary-class problem in the area of machine learning. Distinguishing these two types of AlPOs can help us determine

Conclusion and future work

To take both the reduced uncertainty of class labels and the remained uncertainty of class labels into account, a novel feature selection method named Feature Selection considering Uncertainty Change Ratio (UCRFS) is proposed. The primary contribution of our method is that UCRFS introduces a new term UCR that considers the dynamic change of uncertainty of class labels. Finally, UCRFS combines UCR with the traditional feature relevance term and feature redundancy term to select the most

CRediT authorship contribution statement

Ping Zhang: Writing - original draft, Writing - reviewing, Investigation. Wanfu Gao: Conceptualization, Methodology, Supervision, Editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the Postdoctoral Innovative Talents Support Program, China under Grant No. BX20190137;National Key R&D Plan of China under Grant No. 2017YFA0604500, and by National Sci-Tech Support Plan of China under Grant No. 2014BAH02F00, and by National Natural Science Foundation of China under Grant No. 61701190, and by Youth Science Foundation of Jilin Province of China under Grant No. 20160520011JH & 20180520021JH, and by Youth Sci-Tech Innovation Leader and Team Project of

References (52)

LiuH. et al.
Feature Selection for Knowledge Discovery and Data Mining, Vol. 454
(2012)
LiuH. et al.
Toward integrating feature selection algorithms for classification and clustering
IEEE Trans. Knowl. Data Eng.
(2005)
P.L. Varela, A. Martins, P. Aguiar, M. Figueiredo, An empirical study of feature selection for sentiment analysis, in:...
ZhangK. et al.
Feature selection for high-dimensional machinery fault diagnosis data using multiple models and radial basis function networks
Neurocomputing
(2011)
QianW. et al.
Mutual information criterion for feature selection from incomplete data
Neurocomputing
(2015)
SaeysY. et al.
A review of feature selection techniques in bioinformatics
Bioinformatics
(2007)
M. Kolar, H. Liu, Feature selection in high-dimensional classification, in: International Conference on International...
NguyenX.V. et al.
Effective global approaches for mutual information based feature selection
Bagherzadeh-KhiabaniF. et al.
A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results
J. Clin. Epidemiol.
(2016)
SongL. et al.
Feature selection via dependence maximization
J. Mach. Learn. Res.
(2012)

FreemanC. et al.

An evaluation of classifier-specific filter measure performance for feature selection

Pattern Recognit.

(2015)

ShishkinA. et al.

Efficient high-order interaction-aware feature selection based on conditional mutual information

CaiJ. et al.

Feature selection in machine learning: A new perspective

Neurocomputing

(2018)

GaoS. et al.

Variational information maximization for feature selection

KohaviR. et al.

Wrappers for feature subset selection

Artif. Intell.

(1997)

GuyonI. et al.

Gene selection for cancer classification using support vector machines

Mach. Learn.

(2002)

MursalinM. et al.

Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier

Neurocomputing

(2017)

DashM. et al.

Consistency-based search in feature selection

Artif. Intell.

(2003)

VergaraJ.R. et al.

A review of feature selection methods based on mutual information

Neural Comput. Appl.

(2014)

WeiM. et al.

Heterogeneous feature subset selection using mutual information-based feature transformation

Neurocomputing

(2015)

PengH. et al.

Feature selection by optimizing a lower bound of conditional mutual information

Inform. Sci.

(2017)

HancerE. et al.

Differential evolution for filter feature selection based on information theory and feature ranking

Knowl.-Based Syst.

(2018)

SharminS. et al.

Simultaneous feature selection and discretization based on mutual information

Pattern Recognit.

(2019)

MacedoF. et al.

Theoretical foundations of forward feature selection methods based on mutual information

Neurocomputing

(2019)

PengH. et al.

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

IEEE Trans. Pattern Anal. Mach. Intell.

(2005)

WangJ. et al.

Feature selection by maximizing independent classification information

IEEE Trans. Knowl. Data Eng.

(2017)

Cited by (24)

Feature selection using a sinusoidal sequence combined with mutual information
2023, Engineering Applications of Artificial Intelligence
Data classification is the most common task in machine learning, and feature selection is the key step in the classification task. Common feature selection methods mainly analyze the maximum correlation and minimum redundancy between feature factors and tags while ignoring the impact of the number of key features, which will inevitably lead to waste in subsequent classification training. To solve this problem, a feature selection algorithm (SSMI) based on the combination of sinusoidal sequences and mutual information is proposed. First, the mutual information between each feature and tag is calculated, and the interference information in high-dimensional data is removed according to the mutual information value. Second, a sine function is constructed, and sine ordering is carried out according to the mutual information value and feature mean value between different categories of the same feature. By adjusting the period and phase value of the sequence, the feature set with the largest difference is found, and the subset of key features is obtained. Finally, three machine learning classifiers (KNN, RF, SVM) are used to classify key feature subsets, and several feature selection algorithms (JMI, mRMR, CMIM, SFS, etc.) are compared to verify the advantages and disadvantages of different algorithms. Compared with other feature selection methods, the SSMI algorithm obtains the least number of key features, with an average reduction of 15 features. The average classification accuracy has been improved by 3% on the KNN classifier. On the HBV and SDHR datasets, the SSMI algorithm achieved classification accuracy of 81.26% and 83.12%, with sensitivity and specificity results of 76.28%, 87.39% and 68.14%, 86.11%, respectively. This shows that the SSMI algorithm can achieve higher classification accuracy with a smaller feature subset.
Class-specific feature selection via maximal dynamic correlation change and minimal redundancy
2023, Expert Systems with Applications
Information theory has been widely used to evaluate the relevance and redundancy of features in feature selection. The traditional feature selection methods based on information theory focus on classification-specific feature selection. Classification-specific feature selection considers all classes together and therefore may not be able to select a desirable feature subset for a particular class. In contrast, class-specific feature selection can choose an appropriate feature subset for each class. The main goal of this paper is to propose a class-specific feature selection method based on information theory. In this regard, we first develop a class-specific feature evaluation criterion called Class-Specific Maximal Dynamic Correlation Change and Minimal Redundancy (CSMDCCMR), which consists of three items. The first item characterizes the correlation between a candidate feature and a particular class. The second item quantifies the dynamic changes in the relevance of a candidate feature and selected features with respect to a particular class. The third item expresses the redundancy between a candidate feature and selected features with respect to a particular class. We then design a class-specific feature selection algorithm by combining the CSMDCCMR criterion with a sequential forward search strategy, which can be employed to select a suitable feature subset for each class. Finally, we carry out extensive experiments to compare our class-specific feature selection method with nine representative classification-specific feature selection methods based on information theory and six state-of-the-art class-specific feature selection methods. The experimental results show that our class-specific feature selection method achieves better classification performance than these comparison methods.
CSCIM_FS: Cosine similarity coefficient and information measurement criterion-based feature selection method for high-dimensional data
2023, Neurocomputing
Feature selection (FS) based on mutual information (MI) metrics needs to discretize the data in preprocessing, which is a convenient way to identify correlation between features. However, information loss often occurs in data discretization. In order to solve this information loss problem, this paper proposes a FS algorithm based on cosine similarity coefficient and information measurement criterion (CSCIM_FS). First, the MI between features and tags is calculated, and features are sorted out according to the MI calculated. Then, a feature matrix is constructed to transform the one-dimensional feature sequence into a two-dimensional square matrix. Next, cosine transform is adopted to obtain the high-frequency components of the feature matrix, and sampling is conducted to derive the hash fingerprint of the feature matrix. After that, the similarity between every two features is calculated on the basis of the hash fingerprints of different features. Finally, the feature weight is calculated according to tags, the MI and similarity between features, and a key feature subset is obtained and used to conduct feature selection from the data. The experimental results on several UCI public datasets show that CSCIM_FS algorithm selected a feature subset with high accuracy, and that this algorithm performs better than MIM, CMIM, mRMR and other algorithms.
Binarized multi-gate mixture of Bayesian experts for cardiac syndrome X diagnosis: A clinician-in-the-loop scenario with a belief-uncertainty fusion paradigm
2023, Information Fusion
Cardiac Syndrome X (CSX) is a very dangerous cardiovascular disease characterized by angina-like chest discomfort and pain on effort despite normal epicardial coronary arteries at angiography. In this study, we used a CSX dataset from the coronary angiography registry of Tehran’s Heart Center at Tehran University of Medical Sciences in Iran to develop several machine learning (ML) methods combined with uncertainty quantification of the obtained results. Uncertainty quantification plays a significant role in both traditional machine learning (ML) and deep learning (DL) studies allowing researchers to create trustable clinical detection systems. We propose a novel Mixture-of-Experts (MoE) model, called Binarized Multi-Gate Mixture of Bayesian Experts (MoBE), which is an effective ensemble technique for accurately classifying CSX data. The proposed binarized multi-gate model relies on a double quantified uncertainty strategy at the feature selection and decision making stages. First, we use a clinician-in-the-loop scenario with a belief-uncertainty paradigm at the feature selection stage. Second, we use Bayesian neural networks (BNNs) as experts in MoBE and Monte Carlo (MC) dropout for gates at the decision making uncertainty quantification stage. The proposed binarized multi-gate model reaches an accuracy of 85% when applied to our benchmark CSX dataset from Tehran’s Heart Center.
A feature selection method via relevant-redundant weight
2022, Expert Systems with Applications
Citation Excerpt :
Additionally, to address the imbalance between redundant features and the changing class label, a new evaluation criterion named MR-MNCI (Gao, Hu, Zhang, & Wang, 2018) was proposed. Zhang and Gao (2020) defined the uncertainty change ratio considering the uncertainty of dynamically changing classes and introduced a novel feature selection algorithm (uncertainty change ratio-based feature selection, UCRFS). Differing from previous evaluation criteria, preserving similarity and staring decisis (PSSD) (Gao, Hu, Li, & Zhang, 2021) focused on the last already-selected feature to construct feature evaluation criteria, which increased the efficiency of the method.
Feature selection is a crucial preprocessing technique in data mining and machine learning and has attracted increasing attentions. However, the relevance of existing methods only contains limited information. To enhance the classification ability of filter and information theory-based feature selection methods and reduce the redundancy of the selected subset, a relevant-redundant weight-based feature criterion (FSRRW) is proposed. In this paper, a feature relevant-redundant weight (RRW) is constructed to extract the important relevant and redundant information. Then, a novel feature relevance is defined based on the weight, which contains more comprehensive information from the dynamically changing features. Additionally, a feature evaluation criterion is presented via maximizing the feature relevance and minimizing the feature redundancy. The proposed algorithm and seven compared methods are tested on 20 benchmark datasets. Extensive experiments demonstrate that the proposed criterion exhibits better feature screening abilities, effectively facilitates classification, and has preferable applicability and robustness.
Fuzzy information-theoretic feature selection via relevance, redundancy, and complementarity criteria
2022, Information Sciences
Citation Excerpt :
By summarizing these existing information-theoretic feature selection methods, it can be found that these methods are based on different weighted combinations of feature relevance, redundancy, and complementarity. In addition to the above methods, there are also some similar methods, such as the double input symmetrical relevance (DISR) [15], interaction weight-based feature selection (IWFS) [27], max-relevance and max-independence (MRI) [28], dynamic change of selected feature with the class (DCSF) [29], feature selection considering uncertainty change ratio (UCRFS) [30], and lower bounds of redundancy and complementarity (LBRC) [31] criteria. Although these information-theoretic feature selection methods can be applied to discrete features, they cannot effectively handle continuous features.
The concepts of feature relevance, redundancy, and complementarity are very important for identifying optimal feature subsets and designing effective feature evaluation criteria when developing information-theoretic feature selection methods. However, the aforementioned concepts are generally defined based on classical information-theoretic measures that work with discrete features. Therefore, it is very difficult to directly apply them to handle continuous features. In this paper, we investigate the concepts of feature relevance, redundancy, and complementarity based on fuzzy information-theoretic measures to address continuous features. More specifically, we examine some fuzzy information-theoretic measures for any finite number of fuzzy T-equivalence relations. From a conceptual point of view, we present theoretical definitions of feature relevance, redundancy, and complementarity by using these fuzzy information-theoretic measures. According to these theoretical definitions, we introduce a computationally effective feature evaluation criterion that employs a weighting scheme to combine feature relevance, redundancy, and complementarity. We propose a feature selection algorithm by combining the feature evaluation criterion with the sequential forward search strategy. To verify the effectiveness of our method, we compare it with some state-of-the-art feature selection methods through extensive experiments. The experimental results show that our method achieves better performance.

View all citing articles on Scopus

View full text

Feature selection considering Uncertainty Change Ratio of the class label

Highlights

Abstract

Introduction

Section snippets

Preliminaries

Related work

Proposed feature selection method

Experimental results and analysis

A real-life application

Conclusion and future work

CRediT authorship contribution statement

Declaration of Competing Interest

Acknowledgments

Feature Selection for Knowledge Discovery and Data Mining, Vol. 454

Toward integrating feature selection algorithms for classification and clustering

IEEE Trans. Knowl. Data Eng.

Feature selection for high-dimensional machinery fault diagnosis data using multiple models and radial basis function networks

Neurocomputing

Mutual information criterion for feature selection from incomplete data

Neurocomputing

A review of feature selection techniques in bioinformatics

Bioinformatics

Effective global approaches for mutual information based feature selection

A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results

J. Clin. Epidemiol.

Feature selection via dependence maximization

J. Mach. Learn. Res.

An evaluation of classifier-specific filter measure performance for feature selection

Pattern Recognit.

Efficient high-order interaction-aware feature selection based on conditional mutual information

Feature selection in machine learning: A new perspective

Neurocomputing

Variational information maximization for feature selection

Wrappers for feature subset selection

Artif. Intell.

Gene selection for cancer classification using support vector machines

Mach. Learn.

Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier

Neurocomputing

Consistency-based search in feature selection

Artif. Intell.

A review of feature selection methods based on mutual information

Neural Comput. Appl.

Heterogeneous feature subset selection using mutual information-based feature transformation

Neurocomputing

Feature selection by optimizing a lower bound of conditional mutual information

Inform. Sci.

Differential evolution for filter feature selection based on information theory and feature ranking

Knowl.-Based Syst.

Simultaneous feature selection and discretization based on mutual information

Pattern Recognit.

Theoretical foundations of forward feature selection methods based on mutual information

Neurocomputing

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

IEEE Trans. Pattern Anal. Mach. Intell.

Feature selection by maximizing independent classification information

IEEE Trans. Knowl. Data Eng.