Elsevier

Applied Soft Computing

Volume 95, October 2020, 106537
Applied Soft Computing

Feature selection considering Uncertainty Change Ratio of the class label

https://doi.org/10.1016/j.asoc.2020.106537Get rights and content

Highlights

  • We design a new term named UCR.

  • A new method named UCRFS based on UCR is proposed.

  • UCRFS takes the reduced uncertainty of class labels into account.

  • UCRFS takes the remained uncertainty of class labels into account.

  • UCRFS outperforms other seven methods in terms of multiple evaluation criterion.

Abstract

The topic of feature selection in high-dimensional data sets has attracted considerable attention. Feature selection can reduce the dimension of feature and improve the prediction accuracy of the classification model. Information-theoretical-based feature selection methods intend to obtain classification information regarding class labels from the already-selected feature subset as much as possible. Existing methods focus on the reduced uncertainty of class labels while ignoring the change of the remained uncertainty of class labels. In the process of feature selection, the large reduced uncertainty of class labels does not signify the few remained uncertainty of class labels when different candidate features are given. In this paper, we analyze the difference between the reduced uncertainty of class labels and the remained uncertainty of class labels and propose a new term named Uncertainty Change Ratio that considers the change of uncertainty of class labels. Finally, a novel method named Feature Selection considering Uncertainty Change Ratio (UCRFS) is proposed. To prove the classification superiority of the proposed method, UCRFS is compared to three traditional methods and four state-of-the-art methods on fourteen benchmark data sets. The experimental results demonstrate that UCRFS outperforms seven other methods in terms of average classification accuracy, AUC and F1 score.

Introduction

Feature selection is an essential step in data mining and machine learning [1], [2], [3], [4], [5]. In real-world applications, high dimensional feature space leads to some issues in the classification task such as over-fitting, high computational cost and terrible classification performance [6], [7], [8], [9]. Feature selection intends to find the most informative feature subset from the original feature set and eliminate the irrelevant and redundant features, which can effectively reduce the high dimensionality of data sets and improve classification performance [10], [11], [12], [13].

Feature selection methods can be divided into three groups based on the selection strategy: wrapper methods, embedded methods and filter methods [14], [15], [16]. Wrapper methods are dependent on a specific classifier for evaluating candidate features. Embedded methods implement feature selection process simultaneously with the training process of the learning algorithm. Filter methods evaluate feature subsets by the predefined evaluation criteria that are independent of any classifier. Filter methods have attracted more and more attention because they are effective and efficient. In this paper, we focus on filter methods.

There are many metrics for evaluating the feature subset such as correlation-based [17], consistency-based [18] and information theory-based [19]. Among them, information theory is widely applied because it can be used to measure linear and nonlinear relationships among variables. In the past years, a large number of information-theoretical-based feature selection methods have been proposed [20], [21], [22], [23], [24]. Peng et al. [25] propose a minimal-redundancy-maximum-relevance (mRMR) method that employs mutual information to evaluate feature relevance and feature redundancy. Max-Relevance and Max-Independence (MRI) is proposed [26], which uses conditional mutual information to select the most informative features. Let X, Y and Z be three variables. Mutual information I(X;Y) is used to measure the reduced uncertainty of variable X under the condition of Y. Conditional mutual information I(X;Y|Z) measures the amount of the reduction of the uncertainty of X under the condition of Y when the variable Z is given. The purpose of information-theoretical-based feature selection methods is to select the features that maximize the reduction of the uncertainty of class labels and capture the most classification information of class labels. In other words, the optimal feature subset should ensure that the remained uncertainty of class labels is minimum. Existing methods focus on the reduced uncertainty of class labels while ignoring the change of the remained uncertainty of class labels. In the process of feature selection, the large reduced uncertainty of class labels does not signify the few remained uncertainty of class labels when different candidate features are given. To select the informative features for class labels, we propose a new feature selection method in this paper. We highlight the main contributions as follows:

  • (1)

    We analyze and discuss the difference between the reduced uncertainty of class labels and the remained uncertainty of class labels in feature selection process.

  • (2)

    We design a new term named Uncertainty Change Ratio (UCR) that assembles the reduced uncertainty of class labels and the remained uncertainty of class labels using conditional mutual information and conditional entropy.

  • (3)

    A novel Feature Selection method considering Uncertainty Change Ratio (UCRFS) is proposed. UCRFS method combines UCR with the traditional feature relevance term and feature redundancy term to evaluate candidate features.

  • (4)

    To evaluate the classification performance of the proposed method, UCRFS is compared to three traditional methods and four state-of-the-art methods on fourteen benchmark data sets. The experimental results demonstrate that the proposed method achieves better classification performance than other compared feature selection methods.

The rest of this paper is organized as follows. Section 2 introduces some important information theory concepts applied in the process of feature selection. Section 3 reviews the related work. In Section 4, we present the definition of Uncertainty Change Ratio and propose a novel feature selection method (UCRFS). In Section 5, the experimental results are shown to verify the effectiveness of our method. In Section 6, we make a conclusion and look forward to future research direction.

Section snippets

Preliminaries

In this section, some fundamental concepts of information theory used in the process of feature selection are reviewed. Information theory [27], [28] is employed to measure the amount of information of random variables according to the probability distributions of variables. Information entropy of a random variable X={x1,x2,,xn} describes its uncertainty, it is defined by: H(X)=i=1np(xi)logp(xi)where p(xi) is the probability of xi. The base of log is 2, therefore, H(X)0. Y={y1,y2,,ym}. The

Related work

For a given data set DRN×M and the class label YRN×1 where N is the number of instances and M is the number of features, the task of feature selection methods is to select a feature subset with size m related to the class labels, where mM. In information-theoretical-based feature selection methods, the straightforward strategy is Mutual Information Maximization (MIM) [29] that applies mutual information to measure the correlation between each feature and the class label. However, MIM method

Proposed feature selection method

In Section 4.1, we discuss the reduced uncertainty of class labels and the remained uncertainty of class labels under the condition of different features. Then, we define a new term Uncertainty Change Ratio (UCR). In Section 4.2, we present the new feature selection method named Feature Selection considering Uncertainty Change Ratio (UCRFS), and give the pseudo code of UCRFS.

Experimental results and analysis

In this section, the proposed method UCRFS is compared to seven competitive feature selection methods CIFE, JMI, mRMR, IWFS, MRI, JMIM and CFR. Section 5.1 describes the benchmark data sets used in our experiment and details the experimental setting. Section 5.2 analyzes and discusses the experimental results in terms of classification accuracy, AUC and F1 score.

A real-life application

In the study of chemistry, aluminophosphate zeolites (AlPOs) are important in various applications, such as adsorption, separation, and catalysis. Nowadays, AlPOs materials have become extremely important for industrial applications [45], [46]. AlPOs can be divided into two categories: pure AlPOs and the heteroatom-stabilized AlPOs, which can be regarded as the class label in binary-class problem in the area of machine learning. Distinguishing these two types of AlPOs can help us determine

Conclusion and future work

To take both the reduced uncertainty of class labels and the remained uncertainty of class labels into account, a novel feature selection method named Feature Selection considering Uncertainty Change Ratio (UCRFS) is proposed. The primary contribution of our method is that UCRFS introduces a new term UCR that considers the dynamic change of uncertainty of class labels. Finally, UCRFS combines UCR with the traditional feature relevance term and feature redundancy term to select the most

CRediT authorship contribution statement

Ping Zhang: Writing - original draft, Writing - reviewing, Investigation. Wanfu Gao: Conceptualization, Methodology, Supervision, Editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the Postdoctoral Innovative Talents Support Program, China under Grant No. BX20190137;National Key R&D Plan of China under Grant No. 2017YFA0604500, and by National Sci-Tech Support Plan of China under Grant No. 2014BAH02F00, and by National Natural Science Foundation of China under Grant No. 61701190, and by Youth Science Foundation of Jilin Province of China under Grant No. 20160520011JH & 20180520021JH, and by Youth Sci-Tech Innovation Leader and Team Project of

References (52)

  • LiuH. et al.

    Feature Selection for Knowledge Discovery and Data Mining, Vol. 454

    (2012)
  • LiuH. et al.

    Toward integrating feature selection algorithms for classification and clustering

    IEEE Trans. Knowl. Data Eng.

    (2005)
  • P.L. Varela, A. Martins, P. Aguiar, M. Figueiredo, An empirical study of feature selection for sentiment analysis, in:...
  • ZhangK. et al.

    Feature selection for high-dimensional machinery fault diagnosis data using multiple models and radial basis function networks

    Neurocomputing

    (2011)
  • QianW. et al.

    Mutual information criterion for feature selection from incomplete data

    Neurocomputing

    (2015)
  • SaeysY. et al.

    A review of feature selection techniques in bioinformatics

    Bioinformatics

    (2007)
  • M. Kolar, H. Liu, Feature selection in high-dimensional classification, in: International Conference on International...
  • NguyenX.V. et al.

    Effective global approaches for mutual information based feature selection

  • Bagherzadeh-KhiabaniF. et al.

    A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results

    J. Clin. Epidemiol.

    (2016)
  • SongL. et al.

    Feature selection via dependence maximization

    J. Mach. Learn. Res.

    (2012)
  • FreemanC. et al.

    An evaluation of classifier-specific filter measure performance for feature selection

    Pattern Recognit.

    (2015)
  • ShishkinA. et al.

    Efficient high-order interaction-aware feature selection based on conditional mutual information

  • CaiJ. et al.

    Feature selection in machine learning: A new perspective

    Neurocomputing

    (2018)
  • GaoS. et al.

    Variational information maximization for feature selection

  • KohaviR. et al.

    Wrappers for feature subset selection

    Artif. Intell.

    (1997)
  • GuyonI. et al.

    Gene selection for cancer classification using support vector machines

    Mach. Learn.

    (2002)
  • MursalinM. et al.

    Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier

    Neurocomputing

    (2017)
  • DashM. et al.

    Consistency-based search in feature selection

    Artif. Intell.

    (2003)
  • VergaraJ.R. et al.

    A review of feature selection methods based on mutual information

    Neural Comput. Appl.

    (2014)
  • WeiM. et al.

    Heterogeneous feature subset selection using mutual information-based feature transformation

    Neurocomputing

    (2015)
  • PengH. et al.

    Feature selection by optimizing a lower bound of conditional mutual information

    Inform. Sci.

    (2017)
  • HancerE. et al.

    Differential evolution for filter feature selection based on information theory and feature ranking

    Knowl.-Based Syst.

    (2018)
  • SharminS. et al.

    Simultaneous feature selection and discretization based on mutual information

    Pattern Recognit.

    (2019)
  • MacedoF. et al.

    Theoretical foundations of forward feature selection methods based on mutual information

    Neurocomputing

    (2019)
  • PengH. et al.

    Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2005)
  • WangJ. et al.

    Feature selection by maximizing independent classification information

    IEEE Trans. Knowl. Data Eng.

    (2017)
  • Cited by (24)

    • Feature selection using a sinusoidal sequence combined with mutual information

      2023, Engineering Applications of Artificial Intelligence
    • A feature selection method via relevant-redundant weight

      2022, Expert Systems with Applications
      Citation Excerpt :

      Additionally, to address the imbalance between redundant features and the changing class label, a new evaluation criterion named MR-MNCI (Gao, Hu, Zhang, & Wang, 2018) was proposed. Zhang and Gao (2020) defined the uncertainty change ratio considering the uncertainty of dynamically changing classes and introduced a novel feature selection algorithm (uncertainty change ratio-based feature selection, UCRFS). Differing from previous evaluation criteria, preserving similarity and staring decisis (PSSD) (Gao, Hu, Li, & Zhang, 2021) focused on the last already-selected feature to construct feature evaluation criteria, which increased the efficiency of the method.

    • Fuzzy information-theoretic feature selection via relevance, redundancy, and complementarity criteria

      2022, Information Sciences
      Citation Excerpt :

      By summarizing these existing information-theoretic feature selection methods, it can be found that these methods are based on different weighted combinations of feature relevance, redundancy, and complementarity. In addition to the above methods, there are also some similar methods, such as the double input symmetrical relevance (DISR) [15], interaction weight-based feature selection (IWFS) [27], max-relevance and max-independence (MRI) [28], dynamic change of selected feature with the class (DCSF) [29], feature selection considering uncertainty change ratio (UCRFS) [30], and lower bounds of redundancy and complementarity (LBRC) [31] criteria. Although these information-theoretic feature selection methods can be applied to discrete features, they cannot effectively handle continuous features.

    View all citing articles on Scopus
    View full text