Methods of forward feature selection based on the aggregation of classifiers generated by single attribute

https://doi.org/10.1016/j.compbiomed.2011.04.005Get rights and content

Abstract

Compared to backward feature selection (BFS) method in gene expression data analysis, forward feature selection (FFS) method can obtain an expected feature subset with less iteration. However, the number of FFS method is considerably less than that of BFS method. More efficient FFS methods need to be developed. In this paper, two FFS methods based on the pruning of the classifier ensembles generated by single attribute are proposed for gene selection. The main contributions are as follows: (1) a new loss function, p-insensitive loss function, is proposed to overcome the disadvantage of the margin Euclidean distance loss function in the pruning of classifier ensembles; (2) two FFS methods based on the margin Euclidean distance loss function and the p-insensitive loss function, named as FFS-ACSA1 and FFS-ACSA2 respectively, are proposed; (3) the comparison experiments on four gene expression datasets show that FFS-ACSA2 obtains the best results among three FFS methods (i.e. signal-to-noise ratio (SNR), FFS-ACSA1 and FFS-ACSA2), and is competitive to the famous support vector machine-based recursive feature elimination (SVM-RFE), while FFS-ACSA1 is unstable.

Introduction

The microarray data pose a great challenge on conventional data analysis with a large number of genes and a relatively small number of samples. A lot of studies in microarray data analysis reveal that gene selection or feature selection is more significant than the classification algorithm [1], [2], [3], [4], [5], [6], [7]. The simple algorithm with a proper feature selection may achieve the same even better performance than the complex algorithm. In addition, feature selection can also improve the computational efficiency of the classification algorithm. So feature selection becomes the key issue of microarray data analysis.

According to the direction of feature selection, the existing methods can be mainly divided into two types: backward and forward. In each iteration step, the backward feature selection (BFS) method eliminates the least important features while the forward feature selection (FFS) method selects the most important ones. The BFS method is widely used in feature selection of gene expression data. The famous support vector machine-based recursive feature elimination (SVM-RFE) proposed by Guyon et al. [2] is a representative of BFS methods. Many scholars [5], [6], [7], [8] carried out a number of improvements and extensions for SVM-RFE. In contrast, there are only a few FFS methods, such as signal-to-noise ratio (SNR) method proposed by Golub et al. [1], SVM-IRFS proposed by Zhou et al. [9] and the incremental forward feature selection proposed by Lee et al. [10].

An advantage of FFS method is that it may obtain a desired feature subset via less iteration while BFS method usually needs to iterate quite a few times. This advantage will be more outstanding if the computational efficiency and/or the small size of desired feature subset are more concerned. However, FFS method tends to miss complementary features or select redundant features so that it does not often separate the data well. For example, the famous SNR method can quickly sort the feature importance while the selected features are always redundant and miss the complementary genes so that it is not as good as SVM-RFE [2]. Therefore, a key issue in FFS method is how to select complementary features and remove redundant features.

Martínez-Muñoz and Suáárez [11] proposed a pruning method (MSPM) in bagging classifier ensembles. MSPM belongs to forward method. At each iteration step, it selects a classifier to the desired aggregation, where the classifier added is the best one so that some optimal objective function is obtained by the desired aggregation. Selecting complementary classifiers and removing redundant ones is the main consideration in the design of objective function. MSPM obtained a lower generalization error with fewer classifiers than original bagging classifier ensembles. Since MSPM is a forward method, it can be applied to the forward feature selection problem. Based on this idea, we propose two FFS methods named as FFS-ACSA1 and FFS-ACSA2. In the two FFS methods, we first construct the classifiers based on the ratio of signal to noise on single attribute (here a feature corresponds to a classifier), then apply MSPM to select a good combination of classifiers (i.e. a good feature subset). The difference between the two methods is that FFS-ACSA1 uses Euclidean distance loss function while FFS-ACSA2 uses p-insensitive loss function. To investigate the performance of the two FFS methods, we conduct some comparison experiments of FFS-ACSA1, FFS-ACSA2, SNR and SVM-RFE on four gene expression datasets. The experiment results reveal that FFS-ACSA2 improves the classification performance compared with traditional SNR method and obtain a competitive performance with the famous SVM-RFE.

The rest of this paper is organized as follows. Section 2 briefly introduces the pruning method for bagging classifier ensembles. In Section 3, we present the framework of FFS-ACSA1 and FFS-ACSA2. In Section 4, we carry out some experiments on four gene expression datasets to compare the classification performance of the classic SNR, FFS-ACSA1, FFS-ACSA2 and the famous SVM-RFE. Finally, some conclusions and discussions are given in Section 5.

Section snippets

The aggregation of classifiers

The goal of the classification algorithm is to generate a classification function with a given training dataset so that it can predict the label of unseen sample well. For the sake of simplicity, we only consider the binary classification problem with a given training datasetTdata={(xi,yi)|xiRn,yi{1,1},i=1,2,,l},where l is the number of samples, n is the number of the features(attributes).

The classifier ensembles are often helpful to achieve a better classification performance than a single

Forward feature selection based on the aggregation of classifiers generated by single attribute

In this section, we construct two FFS methods based on the pruning of the classifier ensembles where each classifier is generated by single attribute.

Let us consider the feature selection problem for the binary classification with the given training dataset Tdata as (1). The task is to select a feature subset I with good classification performance from the feature set {1,2,,n}.

In order to apply MSPM, we first construct some classifiers based on single attribute. Letsi=μ+(i)μ(i)σ+(i)+σ(i),

Datasets

Four public datasets ALL-AML Leukemia (Leukemia), DLBCL, Colon Tumor (Colon) and Duke are used to investigate the performance of SNR, FFS-ACSA1, FFS-ACSA2 and SVM-RFE. Table 1 lists the basic information of these datasets. More details can be found from the source websites and the references therein.

Experimental program

Considering the excellent performance of SVM-RFE on feature selection for gene expression datasets, we use it as an indicator of SNR, FFS-ACSA1 and FFS-ACSA2. In order to accelerate SVM-RFE, we

Conclusions and discussions

In this paper, we first generate the classifiers by the ratio of signal to noise on single attribute. Then we apply the pruning of the aggregation of classifiers to select feature subset. Based on the traditional Euclidean distance loss function and the p-insensitive loss function presented in this paper, we propose two FFS methods: FFS-ACSA1 and FFS-ACSA2. The comparison experiments on four gene expression datasets reveal that FFS-ACSA2 is not only superior to the SNR method, but also achieves

Acknowledgments

This work was partially supported by the Fundamental Research Funds of China for the Central Universities under Grant no. 2010121065 and the Natural Science Foundations of Fujian Province of China under Grant no. 2009J05153.

References (18)

  • T.R. Golub

    Molecular classification of cancer:class discovery and class prediction by gene expression monitoring

    Science

    (1999)
  • I. Guyon

    Gene selection for cancer classification using support vector machines

    Machine Learning

    (2002)
  • M. Yousef et al.

    Recursive cluster elimination (RCE) for classification and feature selection from gene expression data

    BMC Bioinformatics

    (2007)
  • L.K. Luo et al.

    Improving the computational efficiency of recursive cluster elimination for gene selection

    IEEE/ACM Transactions on Computational Biology and Bioinformatics

    (2011)
  • K.B. Duan

    Multiple SVM-RFE for gene selection in cancer classification with expression data

    IEEE Transactions on Nanobioscience

    (2005)
  • X. Zhou et al.

    MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data

    Bioinformatics

    (2006)
  • S. Deegalla, H. Boström, Classification of microarrays with kNN: comparison of dimensionality reduction methods, in:...
  • Y. Ding et al.

    Improving the performance of SVM-RFE to select genes in microarray data

    BMC Bioinformatics

    (2006)
  • X. Zhou, X.Y. Wu, K.Z. Mao, P. Tuck David, Fast gene selection for microarray data using SVM-based evaluation...
There are more references available in the full text version of this article.

Cited by (21)

  • Heuristic filter feature selection methods for medical datasets

    2020, Genomics
    Citation Excerpt :

    Gene selection is the process of selecting the optimum subset of genes (features) that have the most effective impact in classifying the labels [1,2]. Gene selection is specially important when dealing with high dimensional datasets in which the dimension is much more than the samples [3–11]. Generally, the high number of dimensions may decrease the accuracy of classification.

  • Combining multiple approaches for the early diagnosis of Alzheimer's Disease

    2016, Pattern Recognition Letters
    Citation Excerpt :

    T-test (Tt) [8]: an approach based on the Student's distribution; ACSA [9]: a forward feature selection algorithm; RFE [10]: the famous SVM-based recursive feature elimination method.

View all citing articles on Scopus
View full text