Methods of forward feature selection based on the aggregation of classifiers generated by single attribute
Introduction
The microarray data pose a great challenge on conventional data analysis with a large number of genes and a relatively small number of samples. A lot of studies in microarray data analysis reveal that gene selection or feature selection is more significant than the classification algorithm [1], [2], [3], [4], [5], [6], [7]. The simple algorithm with a proper feature selection may achieve the same even better performance than the complex algorithm. In addition, feature selection can also improve the computational efficiency of the classification algorithm. So feature selection becomes the key issue of microarray data analysis.
According to the direction of feature selection, the existing methods can be mainly divided into two types: backward and forward. In each iteration step, the backward feature selection (BFS) method eliminates the least important features while the forward feature selection (FFS) method selects the most important ones. The BFS method is widely used in feature selection of gene expression data. The famous support vector machine-based recursive feature elimination (SVM-RFE) proposed by Guyon et al. [2] is a representative of BFS methods. Many scholars [5], [6], [7], [8] carried out a number of improvements and extensions for SVM-RFE. In contrast, there are only a few FFS methods, such as signal-to-noise ratio (SNR) method proposed by Golub et al. [1], SVM-IRFS proposed by Zhou et al. [9] and the incremental forward feature selection proposed by Lee et al. [10].
An advantage of FFS method is that it may obtain a desired feature subset via less iteration while BFS method usually needs to iterate quite a few times. This advantage will be more outstanding if the computational efficiency and/or the small size of desired feature subset are more concerned. However, FFS method tends to miss complementary features or select redundant features so that it does not often separate the data well. For example, the famous SNR method can quickly sort the feature importance while the selected features are always redundant and miss the complementary genes so that it is not as good as SVM-RFE [2]. Therefore, a key issue in FFS method is how to select complementary features and remove redundant features.
Martínez-Muñoz and Suáárez [11] proposed a pruning method (MSPM) in bagging classifier ensembles. MSPM belongs to forward method. At each iteration step, it selects a classifier to the desired aggregation, where the classifier added is the best one so that some optimal objective function is obtained by the desired aggregation. Selecting complementary classifiers and removing redundant ones is the main consideration in the design of objective function. MSPM obtained a lower generalization error with fewer classifiers than original bagging classifier ensembles. Since MSPM is a forward method, it can be applied to the forward feature selection problem. Based on this idea, we propose two FFS methods named as FFS-ACSA1 and FFS-ACSA2. In the two FFS methods, we first construct the classifiers based on the ratio of signal to noise on single attribute (here a feature corresponds to a classifier), then apply MSPM to select a good combination of classifiers (i.e. a good feature subset). The difference between the two methods is that FFS-ACSA1 uses Euclidean distance loss function while FFS-ACSA2 uses p-insensitive loss function. To investigate the performance of the two FFS methods, we conduct some comparison experiments of FFS-ACSA1, FFS-ACSA2, SNR and SVM-RFE on four gene expression datasets. The experiment results reveal that FFS-ACSA2 improves the classification performance compared with traditional SNR method and obtain a competitive performance with the famous SVM-RFE.
The rest of this paper is organized as follows. Section 2 briefly introduces the pruning method for bagging classifier ensembles. In Section 3, we present the framework of FFS-ACSA1 and FFS-ACSA2. In Section 4, we carry out some experiments on four gene expression datasets to compare the classification performance of the classic SNR, FFS-ACSA1, FFS-ACSA2 and the famous SVM-RFE. Finally, some conclusions and discussions are given in Section 5.
Section snippets
The aggregation of classifiers
The goal of the classification algorithm is to generate a classification function with a given training dataset so that it can predict the label of unseen sample well. For the sake of simplicity, we only consider the binary classification problem with a given training datasetwhere l is the number of samples, n is the number of the features(attributes).
The classifier ensembles are often helpful to achieve a better classification performance than a single
Forward feature selection based on the aggregation of classifiers generated by single attribute
In this section, we construct two FFS methods based on the pruning of the classifier ensembles where each classifier is generated by single attribute.
Let us consider the feature selection problem for the binary classification with the given training dataset Tdata as (1). The task is to select a feature subset I with good classification performance from the feature set .
In order to apply MSPM, we first construct some classifiers based on single attribute. Let
Datasets
Four public datasets ALL-AML Leukemia (Leukemia), DLBCL, Colon Tumor (Colon) and Duke are used to investigate the performance of SNR, FFS-ACSA1, FFS-ACSA2 and SVM-RFE. Table 1 lists the basic information of these datasets. More details can be found from the source websites and the references therein.
Experimental program
Considering the excellent performance of SVM-RFE on feature selection for gene expression datasets, we use it as an indicator of SNR, FFS-ACSA1 and FFS-ACSA2. In order to accelerate SVM-RFE, we
Conclusions and discussions
In this paper, we first generate the classifiers by the ratio of signal to noise on single attribute. Then we apply the pruning of the aggregation of classifiers to select feature subset. Based on the traditional Euclidean distance loss function and the p-insensitive loss function presented in this paper, we propose two FFS methods: FFS-ACSA1 and FFS-ACSA2. The comparison experiments on four gene expression datasets reveal that FFS-ACSA2 is not only superior to the SNR method, but also achieves
Acknowledgments
This work was partially supported by the Fundamental Research Funds of China for the Central Universities under Grant no. 2010121065 and the Natural Science Foundations of Fujian Province of China under Grant no. 2009J05153.
References (18)
Molecular classification of cancer:class discovery and class prediction by gene expression monitoring
Science
(1999)Gene selection for cancer classification using support vector machines
Machine Learning
(2002)- et al.
Recursive cluster elimination (RCE) for classification and feature selection from gene expression data
BMC Bioinformatics
(2007) - et al.
Improving the computational efficiency of recursive cluster elimination for gene selection
IEEE/ACM Transactions on Computational Biology and Bioinformatics
(2011) Multiple SVM-RFE for gene selection in cancer classification with expression data
IEEE Transactions on Nanobioscience
(2005)- et al.
MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data
Bioinformatics
(2006) - S. Deegalla, H. Boström, Classification of microarrays with kNN: comparison of dimensionality reduction methods, in:...
- et al.
Improving the performance of SVM-RFE to select genes in microarray data
BMC Bioinformatics
(2006) - X. Zhou, X.Y. Wu, K.Z. Mao, P. Tuck David, Fast gene selection for microarray data using SVM-based evaluation...
Cited by (21)
Exploring the landscape of automatic cerebral microbleed detection: A comprehensive review of algorithms, current trends, and future challenges
2023, Expert Systems with ApplicationsEnsemble feature selection using distance-based supervised and unsupervised methods in binary classification
2022, Expert Systems with ApplicationsA hybrid feature selection method based on information theory and binary butterfly optimization algorithm
2021, Engineering Applications of Artificial IntelligenceHeuristic filter feature selection methods for medical datasets
2020, GenomicsCitation Excerpt :Gene selection is the process of selecting the optimum subset of genes (features) that have the most effective impact in classifying the labels [1,2]. Gene selection is specially important when dealing with high dimensional datasets in which the dimension is much more than the samples [3–11]. Generally, the high number of dimensions may decrease the accuracy of classification.
Combining multiple approaches for the early diagnosis of Alzheimer's Disease
2016, Pattern Recognition LettersCitation Excerpt :T-test (Tt) [8]: an approach based on the Student's distribution; ACSA [9]: a forward feature selection algorithm; RFE [10]: the famous SVM-based recursive feature elimination method.