Methods of forward feature selection based on the aggregation of classifiers generated by single attribute

doi:10.1016/j.compbiomed.2011.04.005

Computers in Biology and Medicine

Volume 41, Issue 7, July 2011, Pages 435-441

https://doi.org/10.1016/j.compbiomed.2011.04.005 Get rights and content

Abstract

Compared to backward feature selection (BFS) method in gene expression data analysis, forward feature selection (FFS) method can obtain an expected feature subset with less iteration. However, the number of FFS method is considerably less than that of BFS method. More efficient FFS methods need to be developed. In this paper, two FFS methods based on the pruning of the classifier ensembles generated by single attribute are proposed for gene selection. The main contributions are as follows: (1) a new loss function, p-insensitive loss function, is proposed to overcome the disadvantage of the margin Euclidean distance loss function in the pruning of classifier ensembles; (2) two FFS methods based on the margin Euclidean distance loss function and the p-insensitive loss function, named as FFS-ACSA1 and FFS-ACSA2 respectively, are proposed; (3) the comparison experiments on four gene expression datasets show that FFS-ACSA2 obtains the best results among three FFS methods (i.e. signal-to-noise ratio (SNR), FFS-ACSA1 and FFS-ACSA2), and is competitive to the famous support vector machine-based recursive feature elimination (SVM-RFE), while FFS-ACSA1 is unstable.

Introduction

The microarray data pose a great challenge on conventional data analysis with a large number of genes and a relatively small number of samples. A lot of studies in microarray data analysis reveal that gene selection or feature selection is more significant than the classification algorithm [1], [2], [3], [4], [5], [6], [7]. The simple algorithm with a proper feature selection may achieve the same even better performance than the complex algorithm. In addition, feature selection can also improve the computational efficiency of the classification algorithm. So feature selection becomes the key issue of microarray data analysis.

According to the direction of feature selection, the existing methods can be mainly divided into two types: backward and forward. In each iteration step, the backward feature selection (BFS) method eliminates the least important features while the forward feature selection (FFS) method selects the most important ones. The BFS method is widely used in feature selection of gene expression data. The famous support vector machine-based recursive feature elimination (SVM-RFE) proposed by Guyon et al. [2] is a representative of BFS methods. Many scholars [5], [6], [7], [8] carried out a number of improvements and extensions for SVM-RFE. In contrast, there are only a few FFS methods, such as signal-to-noise ratio (SNR) method proposed by Golub et al. [1], SVM-IRFS proposed by Zhou et al. [9] and the incremental forward feature selection proposed by Lee et al. [10].

An advantage of FFS method is that it may obtain a desired feature subset via less iteration while BFS method usually needs to iterate quite a few times. This advantage will be more outstanding if the computational efficiency and/or the small size of desired feature subset are more concerned. However, FFS method tends to miss complementary features or select redundant features so that it does not often separate the data well. For example, the famous SNR method can quickly sort the feature importance while the selected features are always redundant and miss the complementary genes so that it is not as good as SVM-RFE [2]. Therefore, a key issue in FFS method is how to select complementary features and remove redundant features.

Martínez-Muñoz and Suáárez [11] proposed a pruning method (MSPM) in bagging classifier ensembles. MSPM belongs to forward method. At each iteration step, it selects a classifier to the desired aggregation, where the classifier added is the best one so that some optimal objective function is obtained by the desired aggregation. Selecting complementary classifiers and removing redundant ones is the main consideration in the design of objective function. MSPM obtained a lower generalization error with fewer classifiers than original bagging classifier ensembles. Since MSPM is a forward method, it can be applied to the forward feature selection problem. Based on this idea, we propose two FFS methods named as FFS-ACSA1 and FFS-ACSA2. In the two FFS methods, we first construct the classifiers based on the ratio of signal to noise on single attribute (here a feature corresponds to a classifier), then apply MSPM to select a good combination of classifiers (i.e. a good feature subset). The difference between the two methods is that FFS-ACSA1 uses Euclidean distance loss function while FFS-ACSA2 uses p-insensitive loss function. To investigate the performance of the two FFS methods, we conduct some comparison experiments of FFS-ACSA1, FFS-ACSA2, SNR and SVM-RFE on four gene expression datasets. The experiment results reveal that FFS-ACSA2 improves the classification performance compared with traditional SNR method and obtain a competitive performance with the famous SVM-RFE.

The rest of this paper is organized as follows. Section 2 briefly introduces the pruning method for bagging classifier ensembles. In Section 3, we present the framework of FFS-ACSA1 and FFS-ACSA2. In Section 4, we carry out some experiments on four gene expression datasets to compare the classification performance of the classic SNR, FFS-ACSA1, FFS-ACSA2 and the famous SVM-RFE. Finally, some conclusions and discussions are given in Section 5.

Section snippets

The aggregation of classifiers

The goal of the classification algorithm is to generate a classification function with a given training dataset so that it can predict the label of unseen sample well. For the sake of simplicity, we only consider the binary classification problem with a given training dataset $T d a t a = {(x_{i}, y_{i}) | x_{i} \in R^{n}, y_{i} \in {1, - 1}, i = 1, 2, \dots, l},$ where l is the number of samples, n is the number of the features(attributes).

The classifier ensembles are often helpful to achieve a better classification performance than a single

Forward feature selection based on the aggregation of classifiers generated by single attribute

In this section, we construct two FFS methods based on the pruning of the classifier ensembles where each classifier is generated by single attribute.

Let us consider the feature selection problem for the binary classification with the given training dataset Tdata as (1). The task is to select a feature subset I with good classification performance from the feature set ${1, 2, \dots, n}$ .

In order to apply MSPM, we first construct some classifiers based on single attribute. Let $s_{i} = \frac{μ_{+}^{(i)} - μ_{-}^{(i)}}{σ_{+}^{(i)} + σ_{-}^{(i)}},$

Datasets

Four public datasets ALL-AML Leukemia (Leukemia), DLBCL, Colon Tumor (Colon) and Duke are used to investigate the performance of SNR, FFS-ACSA1, FFS-ACSA2 and SVM-RFE. Table 1 lists the basic information of these datasets. More details can be found from the source websites and the references therein.

Experimental program

Considering the excellent performance of SVM-RFE on feature selection for gene expression datasets, we use it as an indicator of SNR, FFS-ACSA1 and FFS-ACSA2. In order to accelerate SVM-RFE, we

Conclusions and discussions

In this paper, we first generate the classifiers by the ratio of signal to noise on single attribute. Then we apply the pruning of the aggregation of classifiers to select feature subset. Based on the traditional Euclidean distance loss function and the p-insensitive loss function presented in this paper, we propose two FFS methods: FFS-ACSA1 and FFS-ACSA2. The comparison experiments on four gene expression datasets reveal that FFS-ACSA2 is not only superior to the SNR method, but also achieves

Acknowledgments

This work was partially supported by the Fundamental Research Funds of China for the Central Universities under Grant no. 2010121065 and the Natural Science Foundations of Fujian Province of China under Grant no. 2009J05153.

References (18)

T.R. Golub
Molecular classification of cancer:class discovery and class prediction by gene expression monitoring
Science
(1999)
I. Guyon
Gene selection for cancer classification using support vector machines
Machine Learning
(2002)
M. Yousef et al.
Recursive cluster elimination (RCE) for classification and feature selection from gene expression data
BMC Bioinformatics
(2007)
L.K. Luo et al.
Improving the computational efficiency of recursive cluster elimination for gene selection
IEEE/ACM Transactions on Computational Biology and Bioinformatics
(2011)
K.B. Duan
Multiple SVM-RFE for gene selection in cancer classification with expression data
IEEE Transactions on Nanobioscience
(2005)
X. Zhou et al.
MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data
Bioinformatics
(2006)
S. Deegalla, H. Boström, Classification of microarrays with kNN: comparison of dimensionality reduction methods, in:...
Y. Ding et al.
Improving the performance of SVM-RFE to select genes in microarray data
BMC Bioinformatics
(2006)
X. Zhou, X.Y. Wu, K.Z. Mao, P. Tuck David, Fast gene selection for microarray data using SVM-based evaluation...

There are more references available in the full text version of this article.

Cited by (21)

Exploring the landscape of automatic cerebral microbleed detection: A comprehensive review of algorithms, current trends, and future challenges
2023, Expert Systems with Applications
This paper provides the first review to date which gathers, describes, and assesses, to the best of our knowledge, all available publications on automating cerebral microbleed (CMB) detection. It provides insights into the current state of the art and highlights the challenges and opportunities in this topic. By incorporating the best practices identified in this review, we established guidelines for the development of CMB detection systems. We are confident that these guidelines can serve as a foundation for further research.
CMB detection is a crucial but challenging task that can be laborious for radiologists. With the increasing popularity of magnetic resonance imaging (MRI), the ability to detect CMBs has improved, but there is still a need to automate this process to enhance its efficiency and accuracy. A high prevalence of CMBs is closely associated with cognitive dysfunction, diabetes, hypertension, an increased risk of stroke, and intracerebral hemorrhage. It is alarming to note that strokes, Alzheimer’s disease, and Diabetes mellitus have secured their position as the second, seventh, and ninth most common causes of death worldwide, respectively. Moreover, CMBs are sometimes found in association with other pathologies and indicate a range of pathological processes in the cerebral vessels. Thus, it is essential to enhance the quality of diagnostics to facilitate prompt identification and treatment of these potentially life-threatening conditions.
In this paper, we aimed to systematize the existing knowledge and best practices in automatic CMB detection, from fundamental information about CMBs and MRI image data, through employed datasets and CMB detection and verification algorithms, to methods of result evaluation. This can serve as a starting point for future research and the development of a CMB detection system that is practically applicable in medicine, leading to enhanced patient treatment outcomes.
Ensemble feature selection using distance-based supervised and unsupervised methods in binary classification
2022, Expert Systems with Applications
Feature selection refers to the problem of finding the optimal subset of features by removing irrelevant and redundant features to improve classification accuracy. The determination of the most effective distance measures to evaluate the relevance and redundancy of features has not been investigated precisely to date. Moreover, the relation between relevancy and redundancy is still uncertain. This paper presents a novel relevancy-redundancy measurement based on distance applying the idea of the mRMR criteria to an unsupervised method. In addition, a supervised method is proposed, in which the features are ranked in terms of the distance between each pair of samples in different classes of the feature vector. Then an ensemble of the proposed supervised and unsupervised methods is applied to choose the most relevant features subset. This study investigates and compares the effects of 24 distance measures selected from five major families of distance functions on the performance of the proposed feature selection methods. The highest-ranked features are selected using an empirically achieved threshold. To evaluate the selected features, three classifiers, i.e., Decision Tree, Support Vector Machine and Naive Bayes were applied to biomedical datasets representing binary problems from the UCI data repository. The experimental results demonstrate the superiority of the proposed methods over the state-of-the-art and also classical feature selection ones in terms of improving stability, classification accuracy, Recall (Sensitivity), Precision, F-measure, and Specificity.
A hybrid feature selection method based on information theory and binary butterfly optimization algorithm
2021, Engineering Applications of Artificial Intelligence
Feature selection is the problem of finding the optimal subset of features for predicting class labels by removing irrelevant or redundant features. S-shaped Binary Butterfly Optimization Algorithm (S-bBOA) is a nature-inspired algorithm for solving the feature selection problems. The evidence shows that S-bBOA has a better performance in exploration, exploitation, convergence, and avoidance of getting stuck in local optimal compared to other optimization algorithms. However, S-bBOA does not consider redundancy and relevancy of features. This paper proposes Information Gain binary Butterfly Optimization Algorithm (IG-bBOA), to overcome the S-bBOA constraints firstly. IG-bBOA maximizes both the classification accuracy and the mean of the mutual information between features and class labels. In addition, IG-bBOA also tries to minimize the number of selected features and is used within a three-phase proposed method called Ensemble Information Theory based binary Butterfly Optimization Algorithm (EIT-bBOA). In the first phase, 80% of irrelevant and redundant features are removed using Minimal Redundancy-Maximal New Classification Information (MR-MNCI) feature selection. In the second phase, the best feature subset is selected using IG-bBOA. Finally, a similarity based ranking method is used to select the final features subset. The experimental results are conducted using six standard datasets from UCI repository. The findings confirm the efficiency of the proposed method in improving the classification accuracy and selecting the best optimal features subset with minimum number of feature in most cases.
Heuristic filter feature selection methods for medical datasets
2020, Genomics
Citation Excerpt :
Gene selection is the process of selecting the optimum subset of genes (features) that have the most effective impact in classifying the labels [1,2]. Gene selection is specially important when dealing with high dimensional datasets in which the dimension is much more than the samples [3–11]. Generally, the high number of dimensions may decrease the accuracy of classification.
Gene selection is the process of selecting the optimal feature subset in an arbitrary dataset. The significance of gene selection is in high dimensional datasets in which the number of samples and features are low and high respectively. The major goals of gene selection are increasing the accuracy, finding the minimal effective feature subset, and increasing the performance of evaluations. This paper proposed two heuristic methods for gene selection, namely, Xvariance against Mutual Congestion. Xvariance tries to classify labels using internal attributes of features however Mutual Congestion is frequency based. The proposed methods have been conducted on eight binary medical datasets. Results reveal that Xvariance works well with standard datasets, however Mutual Congestion improves the accuracy of high dimensional datasets considerably.
Frequency based feature selection method using whale algorithm
2019, Genomics
Feature selection is the problem of finding the best subset of features which have the most impact in predicting class labels. It is noteworthy that application of feature selection is more valuable in high dimensional datasets. In this paper, a filter feature selection method has been proposed on high dimensional binary medical datasets – Colon, Central Nervous System (CNS), GLI_85, SMK_CAN_187. The proposed method incorporates three sections. First, whale algorithm has been used to discard irrelevant features. Second, the rest of features are ranked based on a frequency based heuristic approach called Mutual Congestion. Third, majority voting has been applied on best feature subsets constructed using forward feature selection with threshold τ = 10. This work provides evidence that Mutual Congestion is solely powerful to predict class labels. Furthermore, applying whale algorithm increases the overall accuracy of Mutual Congestion in most of the cases. The findings also show that the proposed method improves the prediction with selecting the less possible features in comparison with state of the arts.
https://github.com/hnematzadeh
Combining multiple approaches for the early diagnosis of Alzheimer's Disease
2016, Pattern Recognition Letters
Citation Excerpt :
T-test (Tt) [8]: an approach based on the Student's distribution; ACSA [9]: a forward feature selection algorithm; RFE [10]: the famous SVM-based recursive feature elimination method.
One of the current challenges in Alzheimer's Disease (AD)-related research is to achieve an early and definite diagnosis. Automatic classification of AD is typically based on the use of feature vectors of high dimensionality, containing few training patterns, which leads to the curse-of-dimensionality problem. It is indispensable to find good approaches for selecting a subset of the original set of features. In this work, a method to perform early diagnosis of AD is proposed, combining different feature reduction approaches on both brain MRI studies and expression values of blood plasma proteins. Each selected set of features is used to train a Support Vector Machine (SVM), then the set of SVM is combined by weighted sum rule. Moreover, a novel approach for considering the feature vector as an image is proposed, different texture descriptors are extracted from the image and used to train a SVM. The superior performance of the proposed system is obtained without any ad hoc parameter optimization (i.e., the same ensemble of classifiers and the same parameter settings are used in all datasets). The MATLAB code for the ensemble of classifiers will be publicly available³ to other researchers for future comparisons.

View all citing articles on Scopus

View full text

Methods of forward feature selection based on the aggregation of classifiers generated by single attribute

Abstract

Introduction

Section snippets

The aggregation of classifiers

Forward feature selection based on the aggregation of classifiers generated by single attribute

Datasets

Experimental program

Conclusions and discussions

Acknowledgments

Molecular classification of cancer:class discovery and class prediction by gene expression monitoring

Science

Gene selection for cancer classification using support vector machines

Machine Learning

Recursive cluster elimination (RCE) for classification and feature selection from gene expression data

BMC Bioinformatics

Improving the computational efficiency of recursive cluster elimination for gene selection

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Multiple SVM-RFE for gene selection in cancer classification with expression data

IEEE Transactions on Nanobioscience

MSVM-RFE: extensions of SVM-RFE for multiclass gene selection on DNA microarray data

Bioinformatics

Improving the performance of SVM-RFE to select genes in microarray data

BMC Bioinformatics