Feature selection methods on gene expression microarray data for cancer classification: A systematic review

doi:10.1016/j.compbiomed.2021.105051

Computers in Biology and Medicine

Volume 140, January 2022, 105051

https://doi.org/10.1016/j.compbiomed.2021.105051 Get rights and content

Abstract

This systematic review provides researchers interested in feature selection (FS) for processing microarray data with comprehensive information about the main research directions for gene expression classification conducted during the recent seven years. A set of 132 researches published by three different publishers is reviewed. The studied papers are categorized into nine directions based on their objectives. The FS directions that received various levels of attention were then summarized. The review revealed that ‘propose hybrid FS methods’ represented the most interesting research direction with a percentage of 34.9%, while the other directions have lower percentages that ranged from 13.6% down to 3%. This guides researchers to select the most competitive research direction. Papers in each category are thoroughly reviewed based on six perspectives, mainly: method(s), classifier(s), dataset(s), dataset dimension(s) range, performance metric(s), and result(s) achieved.

Introduction

Nowadays, classifying microarray datasets, which are called (large-scale biological data analysis), is a popular and attractive area of study for many researchers, as applying microarray technology is one of the most important applications in molecular biology for cancer detection [1]. It depends on developing more effective classification models that can be used for classifying any unseen microarray data after training the model over a specific training dataset. Detecting and classifying cancer, using microarray gene expressed data, have posed a huge challenge for researchers in the field of computer science, as this kind of datasets contains a small number of examples versus a huge number of genes. However, many of these genes are considered irrelevant or redundant, and they must be removed by using an efficient FS method for improving the performance of classification. Therefore, researchers have employed much effort in coming up with more effective FS techniques that can increase classification's accuracy and decrease the computation time using a smaller number of genes in diagnostic and prognostic prediction of tumor cancer [2].

FS is the process of selecting the most relevant and efficient features for improving classification's performance in high dimensional datasets [3]. Filters, wrappers, embedded, and hybrid methods are the main types of FS methods, but there is a new kind of FS methods that has been recently developed, which is called ensemble [4].

Filters use statistical measures for evaluating features against a class label. There are two categories of filters, mainly: ranking-based (univariate) and search-space-based (multivariate). The first category selects features that have higher ranks based on a specific threshold value, where ranks are provided according to the relationships between each feature and the specified class label for removing the most irrelevant features. In contrast, the second category takes care of the relationships within features. Therefore, it can remove irrelevant features in addition to redundant ones [5].

Wrappers depend on a classifier evaluation to select features as it selects feature sets that satisfy best results based on a fitness value for a classifier. This kind of methods consists of three parts: search algorithm, classifier, and fitness function [6], while embedded methods select features that automatically improve classification performance as a part of the learning stage [2]. See Fig. 1.

Ensemble methods have recently appeared for obtaining stability which does not exist in many FS methods. This will be achieved by aggregating the results of different feature subsets which were generated either by using the same FS method on various training data (homogeneous ensemble FS approach) or by applying various FS methods over the same training data (heterogeneous ensemble FS approach) [7]. See Fig. 1 (d and e).

Each approach has its negative and positive properties. For example, filter-based FS methods need less computational complexity and avoid over-fitting problems more than wrapper and embedded-based FS methods. In contrast, wrappers and embedded-based FS methods provide better accuracy than filters. Hybrid methods have advantages of both wrappers and filters as they result in achieving better accuracy than filters, and need less computational cost than wrappers. Ensemble methods are the most flexible of FS methods in high dimensional data, and are the least prone to over-fitting problems [2].

In literature, there are many reviews that concern FS in microarray data subjects, such as the works of [[8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23]], but to the best of our knowledge, there are no reviews done on categorizing these studies based on the main purpose of the reviewed papers.

In this systematic review, we categorize studies done recently on FS for microarray data processing into nine directions based on the main objectives of each research. Where some of these studies developed new FS methods which are (filter, wrapper, embedded, ensemble, distributed, parallel), other studies selected some FSM from the literature and compared their performance over the same environment. On the other hand, there are some researchers who conducted surveys on the subjects of FS.

Each research in each direction was reviewed based on methods, datasets, classifiers, performance metrics, datasets dimension's range used, and results obtained. Finally, we summarized some of the general observations that we noticed along with our conclusions.

The main contributions of this systematic review can be summarized as follows:

●
It analyses 132 research papers from Elsevier, Springer, and IEEE publishers in the literature published in the last seven years about FS in microarray data processing according to their main objectives and categorizes these researches into nine main directions.
●
It summarizes FS directions that received the highest, the middle, and the lowest research attention in the recent seven years for guiding researchers who plan to pursue research on FS for microarray data processing to choose their research direction.
●
In each direction, each research paper is reviewed based on methods, datasets, classifiers, performance metrics, datasets dimensions range used, and results obtained. This aims to provide researchers with valuable information about the related work of their selected directions if they have already specified their research direction in FS for microarray data classification field as displayed in Fig. 2

The rest of this paper is organized as follows: Section 2 presents a background. Section 3 displays the methodology. Section 4 displays and discusses the main nine directions of FS in microarray data researches in the recent seven years that we proposed in this survey. The development of FS publications over the recent seven years is displayed in Section 5. In Section 6, the main observations are identified, and in Section 7, the conclusion and future work are presented.

Section snippets

Gene expression microarray data

It is a structured medical data, where features in gene expression microarray datasets represent gene expression coefficients in samples for each instance that represents a patient. Usually, microarray datasets are highly dimensional, as they contain a huge number of features versus a small number of samples [24].

Importance of FS to microarray datasets analysis

Detecting cancer-infected genes and normal healthy genes from the microarray dataset is challenging in high dimensional microarray datasets which contain many redundant and irrelevant

Methodology

This survey was conducted by applying five main steps as summarized in Fig. 2:

-
At the beginning, we collected 132 papers done on FS in microarray data processing from three popular publishers (Elsevier, Springer, and IEEE) during the last seven years.
-
These papers were analyzed and distributed into nine main directions based on their main objectives as new FS methods do not exist in the literature. Surveys were conducted and performance was compared based on some of the existing FS methods.
-
Each

FS in microarray data researches main directions

In the literature, there are many studies done on FS for microarray data processing which we can categorize into nine main research directions as illustrated in Fig. 3. In each direction, papers were grouped and connected based on two levels. At the first level, these papers were arranged in the provided order presented in the presentation according to specific criteria that varies from direction to another (for example, D3 and D5 connected papers that used the same meta-heuristic algorithm

Development of feature selection publications over the recent seven years

By tracking the development of research done in the recent seven years over all the proposed directions that we discussed in this survey, it can be noted that there is an increase in the number of research papers published in the last four years, especially the years of 2018 and 2019 compared to years before 2018, and this is noted for most directions.

We noticed that there are more works published in relation to D1, D2, D3, D5, D6, and D8 in the recent three years (2018–2021) than works

Observations and analyses

In this systematic review, we investigated 132 research papers available from three famous publishers (Elsevier, Springer, and IEEE) about FS for microarray data processing during the last seven years. We observed that researchers focused on the fifth direction (D5) “Hybrid FSM”, as it constituted 34.9% of the researches that were examined. We believe that the reason for this high percentage could be due to the fact that hybrid methods generally improve classification accuracy without causing

Conclusion and future work

We examined all papers concerned FS field for microarray data processing during the recent seven years, which were published in three famous publishers (Elsevier, Springer, and IEEE). We found that 38% of these papers are published in Springer. The reviewed papers were categorized based on their main purposes into nine directions, then they were summarized according to what studies received the most, the middle, and the least research attention in all 132 papers that were reviewed in this

CRediT author statement

Esra'a Alhenawi: Methodology, Writing- Original draft preparation, validation. Rizik Al-Sayyed: Supervision, Writing - review & editing, Visualization, Validation. Amjad Hudaib: Supervision, Writing - review, Validation. Seyedali Mirjalili: Supervision, Review & editing, Visualization, Validation.

Declaration of competing interest

The authors whose names are listed immediately below certify that they have NO affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers’ bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs) in

Acknowledgements

We would like to thank all people who provided a technical help, assisted in reviewing and editing the language of the manuscript, and to those who offered general support and useful comments regarding this manuscript.

Esra'a Alhenawi ([email protected]) is currently a PhD candidate, The University of Jordan, King Abdullah II School for Information Technology, Department of Computer Science.

References (165)

B. Remeseiro et al.
A review of feature selection methods in medical applications
Comput. Biol. Med.
(2019)
G. Chandrashekar et al.
A survey on feature selection methods
Comput. Electr. Eng.
(2014)
V. Bolón-Canedo et al.
Distributed feature selection: an application to microarray data classification
Appl. Soft Comput.
(2015)
B. Seijo-Pardo et al.
Ensemble feature selection: homogeneous and heterogeneous approaches
Knowl. Base Syst.
(2017)
R.K. Singh et al.
Feature selection of gene expression data for cancer classification: a review
Procedia Computer Science
(2015)
V. Bolón-Canedo et al.
An ensemble of filters and classifiers for microarray data classification
Pattern Recogn.
(2012)
R. Xu et al.
Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps
Artif. Intell. Med.
(2010)
J. Zhao et al.
Locality sensitive semi-supervised feature selection
Neurocomputing
(2008)
M. Mandal et al.
An improved minimum redundancy maximum relevance approach for feature selection in gene expression data
Procedia Technology
(2013)
I.A. Gheyas et al.
Feature subset selection in large dimensionality domains
Pattern Recogn.
(2010)

X. Gao et al.

A novel effective diagnosis model based on optimized least squares support machine for gene microarray

Appl. Soft Comput.

(2018)

O. Dagliyan et al.

Optimization based tumor classification from microarray gene expression data

PLoS One

(2011)

G. Manikandan et al.

A survey on feature selection and extraction techniques for high-dimensional microarray datasets

T. Saw et al.

Swarm intelligence based feature selection for high dimensional classification: a literature survey

Int. J. Comput.

(2019)

G. Manikandan et al.

Feature selection is important: state-of-the-art methods and application domains of feature selection on high-dimensional data

T. Almutiri et al.

Review on feature selection methods for gene expression data classification

A.K. Shukla et al.

A Study on Metaheuristics Approaches for Gene Selection in Microarray Data: Algorithms, Applications and Open Challenges

(2019)

A. Alonso-Betanzos et al.

Feature selection applied to microarray data

N. Sánchez-Maroño et al.

Classification of microarray data

N.A. Zamri et al.

Review on the usage of swarm intelligence in gene expression data

N. Almugren et al.

A survey on hybrid feature selection methods in microarray gene expression data for cancer classification

IEEE Access

(2019)

J.C. Ang et al.

Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection

IEEE ACM Trans. Comput. Biol. Bioinf

(2015)

V. Bolón-Canedo et al.

Challenges and future trends for microarray analysis

S. Vanjimalar et al.

A review on feature selection techniques for gene expression data

S.D. Bharathi et al.

A survey on gene selection for microarray cancer classification based on soft computing techniques

A. Jović et al.

A review of feature selection methods with applications

K.P. Shroff et al.

A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy

V. Bolón-Canedo et al.

Feature selection in dna microarray classification

Z. Mungloo-Dilmohamud et al.

A meta-review of feature selection techniques in the context of microarray data

J. Li et al.

Feature selection: a data perspective

ACM Comput. Surv.

(2017)

D. Cai et al.

Unsupervised feature selection for multi-cluster data

S. Huang

Supervised feature selection: a tutorial

Artif. Intell. Res.

(2015)

L. Fu et al.

Condition monitoring for the roller bearings of wind turbines under variable working conditions based on the Fisher score and permutation entropy

Energies

(2019)

M.A. Sulaiman et al.

Feature selection based on mutual information

X. He et al.

Laplacian score for feature selection

Adv. Neural Inf. Process. Syst.

(2005)

Cited by (78)

Particle guided metaheuristic algorithm for global optimization and feature selection problems[Formula presented]
2024, Expert Systems with Applications
Optimization problems can be seen in numerous fields of practical studies. One area making waves in the application of optimization methods is data mining in machine learning. An important preprocessing technique of data mining where irrelevant variables are discarded from the datasets and holding onto variables with important information is referred to as feature selection (FS). FS is critical to tackling the ‘curse of dimensionality’ by reducing the number of features, minimizing computational expensiveness and maximizing the accuracy of the machine learning models. Swarm Intelligence (SI)-based meta-heuristic algorithms (MAs) have been widely employed to solve several optimization problems like FS. However, common drawbacks identified with these algorithms include getting trapped in local optima, especially in situations where the search space is large (high dimensional space). This study proposes a new hybrid SI-based MA called Particle Swarm-guided Bald Eagle Search (PS-BES). The algorithm utilizes the speed of Particle Swarm to guide Bald Eagles in their search to ensure a smooth transition of the algorithm from exploration to exploitation. Additionally, we introduce the Attack-Retreat-Surrender technique, a new local-optima escape technique to enhance the balance between diversification and intensification of PS-BES. To establish the outstanding performance of the proposed algorithm, PS-BES is comprehensively analyzed utilizing 26 Benchmark functions. Further, the practicality of PS-BES is highlighted by its binary version for feature selection and evaluated using 27 classification datasets from the UCI repository. The results prove the overall superiority of PS-BES and bPS-BES as opposed the 10 state-of-the-art algorithms employed in the study.
FG-HFS: A feature filter and group evolution hybrid feature selection algorithm for high-dimensional gene expression data
2024, Expert Systems with Applications
High dimensional and small samples characterize gene expression data and contain a large number of genes unrelated to disease. Feature selection improves the efficiency of disease diagnosis by selecting a small number of important genes. Unfortunately, existing algorithms do not consider the correlation between features, and search algorithms tend to fall into the local optimal solution in the feature search process. To this end, this paper proposes a feature filter and group evolution hybrid feature selection algorithm (FG-HFS) for high-dimensional gene expression data. Unlike existing algorithms, we propose using spectral clustering to group redundant features into a group. Then, we propose a redundant feature filter algorithm. According to the principle of approximate Markov blanket, grouped feature groups are filtered to delete these redundant features. Among them, filtered features are evenly divided by density according to the feature exponential strategy. Most importantly, we propose using the group evolution multi-objective genetic algorithm to search the filtered feature subsets and evaluate the candidate feature subsets according to the in-group and out-group so as to select the feature subsets with the highest accuracy and the least number. Experimental results show that the average accuracy (ACC) and Matthews correlation coefficient (MCC) indexes of the selected feature subsets (FSs) by the FG-HFS algorithm on 5 gene expression datasets are 92.76% and 88.76%, respectively, which are significantly better than the existing algorithms. In addition, the FSs and ACC/FSs indexes of the FG-HFS algorithm are also better than the existing algorithms, which fully proves the superiority of the FG-HFS algorithm. More importantly, the Wilcoxon and Friedman statistical experiments results show that the feature selection effect of FG-HFS algorithm is significantly better than that of existing algorithms, no matter in pairwise comparison or multiple comparison.
Prediction model of radiotherapy outcome for Ocular Adnexal Lymphoma using informative features selected by chemometric algorithms
2024, Computers in Biology and Medicine
Ocular Adnexal Lymphoma (OAL) is a non-Hodgkin’s lymphoma that most often appears in the tissues near the eye, and radiotherapy is the currently preferred treatment. There has been a controversy regarding the prognostic factors for systemic failure of OAL radiotherapy, the thorough evaluation prior to receiving radiotherapy is highly recommended to better the patient’s prognosis and minimize the likelihood of any adverse effects.
To investigate the risk factors that contribute to incomplete remission in OAL radiotherapy and to establish a hybrid model for predicting the radiotherapy outcomes in OAL patients.
A retrospective chart review was performed for 87 consecutive patients with OAL who received radiotherapy between Feb 2011 and August 2022 in our center. Seven image features, derived from MRI sequences, were integrated with 122 clinical features to form comprehensive patient feature sets. Chemometric algorithms were then employed to distill highly informative features from these sets. Based on these refined features, SVM and XGBoost classifiers were performed to classify the effect of radiotherapy.
The clinical records of from 87 OAL patients (median age: 60 months, IQR: 52–68 months; 62.1% male) treated with radiotherapy were reviewed. Analysis of Lasso (AUC $=$ 0.75, 95% CI: 0.72–0.77) and Random Forest (AUC $=$ 0.67, 95% CI: 0.62–0.70) algorithms revealed four potential features, resulting in an intersection AUC of 0.80 (95% CI: 0.75–0.82). Logistic Regression (AUC $=$ 0.75, 95% CI: 0.72–0.77) identified two features. Furthermore, the integration of chemometric methods such as CARS (AUC $=$ 0.66, 95% CI: 0.62–0.72), UVE (AUC $=$ 0.71, 95% CI: 0.66–0.75), and GA (AUC $=$ 0.65, 95% CI: 0.60–0.69) highlighted six features in total, with an intersection AUC of 0.82 (95% CI: 0.78–0.83). These features included enophthalmos, diplopia, tenderness, elevated ALT count, HBsAg positivity, and CD43 positivity in immunohistochemical tests.
The findings suggest the effectiveness of chemometric algorithms in pinpointing OAL risk factors, and the prediction model we proposed shows promise in helping clinicians identify OAL patients likely to achieve complete remission via radiotherapy. Notably, patients with a history of exophthalmos, diplopia, tenderness, elevated ALT levels, HBsAg positivity, and CD43 positivity are less likely to attain complete remission after radiotherapy. These insights offer more targeted management strategies for OAL patients. The developed model is accessible online at: https://lzz.testop.top/.
A machine learning and deep learning-based integrated multi-omics technique for leukemia prediction
2024, Heliyon
In recent years, scientific data on cancer has expanded, providing potential for a better understanding of malignancies and improved tailored care. Advances in Artificial Intelligence (AI) processing power and algorithmic development position Machine Learning (ML) and Deep Learning (DL) as crucial players in predicting Leukemia, a blood cancer, using integrated multi-omics technology. However, realizing these goals demands novel approaches to harness this data deluge. This study introduces a novel Leukemia diagnosis approach, analyzing multi-omics data for accuracy using ML and DL algorithms. ML techniques, including Random Forest (RF), Naive Bayes (NB), Decision Tree (DT), Logistic Regression (LR), Gradient Boosting (GB), and DL methods such as Recurrent Neural Networks (RNN) and Feedforward Neural Networks (FNN) are compared. GB achieved 97 % accuracy in ML, while RNN outperformed by achieving 98 % accuracy in DL. This approach filters unclassified data effectively, demonstrating the significance of DL for leukemia prediction. The testing validation was based on 17 different features such as patient age, sex, mutation type, treatment methods, chromosomes, and others. Our study compares ML and DL techniques and chooses the best technique that gives optimum results. The study emphasizes the implications of high-throughput technology in healthcare, offering improved patient care.
An improved binary particle swarm optimization algorithm for clinical cancer biomarker identification in microarray data
2024, Computer Methods and Programs in Biomedicine
The limited number of samples and high-dimensional features in microarray data make selecting a small number of features for disease diagnosis a challenging problem. Traditional feature selection methods based on evolutionary algorithms are difficult to search for the optimal set of features in a limited time when dealing with the high-dimensional feature selection problem. New solutions are proposed to solve the above problems.
In this paper, we propose a hybrid feature selection method (C-IFBPFE) for biomarker identification in microarray data, which combines clustering and improved binary particle swarm optimization while incorporating an embedded feature elimination strategy. Firstly, an adaptive redundant feature judgment method based on correlation clustering is proposed for feature screening to reduce the search space in the subsequent stage. Secondly, we propose an improved flipping probability-based binary particle swarm optimization (IFBPSO), better applicable to the binary particle swarm optimization problem. Finally, we also design a new feature elimination (FE) strategy embedded in the binary particle swarm optimization algorithm. This strategy gradually removes poorer features during iterations to reduce the number of features and improve accuracy.
We compared C-IFBPFE with other published hybrid feature selection methods on eight public datasets and analyzed the impact of each improvement. The proposed method outperforms other current state-of-the-art feature selection methods in terms of accuracy, number of features, sensitivity, and specificity. The ablation study of this method validates the efficacy of each component, especially the proposed feature elimination strategy significantly improves the performance of the algorithm.
The hybrid feature selection method proposed in this paper helps address the issue of high-dimensional microarray data with few samples. It can select a small subset of features and achieve high classification accuracy on microarray datasets. Additionally, independent validation of the selected features shows that those chosen by C-IFBPFE have strong correlations with disease phenotypes and can identify important biomarkers from data related to biomedical problems.
A bidirectional dynamic grouping multi-objective evolutionary algorithm for feature selection on high-dimensional classification
2023, Information Sciences
As a key preprocessing step in classification, feature selection involves two conflicting objectives: maximizing the classification accuracy and minimizing the number of selected features. Therefore, multi-objective optimization is widely used in feature selection due to its excellent trade-off between the convergence of two objectives. However, most existing multi-objective feature selection methods still face the issues of the “curse of dimensionality” and high computational costs, especially when the search space is large. To solve the above issues, this paper proposes a bidirectional dynamic grouping multi-objective evolutionary approach for high-dimensional feature selection, referred to as BDGMOEA. This approach transforms a high-dimensional feature selection problem into a feature selection task with a smaller search space by the idea of feature grouping, in which one bit of an individual represents a group of features. Specifically, a grouping search strategy is developed to divide the features into different quadrants according to the importance of the features obtained by different evaluation techniques. Then, the features in each quadrant are grouped by sector. This strategy can effectively narrow the search space and quickly locate promising feature regions. In addition, a bidirectional dynamic adjustment mechanism is presented by considering the evolutionary state of the population, and it can be used to explore each feature in more detail and comprehensively to prevent good features from being ignored in unselected groups. The experimental results demonstrate that the proposed BDGMOEA method performs the best in most cases, indicating that BDGMOEA not only achieves better classification performance but also reduces the training time.

View all citing articles on Scopus

Esra'a Alhenawi ([email protected]) is currently a PhD candidate, The University of Jordan, King Abdullah II School for Information Technology, Department of Computer Science.

Rizik Al-Sayyed ([email protected]), is a professor in computer networks, cloud computing, databases systems, and simulation, The University of Jordan, King Abdullah II School for Information Technology, Department of Information Technology.

Amjad Hudaib ([email protected]), is a professor in Software Engineering, The University of Jordan, King Abdullah II School for Information Technology, Department of Computer Information Systems.

Seyedali Mirjalili ([email protected]), is currently an Associate Professor and the director of the Centre for Artificial Intelligence Research and Optimization at Torrens University Australia. He is internationally well recognized in Swarm Intelligence and Optimization.

View full text

Feature selection methods on gene expression microarray data for cancer classification: A systematic review

Abstract

Introduction

Section snippets

Gene expression microarray data

Importance of FS to microarray datasets analysis

Methodology

FS in microarray data researches main directions

Development of feature selection publications over the recent seven years

Observations and analyses

Conclusion and future work

CRediT author statement

Declaration of competing interest

Acknowledgements

Comput. Biol. Med.

Comput. Electr. Eng.

Appl. Soft Comput.

Knowl. Base Syst.

Procedia Computer Science

Pattern Recogn.

Artif. Intell. Med.

Neurocomputing

Procedia Technology

Pattern Recogn.

Neurocomputing

Artif. Intell. Med.

J. Theor. Biol.

Appl. Soft Comput.

Karbala Int. J. Modern Sci.

Appl. Soft Comput.

Appl. Soft Comput.

Expert Syst. Appl.

Expert Syst. Appl.

Neurocomputing

Genomics

Appl. Soft Comput.

Chemometr. Intell. Lab. Syst.

Biocybernet. Biomed. Eng.

J. Biomed. Inf.

Appl. Soft Comput.

Optimization based tumor classification from microarray gene expression data

PLoS One

A survey on feature selection and extraction techniques for high-dimensional microarray datasets

Swarm intelligence based feature selection for high dimensional classification: a literature survey

Int. J. Comput.

Feature selection is important: state-of-the-art methods and application domains of feature selection on high-dimensional data

Review on feature selection methods for gene expression data classification

A Study on Metaheuristics Approaches for Gene Selection in Microarray Data: Algorithms, Applications and Open Challenges

Feature selection applied to microarray data

Classification of microarray data

Review on the usage of swarm intelligence in gene expression data

A survey on hybrid feature selection methods in microarray gene expression data for cancer classification

IEEE Access

Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection

IEEE ACM Trans. Comput. Biol. Bioinf

Challenges and future trends for microarray analysis

A review on feature selection techniques for gene expression data

A survey on gene selection for microarray cancer classification based on soft computing techniques

A review of feature selection methods with applications

A comparative study of various feature selection techniques in high-dimensional data set to improve classification accuracy

Feature selection in dna microarray classification

A meta-review of feature selection techniques in the context of microarray data

Feature selection: a data perspective

ACM Comput. Surv.

Unsupervised feature selection for multi-cluster data

Supervised feature selection: a tutorial

Artif. Intell. Res.

Condition monitoring for the roller bearings of wind turbines under variable working conditions based on the Fisher score and permutation entropy

Energies

Feature selection based on mutual information

Laplacian score for feature selection

Adv. Neural Inf. Process. Syst.