A filter model for feature subset selection based on genetic algorithm

doi:10.1016/j.knosys.2009.02.006

Knowledge-Based Systems

Volume 22, Issue 5, July 2009, Pages 356-362

https://doi.org/10.1016/j.knosys.2009.02.006 Get rights and content

Abstract

This paper describes a novel feature subset selection algorithm, which utilizes a genetic algorithm (GA) to optimize the output nodes of trained artificial neural network (ANN). The new algorithm does not depend on the ANN training algorithms or modify the training results. The two groups of weights between input-hidden and hidden-output layers are extracted after training the ANN on a given database. The general formula for each output node (class) of ANN is then generated. This formula depends only on input features because the two groups of weights are constant. This dependency is represented by a non-linear exponential function. The GA is involved to find the optimal relevant features, which maximize the output function for each class. The dominant features in all classes are the features subset to be selected from the input feature group.

Introduction

Reducing dimensionality of a problem, in many real world problems, is an essential step before any analysis of the data. The general criterion for reducing the dimensionality is the desire to preserve most of the relevant information of the original data according to some optimality criteria [1]. Dimensionality reduction or feature selection has been an active research area in pattern recognition, statistics and data mining communities. The main idea of feature selection is to choose a subset of input features by eliminating features with little or no predictive information. In particular, feature selection removes irrelevant features, increases efficiency of learning tasks, improves learning performance and enhances comprehensibility of learned results [2], [3]. Feature selection problem can be viewed as a special case of the feature-weighting problem. The weight associated with a feature measures its relevance or significance in the classification task [4]. If we restrict the weights to be binary valued, the feature-weighting problem reduces to the feature selection problem. Feature selection algorithms fall into two broad categories, the filter model or the wrapper model [5]. Filter models use an evaluation function that relies solely on properties of the data, thus it is independent on any particular algorithm. Wrapper models use the inductive algorithm to estimate the value of a given subset. Most algorithms for feature selection perform either heuristic or exhaustive search [6]. Heuristic feature selection algorithms estimate the feature’s quality with a heuristic measure such as information gain [7], Gini index [8], discrepancies measure [9] and chi-square test [10]. Other examples of heuristic algorithms include the Relief algorithm and its extension [11]. Exhaustive feature selection algorithms search all possible combinations of features and aim at finding a minimal combination of features that are sufficient to construct a model consistent with a given set of instances such as the FOCUS algorithm [12]. Various approaches have been proposed for finding irrelevant features and remove them from the feature set. C4.5 decision tree presented in [13] finds relevant features by keeping only those features that appear in the decision tree. The cross-validation method is applied in [14] to filter irrelevant features before constructing ID3 and C4.5 decision trees. Neural network is used to estimate the relative importance of each feature (with respect to the classification task), and assign it a corresponding weight [15]. When properly weighted, an important feature would receive a larger weight than less important or irrelevant features. In general, feature selection refers to the study of algorithms that select an optimal subset from the input feature set. Optimality is normally dependent on the evaluation criteria or the application’s needs. Therefore, the genetic algorithms (GAs) have recently received much attention because of their ability to solve difficult problems in the optimization. The GAs are search methods that have been widely used in feature selection where the size of the search space is large [16]. The most important differences between the GAs and the traditional optimization algorithms are: genetic algorithms work with a coded version of the parameters; they do not search from one single point, but from a population of points. A crucial issue in the design of a genetic algorithm is the choice of the fitness function. This function is used to evaluate the quality of each hypothesis, and it is the function to be optimized in the target problem.

This paper presents a novel algorithm for feature subset selection from trained neural network using genetic algorithm. It does not depend on the ANN training algorithms and it does not modify the training results. The GA is used to find the optimal input features (relevant), which maximize the output functions of trained neural network.

The organization of this paper is as follows. The problem formulation is described in Section 2. The data preprocessing is performed in Section 3. The proposed feature selection algorithm is outlined in Section 4. An initial experiment is described in Section 5 to demonstrate the feasibility of the proposed algorithm. The application and results are reported in Section 6. The conclusion and future work are presented in Section 7.

Section snippets

Problem description

The proposed feature subset selection algorithm starts with training the artificial neural network on the input features and the corresponding class. The ANN is trained so that a satisfactory error level is reached. Each input unit corresponds typically to a single feature and each output unit corresponds to a class value. The main objective of our approach is to encode the network in such a way that a genetic algorithm can run over it. After training the ANN, the weights between input-hidden

Data preprocessing

Some of the learning algorithms such as neural network are often trained more successful and fast when the discrete input features are used. Therefore, the features which have numerical values in a given database must be treated by using the discretization technique. The discretization technique splits the continuous feature values into small sets of intervals, each interval has upper and lower values [X_lower–X_upper]. Hence, these intervals are transformed into linguistic terms. The

The proposed algorithm

A supervised ANN uses a set of M examples or records. These records include N features. Each feature, F_n (n = 1,2,…,N), can be encoded into a fixed length binary sub-string $\{x_{1} x_{i} {x_{V}}_{n}\}$ , where V_n is the number of possible values of nth feature. The element x_i = 1 if its corresponding feature value exists, while all the other elements = 0. Therefore, the proposed number of input nodes, I, in the input layer of ANN is given as: $I = \sum_{n = 1}^{N} V_{n}$ Consequently, the input features vector, X_m, to the input layer can be

Initial experiment

The experiment described in this section is chosen to demonstrate the simplicity of the proposed feature subset selection algorithm. A given database (has four features and two different output classes) is shown in Table 1 [18]. The encoding values of the given database are shown in Table 2. The ANN is trained on the encoding input features vectors, X_m, and the corresponding output classes vectors T_m.

The number of input nodes is given by: $I = \sum_{n = 1}^{N} V_{n} = V_{1} + V_{2} + V_{3} + V_{4} = 10$ The number of output nodes is: $K = 2 .$

Application and results

The proposed algorithm is evaluated on two different databases, Monk1’s database and Car Evaluation database for extracting the relevant features. These databases are drawn from UCI machine learning database repository [21].

Conclusion and future work

A novel feature subset selection algorithm from trained artificial neural network and genetic algorithm is presented in this paper. The proposed algorithm is applied on two different applications, Monk1’s database and Car Evaluation database. The results demonstrate that the proposed algorithm reduces the dimensionality of two databases by 50% and 33.33% respectively. Therefore, it is very effective in reducing dimensionality, removing irrelevant features and improving result comprehensibility.

References (24)

Bovas Abraham et al.
Dimensionality reduction approach to multivariate prediction
Computational Statistics and Data Analysis
(2005)
Zhiping Chen et al.
A preprocess algorithm of filtering irrelevant information based on the minimum class difference
Knowledge-Based Systems
(2006)
Fangming Zhu et al.
Feature selection for modular GA-based classification
Applied Soft Computing
(2004)
Muhammad Atif Tahir et al.
Simultaneous feature selection and feature weighting using hybrid tabu search/K-nearest neighbor classifier
Pattern Recognition Letters
(2007)
Marc Sebban et al.
A hybrid filter/wrapper approach of feature selection using information theory
Pattern Recognition
(2002)
Manoranjan Dash et al.
Consistency-based search in feature selection
Artificial Intelligence
(2003)
Changki Lee et al.
Information gain and divergence-based feature selection for machine learning-based text categorization
Information Processing and Management
(2006)
N.L. Fernández-García et al.
Characterization of empirical discrepancy evaluation measures
Pattern Recognition Letters
(2004)
Jaekyung Yang et al.
Optimization-based feature selection with adaptive instance sampling
Computers and Operations Research
(2006)
C.R. Rao et al.
Linear model selection by cross-validation
Journal of Statistical Planning and Inference
(2005)

Harinder Sawhney et al.

A feed-forward artificial neural network with enhanced feature selection for power system transient stability assessment

Electric Power Systems Research

(2006)

Riyaz Sikora et al.

Framework for efficient feature selection in genetic algorithm based data mining

European Journal of Operational Research

(2007)

Cited by (104)

Internet financing credit risk evaluation using multiple structural interacting elastic net feature selection
2021, Pattern Recognition
Internet financing is an important alternative to banks where individuals or SMEs borrow money using online trading platforms. A central problem for internet financing is how to identify the most influential factors that are closely related to the credit risks. This problem is inherently challenging because the raw data of internet financing is often associated with complex structural correlations and usually contains many irrelevant and redundant features. To effectively identify the most salient features for credit risk evaluation in internet financing, we develop a new multiple structural interacting elastic net model for feature selection (MSIEN). Our idea is based on converting the original vectorial features into structure-based feature graph representations to encapsulate structural relationship between pairwise samples, and defining two new information theoretic criteria. One criterion maximizes joint relevance of different pairwise feature combinations in relation to the target feature graph and the other minimizes the redundancy between pairwise features. Then two structural interaction matrices are obtained with the elements representing the proposed information theoretic measures. To identify the most informative features, we formulate a new optimization model which combines the interaction matrices and an elastic net regularization model for the feature subset selection problem. We exploit an efficient iterative optimization algorithm to solve the proposed problem and also provide the theoretical analyses on its convergence property and computational complexity. Finally, experimental results on datasets of internet financing demonstrate the effectiveness of the proposed MSIEN method.
A knowledge-based heterogeneity characterization framework for 3D steam-assisted gravity drainage reservoirs
2020, Knowledge-Based Systems
In typical field-scale steam-assisted gravity drainage (SAGD) projects, shale barriers would often act as flow barriers that may adversely impact the ensuing steam chamber development. Efficient inference and proper representation of such heterogeneities from production data remain challenging. A novel hybrid knowledge-based workflow for 3D SAGD heterogeneity inference is presented. A convolutional neural network (CNN) proxy model is integrated with the genetic algorithm (GA) to infer shale parameters.
A total number of 1000 heterogeneous 3D cases are constructed and subjected to numerical simulation and the corresponding production data is recorded. A dataset is assembled from the simulation results corresponding to these 1000 cases to train a set of CNN-based proxy models: discrete wavelet transform (DWT) is applied to parameterize a 3D reservoir model and the corresponding production time-series data. A GA-based workflow is introduced to infer the unknown shale barrier arrangement from a given (known) production profile by searching for a set of shale barrier parameters that would minimize the difference between the known profile and the predictions. The proposed workflow would yield an ensemble of 3D models of shale barrier distribution that are consistent with the actual production histories.
The proposed methodology is tested with cases involving both idealized and irregularly-shaped shale barrier configurations. The proposed hybrid characterization workflow provides a robust and computationally-efficient alternative for inferring uncertain 3D heterogeneous features and can be easily extended to solve other similar inverse problems in various engineering fields.
Hybrid particle swarm optimization with spiral-shaped mechanism for feature selection
2019, Expert Systems with Applications
The “curse of dimensionality” is one of the largest problems that influences the quality of the optimization process in most data mining, pattern recognition, and machine learning tasks. Using high-dimensional datasets to train a classification model may reduce the generalization performance of the learned model. In addition, high dimensionality of the dataset results in high computational and memory costs. Feature selection is an important data preprocessing approach in many practical application domains that are relevant to expert and intelligent systems. Feature selection aims at selecting a subset of informative and relevant features from an original feature dataset. Therefore, using a feature selection approach to process the original data prior to the learning process is essential for enhancing the performance on the classification task. In this paper, hybrid particle swarm optimization with a spiral-shaped mechanism (HPSO-SSM) is proposed for selecting the optimal feature subset for classification via a wrapper-based approach. In HPSO-SSM, we make three improvements: First, a logistic map sequence is used to enhance the diversity in the search process. Second, two new parameters are introduced into the original position update formula, which can effectively improve the position quality of the next generation. Finally, a spiral-shaped mechanism is adopted as a local search operator around the known optimal solution region. For a complete evaluation, the proposed HPSO-SSM method is compared with six state-of-the-art meta-heuristic optimization algorithms, ten well-known wrapper-based feature selection techniques, and six classic filter-based feature selection methods. Various assessment indicators are used to properly evaluate and compare the performances of these approaches on twenty classic benchmark classification datasets from the UCI machine learning repository. According to the experimental results and statistical tests, the developed methods effectively and efficiently improve the classification accuracy compared with other wrapper-based approaches and filter-based approaches. The results demonstrate the high performance of the HPSO-SSM method in searching the feasible feature space and selecting the most informative attributes for solving classification problems. Therefore, the HPSO-SSM method has broad application prospects as a new feature selection approach.
Identifying the most informative features using a structurally interacting elastic net
2019, Neurocomputing
Citation Excerpt :
Existing feature selection algorithms can be broadly categorized as filter and wrapper methods depending on whether the learning algorithm is used in the feature subset selection process [9]. Filter methods utilize the intrinsic properties of the data to build quantitative evaluation criteria [10]. By contrast, wrapper methods [11] evaluate the selected feature subsets based on the performance measures of the classifier including accuracy and precision.
Feature selection can efficiently identify the most informative features with respect to the target feature used in training. However, state-of-the-art vector-based methods are unable to encapsulate the relationships between feature samples into the feature selection process, thus leading to significant information loss. To address this problem, we propose a new graph-based structurally interacting elastic net method for feature selection. Specifically, we commence by constructing feature graphs that can incorporate pairwise relationship between samples. With the feature graphs to hand, we propose a new information theoretic criterion to measure the joint relevance of different pairwise feature combinations with respect to the target feature graph representation. This measure is used to obtain a structural interaction matrix where the elements represent the proposed information theoretic measure between feature pairs. We then formulate a new optimization model through the combination of the structural interaction matrix and an elastic net regression model for the feature subset selection problem. This allows us to (a) preserve the information of the original vectorial space, (b) remedy the information loss of the original feature space caused by using graph representation, and (c) promote a sparse solution and also encourage correlated features to be selected. Because the proposed optimization problem is non-convex, we develop an efficient alternating direction multiplier method (ADMM) to locate the optimal solutions. Extensive experiments on various datasets demonstrate the effectiveness of the proposed method.
High-dimensional hybrid feature selection using interaction information-guided search
2018, Knowledge-Based Systems
Citation Excerpt :
Based on how they utilize the learning model, feature selection algorithms can be categorized into three types, i.e., filter methods, wrapper methods, and embedded methods. Filter methods [5-7] rely on the intrinsic properties of the data such as distance, dependency, and consistency and select subsets without any knowledge of the learning algorithm. They are relatively fast and unbiased towards a specific learning classifier.
With the rapid growth of high-dimensional data sets in recent years, the need for reducing the dimensionality of data has grown significantly. Although wrapper approaches tend to achieve higher accuracy rates than filter techniques for the same number of selected features, only a few wrapper algorithms are applicable for high-dimensional data sets because the computational time becomes very excessive. We thus propose a new hybrid feature selection algorithm that is computationally efficient with high accuracy rates for high-dimensional data. The proposed method employs interaction information to guide the search, sequentially adds one feature at a time into the currently selected subset, and adopts early stopping to prevent overfitting and speed up the search. Our method is dynamic and selects only relevant and irredundant features that significantly improve the accuracy rates. Our experimental results for eleven high-dimensional data sets demonstrate that our algorithm consistently outperforms prior feature selection techniques, while requiring a reasonable amount of search time.
A BPSO-SVM algorithm based on memory renewal and enhanced mutation mechanisms for feature selection
2017, Applied Soft Computing Journal
Feature selection (FS) is an essential component of data mining and machine learning. Most researchers devoted to get more effective method with high accuracy and fewer features, it has become one of the most challenging problems in FS. Certainly, some algorithms have been proven to be effectively, such as binary particle swarm optimization (BPSO), genetic algorithm (GA) and support vector machine (SVM). BPSO is a metaheuristic algorithm having been widely applied to various fields and applications successfully, including FS. As a wrapper method of FS, BPSO-SVM tends to be trapped into premature easily. In this paper, we present a novel mutation enhanced BPSO-SVM algorithm by adjusting the memory of local and global optimum (LGO) and increasing the particles’ mutation probability for feature selection to overcome convergence premature problem and achieve high quality features. Typical simulated experimental results carried out on Sonar, LSVT and DLBCL datasets indicated that the proposed algorithm improved the accuracy and decreased the number of feature subsets, comparing with existing modified BPSO algorithms and GA.

View all citing articles on Scopus

View full text

A filter model for feature subset selection based on genetic algorithm

Abstract

Introduction

Section snippets

Problem description

Data preprocessing

The proposed algorithm

Initial experiment

Application and results

Conclusion and future work

Computational Statistics and Data Analysis

Knowledge-Based Systems

Applied Soft Computing

Pattern Recognition Letters

Pattern Recognition

Artificial Intelligence

Information Processing and Management

Pattern Recognition Letters

Computers and Operations Research

Journal of Statistical Planning and Inference

Electric Power Systems Research

European Journal of Operational Research