A filter model for feature subset selection based on genetic algorithm

https://doi.org/10.1016/j.knosys.2009.02.006Get rights and content

Abstract

This paper describes a novel feature subset selection algorithm, which utilizes a genetic algorithm (GA) to optimize the output nodes of trained artificial neural network (ANN). The new algorithm does not depend on the ANN training algorithms or modify the training results. The two groups of weights between input-hidden and hidden-output layers are extracted after training the ANN on a given database. The general formula for each output node (class) of ANN is then generated. This formula depends only on input features because the two groups of weights are constant. This dependency is represented by a non-linear exponential function. The GA is involved to find the optimal relevant features, which maximize the output function for each class. The dominant features in all classes are the features subset to be selected from the input feature group.

Introduction

Reducing dimensionality of a problem, in many real world problems, is an essential step before any analysis of the data. The general criterion for reducing the dimensionality is the desire to preserve most of the relevant information of the original data according to some optimality criteria [1]. Dimensionality reduction or feature selection has been an active research area in pattern recognition, statistics and data mining communities. The main idea of feature selection is to choose a subset of input features by eliminating features with little or no predictive information. In particular, feature selection removes irrelevant features, increases efficiency of learning tasks, improves learning performance and enhances comprehensibility of learned results [2], [3]. Feature selection problem can be viewed as a special case of the feature-weighting problem. The weight associated with a feature measures its relevance or significance in the classification task [4]. If we restrict the weights to be binary valued, the feature-weighting problem reduces to the feature selection problem. Feature selection algorithms fall into two broad categories, the filter model or the wrapper model [5]. Filter models use an evaluation function that relies solely on properties of the data, thus it is independent on any particular algorithm. Wrapper models use the inductive algorithm to estimate the value of a given subset. Most algorithms for feature selection perform either heuristic or exhaustive search [6]. Heuristic feature selection algorithms estimate the feature’s quality with a heuristic measure such as information gain [7], Gini index [8], discrepancies measure [9] and chi-square test [10]. Other examples of heuristic algorithms include the Relief algorithm and its extension [11]. Exhaustive feature selection algorithms search all possible combinations of features and aim at finding a minimal combination of features that are sufficient to construct a model consistent with a given set of instances such as the FOCUS algorithm [12]. Various approaches have been proposed for finding irrelevant features and remove them from the feature set. C4.5 decision tree presented in [13] finds relevant features by keeping only those features that appear in the decision tree. The cross-validation method is applied in [14] to filter irrelevant features before constructing ID3 and C4.5 decision trees. Neural network is used to estimate the relative importance of each feature (with respect to the classification task), and assign it a corresponding weight [15]. When properly weighted, an important feature would receive a larger weight than less important or irrelevant features. In general, feature selection refers to the study of algorithms that select an optimal subset from the input feature set. Optimality is normally dependent on the evaluation criteria or the application’s needs. Therefore, the genetic algorithms (GAs) have recently received much attention because of their ability to solve difficult problems in the optimization. The GAs are search methods that have been widely used in feature selection where the size of the search space is large [16]. The most important differences between the GAs and the traditional optimization algorithms are: genetic algorithms work with a coded version of the parameters; they do not search from one single point, but from a population of points. A crucial issue in the design of a genetic algorithm is the choice of the fitness function. This function is used to evaluate the quality of each hypothesis, and it is the function to be optimized in the target problem.

This paper presents a novel algorithm for feature subset selection from trained neural network using genetic algorithm. It does not depend on the ANN training algorithms and it does not modify the training results. The GA is used to find the optimal input features (relevant), which maximize the output functions of trained neural network.

The organization of this paper is as follows. The problem formulation is described in Section 2. The data preprocessing is performed in Section 3. The proposed feature selection algorithm is outlined in Section 4. An initial experiment is described in Section 5 to demonstrate the feasibility of the proposed algorithm. The application and results are reported in Section 6. The conclusion and future work are presented in Section 7.

Section snippets

Problem description

The proposed feature subset selection algorithm starts with training the artificial neural network on the input features and the corresponding class. The ANN is trained so that a satisfactory error level is reached. Each input unit corresponds typically to a single feature and each output unit corresponds to a class value. The main objective of our approach is to encode the network in such a way that a genetic algorithm can run over it. After training the ANN, the weights between input-hidden

Data preprocessing

Some of the learning algorithms such as neural network are often trained more successful and fast when the discrete input features are used. Therefore, the features which have numerical values in a given database must be treated by using the discretization technique. The discretization technique splits the continuous feature values into small sets of intervals, each interval has upper and lower values [XlowerXupper]. Hence, these intervals are transformed into linguistic terms. The

The proposed algorithm

A supervised ANN uses a set of M examples or records. These records include N features. Each feature, Fn (n = 1,2,…,N), can be encoded into a fixed length binary sub-string x1xixVn, where Vn is the number of possible values of nth feature. The element xi = 1 if its corresponding feature value exists, while all the other elements = 0. Therefore, the proposed number of input nodes, I, in the input layer of ANN is given as:I=n=1NVnConsequently, the input features vector, Xm, to the input layer can be

Initial experiment

The experiment described in this section is chosen to demonstrate the simplicity of the proposed feature subset selection algorithm. A given database (has four features and two different output classes) is shown in Table 1 [18]. The encoding values of the given database are shown in Table 2. The ANN is trained on the encoding input features vectors, Xm, and the corresponding output classes vectors Tm.

The number of input nodes is given by:I=n=1NVn=V1+V2+V3+V4=10The number of output nodes is:K=2.

Application and results

The proposed algorithm is evaluated on two different databases, Monk1’s database and Car Evaluation database for extracting the relevant features. These databases are drawn from UCI machine learning database repository [21].

Conclusion and future work

A novel feature subset selection algorithm from trained artificial neural network and genetic algorithm is presented in this paper. The proposed algorithm is applied on two different applications, Monk1’s database and Car Evaluation database. The results demonstrate that the proposed algorithm reduces the dimensionality of two databases by 50% and 33.33% respectively. Therefore, it is very effective in reducing dimensionality, removing irrelevant features and improving result comprehensibility.

References (24)

Cited by (104)

  • Identifying the most informative features using a structurally interacting elastic net

    2019, Neurocomputing
    Citation Excerpt :

    Existing feature selection algorithms can be broadly categorized as filter and wrapper methods depending on whether the learning algorithm is used in the feature subset selection process [9]. Filter methods utilize the intrinsic properties of the data to build quantitative evaluation criteria [10]. By contrast, wrapper methods [11] evaluate the selected feature subsets based on the performance measures of the classifier including accuracy and precision.

  • High-dimensional hybrid feature selection using interaction information-guided search

    2018, Knowledge-Based Systems
    Citation Excerpt :

    Based on how they utilize the learning model, feature selection algorithms can be categorized into three types, i.e., filter methods, wrapper methods, and embedded methods. Filter methods [5-7] rely on the intrinsic properties of the data such as distance, dependency, and consistency and select subsets without any knowledge of the learning algorithm. They are relatively fast and unbiased towards a specific learning classifier.

View all citing articles on Scopus
View full text