Elsevier

Neurocomputing

Volume 74, Issue 17, October 2011, Pages 2914-2928
Neurocomputing

A new local search based hybrid genetic algorithm for feature selection

https://doi.org/10.1016/j.neucom.2011.03.034Get rights and content

Abstract

This paper presents a new hybrid genetic algorithm (HGA) for feature selection (FS), called as HGAFS. The vital aspect of this algorithm is the selection of salient feature subset within a reduced size. HGAFS incorporates a new local search operation that is devised and embedded in HGA to fine-tune the search in FS process. The local search technique works on basis of the distinct and informative nature of input features that is computed by their correlation information. The aim is to guide the search process so that the newly generated offsprings can be adjusted by the less correlated (distinct) features consisting of general and special characteristics of a given dataset. Thus, the proposed HGAFS receives the reduced redundancy of information among the selected features. On the other hand, HGAFS emphasizes on selecting a subset of salient features with reduced number using a subset size determination scheme. We have tested our HGAFS on 11 real-world classification datasets having dimensions varying from 8 to 7129. The performances of HGAFS have been compared with the results of other existing ten well-known FS algorithms. It is found that, HGAFS produces consistently better performances on selecting the subsets of salient features with resulting better classification accuracies.

Introduction

The recent incremental trend of high-dimensional data collection and problem representation demands the use of feature selection (FS) in many machine learning tasks. A large number of irrelevant and/or redundant features generally exist in the real-world datasets that may significantly degrade the accuracy of learned models and reduce the learning speed of the models. FS is a solution that involves finding a subset of salient features to improve predictive accuracy and removes the useless features. Thus, the learning model receives a concise structure without sacrificing the predictive accuracy built by using only the selected salient features. Therefore, nowadays, FS is an active research in the machine learning area. It also provides other benefits, such as, data visualization and data understanding, reducing the measurement and storage requirements [15].

The traditional approaches in FS can be broadly categorized into three approaches: filter, wrapper, and hybrid approaches [32]. The filter approach requires the statistical analysis of the feature set only for solving the FS task without utilizing any learning model [10]. The wrapper approach involves with the predetermined learning model, selects features on measuring the learning performance of the particular learning model [15]. The hybrid approach attempts to take advantage of the filter and wrapper approaches [18], [51]. It is often found that, hybrid technique is capable of locating a good solution, while a single technique often traps into an immature solution.

In addition, recently a new category of FS approach has been proposed. That is to say, the embedded approach in which FS process is integrated with the classifier construction (e.g., SIMBA [61], SVM-RFE [17], L1 regularized learning [54]). The performance of this approach is similar to wrapper approach, since the main concern is to the interaction between the feature selection and classification.

The success of different approaches mainly depends on considering fruitful search strategy in FS process. For this, different approaches use different ways to generate subsets and progress the search processes. One way is to start the search process with an empty set and successively add features (e.g., [14], [43]), called the sequential forward search (SFS). Another way is to search with a full set and successively remove features [1], [12], [19], [49], [52], called the sequential backward search (SBS). This sequential strategy is simple to implement and fast but affected by nesting effect [44], which signifies that once a feature is added (or, deleted) it cannot be deleted (or, added) later. In order to overcome such effect, the floating search strategy [44] has been implemented by modifying sequential search strategy. On the other hand, one can start search process with a randomly selected subset (e.g., [33], [53]) involving a sequential strategy, called as the random search strategy. In addition, a good solution can be found using complete search strategy [10], [32], since it covers searching for the whole feature space. However, such strategy is not feasible, since it requires longer time while involving larger feature sets.

Most search strategies discussed above, however, attempt to find solutions that range between sub-optimal and near optimal regions, since they involve searching locally rather than globally. On the other hand, solution of optimal or near optimal is quite difficult for those search algorithms due to involving partial search in the solution space or suffering from computational complexity. Therefore, the recent trend of research has been shifted towards the global search algorithms (or, meta-heuristics). They find the solution in the full search space by their global search capability with utilizing the local search suitably. The global search algorithms work on basis of the activity of multi-agents, which ultimately enhance to find very high-quality solutions within a reasonable time [9]. Genetic algorithm (GA) [8], [18], [40], [41], [56], [61], [62] is the one that can successfully solve the FS tasks among the other global search algorithms, such as, ant colony optimization algorithm (ACO) [3], [27], [51], and particle swarm optimization algorithm (PSO) [58].

In this paper, we propose a new hybrid genetic algorithm (HGA) for feature selection, called as HGAFS. The proposed idea tries to hybridize the GA by integrating new local search operations (LSOs). Such operations embedded in GA leads to fine-tune the search process for FS in an organized fashion. HGAFS combines the FS with determining the size of the subset in a reduced form. It uses a bounded random selection scheme involving correlation information in the LSOs for selecting salient features. The idea implemented here as an extension of our earlier work [24]. Our algorithm, HGAFS, differs from previous works (e.g., [8], [18], [40], [62]) in selecting salient features from a given dataset on two aspects.

First, HGAFS emphasizes not only on selecting a number of salient features, but also on attaining a reduced number. The proposed HGAFS selects the salient features with reduced number using a subset size determination scheme. Such scheme works upon a bounded region and tries to provide the size of the subset smaller in number. Thus, the distribution of the number of 1-bits into the individual strings of a population in HGAFS is maintained according to the subset size determined here. This approach is quite different from existing works (e.g., [8], [18], [35], [40], [61], [62]). The most common practice is to choose these 1-bits using a boundless random function and then selects relevant features using GA. Although finding relevant features using GA is a good step, the boundless random function affects the FS process. The reason is that, it may provide either too low or too high value. In case of too high, search space becomes larger, computational time increases much, and ultimately least significant features might be selected. The search process, on the other hand, cannot be completed perfectly if the number of 1-bits is too low. Thus, selecting a subset of salient features with reduced number provides a novel approach in FS process using GAs.

Second, HGAFS uses correlation information in conjunction with the bounded scheme to select a subset of relevant features. The aim of using correlation information of features is to guide the search process in GA in such a way that, relatively less correlated (distinct) features are injected in a high proportion with respect to more correlated (similar) features to the consecutive generations. It is important here to note that, correlation information guides the search process only using the GA, while the neural networks (NNs) assist to fulfill the genetic process of GA. The existing FS approaches (e.g., [8], [18], [40], [62]) do not use correlation information to guide the search process. Thus, the redundant information might be increased due to the selection of correlated features in their solutions.

In addition, there are also some approaches, such as, RELIEF [29], I-RELIEF [55], and SIMBA [5] that are also involved to assign the weight of each feature and select a reduced number of relevant features. In close observation, these methods are indeed filter method, except SIMBA, while our proposed HGAFS is the wrapper-based feature selection algorithm using GA and NN. It is well-known that, the performance of wrapper method always outperforms filter method for feature selection [10], [32]. On the other hand, SIMBA involves with the classifier in selecting salient features, but failed to provide global search.

Recently, a constructive approach for FS, CAFS, has been proposed [25]. This approach automatically selects the relevant features using correlation information-based sequential search strategy and determines appropriate architectures for the NNs during training. One major disadvantage of such an approach is that, CAFS may suffer from the nesting effect, since it uses a sequential approach in selecting a set of salient features. One efficient way to avoid this effect is to incorporate a global search strategy such as GA [25]. We utilize such correlation-based search strategy in the GA as a local search operation. Such utilization ultimately has produced a significant performance gain for FS in HGAFS.

The rest of this paper is organized as follows. Section 2 reviews the literature of existing FS works. Detailed discussions about HGAFS with the computational complexity can be found in Section 3. Section 4 presents the results of our experimental studies including the experimental methodology, experimental results, and comparisons with other existing FS algorithms. Finally, Section 5 presents the discussions with future directions of our algorithm and Section 6 concludes the paper with a brief summary.

Section snippets

A review of literature

The performance of any FS task is greatly dependent on the search technique in finding the salient features from a given dataset [35]. Among numerous FS algorithms, most are involved with either sequential search [1], [7], [12], [14], [17], [19], [44], [45], [50], [52], [57], [59] or global search technique [3], [8], [18], [27], [31], [35], [40], [41], [51], [56], [58], [61], [62]. In contrast, guiding the search strategies and evaluating the generated subset, the existing FS algorithms can be

Proposed HGAFS

GA provides genetic search that belongs to the global search strategy for finding an optimal solution to a given problem. In FS task, GA provides better solutions, but affected by two shortcomings, i.e., premature convergence and weakness in fine-tuning near local optimum points [18], [40]. To overcome such weakness of GA, hybridizing GA, i.e., incorporating domain specific knowledge into the GA, is an active research nowadays.

Our proposed HGAFS uses HGA technique combining a bounded scheme,

Experimental studies

This section presents HGAFS's performance on several well-known real-world benchmark and gene expression classification datasets, including diabetes, breast cancer, glass, vehicle, hepatitis, horse, sonar, splice, colon cancer, lymphoma, and leukemia datasets. These datasets have been the subject of many studies in NNs and machine learning and cover the examples of small, medium, large, and very large dimensional datasets. The characteristics of these datasets are shown in Table 1, which show a

Discussions

This section briefly explains why the performance of HGAFS is better than those of the other FS algorithms. There are three major differences that might contribute to the better performance of HGAFS.

First, HGAFS is guided in selecting the salient features using the subset size determination scheme. Such determination scheme encourages HGAFS to generate subsets with reduced form, while other approaches (e.g., [8], [18], [35], [40], [62]) use a random function instead. Thus, subsets of larger

Conclusions

A method that integrates two quite different new techniques in GA for feature selection has been proposed. It restricts the number of 1-bits in the individual strings, and the local search operation. The fitness function, combining the performance of NN with the correlation information of features, assists LSO in HGAFS to find the most salient features with less redundancy of information.

In HGAFS, neither the computation of training based classifier nor the computation of mutual information nor

Acknowledgments

Supported by grants to KM from the Japanese Society for Promotion of Sciences and the University of Fukui.

Md. Monirul Kabir received the B.E. degree in Electrical and Electronic Engineering from Bangladesh Institute of Technology (BIT), Khulna, now Khulna University of Engineering and Technology (KUET), Bangladesh in 1999. He received a master of engineering degree in the department of Human and Artificial Intelligent Systems from University of Fukui, Japan in 2008. He obtained a doctor of engineering degree in the System Design Engineering from University of Fukui in March 2011. He was an

References (62)

  • S. Li et al.

    An effective feature selection method for hyperspectral image classification based on genetic algorithm and support vector machine

    Knowledge-Based Systems

    (2011)
  • S. Nemati et al.

    A novel ACO-GA hybrid algorithm for feature selection in protein function prediction

    Expert Systems with Applications

    (2009)
  • L. Prechelt

    A quantitative study of experimental evaluations of neural network learning algorithms

    Neural Networks

    (1996)
  • P. Pudil et al.

    Floating search methods in feature selection

    Pattern Recognition Letters

    (1994)
  • R.K. Sivagaminathan et al.

    A hybrid approach for feature subset selection using neural networks and ant colony optimization

    Expert Systems with Applications

    (2007)
  • A. Verikas et al.

    Feature selection with neural networks

    Pattern Recognition Letters

    (2002)
  • X. Wang et al.

    Feature selection based on rough sets and particle swarm optimization

    Pattern Recognition Letters

    (2007)
  • Y.L. Wu et al.

    Feature selection using genetic algorithm and cluster validation

    Expert Systems with Applications

    (2011)
  • Z. Zhu et al.

    Markov blanket-embedded genetic algorithm for gene selection

    Pattern Recognition

    (2007)
  • S. Abe

    Modified backward feature selection by cross validation

    Proceedings of the European Symposium on Artificial Neural Networks

    (2005)
  • A. Alizadeh

    Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

    Nature

    (2000)
  • U. Alon et al.

    Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays

    Proceedings of the National Academy of Sciences of the USA

    (1999)
  • R.G. Bachrach, A. Navot, N. Tishby, Margin based feature selection-theory and algorithms, in: Proceedings of the 21st...
  • A.D. Back et al.

    Selecting inputs for modeling using normalized higher order statistics and independent component analysis

    IEEE Transactions on Neural Networks

    (2001)
  • D. Chakraborty et al.

    A neuro-fuzzy scheme for simultaneous feature selection and fuzzy rule-based classification

    IEEE Transactions on Neural Networks

    (2004)
  • M. Dorigo et al.

    Ant Colony Optimization

    (2004)
  • D.E. Goldberg

    Genetic Algorithms in Search, Optimization and Machine Learning

    (1989)
  • S. Guan et al.

    An incremental approach to contribution-based feature selection

    Journal of Intelligence Systems

    (2004)
  • I. Guyon et al.

    An introduction to variable and feature selection

    Journal of Machine Learning Research

    (2003)
  • T. Golub

    Molecular classification of cancer: class discovery and class prediction by gene expression

    Science

    (1999)
  • I. Guyon et al.

    Gene selection for cancer classification using support vector machines

    Machine Learning

    (2002)
  • Cited by (0)

    Md. Monirul Kabir received the B.E. degree in Electrical and Electronic Engineering from Bangladesh Institute of Technology (BIT), Khulna, now Khulna University of Engineering and Technology (KUET), Bangladesh in 1999. He received a master of engineering degree in the department of Human and Artificial Intelligent Systems from University of Fukui, Japan in 2008. He obtained a doctor of engineering degree in the System Design Engineering from University of Fukui in March 2011. He was an assistant programmer from 2002 to 2005 at the Dhaka University of Engineering and Technology (DUET), Bangladesh. His research interest includes data mining, artificial neural networks, evolutionary approaches, swarm intelligence, and mobile ad hoc network.

    Md. Shahjahan is an Associate Professor at the Department of Electrical and Electronic Engineering, Khulna University of Engineering and Technology, Khulna, Bangladesh. He received B.E. from Bangladesh Institute of Technology (BIT) in January, 1996. He received M.E. in Information Science from University of Fukui, Japan, in 2003, D.E. at the Department of System Design Engineering from University of Fukui in 2006. He joined as a Lecturer at Department of Electrical and Electronic Engineering, KUET, in September, 1996 and as assistant professor in 2006. He received the best student award from IEICE, Hokuriku part, in the year 2003, Japan. He is a member of Institute of Engineers Bangladesh (IEB), Bangladesh. He has published a number of international conference and journal papers in different places of the world.

    Kazuyuki Murase is a Professor at the Department of Human and Artificial Intelligence Systems, Graduate School of Engineering, University of Fukui, Fukui, Japan, since 1999. He received M.E. in Electrical Engineering from Nagoya University in 1978, Ph.D. in Biomedical Engineering from Iowa State University in 1983. He Joined as a Research Associate at Department of Information Science of Toyohashi University of Technology in 1984, as an Associate Professor at the Department of Information Science of Fukui University in 1988, and became the professor in 1992. He is a member of The Institute of Electronics, Information and Communication Engineers (IEICE), The Japanese Society for Medical and Biological Engineering (JSMBE), The Japan Neuroscience Society (JSN), The International Neural Network Society (INNS), and The Society for Neuroscience (SFN). He serves as a Councilor of Physiological Society of Japan (PSJ) and Japanese Association for the Study of Pain (JASP).

    View full text