Incorporation of multimodal multiobjective optimization in designing a filter based feature selection technique
Graphical abstract
Introduction
A large number of features are a commonality among real-world machine learning problems but not all of them are useful. Many of these features may be redundant and irrelevant. To remove much of this redundancy and irrelevancy, dimensionality reduction is carried out during the preprocessing stage. This reduces the computational complexity and enhances the performance of the operations carried out on these datasets. Feature selection [1] is one of the ways to reduce the dimensions of feature space. This involves the selection of an optimal feature subset with cardinality d (dD, D is the total number of features in original feature space) which is based on predetermined evaluation criteria. It aims at achieving a satisfactory performance while keeping a minimal number of selected features.
There are four categories of feature selection algorithms available in the literature, namely: filter, wrapper, hybrid, and embedded. In Filter based methods [2], to evaluate the relevance of a feature subset its intrinsic properties such as information, correlation, consistency, etc. are used as basis. The under-performing features are discarded from the optimal feature subset. As there is no dependency on classifiers, these methods are often more generic, execute faster, require low computational complexity, and usually have lower accuracy [3].
On the other hand, wrapper approaches [4] are dependent on classification methods. In this, feature subsets are randomly selected and are evaluated with the help of some classifier. This method objectively aims at increasing accuracy by adding or removing features from the subset sequentially. In the process of achieving better accuracy, it generally suffers from over-fitting. Wrapper based approaches are not generic; the selected subset of features is heavily dependent on classifier used to evaluate quality. Thus a change in classifier leads to re-execution of the feature selection technique. Hybrid methods [5] use a combination of both wrapper and filter methods to select useful features, whereas the embedded approach integrates learning and feature selection into a single process.
There is a need for optimization of the number of selected features and the values of other objective functions (obtained from the quality measures of feature selection) in the process of feature selection. Feature selection algorithms work within the range of space which if large reduces the efficiency of searching techniques chosen for exploration of the search space to find the best optimal feature subset. Algorithms that come under evolutionary methods, with their strong search-ability can be chosen as preferable approaches to solve the above problem.
Evolutionary computational methods in feature selection have already been studied extensively. Some of the popular evolutionary algorithms adopted for such studies are Particle Swarm Optimization (PSO) [6], [7], [8], Genetic Algorithms (GA) [9], Differential Evolution (DE) [10], Ant Colony Optimization (ACO) [11], [12], and so on. Initial researches focused on single objective evolutionary algorithms as a possible method to arrive at the optimal feature subset. This, however, did not cater to the existence of multiple objectives that were required to be optimized in feature selection, such as the number of selected features, the error rate (1 - accuracy) in case of wrapper, computing time and so on.
This has led to adoption of multiobjective evolutionary algorithms to select feature subsets [6], [13]. The use of a traditional (unimodal) multiobjective optimization algorithm to solve these problems provides only a limited number of possible Pareto-optimal solutions, thereby leading to the possibility of leaving out some of the important feature subsets. Since different feature subsets have same/similar objective values, many of these feature selection problems may be put under the category of multimodal optimization problems.
Multimodal multiobjective optimization is a technique that combines the usefulness of both the methods (multimodal optimization and multiobjective optimization) by maintaining optimal solutions having similar objective values while simultaneously optimizing two or more objective functions. One of the first attempts in solving the feature selection problem using multimodal optimization algorithm was undertaken by Kamyab and Eftekhari [14] who had used only multimodal single objective algorithm. Some recent researches have popularly developed some filter-based feature selection approaches [15], [16] in the multiobjective environment but none of them dealt with the multimodal aspect in feature selection.
In [17], the feature selection problem has been solved as a multimodal multiobjective optimization problem. The number of selected features and the classification error rate are two objective functions used in this work. Ring-based PSO is used to induce multiple niches, and special crowding distance (SCD) is used to maintain more optimal solutions. In this work, the wrapper based method is used to evaluate the feature subsets, and thus the performance is dependent on the classifier used to evaluate the feature-subset quality. This approach is not generic also.
Current work aims to merge the effectiveness of using filter-based approaches with the multimodal multiobjective algorithm for feature selection. In order to quantify the quality of feature-subsets obtained, several information-theoretic measures like mutual information and correlation with respect to classes are considered. Simultaneous optimization of a large number of objective functions is then performed using a multimodal multiobjective optimization technique (MMO). For this purpose, multi-objective PSO with ring topology and SCD (MO_RING_PSO_SCD) [18] are used. The location of multiple optima (multimodal) is achieved by adopting ring topology and diversity is ensured using SCD. A large set of experiments are performed on seven data sets in which all objective functions are simultaneously optimized which are then compared with those obtained by existing wrapper based MMO in feature selection.
Some of the important contributions of our present work are listed below:
- (1)
The approach adopted in this work (Filter-based) is quite novel in the sense that it has not yet been utilized in a multimodal multiobjective environment to solve the feature selection problem to the best of our knowledge.
- (2)
It provides a more significant number of solutions hence enables the decision-makers to choose one of their preferences.
- (3)
Experimental results show that some of the obtained solutions give better accuracy values for most of the datasets used in this work.
This paper is further divided into four sections. Section 2 contains a description regarding the preliminaries of MMO and different feature subset quality measures. Section 3 explains the methodology of multimodal multiobjective filter-based feature selection. Experimental results and comparative study find a mention in Section 4 which is succeeded finally by Section 5 which contains the conclusion.
Section snippets
Background
This section contains the basics of multimodal multiobjective optimization and various quality measures to evaluate feature subsets in brief.
Methodology
This section contains a detailed description of the proposed multimodal multiobjective filter based optimization technique for feature selection. The mathematical formulation of theproblem-statement is as follows:
Input: A dataset with N samples, having a total number of F features.
Output: A proper subset F from F such that the given two conditions below are satisfied:
- (1)
Simultaneously optimize all objective functions used in this work.
- (2)
Maintain all feature subsets having similar objective
Datasets, experimental results and analysis
This section contains a brief introduction of datasets (available at UCI machine learning repository) used in this work, performance metrics, which measure the goodness of obtained feature subsets present in the Pareto optimal solutions, experimental results, and their thorough analyses.
Conclusion
In this paper, a filter based multimodal multiobjective optimization (MMO) framework is used to solve the problem of feature selection. Multimodality is induced using ring-based Particle Swarm Optimization (PSO) algorithms with special crowding distance (SCD). The ring-based PSO helps in finding more optimal feature subsets for feature selection problem while SCD allows maintaining these optimal solutions and enhancing the diversity. Different objective functions based on information-theoretic
CRediT authorship contribution statement
Kanchan Jha: Software, Validation, Formal analysis, Investigation, Writing - original draft. Sriparna Saha: Conceptualization, Methodology, Supervision, Writing - review & editing, Funding acquisition, Project administration.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgement
Dr. Sriparna Saha would like to acknowledge the support of Science and Engineering Research Board (SERB), India [ECR/2017/001915] of Department of Science and Technology India to carry out this research.
References (42)
- et al.
Wrappers for feature subset selection
Artificial Intelligence
(1997) - et al.
Feature selection based on rough sets and particle swarm optimization
Pattern Recognit. Lett.
(2007) - et al.
A ga-based feature selection and parameters optimizationfor support vector machines
Expert Syst. Appl.
(2006) - et al.
Feature subset selection using differential evolution and a wheel based search strategy
Swarm Evol. Comput.
(2013) - et al.
A new hybrid ant colony optimization algorithm for feature selection
Expert Syst. Appl.
(2012) - et al.
A feature selection method based on modified binary coded ant colony optimization algorithm
Appl. Soft Comput.
(2016) - et al.
Feature selection by multi-objective optimisation: Application to network anomaly detection by hierarchical self-organising maps
Knowl.-Based Syst.
(2014) - et al.
Feature selection using multimodal optimization techniques
Neurocomputing
(2016) - et al.
Unsupervised feature selection using an improved version of differential evolution
Expert Syst. Appl.
(2015) - et al.
A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms
Swarm Evol. Comput.
(2011)