Multimodal particle swarm optimization for feature selection

https://doi.org/10.1016/j.asoc.2021.107887Get rights and content

Highlights

  • First attempt to study the feature selection problem from the perspective of multimodal optimization.

  • A novel multimodal niching particle swarm optimization framework to locate all the best feature combinations.

  • New strategies on Hamming distance measurement, niching techniques, and particle velocity update.

  • Find much more multimodal solutions than the other algorithms.

Abstract

The purpose of feature selection (FS) is to eliminate redundant and irrelevant features and leave useful features for classification, which can not only reduce the cost of classification, but also improve the classification accuracy. Existing algorithms mainly focus on finding one best feature subset for an optimization target or some Pareto solutions that best fit multiple targets, neglecting the fact that the FS problem may have more than one best feature subset for a single target. In fact, diffident feature subsets are likely to exhibit similar classification ability, so the FS problem is also a multimodal optimization problem. This paper firstly attempts to study the FS problem from the perspective of multimodal optimization. A novel multimodal niching particle swarm optimization (MNPSO) algorithm, aiming at finding out all the best feature combinations in a FS problem is proposed. Unlike traditional niching methods, the proposed algorithm uses the Hamming distance to measure the distance between any two particles. Two niching updating strategies are adopted for multimodal FS, and the two proposed variants of MNPSO are termed MNPSO-C (using crowding clustering) and MNPSO-S (using speciation clustering) respectively. To enable the particles in the same niche to exchange information properly, the particle velocity update is modified based on the best particle in the niche instead of the traditional globally best one. An external archive is applied to store the feature subsets with the highest classification accuracy. Datasets with various dimensions of attributes have been tested. Particularly, the number of multimodal solutions and the successful rates of the proposed algorithms have been extensively analyzed and compared with the state-of-the-art algorithms. The experimental results show that the proposed algorithms can find more multimodal feature solutions and have advantages in classification accuracy.

Introduction

Classification is an important task in machine learning and data mining. Its purpose is to assign each data instance to the right category according to the characteristic of the instance. In many real world classification problems, samples to be classified have a large number of features. If the features are not screened, the cost of classification can be considerable [1]. Moreover, there may be redundancy among features. Some features can be noisy or even irrelevant in terms of predicting the class label. Inclusion of these features may not be helpful, but even detrimental to classification performance. As a pre-processing operation, feature selection (FS) is to eliminate redundant and irrelevant features and select only useful features for classification. It is conducive to the reduction of computing cost and potential improvement of classification accuracy.

Based on the selection evaluation mechanisms, FS methods are mainly divided into two approaches: filters and wrappers [2]. The filter-based methods evaluate the statistical characteristics of the features and select the best ranked features into a feature subset [3]. As no classifiers are involved in the process of obtaining the feature subset, they have low computational cost and find wide applications. In contrast, the wrapper methods use classification algorithms to guide the search process in the evaluation. Paired with an effective search process, they tend to generate FS with better performance.

Traditional wrapper-based methods include search algorithms such as the sequential floating forward selection (SFFS) and sequential floating backward selection (SFBS) [4] in addition to the employment of a classifier. For the SFFS algorithm, its feature subset is an empty set at the beginning. In each iteration, L features are added to the feature subset to optimize the classification outcome, and then R features are deleted from the feature subset to optimize the classification again. The values of L and R can fluctuate during the iterative search. For the SFBS algorithm, its feature subset is a complete set at the beginning. In each round, contrary to SFFS, in SFBS R features are deleted from the feature subset, and L features are added to the feature subset to optimize the classification. These two algorithms combine the characteristics of the sequential forward selection, sequential backward selection, and the “plus-L-take away-R”, which lead to good performance. However, these traditional FS methods have the disadvantage of getting trapped in local optima easily [5], [6].

Population-based metaheuristic algorithms like genetic algorithm (GA) [7], [8], differential evolution (DE) [9], ant colony optimization (ACO) [10] [11] and particle swarm optimization (PSO) [12], [13], etc., have been widely used in FS research due to their stronger global search ability and faster convergence speed than traditional methods. The main idea of population-based metaheuristic algorithms is to maintain and evolve a population based on some certain rules. Evolutionary algorithms such as GA and DE are inspired by the Darwinian evolution theory to solve complex problems, whereas swarm intelligence methodologies (e.g. ACO and PSO) are mainly based on the emergent collective behavior of groups of living organisms. Especially for PSO, every particle has memory of its own best solution, and the particles all learn from the best solution found so far by the swarm. Those operations guide the swarm to find the global optimal solutions through iterations.

In previous research, FS is mainly considered a multi-objective optimization problem or a binary discrete problem. In the study of multi-objective FS [14], [15], it is generally used to minimize two conflicting targets such as the selected feature number and the classification error. Population-based metaheuristic algorithms have been applied to find the non-dominant solutions, and finally construct an optimal Pareto frontier. The non-dominated sorting GA (NSGA) II is one of the most used approaches to tackle multi-objective problems and it has been applied to solve FS problems [8]. A multi-objective PSO algorithm called CMDPSOFS [12] was proposed to apply PSO to solve FS. The algorithm adopted an external set to preserve the non-dominant solutions, and selected the optimal solutions from the external set by using a tournament selection based on the crowding distance. A multi-objective DE algorithm that used Pareto-dominated randomized local mutations was proposed in [16] to improve the convergence ability of the DE and used adaptive crossover to dynamically assign the crossover probability of each individual. Its follow-up extension was proposed in [17] based on minimizing the cost of the selected features and the classification error. By considering the fuzziness of feature cost, a fuzzy multi-objective FS method with PSO was proposed in [13].

Binary representation is generally used in solving FS problems, in which 1 or 0 is used to denote the selection of the feature or not. For PSO, although the particle positions are continuous real values, they can be converted into binary characteristic strings using some coding strategies. Binary PSO (BPSO) was proposed by Kennedy and Eberhart [18], and it has been gradually improved by other researchers. BPSO maps the velocity of the particle to the position through the Sigmoid function to form the characteristic string of 0 and 1. A chaos BPSO [19] was proposed based on the chaos theory for FS. An improved BPSO based on an adaptive disturbance method, termed the adaptive bare bone PSO (ABPSO) [20], [21] was proposed for solving discrete problems and it was modified for solving FS in [22]. Variants of PSO have also been applied in breast cancer prediction [23], [24] and multi-label FS [25]. There are other binary swarm intelligence algorithms proposed for FS in recent years, such as the binary grey wolf optimization algorithm (BGWO) [26], [27], frog leaping [28], and the binary dragon algorithm [29], etc. For a comprehensive review of swarm intelligence algorithms for feature selection, one can refer to [30].

In FS, different feature solutions are likely to correspond to the same or similar classification ability, which means FS may have a multimodal property. Currently, the mainstream methods of studying FS find either only a global optimal solution, or a group of non-dominant solutions with conflicting targets. It is significant for practical applications to find multiple global optimal solutions, and provide more options for practitioners to choose according to the actual needs. Therefore, studying the multimodal property of FS is very necessary.

The method to find multiple global optimal solutions is called multimodal optimization. At present, there are mainly three technologies to extend the algorithms from single-peak optimization to multimodal optimization [31], [32], [33]. The first one is to repeatedly use the same optimization algorithm to iteratively locate multiple optimization solutions of multimodal functions. The second is to use explicit parallel sub-population methods to divide the whole population into multiple sub-populations evolving in parallel, such as the adaptive isolation model [34], the island model [35], and multi-swarm [36]. The third is to use the implicit parallel sub-population approach to preserve population diversity by introducing niche or speciation techniques, such as species conservation [37].

In recent years, researches on multimodal optimization have gradually focused on proposing new niche techniques [38]. They use niche techniques to divide the population into multiple sub-populations, and then allow these sub-populations to iterate independently to find out all global optimal solutions and local optimal solutions [39], [40], [41]. When searching for an optimal solution, these multimodal algorithms can judge whether the optimal solution is found according to the determined function model. However, traditional multimodal algorithms cannot be directly used to solve the FS problem.

Although the FS problem also has the multimodal property that different features combinations may correspond to the same classification ability, it does not have an explicit continuous function model. In fact, the evaluation function of FS is discrete and difficult. This demands new niching techniques to be developed for dividing the swarm into sub-populations so that multimodal optima can be found.

In order to study the multimodal characteristics of FS, this paper proposes two novel multimodal niching PSO (MNPSO) algorithms. Two fundamental niching strategies, i.e. crowding and speciation in [38], are adopted in this paper. Hence the two proposed variants of MNPSO are termed MNPSO-C (using crowding clustering) and MNPSO-S (using speciation clustering) respectively. The MNPSO-C divides the population according to the distance between the individual and the random reference points, whereas the MNPSO-S divides the population according to the fitness of the swarm. Compared with existing work, the novelties and contributions of the MNPSO algorithm are summarized as follows.

  • (1)

    MNPSO adopts a multimodal niching strategy to solve the FS problem. In the division of niches, the Hamming distance is used to distinguish the distance between particles. The particles with large differences are divided into different niches, and the particles with small differences are grouped into the same niche. The target of the optimization is to find as many different best solutions as possible.

  • (2)

    For the particle velocity updating process, the traditional global optimal particle position is replaced by the best particle position in each niche, so that particles can learn from their own niche population, aiming to adapt the algorithm to find more different peaks.

  • (3)

    The update of the historical optimal position of a particle is modified by adding a condition when two different solutions have the same classification accuracy. If the classification accuracy of the new best solution of the particle is equal to that of the recorded historical optimal solution, there is 50% possibility to update the historical optimal solution using the new solution. The modification can increase the diversity and prevent particles from trapping in the same peak.

  • (4)

    An external archive is added for solution reservation, and a screening rule is applied to delete duplicate solutions.

  • (5)

    We compare the performance of the proposed algorithm with several state-of-the-art algorithms on FS problems with different numbers of attributes of various characteristics. The experimental results indicate that the proposed MNPSO algorithm outperforms the others both on the classification accuracy and the number of multimodal solutions.

The rest of the paper is organized as follows: Section 2 introduces basic operations of PSO, including the updating of velocity and positions, the encoding scheme, and the evaluation function. Section 3 describes the framework and algorithmic steps of the proposed MNPSO algorithm. Section 4 presents the experimental design and parameter setting. Comparison results are given in Section 5, with the two implementations of the MNPSO algorithm put in contrast with other mainstream algorithms. Finally, Section 6 makes the conclusion.

Section snippets

Particle swarm optimization for FS

This section introduces the traditional PSO algorithm, along with the coding scheme and the evaluation function to be used for FS in the paper.

The proposed algorithms

In this section, the proposed multimodal niching PSO (MNPSO) algorithm for FS is described in detail. First, the multimodal niching methods for FS are described. Then the particle updating strategies are introduced. Finally, the framework of the proposed algorithm is presented.

Datasets

Table 1 shows the benchmark datasets that are used to verify the performance of the proposed multimodal algorithm. The datasets are from the UCI database [54] and the LIBSVM database [55] with different number of attributes from 10 to 2000, categories from 2 to 11, and samples from 101 to 2310. In these datasets, the first seven datasets (1 to 7) from Vowel to Segmentation are used as the low dimensional datasets for their optimal solutions can be found by enumeration. The rest datasets (8 to

Analysis on low dimensional datasets

For the first seven datasets in Table 1, the number and peak value of the optimal peaks can be determined by the enumeration method because of their low dimensions. Before comparing the performance of the algorithms, the enumeration results for the first four datasets are illustrated in Fig. 6. Notice that the numbers of attributes for the first four datasets are from 10 to 14, thus the numbers of possible 0/1 feature combinations are from 210=1024 to 214=16384. For the datasets having more

Conclusions

The paper studies the multimodal characteristics of FS and proposes two novel multimodal PSO algorithms MNPSO-C and MNPSO-S. The MNPSO-C and MNPSO-S use the crowding distance and speciation clustering niching respectively to divide the population. Through particle updates and external set selection, the optimal individuals are saved to guide the search of PSO. To demonstrate the performance of multimodal algorithms, empirical comparison studies have been carried out on a number of FS

CRediT authorship contribution statement

Xiao-Min Hu: Conceptualization, Methodology, Software, Resources, Formal analysis, Data curation, Writing – original draft. Shou-Rong Zhang: Software, Validation. Min Li: Resources, Supervision. Jeremiah D. Deng: Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC) Grant No. 61772142, in part by Natural Science Foundation of Guangdong No. 2019A1515011270, in part by Pearl River S&T Nova Program of Guangzhou, China No. 201806010059.

References (55)

  • ChoiS.H. et al.

    Efficient ranking and selection for stochastic simulation model based on hypothesis test

    IEEE Trans. Syst. Man Cybern. Syst.

    (2018)
  • HamdaniT.M. et al.

    Multi-objective feature selection with NSGA II

  • SameenM.I. et al.

    Integration of ant colony optimization and object-based analysis for LiDAR data classification

    IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.

    (2017)
  • FernandesC.M. et al.

    KANTS: a stigmergic ant algorithm for cluster analysis and swarm art

    IEEE Trans. Cybern.

    (2017)
  • XueB. et al.

    Particle swarm optimization for feature selection in classification: A multi-objective approach

    IEEE Trans. Cybern.

    (2013)
  • HuY. et al.

    Multiobjective particle swarm optimization for feature selection with fuzzy cost

    IEEE Trans. Cybern.

    (2020)
  • AbdollahzadehB. et al.

    A multi-objective optimization algorithm for feature selection problems

    Eng. Comput.

    (2021)
  • A.A. Bidgoli, H. Ebrahimpour-Komleh, S. Rahnamayan, A novel multi-objective binary differential evolution algorithm for...
  • Y. Zhang, M. Rong, D. Gong, A multi-objective feature selection based on differential evolution, in: 2015 International...
  • ZhangY. et al.

    Multi-objective particle swarm optimization approach for cost-based feature selection in classification

    IEEE/ACM Trans. Comput. Biol. Bioinform.

    (2017)
  • J. Kennedy, R.C. Eberhart, A discrete binary version of the particle swarm algorithm, in: 1997 IEEE International...
  • BlackwellT.

    A study of collapse in bare bones particle swarm optimization

    IEEE Trans. Evol. Comput.

    (2012)
  • ZhangY. et al.

    Adaptive bare-bones particle swarm optimization algorithm and its convergence analysis

    Soft Comput.

    (2014)
  • C. Li, H. Hu, H. Gao, et al. Adaptive bare bones particle swarm optimization for feature selection, in: 2016 Chinese...
  • SakriS.B. et al.

    Particle swarm optimization feature selection for breast cancer recurrence prediction

    IEEE Access

    (2018)
  • Nurhayati, F. Agustian, M.D.I. Lubis, Particle swarm optimization feature selection for breast cancer prediction, in:...
  • BayatiH. et al.

    Mlpso: a filter multi-label feature selection based on particle swarm optimization

  • Cited by (27)

    View all citing articles on Scopus
    View full text