An efficient binary slime mould algorithm integrated with a novel attacking-feeding strategy for feature selection

https://doi.org/10.1016/j.cie.2020.107078Get rights and content

Highlights

  • Slime mould optimization algorithm is used for the first time for feature selection.

  • The new local search algorithm was developed to avoid local optima problem.

  • The proposed algorithms show high efficiency over the 28 datasets.

  • The proposed algorithms are compared with six well-known optimization algorithms.

  • The proposed algorithms outperformed all the competing ones.

Abstract

Feature selection refers to a process used to reduce the dimension of a dataset in order to obtain the optimal features subset for machine learning and data mining algorithms. This aids the achievement of higher classification accuracy in addition to reducing the training time of a learning algorithm as a result of the removal of redundant and less-informative features. In this paper, four binary versions of the slime mould algorithm (SMA) are proposed for feature selection, in which the standard SMA is incorporated with the most appropriate transfer function of eight V-Shaped and S-Shaped transfer functions. The first version converts the standard SMA, which has not been used yet for feature selection to the best of our knowledge, into a binary version (BSMA). The second, abbreviated as TMBSMA, integrates BSMA with two-phase mutation (TM) to further exploit better solutions around the best-so-far. The third version, abbreviated as AFBSMA, combines BSMA with a novel attacking-feeding strategy (AF) that trades off exploration and exploitation based on the memory saving of each particle. Finally, TM and AF are integrated with BSMA to produce better solutions, in a version called FMBSMA. The k-nearest neighbors (KNN) algorithm, one of the common classification and regression algorithms in machine learning, is used to measure the classification accuracy of the selected features. To validate the performance of the four proposed versions of BSMA, 28 well-known datasets are employed from the UCI repository. The experiments confirm the efficacy of the AF method in providing better results. Furthermore, after comparing the four versions, the FMBSMA version is shown to be the best compared with the other three versions and six state-of-art feature selection algorithms.

Introduction

Feature selection (FS) is an essential process in many fields such as biology, finance, and telecommunications to remove insignificant and redundant features that increase training time, and lead to over-fitting when applying machine learning methods (Abdel-Basset et al., 2020, Anter and Ali, 2020. When training a classification approach (such as artificial neural networks (Xu, Qiao, Peng, and Zhang (2004), decision tree (Hssina, Merbouha, Ezzikouri, and Erritali (2014), k-nearest neighbors (Amendolia, Gianfranco Cossu, Ganadu, Masala, and Mura (2003), and support vector machine (Amendolia et al. (2003) on a dataset with high dimensions may lead to the over-fitting problem, and subsequently the trained model is not able to predict the new test instances with sufficient accuracy. So, the FS process is used to remove redundant and noisy features of the original dataset features which then reduces the training time of the machine learning model and improves the classification accuracy. The classification process assigns each of the samples to a specific class. Techniques proposed in the literature for tackling FS problems fall into three categories: wrapper, filter, and embedded methods.

In wrapper methods, a machine learning algorithm is typically used to measure the accuracy of the selected features (Balasaraswathi, Sugumaran, & Hamid, 2017). Unfortunately, those methods waste time in measuring the classification accuracy of the extracted features due to using a machine learning algorithm to be trained on the training dataset based on the extracted subset of features, and then its performance is observed on the test dataset. However, when the number of features is drawn from data sets of large-scale problems, this method becomes unwieldy. As a result, filter methods were developed to measure the classification accuracy of the extracted features based on the properties of the data. Although filter methods reduce the time required, they cannot achieve the same classification accuracy as wrapper methods for the identified features. Embedded or hybrid methods are specialized in machine learning, and therefore have high classification accuracies with reduced time complexity.

Many techniques such as mutual information (Peng, Long, & Ding, 2005), information gain (Uğuz, 2011), relief (Urbanowicz, Meeker, La Cava, Olson, & Moore, 2018), depth search (Dash & Liu, 2003), and breadth search (Dash & Liu, 2003) are used to select the optimal subset of features (Chandrashekar & Sahin, 2014). However, such techniques are hamstrung in their search for the optimal subset for datasets with a very large number of features due to the number of combinations (solutions) that increase with the number of features. For example, for d features, the method needs to observe 2d combinations of the features to determine the optimal combination and practically, this is impossible. So, researchers have moved towards meta-heuristic algorithms that have shown promising performance in many fields and have become the best alternatives to alleviate the drawbacks of traditional search techniques. Despite the superiority of the meta-heuristics algorithms, however, most meta-heuristic algorithms still suffer from problems such as becoming trapped in local minima, the imbalance between the exploration and exploitation operators, and lack of diversity.

Recently, a new meta-heuristic algorithm known as the slime mould algorithm (SMA) has been proposed for solving continuous optimization problems Cheng, Li, Tian, Zhang, Yang, Jin, and Yao (2017). SMA simulates the nature of slime mould in using an oscillation mode to obtain the optimal path for collecting food with significant exploration ability and intensification propensity. The success of SMA in solving continuous optimization problems motivates us to propose a binary version for solving FS. To the best of our knowledge, the binary version of the SMA for the feature selection problem has not so far been proposed. Therefore, in this paper, we propose three variants from SMA in addition to the standard one for tackling the feature selection problem. The first version is the standard continuous SMA converted into a binary technique using selected families of V-Shaped and S-shaped transfer functions, identifying the best transfer function for the dataset. The second version integrates SMA with two-phase mutation to improve SMA’s exploitation ability. The third version is integrated with a novel attacking-feeding strategy to improve both the exploitation and the exploitation properties of the technique. In the final version, both two-phase mutation and attacking-feeding strategy are combined with the SMA. To check the classification accuracy of the extracted features, the k-nearest neighbors (KNN) algorithm is used; this algorithm is classified as one of the common classification and regression algorithms in machine learning. In addition, the four versions are validated using 28 well-known datasets taken from the UCI repository, and compared with a number of well-known existing algorithms. The experimental outcomes show the superiority of the FMBSMA version relative to the other three versions and six state-of-art feature selection algorithms. Finally, the main contributions of this work are as follows:

  • Proposing four versions of the standard SMA as binary versions for tackling FS problems

  • Investigating the performance of the standard SMA with V-Shaped and S-shaped transfer functions

  • Proposing a new strategy called the Attacking-Feeding strategy (AF) to explore and exploit more solutions based on the previous status of each particle.

  • After comparing the four versions with the other algorithms, the hybrid version that integrates the binary version of SMA (BSMA) with two-phase mutation (TM), and AF is shown to outperform all other algorithms used in the comparison.

The rest of this paper is organized as follows: Section 1 briefly reviews some of the works proposed for tackling FS, Section 2 describes the SMA, Section 3 explains and discusses the four proposed versions of BSMA which are compared with other algorithms in Section 4. Section 5 draws some conclusions and suggests future work.

Section snippets

Literature review

Meta-heuristics have been applied to a wide range of real-world problems, such as parameter extraction (Abdel-Basset, Mohamed, et al. 2020), DNA fragment assembly problem (Allaoui, Ahiod, and El Yafrani (2018), image segmentation problem (Abdel-Basset, Chang, and Mohamed (2020), job shop scheduling problems (Kurdi, 2019), feature selection (Abualigah, 2019, Abualigah et al., 2018b), and many other problems (Abualigah et al., 2018a, Abualigah et al., 2020). Our interest in this paper is in

Slime mould algorithm (SMA)

Recently, a new meta-heuristic called slime mould algorithm (SMA) has been proposed Li et al. (2020) mimicking the behavior of slime moulds in foraging. The mathematical model of SMA is divided into three phases: approach, wrap, and grabble food. The mathematical model of these three phases adapted from Li et al. (2020) is described as follows:

1) Approach food

Slime mode can reach the food based on its odor, which can be mathematically formulated as follows (Li et al., 2020):X(t+1)=Xbt+vbWX

Proposed algorithm

This section briefly describes the steps in adapting the standard SMA and the other three versions of SMA proposed here. The main steps of the proposed FS algorithms are: initialization, evaluation, transfer function, and two-phase mutation, and attacking-feeding strategy.

Experimental studies and discussion

In this section, the proposed approaches will be validated several instances of the small-, medium-, and large-scale to see their quality and stability. In addition, they will be compared with some state-of-the-arts under various performance metrics: classification accuracy, fitness value, selected features length, standard deviation, computational time, and Wilcoxon rank-sum test to show their superiority.

Conclusion and future work

In this work, four binary versions of the novel BSMA have been proposed for feature selection problems. Since the standard BSMA was proposed for solving continuous optimization problems and the feature selection activity utilizes discrete variables, a suitable transfer function must be used with BSMA to convert the continuous values into discrete values. After investigation, it is concluded that the appropriate choice of the transfer function depends heavily on the specific dataset to which the

Funding

This research has no funding source.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

CRediT authorship contribution statement

Mohamed Abdel-Basset: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Supervision, Validation, Visualization, Writing - original draft. Reda Mohamed: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft. Ripon K. Chakrabortty: Conceptualization, Methodology, Validation, Visualization, Writing - review & editing. Michael J. Ryan: Project

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (56)

  • S. Li et al.

    Slime mould algorithm: A new method for stochastic optimization

    Future Generation Computer Systems

    (2020)
  • M.M. Mafarja et al.

    Hybrid whale optimization algorithm with simulated annealing for feature selection

    Neurocomputing

    (2017)
  • M. Mafarja et al.

    Whale optimization approaches for wrapper feature selection

    Applied Soft Computing

    (2018)
  • S. Mirjalili et al.

    S-shaped versus V-shaped transfer functions for binary particle swarm optimization

    Swarm and Evolutionary Computation

    (2013)
  • J.P. Papa et al.

    Feature selection through binary brain storm optimization

    Computers & Electrical Engineering

    (2018)
  • B. Sahu et al.

    A novel feature selection algorithm using particle swarm optimization for cancer microarray data

    Procedia Engineering

    (2012)
  • H. Uğuz

    A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm

    Knowledge-Based Systems

    (2011)
  • R.J. Urbanowicz et al.

    Relief-based feature selection: Introduction and review

    Journal of Biomedical Informatics

    (2018)
  • Z.-B. Xu et al.

    A comparative study of two modeling approaches in neural networks

    Neural Networks

    (2004)
  • L. Zhang et al.

    Feature selection using firefly optimization for classification and regression models

    Decision Support Systems

    (2018)
  • E. Zorarpacı et al.

    A hybrid approach of differential evolution and artificial bee colony for feature selection

    Expert Systems with Applications

    (2016)
  • S. Aalaei et al.

    Feature selection using genetic algorithm for breast cancer diagnosis: Experiment on three different datasets

    Iranian Journal of Basic Medical Sciences

    (2016)
  • M. Abdel-Basset et al.

    A novel equilibrium optimization algorithm for multi-thresholding image segmentation problems

    Neural Computing and Applications

    (2020)
  • Abdel-Basset, M., El-Shahat, D., El-henawy, I., de Albuquerque, V. H. C., Mirjalili, S., 2020. A new fusion of grey...
  • L.M. Abualigah et al.

    Hybrid clustering analysis using improved krill herd algorithm

    Applied Intelligence

    (2018)
  • Abualigah, Laith Mohammad Qasim. 2019. Feature selection and enhanced krill herd algorithm for text document clustering...
  • L. Abualigah et al.

    Selection scheme sensitivity for a hybrid Salp Swarm Algorithm: Analysis and applications

    Engineering with Computers

    (2020)
  • A.M. Anter et al.

    Feature selection strategy based on hybrid crow search optimization algorithm integrated with chaos theory and fuzzy c-means algorithm for medical diagnosis problems

    Soft Computing

    (2020)
  • Cited by (58)

    • Symmetric uncertainty-incorporated probabilistic sequence-based ant colony optimization for feature selection in classification

      2022, Knowledge-Based Systems
      Citation Excerpt :

      In this study, six recently proposed state-of-the-art evolutionary algorithms designed specifically for FS, i.e., VS-CCPSO [26], VLPSO [15], ABACO [36], CPBGSA [55], WOA [29] and BSMA [56] are chosen for comparison with SPACO because of the following reasons: In addition, the specific parameter settings of all comparison algorithms were set based on the settings of the original studies [15,26,29,36,55,56]. In this section, we compare SPACO with its peers on 16 benchmark problems using five performance indicators, namely, the minimal classification error rate (MCE), precision, recall, F1-score, and area under curve (AUC), to verify its performance.

    View all citing articles on Scopus
    View full text