Parallel alternatives for evolutionary multi-objective optimization in unsupervised feature selection

https://doi.org/10.1016/j.eswa.2015.01.061Get rights and content

Highlights

  • Multiobjective unsupervised feature selection with many decision variables is tackled.

  • EEG signals for Brain–Computer Interface (BCI) applications are used as benchmarks.

  • Cooperative evolutionary algorithms for multiobjective optimization are given.

  • Parallel implementations obtain quality results in terms of hypervolume and speedup.

  • Superlinear speedups are justified by adjusting models to experimental results.

Abstract

Many machine learning and pattern recognition applications require reducing dimensionality to improve learning accuracy while irrelevant inputs are removed. This way, feature selection has become an important issue on these researching areas. Nevertheless, as in past years the number of patterns and, more specifically, the number of features to be selected have grown very fast, parallel processing constitutes an important tool to reach efficient approaches that make possible to tackle complex problems within reasonable computing times. In this paper we propose parallel multi-objective optimization approaches to cope with high-dimensional feature selection problems. Several parallel multi-objective evolutionary alternatives are proposed, and experimentally evaluated by using some synthetic and BCI (Brain–Computer Interface) benchmarks. The experimental results show that the cooperation of parallel evolving subpopulations provides improvements in the solution quality and computing time speedups depending on the parallel alternative and data profile.

Introduction

Many relevant applications imply high-dimensional pattern classification or modelling tasks where feature selection techniques must be applied to reduce the dimensionality, and therefore, to remove redundant, noisy-dominated, or irrelevant features. In particular, dimensionality reduction is very important when the number of features in the input pattern is higher than the number of available patterns. Thus, feature selection is important for increasing the learning accuracy and result comprehensibility.

An interesting review on feature selection techniques used in bioinformatics is provided in Saeys, Inza, and Larrañaga (2007) along with analyses and references of feature selection in bioinformatics applications such as sequence analysis, microarray analysis, and mass spectra analysis. For example, one of the problems in sequence analysis is the identification of relevant motifs, by relating them with levels of gene expression through regression models where feature selection is useful to improve the model fitting. In the prediction of protein function from sequence, feature selection techniques can be useful for determining relevant amino acid subsets. Dimension reduction in the input patterns has been also applied to Electroencephalogram (EEG) classification for recognizing epileptiform patterns (Acir & Güzelis, 2004). Precisely, EEG classification has to cope with Lotte, Congedo, Lcuyer, Lamarche, and Arnaldi (2007) (1) the presence of noise or outliers in the features (as EEG signals have a low signal-to-noise ratio); (2) the need to represent time information in the features (as the brain patterns are usually related to changes in time in the EEG signals); (3) the non-stationarity of EEG signals, that may change quickly over time or experiments; and (4) the low number of patterns (EEGs) available for training (as the experimental work required to register the EEGs for different events is time consuming). As the solution to these problems usually implies the increase of the dimensionality of the feature vectors, the classification of EEG signals, for example in BCI applications, has to be accomplished from relatively few feature vectors of very high dimensionality. This circumstance determines the so called curse-of-dimensionality problem, as the number of patterns needed to properly define the different classes increases very fast with the dimension of the feature vectors )from five to ten times as many training samples per class as the dimension (Raudys & Jain, 2014).

This way, feature selection will reduce the dimension of the input patterns to be classified and thus it makes possible to: (1) decrease the computational complexity of the procedure, (2) remove irrelevant/redundant features that would make the learning of the classifier more difficult, and (3) avoid the curse of dimensionality in problems with many features and a low number of available data to be classified. Nevertheless, as the size of the search space depends exponentially on the number of possible features, an exhaustive search of the best set is not feasible, even for a modest number of features. Procedures based on branch-and-bound, simulated annealing, or evolutionary algorithms have been proposed. Moreover, parallel processing is as an interesting alternative to take advantage of high performance computer architectures for feature selection in high-dimensional cases.

This paper deals with parallel processing in feature selection, considered as a multi-objective optimization problem. In papers such as Emmanouilidis et al., 2000, Kim et al., 2002, Morita et al., 2003, Oliveira et al., 2003, Handl and Knowles, 2006, Mierswa and Wurst, 2006, Huang et al., 2010, feature selection for either supervised or unsupervised classification problems has been recently approached as a multi-objective optimization problem. Indeed, in Mierswa and Wurst (2006), it is shown that feature selection in unsupervised learning problems is inherently a multi-objective problem. With respect to the use of parallel processing for feature selection, related papers are Garcia et al., 2004, De Souza et al., 2006, Guillén et al., 2009, Zhao et al., 2013, Sun, 1991. In Sun (1991), a MapReduce model is used to obtain a parallel implementation of a feature selection procedure based on mutual information to evaluate the statistical dependence between variables. Parallel feature selection based on a forward–backward algorithm applied to a k-nearest neighbours clustering method, to separate genuine and non-genuine images in stenography problems, is shown in Guillén et al. (2009). In Zhao et al. (2013), a large-scale feature selection algorithm based on the abilities of the features to explain the data variance is proposed. Random features selection implemented in parallel is compared with a wrapper method in Garcia et al. (2004), where Support Vector Machines (SVMs) are used as the supervised learning algorithm for classification. Finally, De Souza et al. (2006) compare parallel implementations of different feature selection procedures previously proposed (including a genetic algorithm) with the FortalFS algorithm presented in the paper. A description of the essential characteristics to generate efficient parallel procedures from sequential ones is also provided by De Souza et al. (2006). Hence, these previous papers have not considered feature selection from a parallel multi-objective approach as we propose in this paper. The reason to use a multi-objective formulation of the feature selection problem is that the performance of a classifier is usually expressed not only by its accuracy for a given set of patterns but also by other measures that quantify properties such as the generalization capability. In this way, a multi-objective formulation can be considered a straightforward approach for feature selection. Along with the previous justifications given for using parallel processing and multi-objective optimization, the reasons for using unsupervised classification should be also provided. In this case, it has been taken into account that in many classification problems, the patterns are not labelled. Moreover, unsupervised learning is mandatory whenever the number of classes is unknown or to extract unknown relations among the features, thus being also a suitable approach to select the best set of features.

Once the relevance of feature selection and the reasons for a parallel multi-objective unsupervised learning approach to this problem have been summarized in this introduction, Section 2 deals with the formulation of feature selection as an unsupervised multi-objective optimization problem, while Sections 3 Parallel evolutionary multi-objective algorithms, 4 Parallel multi-objective unsupervised feature selection are devoted to its parallel implementation. More specifically, Section 3 reviews the parallel processing issues of evolutionary multi-objective optimization including references to previous works and Section 4 describes the parallel procedures we propose for unsupervised feature selection. The results obtained from the experiments performed on different benchmarks are presented and discussed in Section 5. Finally, the conclusions are given in Section 6.

Section snippets

Multiobjective optimization in unsupervised feature selection

Approaches for dimensionality reduction can be classified into two main alternatives, (1) feature space transformation through linear or non-linear transformations, and (2) the selection of a subset of features. In this paper, we consider feature selection, which can be defined as the search of a set of features which optimizes a cost function that evaluates the utility of these features according to the classifier performance once it has been trained with patterns whose components are the

Parallel evolutionary multi-objective algorithms

Parallel processing can be useful to improve the performance of serial multi-objective evolutionary optimization algorithms (Van Veldhuizen et al., 2003, Luna et al., 2006), not only by speeding up the execution times but also by improving the quality of the solutions found. Two decomposition alternatives are usually implemented in parallel algorithms: functional decomposition and data decomposition (hybrid alternatives are also possible). While functional decomposition techniques identify

Parallel multi-objective unsupervised feature selection

In the parallel procedures here proposed, we try to reach (as much as possible) disjoint subpopulations that provide a sufficiently diversified searching. The characteristics of many applications could make the strategies here described efficient, not only to find solutions with a good level of quality, but also to reduce the processing time by taking advantage of parallel processing.

Figs. 2 and 3 respectively show the Algorithms 1 and 2 approaches described in Section 3, whenever they are

Experimental results

In this section, the proposed parallel procedures are evaluated by using a set of benchmarks. As our research deals with the parallel implementation of a multi-objective unsupervised feature selection problem, we do not provide an exhaustive comparison of the classification performance of our multi-objective approach with that of other previously proposed procedures. Moreover, we also consider that many improvements in the clustering evaluation and in the classifier can be still done. With

Conclusions

This paper contributes to the parallel implementation of MOEA by using independent evolution of subpopulations that cooperate after a given number of independent generations. The approach supposes a significant advance for the resolution of multi-objective evolutionary problems with a high number of decision variables, such as the feature selection problem here considered. Moreover, feature selection has been tackled as an unsupervised multi-objective clustering problem. Although some proposals

Acknowledgments

This work has been funded by projects TIN2012–32039 (Spanish Ministerio de Economía y Competitividad and FEDER funds) and P11-TIC-7983 (Junta de Andalucía). The authors would like to thank the reviewers for their comments and suggestions.

References (49)

  • M. Cámara et al.

    Comparison of frameworks for parallel multiobjective evolutionary optimization in dynamic problems

  • C. Coello Coello et al.

    A coevolutionary multi-objective evolutionary algorithm

  • J. Cohen

    A coefficient of agreement for nominal scales

    Educational and Psychological Measurement

    (1960)
  • G. Corder et al.

    Nonparametric statistics for non-statisticians

    (2009)
  • D.L. Davies et al.

    A cluster separation measure

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1979)
  • T. De Souza et al.

    Parallelizing feature selection

    Algoritmica

    (2006)
  • K. Deb et al.

    A fast and elitist multiobjective genetic algorithm: NSGA-II

    IEEE Transactions on Evolutionary Computation

    (2002)
  • K. Deb et al.

    Distributed computing of Pareto-optimal solutions using multi-objective evolutionary algorithms

  • EEG motor activity data set. (2014). Project BCI - EEG motor activity data set (Computer Interface research at NUST...
  • C. Emmanouilidis et al.

    A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator

  • Garcia, D. J., Hall, L. O., Goldgof, D. B., & Kramer, K. (2004). A parallel feature selection algorithm from random...
  • C. Goh et al.

    A coevolutionary paradigm for dynamic multiobjective optimization

  • D.E. Goldberg

    Genetic algorithms in search, optimization and machine learning

    (1989)
  • J. Handl et al.

    Feature subset selection in unsupervised learning via multiobjective optimization

    International Journal of Computational Intelligence

    (2006)
  • Cited by (37)

    • Feature selection based multivariate time series forecasting: An application to antibiotic resistance outbreaks prediction

      2020, Artificial Intelligence in Medicine
      Citation Excerpt :

      Below are some of the most relevant works published during the last five years. Kimovski et al. [23] propose a parallel multi-objective optimization approach to cope with high-dimensional FS problems. Several parallel multi-objective evolutionary alternatives are proposed and experimentally evaluated by using some synthetic and BCI (Brain–Computer Interface) benchmarks.

    • Distributed multi-objective evolutionary optimization using island-based selective operator application

      2019, Applied Soft Computing Journal
      Citation Excerpt :

      To evolve the individuals of a subpopulation, the processor where it has been allocated can operate according to two main alternatives. One example of the application of the first alternative was described in [20]. The method presented in that work involves different workers that evolve sub-populations created and recombined by a master process, which performs different alternatives for the recombination/cooperation of the subpopulations returned by worker processes.

    View all citing articles on Scopus
    View full text