Evolutionary parallel extreme learning machines for the data classification problem

https://doi.org/10.1016/j.cie.2019.02.024Get rights and content

Highlights

  • First evolutionary parallel ELM algorithm for the data classification problem.

  • The ELM is enhanced with the feature selection.

  • The proposed algorithm tunes its parameters at run time.

  • The scalability of the proposed algorithm is (near)-linear.

  • The state-of-the-art algorithms are outperformed.

Abstract

This study proposes an Island Parallel Evolutionary Extreme Learning Machine algorithm (IPE-ELM) for the well-known data classification problem. The ELM is a fast and efficient machine learning technique with its single-hidden layer feed-forward neural network (SLFN). High prediction accuracy and learning speed of the ELM make it an elegant tool for the fitness calculation process of the evolutionary algorithms. The IPE-ELM algorithm combines the evolutionary genetic algorithms (for feature selection), ELM machine learning technique (for prediction accuracy calculation), parallel computation (for faster fitness evaluation), and parameter tuning (activation function selection and the number of hidden neurons) for the solution of this important problem. Each ELM that runs at a different processor selects one of four different activation functions (Sine, Cosine, Sigmoid and Hyperbolic Tangent) and uses a randomized number of hidden neurons to achieve higher prediction accuracy. The proposed algorithm provides high quality results with its (near)-linear scalability behavior. The IPE-ELM algorithm is compared with state-of-the-art data classification algorithms by using UCI benchmark datasets and significant improvements are reported in terms of prediction accuracy with reasonable execution times. The scalable IPE-ELM algorithm can be reported as the first island parallel evolutionary classification algorithm with its high prediction accuracy results that outperforms state-of-the-art algorithms in literature.

Introduction

Data classification is a crucial mining technique with its many applications in our daily life (Ian, Witten, & Frank, 2011). Scientists can identify, acquire knowledge and derive statistical/predictive models by making use of data classification techniques. The accuracy and the execution speed are important issues of the data classification process. One of the best means of extracting interesting and valuable patterns is making use of recent machine learning techniques. Dealing with large datasets (having many features and rows) requires advanced supervised machine learning techniques (Nasrabadi, 2007, Unler and Murat, 2010). Supervised machine learning techniques build a model to predict the class labels of data by using a set of training data. Recently, it is very common to have a large amount of data with several attributes/features and it is not an easy process to properly clean and classify such a large amount of data correctly and obtain the distilled information. In literature, there have been many supervised techniques proposed for the solution of the data classification problem. However, the demand for fast algorithms that work with high prediction accuracy is still a valuable research area.

A high quality classifier is a crucial part of a data classification process. The classifier should have a good prediction accuracy and a good generalization ability. The training speed of a classifier is another important point that should be considered. Extreme Learning Machine (ELM) is a recent and fast supervised machine learning technique with its high performance for the data classification problem (Guang-Bin Huang & Zhu, 2004). The learning speed of feed-forward neural networks is generally slow and it has been a major drawback in machine learning applications. Slow gradient-based learning algorithms are used to train neural networks and the parameters of the networks are tuned by using such learning techniques. These are the main reasons of the slow learning process. However, the ELM is a different machine learning technique for single-hidden layer feed-forward neural networks (SLFNs) that randomly chooses the number of hidden nodes and determines the output weights of SLFNs (Guang-Bin Huang & Siew, 2006). This property of the ELM makes it a suitable technique for intensive fitness chromosome evaluation of evolutionary genetic algorithms that select the best feature subset. The ELM has been applied to a lot of important problems and many studies are still under progress for improving the performance of this valuable machine learning technique (Huang, Wang, & Lan, 2011).

With recent developments in computer science, the need for real-time processing of large datasets presents big challenges to traditional ways of data processing (Bolón-Canedo & Alonso-Betanzos, 2012). For this reason, Feature Subset Selection (FSS) has attracted the attention of scientists to filter out unnecessary data and greatly reduce processing time (Mingkui Tan & Tsang, 2014). Ensemble-based wrapper methods (they use an exploration method for efficient FSS and use a machine learning method to measure the accuracy level) applied with FSS are providing good results for the data classification problem (Xiaowei Xue & Wu, 2017). The wrapper methods are computationally expensive tools since they need to compute many fitness values for the explored subsets. However, they are still the best performing methods.

We use a parallel computation environment to select the features of a dataset by using a genetic algorithm and apply ELM to the selected features to evaluate the prediction accuracy through fast evaluation of the instances. Genetic and ELM machine learning have been used before for solving the data classification problem. However, it is the first application of these methods with a parallel island genetic algorithm approach. We tune the parameters of the ELM dynamically and experimentally show that our method outperforms state-of-the-art metaheuristics. There have been earlier works that try to parallelize the ELM. Our method differs from the fact that we don’t parallelize the matrix multiplication phase of ELM, which is a trivial method that enables faster execution using parallel processing. This process requires intensive communication between processors and not scalable. Instead, we propose an island parallel method and execute as many ELM as the number of processors in the environment simultaneously. This approach provides an effective diversification mechanism for improving the population quality of the genetic algorithm.

Considering all the issues mentioned above, we propose a novel evolutionary island parallel ELM-based classifier for the data classification problem. To the best of our knowledge, the IPE-ELM algorithm is the first island parallel machine learning algorithm in literature that has been applied to the data classification problem (Aggarwal, 2014). The IPE-ELM generates diversified populations at each processor’s memory and improves the population’s fitness qualities independently. Our approach provides a very effective diversification mechanism for increasing the population quality of the genetic algorithm by initializing random number generator of each processor with a different seed. At the termination phase of the processes at each processor, the best results of the slave nodes are collected by the master node and the overall best solution is reported.

Parallel machine learning is a new developing research area and designing scalable parallel algorithms in this field is challenging. In our opinion, the IPE-ELM algorithm is a unique algorithm with its efficient features when compared with other algorithms in this domain. Four different activation functions are used during the classification process, namely, Sine, Cosine, Sigmoid and Hyperbolic Tangent. It is not always possible to choose a single type of activation function to be the best one for every possible dataset. Different activation functions can do better on varied datasets. Each processor randomly selects one of these activation functions and continues its optimization process. The number of hidden neurons is another criterion to be considered during the classification. The tuning of this parameter greatly affects performance. This issue is also observed and the results obtained in our experiments are reported. Therefore, each processor decides a different number of hidden neurons (in the range of [2–10] % of the instance size) and runs the ELM. The size of the population has been chosen as 70 after comprehensive tests. The best performing convergence, truncation, crossover and mutation ratios are applied in all processors of the parallel distributed memory environment by using Message Passing Interface (MPI) libraries. The parallel and diversified populations of the IPE-ELM algorithm provide a stagnation prevention mechanism. At each processor, we select different seeds for the randomization of all parameters in the FSS and the ELM phases of the algorithm, which prevents the genetic algorithm from exploring the same areas of the search space repeatedly. Comprehensive experiments comparing our scalable algorithm with state-of-the-art classification algorithms show that the IPE-ELM algorithm outperforms them in terms of prediction accuracy values with reasonable execution times.

Some of the state-of-the-art metaheuristics that have been applied to the data classification problem are Particle Swarm Optimization (PSO) (Unler & Murat, 2010), Attribute Bagging (AB), (a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features) (Bryll & Gutierrez, 2003), Multi-View Adaboost (MVA) (Xu, 2010), Random Subspace Method for constructing decision forests (RSE) (Ho, 1998), Correlation based Feature Selection (CFS-SFS) (Hall, 1998), C4.5 (Quinlan, 1996), Hybrid Genetic Algorithm and ELM-based feature selection algorithm (HGEFS) (Xiaowei Xue & Wu, 2017), Advanced Binary Ant Colony Optimization (ABACO) (Shima Kashef, 2015), and ACO-based feature selection algorithm (ACOFS) (Chen, Chen, & Chen, 2013). The algorithms mentioned here need to calculate the fitness of each next possible better solution, which is the most time-consuming part of these algorithms. In our experiments, we compare our solutions with the results of these algorithms.

In Section 2, related studies for the state-of-the-art ELM techniques and data classification algorithms are given. The details of the ELM are presented in Section 3. The proposed IPE-ELM algorithm is introduced in Section 4. The setup of the experimental environment, obtained results of the experiments, and comparison with state-of-the-art methods are reported in Section 5. Concluding remarks are provided in the last section.

Section snippets

Related work

In this section, we give information about the ELM, FSS, state-of-the-art evolutionary data classification techniques and parallel implementations of the ELM. Huang et al. introduced the ELM in 2004 (Guang-Bin Huang & Zhu, 2004). Huang et al. propose the ELM for the classification of the standard optimization method and enhances the ELM to a SLFNs support vector network (Guang-Bin Huang & Zhou, 2010). Guang-Bin Huang and Ding (2012) show that least square SVM (LS-SVM) and proximal SVM (PSVM)

Extreme learning machines

In this section, we give information about the ELM used by the IPE-ELM algorithm (Guang-Bin Huang and Siew, 2006, Guang-Bin Huang and Zhu, 2004). The ELM uses an SLFN with a learning speed faster than traditional feed-forward network learning algorithms (e.g. back-propagation (BP)) (see Fig. 1 for SLFN). Due to its simplicity, remarkable efficiency, and impressive performance on generalization, the ELM has been applied in a variety of domains, such as computer vision, bioinformatics, data

Island Parallel Evolutionary Extreme Learning Machine Algorithm (IPE-ELM)

In this section, we introduce our proposed island parallel evolutionary algorithm, IPE-ELM. The main goal of the algorithm is to discover the best subset of features that will produce the highest prediction accuracy for the data classification problem. The IPE-ELM algorithm has two main components (phases), evolutionary computation (for selecting feature subsets) and the ELM (for finding the prediction accuracy of the selected features).

Island parallel genetic algorithms are novel

Performance evaluation of the IPE-ELM algorithm

In this section, we present the experimental setup, the results of a series of experiments carried out to evaluate the prediction accuracy of the IPE-ELM algorithm, the parameter sensitivity of the algorithm and comparison with state-of-the-art data classification algorithms in literature. We carry out experiments on 8 core 64-bit CPU. It is possible to create 8 threads at each core (providing 64 possible cores simultaneously). The server uses 256 GB RAM and 1.5 TB. hard-disk storage.

The

Conclusions and future work

In this study, we present a novel Island Parallel Evolutionary Extreme Learning Machine algorithm (IPE-ELM) for the data classification problem. The ELM is an efficient machine learning technique with fast speed learning capability and high prediction accuracy. In addition, it is well known that effective feature selection methods can improve the quality of the ELM. We combine ELM with a parallel evolutionary algorithm and propose a robust data classification algorithm. Activation functions

Tansel DÖkeroglu received his B.S. in Mechanical Engineering Department of Turkish Military Academy in 1991. He received M.S. and Ph.D. degree in Computer Science Department of Middle East Technical University in 2006 and 2014 respectively. He worked in Ministry of Defence, Turkish General Staff and Land Forces as software engineer, database administrator, decision support expert, and distance learning system administrator. Currently, he is the director of Research and Development Department of

References (42)

  • R.N. Bo He et al.

    Fast face recognition via sparse coding and extreme learning machine

    Cognitive Computation

    (2013)
  • S.-M. Bolón-Canedo et al.

    A review of feature selection methods on synthetic data

    Knowledge and Information Systems

    (2012)
  • O.R. Bryll et al.

    Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets

    Pattern Recognition

    (2003)
  • E. Cantú-Paz

    A survey of parallel genetic algorithms

    Calculateurs paralleles, reseaux et systems repartis

    (1998)
  • T.D.A. Deniz et al.

    Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques

    Neurocomputing

    (2017)
  • T. Dokeroglu et al.

    Optimization of one-dimensional bin packing problem with island parallel grouping genetic algorithms

    Computers & Industrial Engineering

    (2014)
  • L.J.J. García-Nieto et al.

    Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis

    Information Processing Letters

    (2009)
  • H.Z. Guang-Bin Huang et al.

    Extreme learning machine for regression and multiclass classification

    IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)

    (2012)
  • Q.-Y.Z. Guang-Bin Huang et al.

    Extreme learning machine: Theory and applications

    Neurocomputing

    (2006)
  • X.D. Guang-Bin Huang et al.

    Optimization method based extreme learning machine for classification

    Neurocomputing

    (2010)
  • C.-K.S. Guang-Bin Huang et al.

    Extreme learning machine: A new learning scheme of feedforward neural networks

  • Cited by (22)

    • A comprehensive survey on recent metaheuristics for feature selection

      2022, Neurocomputing
      Citation Excerpt :

      The speed of the classifier is a serious criterion while selecting the learning algorithm as thousands of fitness evaluations are performed during the experiments. Better results can be observed with faster machine learning algorithms such as Extreme Learning Machines [104,105]. SVM can achieve better classification performance, but it is computationally an expensive classifier.

    • An empowered AdaBoost algorithm implementation: A COVID-19 dataset study

      2022, Computers and Industrial Engineering
      Citation Excerpt :

      The aim is to find a generic solution in a reasonable amount of time after optimizing an improved GA. Similarly, in (Deniz, Kiziloz, Dokeroglu, & Cosar, 2017 and Dokeroglu & Sevinc, 2019), some filtering mechanisms and methodologies supported by extreme learning machines have been developed and experimented with for feature subset selection. These studies are good examples of filter-based feature selection approach while Xue et al. (Xue et al., 2019) presents a wrapper feature selection algorithm for classification.

    • Deep transfer Wasserstein adversarial network for wafer map defect recognition

      2021, Computers and Industrial Engineering
      Citation Excerpt :

      Thus, it is necessary to learn features directly from wafer maps to automatically capture effective features, so as to ensure the industrial applicability of intelligent diagnosis systems. Deep learning has been widely applied in various fields because they are increasingly able to extract features from a large amount of data (Dokeroglu & Sevinc, 2019; Jiao, Jia, & Cai, 2018; LeCun, Bengio, & Hinton, 2015). Some researchers have employed DNNs, e.g., stacked denoising autoencoder (SDAE), convolutional neural networks (CNNs) for classification of wafer map defects.

    • An evolutionary parallel multiobjective feature selection framework

      2021, Computers and Industrial Engineering
      Citation Excerpt :

      This fact shows that the quality of individuals improves as individuals evolve through generations. Finally, to verify the efficiency of our proposed framework, we compare our results with seven state-of-the-art methods in the literature: Particle Swarm Optimization (PSO) (Unler & Murat, 2010), Ant Colony Optimization (ABACO) (Kashef & Nezamabadi-pour, 2015), Grey Wolf Optimization (bGWO1 and bGWO2) (Emary, Zawbaa, & Hassanien, 2016), Genetic Algorithm (HGEFS) (Xue, Yao, & Wu, 2018), Grasshopper Optimization Algorithm (BGOA-M) (Mafarja et al., 2019), and Island Parallel Evolutionary Algorithm (IPE-ELM) (Dokeroglu & Sevinc, 2019). We share the maximum accuracy values obtained by all studies in Table 5.

    View all citing articles on Scopus

    Tansel DÖkeroglu received his B.S. in Mechanical Engineering Department of Turkish Military Academy in 1991. He received M.S. and Ph.D. degree in Computer Science Department of Middle East Technical University in 2006 and 2014 respectively. He worked in Ministry of Defence, Turkish General Staff and Land Forces as software engineer, database administrator, decision support expert, and distance learning system administrator. Currently, he is the director of Research and Development Department of SIMSOFT Computer Technologies Company in Teknokent/Ankara. He works as a project manager and consultant for TUBITAK, European FP7 and Horizon 2020 projects. His academic interests are big data query optimization, Cloud databases, mapreduce, discrete optimization, parallel/distributed genetic algorithms, machine learning, and business process modeling and optimization. He has more than 30 conference and journal articles on his research areas. He is a lecturer at the Computer Engineering Department of TED University Ankara, Turkey.

    Ender Sevinc

    He recieved his B.S. degree from Electric/Electronical Department in Military Academy in 1991. Then he received his M.S. and Ph.D. degrees from Computer Engineering Department in Middle East Technical University in 2000 and 2009 respectively. As an industrial experience, he finally worked as Simulation Engineer in NATO JFTC in Poland in 2016. Then he resumed his academic career in University of Turkish Aeronautical Association as an Assistant Professor in 2017. His study and publication areas are query optimization, deep learning and genetic algorithms.

    View full text