Evolutionary parallel extreme learning machines for the data classification problem

doi:10.1016/j.cie.2019.02.024

Computers & Industrial Engineering

Volume 130, April 2019, Pages 237-249

https://doi.org/10.1016/j.cie.2019.02.024 Get rights and content

Highlights

•
First evolutionary parallel ELM algorithm for the data classification problem.
•
The ELM is enhanced with the feature selection.
•
The proposed algorithm tunes its parameters at run time.
•
The scalability of the proposed algorithm is (near)-linear.
•
The state-of-the-art algorithms are outperformed.

Abstract

This study proposes an Island Parallel Evolutionary Extreme Learning Machine algorithm (IPE-ELM) for the well-known data classification problem. The ELM is a fast and efficient machine learning technique with its single-hidden layer feed-forward neural network (SLFN). High prediction accuracy and learning speed of the ELM make it an elegant tool for the fitness calculation process of the evolutionary algorithms. The IPE-ELM algorithm combines the evolutionary genetic algorithms (for feature selection), ELM machine learning technique (for prediction accuracy calculation), parallel computation (for faster fitness evaluation), and parameter tuning (activation function selection and the number of hidden neurons) for the solution of this important problem. Each ELM that runs at a different processor selects one of four different activation functions (Sine, Cosine, Sigmoid and Hyperbolic Tangent) and uses a randomized number of hidden neurons to achieve higher prediction accuracy. The proposed algorithm provides high quality results with its (near)-linear scalability behavior. The IPE-ELM algorithm is compared with state-of-the-art data classification algorithms by using UCI benchmark datasets and significant improvements are reported in terms of prediction accuracy with reasonable execution times. The scalable IPE-ELM algorithm can be reported as the first island parallel evolutionary classification algorithm with its high prediction accuracy results that outperforms state-of-the-art algorithms in literature.

Introduction

Data classification is a crucial mining technique with its many applications in our daily life (Ian, Witten, & Frank, 2011). Scientists can identify, acquire knowledge and derive statistical/predictive models by making use of data classification techniques. The accuracy and the execution speed are important issues of the data classification process. One of the best means of extracting interesting and valuable patterns is making use of recent machine learning techniques. Dealing with large datasets (having many features and rows) requires advanced supervised machine learning techniques (Nasrabadi, 2007, Unler and Murat, 2010). Supervised machine learning techniques build a model to predict the class labels of data by using a set of training data. Recently, it is very common to have a large amount of data with several attributes/features and it is not an easy process to properly clean and classify such a large amount of data correctly and obtain the distilled information. In literature, there have been many supervised techniques proposed for the solution of the data classification problem. However, the demand for fast algorithms that work with high prediction accuracy is still a valuable research area.

A high quality classifier is a crucial part of a data classification process. The classifier should have a good prediction accuracy and a good generalization ability. The training speed of a classifier is another important point that should be considered. Extreme Learning Machine (ELM) is a recent and fast supervised machine learning technique with its high performance for the data classification problem (Guang-Bin Huang & Zhu, 2004). The learning speed of feed-forward neural networks is generally slow and it has been a major drawback in machine learning applications. Slow gradient-based learning algorithms are used to train neural networks and the parameters of the networks are tuned by using such learning techniques. These are the main reasons of the slow learning process. However, the ELM is a different machine learning technique for single-hidden layer feed-forward neural networks (SLFNs) that randomly chooses the number of hidden nodes and determines the output weights of SLFNs (Guang-Bin Huang & Siew, 2006). This property of the ELM makes it a suitable technique for intensive fitness chromosome evaluation of evolutionary genetic algorithms that select the best feature subset. The ELM has been applied to a lot of important problems and many studies are still under progress for improving the performance of this valuable machine learning technique (Huang, Wang, & Lan, 2011).

With recent developments in computer science, the need for real-time processing of large datasets presents big challenges to traditional ways of data processing (Bolón-Canedo & Alonso-Betanzos, 2012). For this reason, Feature Subset Selection (FSS) has attracted the attention of scientists to filter out unnecessary data and greatly reduce processing time (Mingkui Tan & Tsang, 2014). Ensemble-based wrapper methods (they use an exploration method for efficient FSS and use a machine learning method to measure the accuracy level) applied with FSS are providing good results for the data classification problem (Xiaowei Xue & Wu, 2017). The wrapper methods are computationally expensive tools since they need to compute many fitness values for the explored subsets. However, they are still the best performing methods.

We use a parallel computation environment to select the features of a dataset by using a genetic algorithm and apply ELM to the selected features to evaluate the prediction accuracy through fast evaluation of the instances. Genetic and ELM machine learning have been used before for solving the data classification problem. However, it is the first application of these methods with a parallel island genetic algorithm approach. We tune the parameters of the ELM dynamically and experimentally show that our method outperforms state-of-the-art metaheuristics. There have been earlier works that try to parallelize the ELM. Our method differs from the fact that we don’t parallelize the matrix multiplication phase of ELM, which is a trivial method that enables faster execution using parallel processing. This process requires intensive communication between processors and not scalable. Instead, we propose an island parallel method and execute as many ELM as the number of processors in the environment simultaneously. This approach provides an effective diversification mechanism for improving the population quality of the genetic algorithm.

Considering all the issues mentioned above, we propose a novel evolutionary island parallel ELM-based classifier for the data classification problem. To the best of our knowledge, the IPE-ELM algorithm is the first island parallel machine learning algorithm in literature that has been applied to the data classification problem (Aggarwal, 2014). The IPE-ELM generates diversified populations at each processor’s memory and improves the population’s fitness qualities independently. Our approach provides a very effective diversification mechanism for increasing the population quality of the genetic algorithm by initializing random number generator of each processor with a different seed. At the termination phase of the processes at each processor, the best results of the slave nodes are collected by the master node and the overall best solution is reported.

Parallel machine learning is a new developing research area and designing scalable parallel algorithms in this field is challenging. In our opinion, the IPE-ELM algorithm is a unique algorithm with its efficient features when compared with other algorithms in this domain. Four different activation functions are used during the classification process, namely, Sine, Cosine, Sigmoid and Hyperbolic Tangent. It is not always possible to choose a single type of activation function to be the best one for every possible dataset. Different activation functions can do better on varied datasets. Each processor randomly selects one of these activation functions and continues its optimization process. The number of hidden neurons is another criterion to be considered during the classification. The tuning of this parameter greatly affects performance. This issue is also observed and the results obtained in our experiments are reported. Therefore, each processor decides a different number of hidden neurons (in the range of [2–10] % of the instance size) and runs the ELM. The size of the population has been chosen as 70 after comprehensive tests. The best performing convergence, truncation, crossover and mutation ratios are applied in all processors of the parallel distributed memory environment by using Message Passing Interface (MPI) libraries. The parallel and diversified populations of the IPE-ELM algorithm provide a stagnation prevention mechanism. At each processor, we select different seeds for the randomization of all parameters in the FSS and the ELM phases of the algorithm, which prevents the genetic algorithm from exploring the same areas of the search space repeatedly. Comprehensive experiments comparing our scalable algorithm with state-of-the-art classification algorithms show that the IPE-ELM algorithm outperforms them in terms of prediction accuracy values with reasonable execution times.

Some of the state-of-the-art metaheuristics that have been applied to the data classification problem are Particle Swarm Optimization (PSO) (Unler & Murat, 2010), Attribute Bagging (AB), (a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features) (Bryll & Gutierrez, 2003), Multi-View Adaboost (MVA) (Xu, 2010), Random Subspace Method for constructing decision forests (RSE) (Ho, 1998), Correlation based Feature Selection (CFS-SFS) (Hall, 1998), C4.5 (Quinlan, 1996), Hybrid Genetic Algorithm and ELM-based feature selection algorithm (HGEFS) (Xiaowei Xue & Wu, 2017), Advanced Binary Ant Colony Optimization (ABACO) (Shima Kashef, 2015), and ACO-based feature selection algorithm (ACOFS) (Chen, Chen, & Chen, 2013). The algorithms mentioned here need to calculate the fitness of each next possible better solution, which is the most time-consuming part of these algorithms. In our experiments, we compare our solutions with the results of these algorithms.

In Section 2, related studies for the state-of-the-art ELM techniques and data classification algorithms are given. The details of the ELM are presented in Section 3. The proposed IPE-ELM algorithm is introduced in Section 4. The setup of the experimental environment, obtained results of the experiments, and comparison with state-of-the-art methods are reported in Section 5. Concluding remarks are provided in the last section.

Section snippets

Related work

In this section, we give information about the ELM, FSS, state-of-the-art evolutionary data classification techniques and parallel implementations of the ELM. Huang et al. introduced the ELM in 2004 (Guang-Bin Huang & Zhu, 2004). Huang et al. propose the ELM for the classification of the standard optimization method and enhances the ELM to a SLFNs support vector network (Guang-Bin Huang & Zhou, 2010). Guang-Bin Huang and Ding (2012) show that least square SVM (LS-SVM) and proximal SVM (PSVM)

Extreme learning machines

In this section, we give information about the ELM used by the IPE-ELM algorithm (Guang-Bin Huang and Siew, 2006, Guang-Bin Huang and Zhu, 2004). The ELM uses an SLFN with a learning speed faster than traditional feed-forward network learning algorithms (e.g. back-propagation (BP)) (see Fig. 1 for SLFN). Due to its simplicity, remarkable efficiency, and impressive performance on generalization, the ELM has been applied in a variety of domains, such as computer vision, bioinformatics, data

Island Parallel Evolutionary Extreme Learning Machine Algorithm (IPE-ELM)

In this section, we introduce our proposed island parallel evolutionary algorithm, IPE-ELM. The main goal of the algorithm is to discover the best subset of features that will produce the highest prediction accuracy for the data classification problem. The IPE-ELM algorithm has two main components (phases), evolutionary computation (for selecting feature subsets) and the ELM (for finding the prediction accuracy of the selected features).

Island parallel genetic algorithms are novel

Performance evaluation of the IPE-ELM algorithm

In this section, we present the experimental setup, the results of a series of experiments carried out to evaluate the prediction accuracy of the IPE-ELM algorithm, the parameter sensitivity of the algorithm and comparison with state-of-the-art data classification algorithms in literature. We carry out experiments on 8 core 64-bit CPU. It is possible to create 8 threads at each core (providing 64 possible cores simultaneously). The server uses 256 GB RAM and 1.5 TB. hard-disk storage.

The

Conclusions and future work

In this study, we present a novel Island Parallel Evolutionary Extreme Learning Machine algorithm (IPE-ELM) for the data classification problem. The ELM is an efficient machine learning technique with fast speed learning capability and high prediction accuracy. In addition, it is well known that effective feature selection methods can improve the quality of the ELM. We combine ELM with a parallel evolutionary algorithm and propose a robust data classification algorithm. Activation functions

References (42)

B. Chen et al.
Efficient ant colony optimization for image feature selection
Signal Processing
(2013)
M. Dash et al.
Feature selection for classification. Intelligent data analysis
Intelligent Data Analysis
(1997)
H.E. Kiziloz et al.
A robust and cooperative parallel tabu search algorithm for the maximum vertex weight clique problem
Computers & Industrial Engineering
(2018)
T. Kucukyilmaz et al.
Cooperative parallel grouping genetic algorithm for the one-dimensional bin packing problem
Computers & Industrial Engineering
(2018)
X. Li et al.
Extreme learning machine based transfer learning for data classification
Neurocomputing
(2016)
Y. Sun et al.
An os-elm based distributed ensemble classification framework in p2p networks
Neurocomputing
(2011)
A. Unler et al.
A discrete particle swarm optimization method for feature selection in binary classification problems
European Journal of Operational Research
(2010)
G. Xu et al.
A mixed integer optimisation model for data classification
Computers & Industrial Engineering
(2009)
C.C. Aggarwal
Algorithmic graph theory and perfect graphs
(2014)
S.S.-S.E. Alexandre et al.
Hybridizing extreme learning machines and genetic algorithms to select acoustic features in vehicle classification applications
Neurocomputing
(2015)

R.N. Bo He et al.

Fast face recognition via sparse coding and extreme learning machine

Cognitive Computation

(2013)

S.-M. Bolón-Canedo et al.

A review of feature selection methods on synthetic data

Knowledge and Information Systems

(2012)

O.R. Bryll et al.

Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets

Pattern Recognition

(2003)

E. Cantú-Paz

A survey of parallel genetic algorithms

Calculateurs paralleles, reseaux et systems repartis

(1998)

T.D.A. Deniz et al.

Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques

Neurocomputing

(2017)

T. Dokeroglu et al.

Optimization of one-dimensional bin packing problem with island parallel grouping genetic algorithms

Computers & Industrial Engineering

(2014)

L.J.J. García-Nieto et al.

Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis

Information Processing Letters

(2009)

H.Z. Guang-Bin Huang et al.

Extreme learning machine for regression and multiclass classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)

(2012)

Q.-Y.Z. Guang-Bin Huang et al.

Extreme learning machine: Theory and applications

Neurocomputing

(2006)

X.D. Guang-Bin Huang et al.

Optimization method based extreme learning machine for classification

Neurocomputing

(2010)

C.-K.S. Guang-Bin Huang et al.

Extreme learning machine: A new learning scheme of feedforward neural networks

Cited by (22)

A comprehensive survey on recent metaheuristics for feature selection
2022, Neurocomputing
Citation Excerpt :
The speed of the classifier is a serious criterion while selecting the learning algorithm as thousands of fitness evaluations are performed during the experiments. Better results can be observed with faster machine learning algorithms such as Extreme Learning Machines [104,105]. SVM can achieve better classification performance, but it is computationally an expensive classifier.
Feature selection has become an indispensable machine learning process for data preprocessing due to the ever-increasing sizes in actual data. There have been many solution methods proposed for feature selection since the 1970s. For the last two decades, we have witnessed the superiority of metaheuristic feature selection algorithms, and tens of new ones are being proposed every year. This survey focuses on the most outstanding recent metaheuristic feature selection algorithms of the last two decades in terms of their performance in exploration/exploitation operators, selection methods, transfer functions, fitness value evaluations, and parameter setting techniques. Current challenges of the metaheuristic feature selection algorithms and possible future research topics are examined and brought to the attention of the researchers as well.
An empowered AdaBoost algorithm implementation: A COVID-19 dataset study
2022, Computers and Industrial Engineering
Citation Excerpt :
The aim is to find a generic solution in a reasonable amount of time after optimizing an improved GA. Similarly, in (Deniz, Kiziloz, Dokeroglu, & Cosar, 2017 and Dokeroglu & Sevinc, 2019), some filtering mechanisms and methodologies supported by extreme learning machines have been developed and experimented with for feature subset selection. These studies are good examples of filter-based feature selection approach while Xue et al. (Xue et al., 2019) presents a wrapper feature selection algorithm for classification.
The Covid-19 outbreak, which emerged in 2020, became the top priority of the world. The fight against this disease, which has caused millions of people’s deaths, is still ongoing, and it is expected that these studies will continue for years. In this study, we propose an improved learning model to predict the severity of the patients by exploiting a combination of machine learning techniques. The proposed model uses an adaptive boost algorithm with a decision tree estimator and a new parameter tuning process. The learning ratio of the new model is promising after many repeated experiments are performed by using different parameters to reduce the effect of selecting random parameters. The proposed algorithm is compared with other recent state-of-the-art algorithms on UCI data sets and a recent Covid-19 dataset. It is observed that competitive accuracy results are obtained, and we hope that this study unveils more usage of advanced machine learning approaches.
Deep transfer Wasserstein adversarial network for wafer map defect recognition
2021, Computers and Industrial Engineering
Citation Excerpt :
Thus, it is necessary to learn features directly from wafer maps to automatically capture effective features, so as to ensure the industrial applicability of intelligent diagnosis systems. Deep learning has been widely applied in various fields because they are increasingly able to extract features from a large amount of data (Dokeroglu & Sevinc, 2019; Jiao, Jia, & Cai, 2018; LeCun, Bengio, & Hinton, 2015). Some researchers have employed DNNs, e.g., stacked denoising autoencoder (SDAE), convolutional neural networks (CNNs) for classification of wafer map defects.
Deep neural networks (DNNs) are capable of extracting effective features from data by using deep structure and multiple non-linear processing units. However, they dependent on large datasets from the same distribution. It is difficult to collect wafer maps with various defect patterns in semiconductor manufacturing process. A new deep transfer learning model, deep transfer Wasserstein adversarial network (DTWAN) is proposed to recognize wafer map defect. An adaptive transfer learning framework based on adversarial training is proposed for DTWAN, where multi-stage optimization based on the maximum mean discrepancy (MMD), cross entropy, and adversarial loss is performed. Finally, a generative adversarial algorithm is developed to guide the model to extract general features from source and target domain. DTWAN transfers the key knowledge of the source domain data to target domain, and effectively reduces cost of data collection and improve industrial applicability of the recognition model. The effectiveness of DTWAN is verified on the transfer learning tasks from simulation wafer maps to real wafer maps. The testing results indicate that DTWAN is superior to those typical transfer learning algorithms in WMDR. This study will provide effective guidance to transfer knowledge of simulation data that are easy to collect to real manufacturing processes.
An evolutionary parallel multiobjective feature selection framework
2021, Computers and Industrial Engineering
Citation Excerpt :
This fact shows that the quality of individuals improves as individuals evolve through generations. Finally, to verify the efficiency of our proposed framework, we compare our results with seven state-of-the-art methods in the literature: Particle Swarm Optimization (PSO) (Unler & Murat, 2010), Ant Colony Optimization (ABACO) (Kashef & Nezamabadi-pour, 2015), Grey Wolf Optimization (bGWO1 and bGWO2) (Emary, Zawbaa, & Hassanien, 2016), Genetic Algorithm (HGEFS) (Xue, Yao, & Wu, 2018), Grasshopper Optimization Algorithm (BGOA-M) (Mafarja et al., 2019), and Island Parallel Evolutionary Algorithm (IPE-ELM) (Dokeroglu & Sevinc, 2019). We share the maximum accuracy values obtained by all studies in Table 5.
Feature selection has become an indispensable preprocessing step in data mining problems as high amount of data become prevalent with the advances in technology. The objective of feature selection is twofold: reducing data amount and improving learning performance. In this study, we leverage the multi-core nature of a regular PC to build a robust framework for feature selection. This framework executes the feature selection algorithm on four processors, in parallel. As per the No Free Lunch Theorem, we facilitate 40 different execution settings for the processors by employing two multiobjective selection algorithms, four initial population generation methods, and five machine learning techniques. Besides, we introduce six setting selection schemes to decide the most fruitful setting for each processor. We carry out extensive experiments on 11 UCI benchmark datasets and analyze the results with statistical tests. Finally, we compare our proposed method with state-of-the-art studies and record remarkable improvement in terms of maximum accuracy.
Operation rule derivation of hydropower reservoir by k-means clustering method and extreme learning machine based on particle swarm optimization
2019, Journal of Hydrology
In practice, the rational operation rule derived from historical information and real-time working condition can help the operators make the quasi-optimal scheduling plan of hydropower reservoirs, leading to significant improvements in the generation benefit. As an emerging artificial intelligence method, the extreme learning machine (ELM) provides a new effective tool to derivate the reservoir operation rule. However, it is difficult for the standard ELM method to avoid falling into local optima due to the random determination of both input-hidden weights and hidden bias. To enhance the ELM performance, this research develops a novel class-based evolutionary extreme learning machine (CEELM) to determine the appropriate operation rule of hydropower reservoir. In CEELM, the k-means clustering method is firstly adopted to divide all the influential factors into several disjointed sub-regions with simpler patterns; and then ELM optimized by particle swarm intelligence is applied to identify the complex input-output relationship in each cluster. The results from two reservoirs of China show that our method can obtain satisfying performance in deriving operation rules of hydropower reservoir. Thus, it can be concluded that the model’s generalization capability can be improved by isolating each subclass composed of similar dataset.
OPTIMIZING WATER DESALINATION: A NOVEL FUSION OF EXTREME LEARNING MACHINE AND GAME THEORY FOR ENHANCED PH PREDICTION - UNVEILING REVOLUTIONARY INSIGHTS
2024, Journal of Theoretical and Applied Information Technology

View all citing articles on Scopus

Tansel DÖkeroglu received his B.S. in Mechanical Engineering Department of Turkish Military Academy in 1991. He received M.S. and Ph.D. degree in Computer Science Department of Middle East Technical University in 2006 and 2014 respectively. He worked in Ministry of Defence, Turkish General Staff and Land Forces as software engineer, database administrator, decision support expert, and distance learning system administrator. Currently, he is the director of Research and Development Department of SIMSOFT Computer Technologies Company in Teknokent/Ankara. He works as a project manager and consultant for TUBITAK, European FP7 and Horizon 2020 projects. His academic interests are big data query optimization, Cloud databases, mapreduce, discrete optimization, parallel/distributed genetic algorithms, machine learning, and business process modeling and optimization. He has more than 30 conference and journal articles on his research areas. He is a lecturer at the Computer Engineering Department of TED University Ankara, Turkey.

Ender Sevinc

He recieved his B.S. degree from Electric/Electronical Department in Military Academy in 1991. Then he received his M.S. and Ph.D. degrees from Computer Engineering Department in Middle East Technical University in 2000 and 2009 respectively. As an industrial experience, he finally worked as Simulation Engineer in NATO JFTC in Poland in 2016. Then he resumed his academic career in University of Turkish Aeronautical Association as an Assistant Professor in 2017. His study and publication areas are query optimization, deep learning and genetic algorithms.

View full text

Evolutionary parallel extreme learning machines for the data classification problem

Highlights

Abstract

Introduction

Section snippets

Related work

Extreme learning machines

Island Parallel Evolutionary Extreme Learning Machine Algorithm (IPE-ELM)

Performance evaluation of the IPE-ELM algorithm

Conclusions and future work

Signal Processing

Intelligent Data Analysis

Computers & Industrial Engineering

Computers & Industrial Engineering

Neurocomputing

Neurocomputing

European Journal of Operational Research

Computers & Industrial Engineering

Algorithmic graph theory and perfect graphs

Hybridizing extreme learning machines and genetic algorithms to select acoustic features in vehicle classification applications

Neurocomputing

Fast face recognition via sparse coding and extreme learning machine

Cognitive Computation

A review of feature selection methods on synthetic data

Knowledge and Information Systems

Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets

Pattern Recognition

A survey of parallel genetic algorithms

Calculateurs paralleles, reseaux et systems repartis

Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques

Neurocomputing

Optimization of one-dimensional bin packing problem with island parallel grouping genetic algorithms

Computers & Industrial Engineering

Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis

Information Processing Letters

Extreme learning machine for regression and multiclass classification

IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)

Extreme learning machine: Theory and applications

Neurocomputing

Optimization method based extreme learning machine for classification

Neurocomputing

Extreme learning machine: A new learning scheme of feedforward neural networks