Evolutionary parallel extreme learning machines for the data classification problem
Introduction
Data classification is a crucial mining technique with its many applications in our daily life (Ian, Witten, & Frank, 2011). Scientists can identify, acquire knowledge and derive statistical/predictive models by making use of data classification techniques. The accuracy and the execution speed are important issues of the data classification process. One of the best means of extracting interesting and valuable patterns is making use of recent machine learning techniques. Dealing with large datasets (having many features and rows) requires advanced supervised machine learning techniques (Nasrabadi, 2007, Unler and Murat, 2010). Supervised machine learning techniques build a model to predict the class labels of data by using a set of training data. Recently, it is very common to have a large amount of data with several attributes/features and it is not an easy process to properly clean and classify such a large amount of data correctly and obtain the distilled information. In literature, there have been many supervised techniques proposed for the solution of the data classification problem. However, the demand for fast algorithms that work with high prediction accuracy is still a valuable research area.
A high quality classifier is a crucial part of a data classification process. The classifier should have a good prediction accuracy and a good generalization ability. The training speed of a classifier is another important point that should be considered. Extreme Learning Machine (ELM) is a recent and fast supervised machine learning technique with its high performance for the data classification problem (Guang-Bin Huang & Zhu, 2004). The learning speed of feed-forward neural networks is generally slow and it has been a major drawback in machine learning applications. Slow gradient-based learning algorithms are used to train neural networks and the parameters of the networks are tuned by using such learning techniques. These are the main reasons of the slow learning process. However, the ELM is a different machine learning technique for single-hidden layer feed-forward neural networks (SLFNs) that randomly chooses the number of hidden nodes and determines the output weights of SLFNs (Guang-Bin Huang & Siew, 2006). This property of the ELM makes it a suitable technique for intensive fitness chromosome evaluation of evolutionary genetic algorithms that select the best feature subset. The ELM has been applied to a lot of important problems and many studies are still under progress for improving the performance of this valuable machine learning technique (Huang, Wang, & Lan, 2011).
With recent developments in computer science, the need for real-time processing of large datasets presents big challenges to traditional ways of data processing (Bolón-Canedo & Alonso-Betanzos, 2012). For this reason, Feature Subset Selection (FSS) has attracted the attention of scientists to filter out unnecessary data and greatly reduce processing time (Mingkui Tan & Tsang, 2014). Ensemble-based wrapper methods (they use an exploration method for efficient FSS and use a machine learning method to measure the accuracy level) applied with FSS are providing good results for the data classification problem (Xiaowei Xue & Wu, 2017). The wrapper methods are computationally expensive tools since they need to compute many fitness values for the explored subsets. However, they are still the best performing methods.
We use a parallel computation environment to select the features of a dataset by using a genetic algorithm and apply ELM to the selected features to evaluate the prediction accuracy through fast evaluation of the instances. Genetic and ELM machine learning have been used before for solving the data classification problem. However, it is the first application of these methods with a parallel island genetic algorithm approach. We tune the parameters of the ELM dynamically and experimentally show that our method outperforms state-of-the-art metaheuristics. There have been earlier works that try to parallelize the ELM. Our method differs from the fact that we don’t parallelize the matrix multiplication phase of ELM, which is a trivial method that enables faster execution using parallel processing. This process requires intensive communication between processors and not scalable. Instead, we propose an island parallel method and execute as many ELM as the number of processors in the environment simultaneously. This approach provides an effective diversification mechanism for improving the population quality of the genetic algorithm.
Considering all the issues mentioned above, we propose a novel evolutionary island parallel ELM-based classifier for the data classification problem. To the best of our knowledge, the IPE-ELM algorithm is the first island parallel machine learning algorithm in literature that has been applied to the data classification problem (Aggarwal, 2014). The IPE-ELM generates diversified populations at each processor’s memory and improves the population’s fitness qualities independently. Our approach provides a very effective diversification mechanism for increasing the population quality of the genetic algorithm by initializing random number generator of each processor with a different seed. At the termination phase of the processes at each processor, the best results of the slave nodes are collected by the master node and the overall best solution is reported.
Parallel machine learning is a new developing research area and designing scalable parallel algorithms in this field is challenging. In our opinion, the IPE-ELM algorithm is a unique algorithm with its efficient features when compared with other algorithms in this domain. Four different activation functions are used during the classification process, namely, Sine, Cosine, Sigmoid and Hyperbolic Tangent. It is not always possible to choose a single type of activation function to be the best one for every possible dataset. Different activation functions can do better on varied datasets. Each processor randomly selects one of these activation functions and continues its optimization process. The number of hidden neurons is another criterion to be considered during the classification. The tuning of this parameter greatly affects performance. This issue is also observed and the results obtained in our experiments are reported. Therefore, each processor decides a different number of hidden neurons (in the range of [2–10] % of the instance size) and runs the ELM. The size of the population has been chosen as 70 after comprehensive tests. The best performing convergence, truncation, crossover and mutation ratios are applied in all processors of the parallel distributed memory environment by using Message Passing Interface (MPI) libraries. The parallel and diversified populations of the IPE-ELM algorithm provide a stagnation prevention mechanism. At each processor, we select different seeds for the randomization of all parameters in the FSS and the ELM phases of the algorithm, which prevents the genetic algorithm from exploring the same areas of the search space repeatedly. Comprehensive experiments comparing our scalable algorithm with state-of-the-art classification algorithms show that the IPE-ELM algorithm outperforms them in terms of prediction accuracy values with reasonable execution times.
Some of the state-of-the-art metaheuristics that have been applied to the data classification problem are Particle Swarm Optimization (PSO) (Unler & Murat, 2010), Attribute Bagging (AB), (a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features) (Bryll & Gutierrez, 2003), Multi-View Adaboost (MVA) (Xu, 2010), Random Subspace Method for constructing decision forests (RSE) (Ho, 1998), Correlation based Feature Selection (CFS-SFS) (Hall, 1998), C4.5 (Quinlan, 1996), Hybrid Genetic Algorithm and ELM-based feature selection algorithm (HGEFS) (Xiaowei Xue & Wu, 2017), Advanced Binary Ant Colony Optimization (ABACO) (Shima Kashef, 2015), and ACO-based feature selection algorithm (ACOFS) (Chen, Chen, & Chen, 2013). The algorithms mentioned here need to calculate the fitness of each next possible better solution, which is the most time-consuming part of these algorithms. In our experiments, we compare our solutions with the results of these algorithms.
In Section 2, related studies for the state-of-the-art ELM techniques and data classification algorithms are given. The details of the ELM are presented in Section 3. The proposed IPE-ELM algorithm is introduced in Section 4. The setup of the experimental environment, obtained results of the experiments, and comparison with state-of-the-art methods are reported in Section 5. Concluding remarks are provided in the last section.
Section snippets
Related work
In this section, we give information about the ELM, FSS, state-of-the-art evolutionary data classification techniques and parallel implementations of the ELM. Huang et al. introduced the ELM in 2004 (Guang-Bin Huang & Zhu, 2004). Huang et al. propose the ELM for the classification of the standard optimization method and enhances the ELM to a SLFNs support vector network (Guang-Bin Huang & Zhou, 2010). Guang-Bin Huang and Ding (2012) show that least square SVM (LS-SVM) and proximal SVM (PSVM)
Extreme learning machines
In this section, we give information about the ELM used by the IPE-ELM algorithm (Guang-Bin Huang and Siew, 2006, Guang-Bin Huang and Zhu, 2004). The ELM uses an SLFN with a learning speed faster than traditional feed-forward network learning algorithms (e.g. back-propagation (BP)) (see Fig. 1 for SLFN). Due to its simplicity, remarkable efficiency, and impressive performance on generalization, the ELM has been applied in a variety of domains, such as computer vision, bioinformatics, data
Island Parallel Evolutionary Extreme Learning Machine Algorithm (IPE-ELM)
In this section, we introduce our proposed island parallel evolutionary algorithm, IPE-ELM. The main goal of the algorithm is to discover the best subset of features that will produce the highest prediction accuracy for the data classification problem. The IPE-ELM algorithm has two main components (phases), evolutionary computation (for selecting feature subsets) and the ELM (for finding the prediction accuracy of the selected features).
Island parallel genetic algorithms are novel
Performance evaluation of the IPE-ELM algorithm
In this section, we present the experimental setup, the results of a series of experiments carried out to evaluate the prediction accuracy of the IPE-ELM algorithm, the parameter sensitivity of the algorithm and comparison with state-of-the-art data classification algorithms in literature. We carry out experiments on 8 core 64-bit CPU. It is possible to create 8 threads at each core (providing 64 possible cores simultaneously). The server uses 256 GB RAM and 1.5 TB. hard-disk storage.
The
Conclusions and future work
In this study, we present a novel Island Parallel Evolutionary Extreme Learning Machine algorithm (IPE-ELM) for the data classification problem. The ELM is an efficient machine learning technique with fast speed learning capability and high prediction accuracy. In addition, it is well known that effective feature selection methods can improve the quality of the ELM. We combine ELM with a parallel evolutionary algorithm and propose a robust data classification algorithm. Activation functions
Tansel DÖkeroglu received his B.S. in Mechanical Engineering Department of Turkish Military Academy in 1991. He received M.S. and Ph.D. degree in Computer Science Department of Middle East Technical University in 2006 and 2014 respectively. He worked in Ministry of Defence, Turkish General Staff and Land Forces as software engineer, database administrator, decision support expert, and distance learning system administrator. Currently, he is the director of Research and Development Department of
References (42)
- et al.
Efficient ant colony optimization for image feature selection
Signal Processing
(2013) - et al.
Feature selection for classification. Intelligent data analysis
Intelligent Data Analysis
(1997) - et al.
A robust and cooperative parallel tabu search algorithm for the maximum vertex weight clique problem
Computers & Industrial Engineering
(2018) - et al.
Cooperative parallel grouping genetic algorithm for the one-dimensional bin packing problem
Computers & Industrial Engineering
(2018) - et al.
Extreme learning machine based transfer learning for data classification
Neurocomputing
(2016) - et al.
An os-elm based distributed ensemble classification framework in p2p networks
Neurocomputing
(2011) - et al.
A discrete particle swarm optimization method for feature selection in binary classification problems
European Journal of Operational Research
(2010) - et al.
A mixed integer optimisation model for data classification
Computers & Industrial Engineering
(2009) Algorithmic graph theory and perfect graphs
(2014)- et al.
Hybridizing extreme learning machines and genetic algorithms to select acoustic features in vehicle classification applications
Neurocomputing
(2015)
Fast face recognition via sparse coding and extreme learning machine
Cognitive Computation
A review of feature selection methods on synthetic data
Knowledge and Information Systems
Attribute bagging: Improving accuracy of classifier ensembles by using random feature subsets
Pattern Recognition
A survey of parallel genetic algorithms
Calculateurs paralleles, reseaux et systems repartis
Robust multiobjective evolutionary feature subset selection algorithm for binary classification using machine learning techniques
Neurocomputing
Optimization of one-dimensional bin packing problem with island parallel grouping genetic algorithms
Computers & Industrial Engineering
Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis
Information Processing Letters
Extreme learning machine for regression and multiclass classification
IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)
Extreme learning machine: Theory and applications
Neurocomputing
Optimization method based extreme learning machine for classification
Neurocomputing
Extreme learning machine: A new learning scheme of feedforward neural networks
Cited by (22)
A comprehensive survey on recent metaheuristics for feature selection
2022, NeurocomputingCitation Excerpt :The speed of the classifier is a serious criterion while selecting the learning algorithm as thousands of fitness evaluations are performed during the experiments. Better results can be observed with faster machine learning algorithms such as Extreme Learning Machines [104,105]. SVM can achieve better classification performance, but it is computationally an expensive classifier.
An empowered AdaBoost algorithm implementation: A COVID-19 dataset study
2022, Computers and Industrial EngineeringCitation Excerpt :The aim is to find a generic solution in a reasonable amount of time after optimizing an improved GA. Similarly, in (Deniz, Kiziloz, Dokeroglu, & Cosar, 2017 and Dokeroglu & Sevinc, 2019), some filtering mechanisms and methodologies supported by extreme learning machines have been developed and experimented with for feature subset selection. These studies are good examples of filter-based feature selection approach while Xue et al. (Xue et al., 2019) presents a wrapper feature selection algorithm for classification.
Deep transfer Wasserstein adversarial network for wafer map defect recognition
2021, Computers and Industrial EngineeringCitation Excerpt :Thus, it is necessary to learn features directly from wafer maps to automatically capture effective features, so as to ensure the industrial applicability of intelligent diagnosis systems. Deep learning has been widely applied in various fields because they are increasingly able to extract features from a large amount of data (Dokeroglu & Sevinc, 2019; Jiao, Jia, & Cai, 2018; LeCun, Bengio, & Hinton, 2015). Some researchers have employed DNNs, e.g., stacked denoising autoencoder (SDAE), convolutional neural networks (CNNs) for classification of wafer map defects.
An evolutionary parallel multiobjective feature selection framework
2021, Computers and Industrial EngineeringCitation Excerpt :This fact shows that the quality of individuals improves as individuals evolve through generations. Finally, to verify the efficiency of our proposed framework, we compare our results with seven state-of-the-art methods in the literature: Particle Swarm Optimization (PSO) (Unler & Murat, 2010), Ant Colony Optimization (ABACO) (Kashef & Nezamabadi-pour, 2015), Grey Wolf Optimization (bGWO1 and bGWO2) (Emary, Zawbaa, & Hassanien, 2016), Genetic Algorithm (HGEFS) (Xue, Yao, & Wu, 2018), Grasshopper Optimization Algorithm (BGOA-M) (Mafarja et al., 2019), and Island Parallel Evolutionary Algorithm (IPE-ELM) (Dokeroglu & Sevinc, 2019). We share the maximum accuracy values obtained by all studies in Table 5.
OPTIMIZING WATER DESALINATION: A NOVEL FUSION OF EXTREME LEARNING MACHINE AND GAME THEORY FOR ENHANCED PH PREDICTION - UNVEILING REVOLUTIONARY INSIGHTS
2024, Journal of Theoretical and Applied Information Technology
Tansel DÖkeroglu received his B.S. in Mechanical Engineering Department of Turkish Military Academy in 1991. He received M.S. and Ph.D. degree in Computer Science Department of Middle East Technical University in 2006 and 2014 respectively. He worked in Ministry of Defence, Turkish General Staff and Land Forces as software engineer, database administrator, decision support expert, and distance learning system administrator. Currently, he is the director of Research and Development Department of SIMSOFT Computer Technologies Company in Teknokent/Ankara. He works as a project manager and consultant for TUBITAK, European FP7 and Horizon 2020 projects. His academic interests are big data query optimization, Cloud databases, mapreduce, discrete optimization, parallel/distributed genetic algorithms, machine learning, and business process modeling and optimization. He has more than 30 conference and journal articles on his research areas. He is a lecturer at the Computer Engineering Department of TED University Ankara, Turkey.
Ender Sevinc
He recieved his B.S. degree from Electric/Electronical Department in Military Academy in 1991. Then he received his M.S. and Ph.D. degrees from Computer Engineering Department in Middle East Technical University in 2000 and 2009 respectively. As an industrial experience, he finally worked as Simulation Engineer in NATO JFTC in Poland in 2016. Then he resumed his academic career in University of Turkish Aeronautical Association as an Assistant Professor in 2017. His study and publication areas are query optimization, deep learning and genetic algorithms.