Abstract
In this era of big data, more and more models need to be trained to mine useful knowledge from large scale data. It has become a challenging problem to train multiple models accurately and efficiently so as to make full use of limited computing resources. As one of ELM variants, online sequential extreme learning machine (OS-ELM) provides a method to learn from incremental data. MapReduce, which provides a simple, scalable and fault-tolerant framework, can be utilized for large scale learning. In this paper, we propose an efficient parallel method for batched online sequential extreme learning machine (BPOS-ELM) training using MapReduce. Map execution time is estimated with historical statistics, where regression method and inverse distance weighted interpolation method are used. Reduce execution time is estimated based on complexity analysis and regression method. Based on the estimations, BPOS-ELM generates a Map execution plan and a Reduce execution plan. Finally, BPOS-ELM launches one MapReduce job to train multiple OS-ELM models according to the generated execution plan, and collects execution information to further improve estimation accuracy. Our proposal is evaluated with real and synthetic data. The experimental results show that the accuracy of BPOS-ELM is at the same level as those of OS-ELM and parallel OS-ELM (POS-ELM) with higher training efficiencies.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.
Downloaded from http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html .
Downloaded from http://www.datatang.com/data/13152.
Downloaded from https://www.cs.toronto.edu/~kriz/cifar.html.
References
Amazon elastic compute cloud (2015). http://aws.amazon.com/cn/ec2/
Huang GB, Zhu QY, Siew CK (2004) Extreme learning machine. In: Technical Report ICIS/03/2004. School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore
Huang GB, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062
Huang GB, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 42(2):513–529
Liang NY, Huang GB, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. Neural Netw IEEE Trans 17(6):1411–1423
Dean J, Ghemawat S (2008) MapReduce: Simplified data processing on large clusters. Commun ACM 51(1):107–113
White T (2012) Hadoop: The definitive guide. O’ReillyMedia Inc, Sebastopol, CA, USA
Lin J, Yin J, Cai Z, Liu Q, Li K, Leung V (2013) A secure and practical mechanism for outsourcing ELMs in cloud computing. IEEE Intell Syst 28(6):35–38
He Q, Shang T, Zhuang F, Shi Z (2013) Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102:52–58
Xiang J, Westerlund M, Sovilj D, Pulkkis G (2014) Using extreme learning machine for intrusion detection in a big data environment. In: Proceedings of the 2014 workshop on artificial intelligent and security workshop, AISec ’14ACM, New York, pp 73–82
Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2013) ELM*: distributed extreme learning machine with MapReduce. World Wide Web, pp. 1–16
van Heeswijk M, Miche Y, Oja E, Lendasse A (2011) GPU-accelerated and parallelized ELM ensembles for large-scale regression. Neurocomputing 74(16):2430–2437
Wang B, Huang S, Qiu J, Liu Y, Wang G (2015) Parallel online sequential extreme learning machine based on MapReduce. Neurocomputing 149, Part A:224–232
Cao J, Lin Z (2015) Extreme learning machines on high dimensional and large data applications: a survey. Math Probl Eng 2015:1–12
NVIDIA CUDA home page (2015). http://www.nvidia.com/object/cuda_home_new.html
Matlab parallel computing toolbox (2015). http://www.mathworks.com/products/parallel-computing/index.html
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Garey MR, Johnson DS (1990) Computers and intractability; a guide to the theory of NP-completeness. W. H. Freeman & Co, New York
Shepard D (1968) A two-dimensional interpolation function for irregularly-spaced data. In: Proceedings of the 1968 23rd ACM national conference, ACM ’68ACM, New York, pp 517–524
Arndt H, Bundschus M, Naegele A (2009) Towards a next-generation matrix library for java. In: Computer software and applications conference, 2009. COMPSAC’09. 33rd Annual IEEE International, vol 1. IEEE, pp 460–467
Acknowledgments
This research was partially supported by the National Natural Science Foundation of China under Grant Nos. 61173030, 61272181, 61272182, 61173029, 61332014; and the Fundamental Research Funds for the Central Universities (N120816001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, S., Wang, B., Chen, Y. et al. An efficient parallel method for batched OS-ELM training using MapReduce. Memetic Comp. 9, 183–197 (2017). https://doi.org/10.1007/s12293-016-0190-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12293-016-0190-5