Abstract
In the big data era, extreme learning machine (ELM) can be a good solution for the learning of large sample data as it has high generalization performance and fast training speed. However, the emerging big and distributed data blocks may still challenge the method as they may cause large-scale training which is hard to be finished by a common commodity machine in a limited time. In this paper, we propose a MapReduce-based distributed framework named MR-ELM to enable large-scale ELM training. Under the framework, ELM submodels are trained parallelly with the distributed data blocks on the cluster and then combined as a complete single-hidden layer feedforward neural network. Both classification and regression capabilities of MR-ELM have been theoretically proven, and its generalization performance is shown to be as high as that of the original ELM and some common ELM ensemble methods through many typical benchmarks. Compared with the original ELM and the other parallel ELM algorithms, MR-ELM is a general and scalable ELM training framework for both classification and regression and is suitable for big data learning under the cloud environment where the data are usually distributed instead of being located in one machine.
Similar content being viewed by others
References
Lynch C (2008) Big data: how do your data grow. Nature 455(7209):28–29
Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes. Neural Netw IEEE Trans 17:879–892
Bin Huang G, Yu Zhu Q, Kheong Siew C (2006) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of the international joint conference neural network, pp 985–990
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. Syst Man Cybern Part B Cybern IEEE Trans 42:513–529
Mohammed A, Minhas R, Wu QJ, Sid-Ahmed M (2011) Human face recognition based on multidimensional pca and extreme learning machine. Pattern Recognit 44(1011):2588–2597
Wang D, Bin Huang G (2005) Protein sequence classification using extreme learning machine. In: Neural networks, 2005. IJCNN ’05. Proceedings. 2005 IEEE international joint conference on, vol 3, pp 1406–1411
Liang N-Y, Huang G-B, Saratchandran P, Sundararajan N (2006) A fast and accurate online sequential learning algorithm for feedforward networks. Neural Netw IEEE Trans 17(6):1411–1423
Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71:3460–3468
Rong H-J, Ong Y-S, Tan A-H, Zhu Z (2008) A fast pruned-extreme learning machine for classification problem. Neurocomputing 72(13):359–366
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51:107–113
Basilico J, Munson M, Kolda T, Dixon K, Kegelmeyer W (2011) Comet: a recipe for learning and using large ensembles on massive data. In: Data mining (ICDM), 2011 IEEE 11th international conference on, pp 41–50
Panda B, Herbach JS, Basu S, Bayardo RJ (2009) Planet: massively parallel learning of tree ensembles with mapreduce. In: Proceedings of the 35th international conference on very large data bases (VLDB-2009)
van Heeswijk M, Miche Y, Oja E, Lendasse A, Verleysen M (2010) Solving large regression problems using an ensemble of gpu-accelerated elms. In: European symposium on artificial neural networks (ESANN) 2010
van Heeswijk M, Miche Y, Oja E, Lendasse A (2011) Gpu-accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing 74(16):2430–2437
Sun Y, Yuan Y, Wang G (2011) An os-elm based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16):2438–2443
He Q, Shang T, Zhuang F, Shi Z (2013) Parallel extreme learning machine for regression based on mapreduce. Neurocomputing 102:52–58
Xin J, Wang Z, Chen C, Ding L, Wang G, Zhao Y (2013) Elm*: distributed extreme learning machine with mapreduce. World Wide Web. doi:10.1007/s11280-013-0236-2
Apache hadoop. http://hadoop.apache.org/
Cao J, Lin Z, Huang G-B, Liu N (2012) Voting based extreme learning machine. Inf Sci 185:66–77
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Xiao Z, Liu Y (2011) Remote sensing image database based on NOSQL database. In: Geoinformatics, 2011 19th international conference on, pp 1–5
Acknowledgments
This work is funded by LY13F020005 of NSF of Zhejiang, NSFC61070156, YB2013120143 of Huawei and Fundamental Research Funds for the Central Universities.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, J., Chen, H., Wan, X. et al. MR-ELM: a MapReduce-based framework for large-scale ELM training in big data era. Neural Comput & Applic 27, 101–110 (2016). https://doi.org/10.1007/s00521-014-1559-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-014-1559-3