Abstract
Extreme Learning Machine (ELM) has been widely used in many fields such as text classification, image recognition and bioinformatics, as it provides good generalization performance at a extremely fast learning speed. However, as the data volume in real-world applications becomes larger and larger, the traditional centralized ELM cannot learn such massive data efficiently. Therefore, in this paper, we propose a novel Distributed Extreme Learning Machine based on MapReduce framework, named ELM ∗ , which can cover the shortage of traditional ELM whose learning ability is weak to huge dataset. Firstly, after adequately analyzing the property of traditional ELM, it can be found out that the most expensive computation part of the matrix Moore-Penrose generalized inverse operator in the output weight vector calculation is the matrix multiplication operator. Then, as the matrix multiplication operator is decomposable, a Distributed Extreme Learning Machine (ELM ∗ ) based on MapReduce framework can be developed, which can first calculate the matrix multiplication effectively with MapReduce in parallel, and then calculate the corresponding output weight vector with centralized computing. Therefore, the learning of massive data can be made effectively. Finally, we conduct extensive experiments on synthetic data to verify the effectiveness and efficiency of our proposed ELM ∗ in learning massive data with various experimental settings.
Similar content being viewed by others
References
Chacko, B.P., Krishnan, V.R.V, Raju, G., Anto, P.B.: Handwritten character recognition using wavelet energy and extreme learning machine. Int. J. Mach. Learn. Cybern. 3(2):149–161 (2012)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53(1), 72–77 (2010)
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative MapReduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (HPDC), pp. 810–818 (2010)
Ghemawat, S., Gobioff, H., Leung, S.-T.: The google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP), pp. 29–43 (2003)
Ghoting, A., Krishnamurthy, R., Pednault, E.P.D., Reinwald, B., Sindhwani, V., Tatikonda, S., Tian, Y., Vaithyanathan, S.: SystemML: declarative machine learning on MapReduce. In: Proceedings of the 27th International Conference on Data Engineering (ICDE), pp. 231–242 (2011)
He, Q., Shang, T., Zhuang, F., Shi, Z.: Parallel extreme learning manchine for regression based on MapReduce. Neurocomputing 102(2), 52–58 (2013)
Huang, G.-B., Liang, N.-Y., Rong, H.-J., Saratchandran, P., Sundararajan, N.: On-line sequential extreme learning machine. In: Proceedings of the IASTED International Conference on Computational Intelligence (CI), pp. 232–237 (2005)
Huang, G.-B., Zhu, Q.-Y., Siew, C.-K.: Extreme learning machine: theory and applications. Neurocomputing 70(1–3), 489–501 (2006)
Huang, G.-B., Chen, L., Siew, C.-K.: Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17(4), 879–892 (2006)
Huang, G.-B., Chen, L.: Convex incremental extreme learning machine. Neurocomputing 70(16–18), 3056–3062 (2007)
Huang, G.-B., Chen, L.: Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16–18), 3460–3468 (2008)
Huang, G.-B., Ding, X., Zhou, H.: Optimization method based extreme learning machine for classification. Neurocomputing 74(1–3), 155–163 (2010)
Huang, G.-B., Wang, D. H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2), 107–122 (2011)
Huang, G.-B., Wang, D. H., Lan, Y.: Extreme learning machines: a survey. Int. J. Mach. Learn. Cybern. 2(2):107–122 (2011)
Huang, G.-B., Zhou, H., Ding, X., Zhang, R.: Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 42(2), 513–529 (2012)
Liang, N.-Y., Huang, G.-B., Saratchandran, P., Sundararajan, N.: A fast and accurate on-line sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 17(6), 1411–1423 (2006)
Lin, Y., Lv, F., Zhu S., Yang, M., Cour, T., Yu, K., Cao, L., Huang, T.S.: Large-scale image classification: fast feature extraction and SVM training. In: Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1689–1696 (2011)
Panda, B., Herbach, J. S., Basu, S. , Bayardo, R. J.: PLANET: massively parallel learning of tree ensembles with MapReduce. In: Proceedings of the 35th International Conference on Very Large Data Bases (VLDB ), pp. 1426–1437 (2009)
Rong, H.-J., Huang, G.-B., Sundararajan, N., Saratchandran, P.: On-line sequential fuzzy extreme learning machine for function approximation and classification problems. IEEE Trans. Syst. Man Cybern. Part B 39(4), 1067–1072 (2009)
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10 (2010)
Sun, Y., Yuan, Y., Wang, G.: An OS-ELM based distributed ensemble classification framework in P2P networks. Neurocomputing 74(16), 2438–2443 (2011)
Wang, G., Zhao, Y., Wang, D.: A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1–3), 262–268 (2008)
Wang, B., Wang, G., Li, J., Wang, B.: Update strategy based on region classification using ELM for mobile object index. Soft Comput. 16(9), 1607–1615 (2012)
Wang, X., Shao, Q., Miao, Q., Zhai, J.: Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing 102(1), 3–9 (2013)
Witten, I.H., Frank, E., Hell, M.A.: Data Mining: Practical Machine Learning Tools and Technique, 3rd edn. Morgan Kaufmann (2011)
Wu, J., Wang, S., Chung, F.: Positive and negative fuzzy rule system, extreme learning machine and image classification. Int. J. Mach. Learn. Cybern. 2(4):261–271 (2011)
Yang, H.C., Dasdan, A., Hsiao, R.-L., Parker, D.S.: Map-Reduce-Merge: simplified relational data processing on large clusters. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1029–1040 (2007)
Zhai, J., Xu, H., Wang, X.: Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput. 16(9), 1493–1502 (2012)
Zhang, R., Huang, G.-B., Sundararajan, N., Saratchandran, P.: Multi-category classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(3), 485–495 (2007)
Zhao, X., Wang, G., Bi, X., Gong, P., Zhao, Y.: XML document classification based on ELM. Neurocomputing 74(16), 2444–2451 (2011)
Zhu, Q.-Y., Qin, A. K., Suganthan, P. N., Huang, G.-B.: Evolutionary extreme learning machine. Pattern Recogn. 38(10), 1759–1763 (2005)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xin, J., Wang, Z., Chen, C. et al. ELM ∗ : distributed extreme learning machine with MapReduce. World Wide Web 17, 1189–1204 (2014). https://doi.org/10.1007/s11280-013-0236-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-013-0236-2