Distributed SmSVM Ensemble Learning

Hajewski, Jeff; Oliveira, Suely

doi:10.1007/978-3-030-16841-4_2

Jeff Hajewski⁷ &
Suely Oliveira⁷

Part of the book series: Proceedings of the International Neural Networks Society ((INNS,volume 1))

Included in the following conference series:

INNS Big Data and Deep Learning conference

1032 Accesses
1 Citations

Abstract

Traditional ensemble methods are typically performed with models that are fast to construct and evaluate, such as random trees and Naive Baye’s. More complex models frequently suffer from increased computational load in both training and inference. In this work, we present a distributed ensemble method using SmoothSVM, a fast support vector machine (SVM) algorithm. We build and evaluate a large ensemble of SVMs in parallel, with little overhead when compared to a single SVM. The ensemble of SVMs trains in less time than a single SVM while maintaining the same test accuracy and, in some cases, even exhibits improved test accuracy. Our approach also has the added benefit of trivially scaling to much larger systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alham, N.K., Li, M., Liu, Y., Hammoud, S.: A MapReduce-based distributed SVM algorithm for automatic image annotation. Comput. Math. Appl. 62(7), 2801–2811 (2011). Computers & Mathematics in Natural Computation and Knowledge Discovery
Article Google Scholar
Alham, N.K., Li, M., Liu, Y., Qi, M.: A MapReduce-based distributed SVM ensemble for scalable image classification and annotation. Comput. Math. Appl. 66(10), 1920–1934 (2013)
Article Google Scholar
Armijo, L.: Minimization of functions having Lipschitz continuous first partial derivatives. Pac. J. Math. 16(1), 1–3 (1966)
Article MathSciNet Google Scholar
Blackard, J.A., Dean, D.J.: Comparative accuracies of neural networks and discriminant analysis in predicting forest cover types from cartographic variables. In: Second Southern Forestry GIS Conference (1998). UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/covertype
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MATH Google Scholar
Chang, E.Y., Zhu, K., Wang, H., Bai, H., Li, J., Qiu, Z., Cui, H.: PSVM: parallelizing support vector machines on distributed computers. In: NIPS (2007)
Google Scholar
Chen, S., Wang, W., van Zuylen, H.: Construct support vector machine ensemble to detect traffic incident. Expert Syst. Appl. 36(8), 10976–10986 (2009)
Article Google Scholar
Claesen, M., De Smet, F., Suykens, J.A.K., De Moor, B.: EnsembleSVM: a library for ensemble learning using support vector machines. J. Mach. Learn. Res. 15(1), 141–145 (2014)
MATH Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Graf, H.P., Cosatto, E., Bottou, L., Dourdanovic, I., Vapnik, V.: Parallel support vector machines: the cascade SVM. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 521–528. MIT Press, Amsterdam (2005)
Google Scholar
Hajewski, J., Oliveira, S., Stewart, D.E.: Smoothed hinge loss and \(\ell \)1 support vector machines. In: 2018 International Conference on 2018 Workshop on Optimization Based Techniques for Emerging Data Mining Problems (OEDM) (2018)
Google Scholar
Ke, X., Jin, H., Xie, X., Cao, J.: A distributed SVM method based on the iterative MapReduce. In: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015), pp. 116–119, February 2015
Google Scholar
Liu, C., Wu, B., Yang, Y., Guo, Z.: Multiple submodels parallel support vector machine on spark. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 945–950, December 2016
Google Scholar
Mehrotra, S.: On the implementation of a primal-dual interior point method. SIAM J. Optim. 2(4), 575–601 (1991)
Article MathSciNet Google Scholar
Nguyen, T.D., Nguyen, V., Le, T., Phung, D.: Distributed data augmented support vector machine on spark. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 498–503, December 2016
Google Scholar
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, New York (2006)
MATH Google Scholar
Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Comput. Sci. 53, 121–130 (2015). iNNS Conference on Big Data 2015 Program San Francisco, CA, USA 8-10
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), MSST 2010, pp. 1–10. IEEE Computer Society (2010)
Google Scholar
Sonnenburg, S., Franc, V., Yom-Tov, E., Sebag, M.: Pascal large scale learning challenge, vol. 10, pp. 1937–1953, January 2008
Google Scholar
Wang, H., Xiao, Y., Long, Y.: Research of intrusion detection algorithm based on parallel SVM on spark. In: 2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC), pp. 153–156, July 2017
Google Scholar
Yan, B., Yang, Z., Ren, Y., Tan, X., Liu, E.: Microblog sentiment classification using parallel SVM in apache spark. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp. 282–288, June 2017
Google Scholar
Yu, L., Yue, W., Wang, S., Lai, K.K.: Support vector machine based multiagent ensemble learning for credit risk evaluation. Expert Syst. Appl. 37(2), 1351–1360 (2010)
Article Google Scholar
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud 2010, pp. 10–10. USENIX Association, Berkeley (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Iowa, Iowa City, IA, USA
Jeff Hajewski & Suely Oliveira

Authors

Jeff Hajewski
View author publications
You can also search for this author in PubMed Google Scholar
Suely Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeff Hajewski .

Editor information

Editors and Affiliations

Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Luca Oneto
Department of Mathematics, University of Padova, Padua, Italy
Nicolò Navarin
Department of Mathematics, University of Padova, Padua, Italy
Alessandro Sperduti
Department of Informatics, Bioengineering, Robotics, and Systems Engineering, University of Genova, Genoa, Italy
Davide Anguita

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajewski, J., Oliveira, S. (2020). Distributed SmSVM Ensemble Learning. In: Oneto, L., Navarin, N., Sperduti, A., Anguita, D. (eds) Recent Advances in Big Data and Deep Learning. INNSBDDL 2019. Proceedings of the International Neural Networks Society, vol 1. Springer, Cham. https://doi.org/10.1007/978-3-030-16841-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-16841-4_2
Published: 03 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16840-7
Online ISBN: 978-3-030-16841-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics