Support Vector Machines on Large Data Sets: Simple Parallel Approaches

Meyer, Oliver; Bischl, Bernd; Weihs, Claus

doi:10.1007/978-3-319-01595-8_10

Support Vector Machines on Large Data Sets: Simple Parallel Approaches

Oliver Meyer²¹,
Bernd Bischl²¹ &
Claus Weihs²¹

Conference paper
First Online: 10 October 2013

5392 Accesses
9 Citations

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

Abstract

Support Vector Machines (SVMs) are well-known for their excellent performance in the field of statistical classification. Still, the high computational cost due to the cubic runtime complexity is problematic for larger data sets. To mitigate this, Graf et al. (Adv. Neural Inf. Process. Syst. 17:521–528, 2005) proposed the Cascade SVM. It is a simple, stepwise procedure, in which the SVM is iteratively trained on subsets of the original data set and support vectors of resulting models are combined to create new training sets. The general idea is to bound the size of all considered training sets and therefore obtain a significant speedup. Another relevant advantage is that this approach can easily be parallelized because a number of independent models have to be fitted during each stage of the cascade. Initial experiments show that even moderate parallelization can reduce the computation time considerably, with only minor loss in accuracy. We compare the Cascade SVM to the standard SVM and a simple parallel bagging method w.r.t. both classification accuracy and training time. We also introduce a new stepwise bagging approach that exploits parallelization in a better way than the Cascade SVM and contains an adaptive stopping-time to select the number of stages for improved accuracy.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Can be obtained either from the LIBSVM web page or the UCI repository.

References

Border, A., Ertekin, S., Weston, J., & Bottou, L. (2005). Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, 6, 1579–1619.
Google Scholar
Chawla, N. V., Moore, T. E., Jr., Hall, L. O., Bowyer, K. W., Kegelmeyer, P., & Springer, C. (2003). Distributed learning with bagging-like performance. Pattern Recognition Letter, 24, 455–471.
Article Google Scholar
Graf, H. P., Cosatto, E., Bottou, L., Durdanovic, I., & Vapnik, V. (2005). Parallel support vector machines: The cascade SVM. Advances in Neural Information Processing Systems, 17, 521–528.
Google Scholar
Karatzoglou, A., Smola, A., Hornik, K., & Zeileis, A. (2004). kernlab - An S4 package for kernel methods in R. Journal of Statistical Software, 11(9), 1–20.
Google Scholar
Koch, P., Bischl, B., Flasch, O., Bartz-Beilstein, T., & Konen, W. (2012). On the tuning and evolution of support vector kernels. Evolutionary Intelligence, 5, 153–170.
Article Google Scholar
Schoelkopf, B., & Smola, A. J. (2002). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Cambridge: MIT.
Google Scholar

Download references

Author information

Authors and Affiliations

Computational Statistics, Department of Statistics, TU Dortmund, Dortmund, Germany
Oliver Meyer (Chair), Bernd Bischl (Chair) & Claus Weihs (Chair)

Authors

Oliver Meyer
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Bischl
View author publications
You can also search for this author in PubMed Google Scholar
Claus Weihs
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oliver Meyer .

Editor information

Editors and Affiliations

Faculty of Computer Science, Otto-von-Guericke-Universität Magdeburg, Magdeburg, Germany
Myra Spiliopoulou
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Lars Schmidt-Thieme
Institute of Computer Science, University of Hildesheim, Hildesheim, Germany
Ruth Janning

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meyer, O., Bischl, B., Weihs, C. (2014). Support Vector Machines on Large Data Sets: Simple Parallel Approaches. In: Spiliopoulou, M., Schmidt-Thieme, L., Janning, R. (eds) Data Analysis, Machine Learning and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Cham. https://doi.org/10.1007/978-3-319-01595-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-01595-8_10
Published: 10 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01594-1
Online ISBN: 978-3-319-01595-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics