Support Vector Machines with Clustering for Training with Very Large Datasets

Conference paper
First Online: 01 January 2002

pp 346–354
Cite this conference paper

Methods and Applications of Artificial Intelligence (SETN 2002)

Theodoros Evgeniou^3,4,5 &
Massimiliano Pontil^3,4,5

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2308))

Included in the following conference series:

Hellenic Conference on Artificial Intelligence

1211 Accesses
4 Citations

Abstract

We present a method for training Support Vector Machines (SVM) classifiers with very large datasets. We present a clustering algorithm that can be used to preprocess standard training data and show how SVM can be simply extended to deal with clustered data, that is effectively training with a set of weighted examples. The algorithm computes large clusters for points which are far from the decision boundary and small clusters for points near the boundary. This implies that when SVMs are trained on the preprocessed clustered data set nearly the same decision boundary is found but the computational time decreases significantly. When the input dimensionality of the data is not large, for example of the order of ten, the clustering algorithm can significantly decrease the effective number of training examples, which is a useful feature for training SVM on large data sets. Preliminary experimental results indicate the benefits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Similar content being viewed by others

Clustering Aided Support Vector Machines

Chapter © 2017

Massive Classification with Support Vector Machines

Chapter © 2015

Support Vector Machines

Chapter © 2023

References

Bennett, K., A. Demiriz and J. Shawe-Taylor: 2000, ‘A Column Generation Algorithm for Boosting’ In Proc. of 17. International Conferance on Machine Learning, p. 57–64, Morgan Kaufman, San Francisco
Google Scholar
Burges, C. J. C. 1998 ‘A tutorial on support vector machines for pattern recognition’ In Data Mining and Knowledge Discovery 2(2):121–167.
Google Scholar
Cortes, C. and V. Vapnik: 1995, ‘Support Vector Networks’. Machine Learning 20, 1–25.
Google Scholar
Lee, Y. J. and O.L. Mangasarian: 2001 ‘RSVM: Reduced Support Vector Machines’ In First SIAM International Conference on Data Mining
Google Scholar
Musavi, M.T., Ahmed, W., Chan, H., Faris, K.B, Hummels, D.M. 1992. “On the Training of Radial Basis Function Classifiers,” Neural Network 5: 595–603.
Google Scholar
Osuna, E., R. Freund, and F. Girosi: 1997 ‘Improved Training Algorithm for Support Vector Machines’ In Proceeding of IEEE Neural Networks and Signal Processing (NNSP’97)
Google Scholar
Platt, J.: 1999 ‘Fast training of support vector machines using sequential minimal optimization’ In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods— Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.
Google Scholar
Provost, F., and V. Kolluri: 1999 ‘A survey of methods for scaling up inductive algorithms’ In Machine Learning, 1999, p. 1–42.
Google Scholar
Rifkin, R.: 2000 ‘SvmFu a Support Vector Machine Package’ In http://five-percentnation. mit.edu/PersonalPages/rif/SvmFu/index.html
Trafalis T. and H. Ince: 2001, ‘Benders Decomposition Technique for Support VectorRegr ession’ Working Paper, School of Industrial Engineering, University of Oklahoma, Norman, OK
Google Scholar
Vanderbei, R.J. 1997. “LOQO User’s Manual-Version 3.04.” Statistical and Operations Research Tech. Rep. SOR-97-07 Princeton University.
Google Scholar
Vapnik, V. N.: 1998, Statistical Learning Theory. New York: Wiley.
Google Scholar
Wright, M.H. 1992. “Interior Methods for Constrained Optimization” Tech. Rep. AT&T Bell Lab.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Siena, Siena, Italy
Theodoros Evgeniou & Massimiliano Pontil
Technology Management, INSEAD, Bd de Constance, 77300, Fontainebleau, France
Theodoros Evgeniou & Massimiliano Pontil
Department of Mathematics, City University of Hong Kong, Hong Kong
Theodoros Evgeniou & Massimiliano Pontil

Authors

Theodoros Evgeniou
View author publications
You can also search for this author in PubMed Google Scholar
Massimiliano Pontil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Informatics, Aristotle University of Thessaloniki, 54006, Thessaloniki, Greece
Ioannis P. Vlahavas
N.C.S.R. “Demokritos”, Inst. of Informatics & Telecommunications Software and Knowledge Engineering Lab, 15310, Aghia Paraskevi, Greece
Constantine D. Spyropoulos

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Evgeniou, T., Pontil, M. (2002). Support Vector Machines with Clustering for Training with Very Large Datasets. In: Vlahavas, I.P., Spyropoulos, C.D. (eds) Methods and Applications of Artificial Intelligence. SETN 2002. Lecture Notes in Computer Science(), vol 2308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46014-4_31

Download citation

DOI: https://doi.org/10.1007/3-540-46014-4_31
Published: 19 March 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43472-6
Online ISBN: 978-3-540-46014-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics