Abstract
We present a method for training Support Vector Machines (SVM) classifiers with very large datasets. We present a clustering algorithm that can be used to preprocess standard training data and show how SVM can be simply extended to deal with clustered data, that is effectively training with a set of weighted examples. The algorithm computes large clusters for points which are far from the decision boundary and small clusters for points near the boundary. This implies that when SVMs are trained on the preprocessed clustered data set nearly the same decision boundary is found but the computational time decreases significantly. When the input dimensionality of the data is not large, for example of the order of ten, the clustering algorithm can significantly decrease the effective number of training examples, which is a useful feature for training SVM on large data sets. Preliminary experimental results indicate the benefits of our approach.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bennett, K., A. Demiriz and J. Shawe-Taylor: 2000, ‘A Column Generation Algorithm for Boosting’ In Proc. of 17. International Conferance on Machine Learning, p. 57–64, Morgan Kaufman, San Francisco
Burges, C. J. C. 1998 ‘A tutorial on support vector machines for pattern recognition’ In Data Mining and Knowledge Discovery 2(2):121–167.
Cortes, C. and V. Vapnik: 1995, ‘Support Vector Networks’. Machine Learning 20, 1–25.
Lee, Y. J. and O.L. Mangasarian: 2001 ‘RSVM: Reduced Support Vector Machines’ In First SIAM International Conference on Data Mining
Musavi, M.T., Ahmed, W., Chan, H., Faris, K.B, Hummels, D.M. 1992. “On the Training of Radial Basis Function Classifiers,” Neural Network 5: 595–603.
Osuna, E., R. Freund, and F. Girosi: 1997 ‘Improved Training Algorithm for Support Vector Machines’ In Proceeding of IEEE Neural Networks and Signal Processing (NNSP’97)
Platt, J.: 1999 ‘Fast training of support vector machines using sequential minimal optimization’ In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods— Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.
Provost, F., and V. Kolluri: 1999 ‘A survey of methods for scaling up inductive algorithms’ In Machine Learning, 1999, p. 1–42.
Rifkin, R.: 2000 ‘SvmFu a Support Vector Machine Package’ In http://five-percentnation. mit.edu/PersonalPages/rif/SvmFu/index.html
Trafalis T. and H. Ince: 2001, ‘Benders Decomposition Technique for Support VectorRegr ession’ Working Paper, School of Industrial Engineering, University of Oklahoma, Norman, OK
Vanderbei, R.J. 1997. “LOQO User’s Manual-Version 3.04.” Statistical and Operations Research Tech. Rep. SOR-97-07 Princeton University.
Vapnik, V. N.: 1998, Statistical Learning Theory. New York: Wiley.
Wright, M.H. 1992. “Interior Methods for Constrained Optimization” Tech. Rep. AT&T Bell Lab.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Evgeniou, T., Pontil, M. (2002). Support Vector Machines with Clustering for Training with Very Large Datasets. In: Vlahavas, I.P., Spyropoulos, C.D. (eds) Methods and Applications of Artificial Intelligence. SETN 2002. Lecture Notes in Computer Science(), vol 2308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46014-4_31
Download citation
DOI: https://doi.org/10.1007/3-540-46014-4_31
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43472-6
Online ISBN: 978-3-540-46014-5
eBook Packages: Springer Book Archive