Skip to main content

Support Vector Machines with Clustering for Training with Very Large Datasets

  • Conference paper
  • First Online:
Methods and Applications of Artificial Intelligence (SETN 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2308))

Included in the following conference series:

Abstract

We present a method for training Support Vector Machines (SVM) classifiers with very large datasets. We present a clustering algorithm that can be used to preprocess standard training data and show how SVM can be simply extended to deal with clustered data, that is effectively training with a set of weighted examples. The algorithm computes large clusters for points which are far from the decision boundary and small clusters for points near the boundary. This implies that when SVMs are trained on the preprocessed clustered data set nearly the same decision boundary is found but the computational time decreases significantly. When the input dimensionality of the data is not large, for example of the order of ten, the clustering algorithm can significantly decrease the effective number of training examples, which is a useful feature for training SVM on large data sets. Preliminary experimental results indicate the benefits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bennett, K., A. Demiriz and J. Shawe-Taylor: 2000, ‘A Column Generation Algorithm for Boosting’ In Proc. of 17. International Conferance on Machine Learning, p. 57–64, Morgan Kaufman, San Francisco

    Google Scholar 

  2. Burges, C. J. C. 1998 ‘A tutorial on support vector machines for pattern recognition’ In Data Mining and Knowledge Discovery 2(2):121–167.

    Google Scholar 

  3. Cortes, C. and V. Vapnik: 1995, ‘Support Vector Networks’. Machine Learning 20, 1–25.

    Google Scholar 

  4. Lee, Y. J. and O.L. Mangasarian: 2001 ‘RSVM: Reduced Support Vector Machines’ In First SIAM International Conference on Data Mining

    Google Scholar 

  5. Musavi, M.T., Ahmed, W., Chan, H., Faris, K.B, Hummels, D.M. 1992. “On the Training of Radial Basis Function Classifiers,” Neural Network 5: 595–603.

    Google Scholar 

  6. Osuna, E., R. Freund, and F. Girosi: 1997 ‘Improved Training Algorithm for Support Vector Machines’ In Proceeding of IEEE Neural Networks and Signal Processing (NNSP’97)

    Google Scholar 

  7. Platt, J.: 1999 ‘Fast training of support vector machines using sequential minimal optimization’ In B. Schölkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods— Support Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.

    Google Scholar 

  8. Provost, F., and V. Kolluri: 1999 ‘A survey of methods for scaling up inductive algorithms’ In Machine Learning, 1999, p. 1–42.

    Google Scholar 

  9. Rifkin, R.: 2000 ‘SvmFu a Support Vector Machine Package’ In http://five-percentnation. mit.edu/PersonalPages/rif/SvmFu/index.html

  10. Trafalis T. and H. Ince: 2001, ‘Benders Decomposition Technique for Support VectorRegr ession’ Working Paper, School of Industrial Engineering, University of Oklahoma, Norman, OK

    Google Scholar 

  11. Vanderbei, R.J. 1997. “LOQO User’s Manual-Version 3.04.” Statistical and Operations Research Tech. Rep. SOR-97-07 Princeton University.

    Google Scholar 

  12. Vapnik, V. N.: 1998, Statistical Learning Theory. New York: Wiley.

    Google Scholar 

  13. Wright, M.H. 1992. “Interior Methods for Constrained Optimization” Tech. Rep. AT&T Bell Lab.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Evgeniou, T., Pontil, M. (2002). Support Vector Machines with Clustering for Training with Very Large Datasets. In: Vlahavas, I.P., Spyropoulos, C.D. (eds) Methods and Applications of Artificial Intelligence. SETN 2002. Lecture Notes in Computer Science(), vol 2308. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46014-4_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-46014-4_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43472-6

  • Online ISBN: 978-3-540-46014-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics