Abstract
The training time for Support vector machine (SVM) depends largely on the size of the training set, which makes it impractical for large data sets. This paper presents a new method to reduce the size by combining two supplementary algorithms. The training data is partitioned into several pair-wise disjoint clusters by using k-means clustering algorithm. Then, the representatives of these clusters can be edited by Gabriel graph algorithm, based on which we can approximately identify the support vectors and non-support vectors. After de-clustering the marginal boundary clusters represented by support vectors and deleting the internal clusters represented by non-support vectors, the number of training data can be significantly reduced, thereby speeding up the training process. The proposed method was tested on both the artificial data and real data. Experiment results show that replacing the training set with the edited set obtained from Gabriel graph algorithm and k-means clustering technique as the training set, significantly reduces the training time for SVM, yet the classification accuracy remains nearly undegraded.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Joachims, T.: Text Categorization with Support Vector Machine: Learning with Many Features. ECML-98, 10th European Conf. Machine Learning, pp. 137–142 (1998)
DeCoste, D., Scholkopf, B.: Training Invariant Support Vector Machine. Machine Learning 46, 161–190 (2002)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A Training Algorithm for Optimal Margin Classifiers. In: 5th Annual Workshop on Computational Learning Theory, pp. 144–152. ACM, New York (1992)
Osuna, E., Freund, R., Girosi, F.: An Improved Training Algorithm for Support Vector Machines. IEEE NNSP, pp. 276–285 (1997)
Kaufman, L.: Solving the Quadratic Programming Problem Arising in Support Vector Classification. In: Advances in Kernel Methods: Support Vector Learning, pp. 147–167 (1998)
Joachims, T.: Making Large-scale Support Vector Mmachine Learning Practical. In: Advances in Kernel Methods: Support Vector Learning, pp. 169–184 (1999)
Platt J.C.: Sequential Minimum Optimization: A Fast Algorithm for Training Support Vector Machines. In: Advanced in Kernel Methods: support Vector Learning, pp. 185–208 (1998)
Williams, C., Seeger, M.: Using the Nystromn Method to Speed Up Kernel Machines. In: Advances in Neural Information Processing Systems 13, MIT Press, Cambridge (2001)
Fine, S., Scheinberg, K.: Efficient SVM training Using Low-rank Kernel Representations. Journal of Machine Learning Research 2, 243–264 (2001)
Achlioptas, D., McSherry, F., Scholkopf, B.: Sampling Techniques for Kernel Methods. Advances in Neural Information Processing Systems 14, 335–342 (2002)
Shih, L., Rennie, M., Chang, Y., Karger, R.: Text Bundling Statistics-based Data Reduction. In: 20th International Conf. On Machine Learning, pp. 696–703 (2003)
Agarwal, K.: Shrinkage Estimator Generalizations of Proximal Support Vector Machines. In: 8th ACM SIGKDD, pp. 173–182. ACM Press, New York (2002)
Valentini, G., Dietterich, G.: Low Bias Bagged Support Vector Machines. In: 20th ICML, Washington D.C.USA, pp. 752–759 (2003)
Wan, Z., Irwin, K.: A Study of the Relationship Between Support Vector Machine and Gabriel Graph. In: Proc. ICNN, Vol.1, pp. 239–244 (2002)
Daniel, B., Dongwei, C.: Training Support Vector Machine using Adaptive Clustering. In: SIAM International Conference on Data Mining, pp. 126–137 (2004)
Mamoun, A., Latifur, K., Bastani, F., Yen, L.: An effective Support Vector Machines(SVMs) Performance Using Hierarchical Clustering. In: 16th IEEE International Conf. on Tools with Artificial Intelligence, pp. 663–667. IEEE Computer Society Press, Los Alamitos (2004)
Christopher, J., Burges, C.: A tutorial on Support Vector Machines For Pattern Recognition. Data Mining and Knowledge Discovery 2, 121–167 (1998)
Chih-Chung, C., Chih-Jen, L.: LIBSVM: A Library For Support Vector Machines(Version 2.31). Technical report, National Taiwan University, Taipei, Taiwan (2001)
Chih-Chung, C., Chih-Jen, L.: A Comparison on Methods For Multi-class Support Vector Machines. IEEE Trans. On Neural Networks 13, 415–425 (2002)
Cutting, D., Karger, D., Pedersen, J., Tukey, J.: Scatter/gather: A Cluster-based Approach to Browsing Large Document Collections. In: 15th Ann. Int. ACM SIGIR, pp. 318–329. ACM Press, New York (1992)
Blake, L., Merz, J.: UCI Repository of Machine Learning Databases. Irvine, CA: University of California at Irvine, http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, X., Wang, N., Li, SY. (2007). A Fast Training Algorithm for SVM Via Clustering Technique and Gabriel Graph. In: Huang, DS., Heutte, L., Loog, M. (eds) Advanced Intelligent Computing Theories and Applications. With Aspects of Contemporary Intelligent Computing Techniques. ICIC 2007. Communications in Computer and Information Science, vol 2. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74282-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-540-74282-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74281-4
Online ISBN: 978-3-540-74282-1
eBook Packages: Computer ScienceComputer Science (R0)