Abstract
Support vector data description (SVDD) is a well-known technique for one-class classification problems. However, it incurs high time complexity in handling large-scale datasets. In this paper, we propose a novel approach, named K-Farthest-Neighbor-based Concept Boundary Detection (KFN-CBD), to improve the training efficiency of SVDD. KFN-CBD aims at identifying the examples lying close to the boundary of the target class, and these examples, instead of the entire dataset, are then used to learn the classifier. Extensive experiments have shown that KFN-CBD obtains substantial speedup compared to standard SVDD, and meanwhile maintains comparable accuracy as the entire dataset used.
Similar content being viewed by others
Notes
Quadratic programming (QP) problem is about optimizing a quadratic function of several variables subject to linear constraints on these variables. The optimization problems of SVM and SVDD are QP problems.
Available at http://yann.lecun.com/exdb/mnist/.
Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.
Available at http://archive.ics.uci.edu.
Available at www.csie.ntu.edu.tw/~cjlin/libsvm/.
References
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Yu H, Hsieh C, Chang K, Lin C (2012) Large linear classification when data cannot fit in memory. ACM Trans Knowl Discov from Data 5(4):23210–23230
Diosan L, Rogozan A, Pecuchet JP (2012) Improving classification performance of support vector machine by genetically optimising kernel shape and hyper-parameters. Appl Intell 36(2):280–294
Shao YH, Wang Z, Chen WJ, Deng NY (2013) Least squares twin parametric-margin support vector machine for classification. Appl Intell 39(3):451–464
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Wang Z, Zhang Q, Sun X (2010) Document clustering algorithm based on NMF and SVDD. In: Proceedings of the international conference on communication systems, networks and applications
Wang D, Tan X (2013) Centering SVDD for unsupervised feature representation in object classification. In: Proceedings of the international conference on neural information processing
Tax DMJ, Duin RPW (1999) Support vector data description applied to machine vibration analysis. In: Proceedings of the fifth annual conference of the ASCI, pp 398–405
Lee SW, Park J (2006) Low resolution face recognition based on support vector data description. Pattern Recognit 39(9):1809–1812
Brunner C, Fischer A, Luig K, Thies T (2012) Pairwise support vector machines and their application to large scale problems. J Mach Learn Res 13(1):2279–2292
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the international conference on very large data bases, pp 426–435
Debnath R, Muramatsu M, Takahashi H (2005) An efficient support vector machine learning method with second-order cone programming for large-scale problems. Appl Intell 23(3):219–239
Joachims T (1998) Making large-scale support vector machine learning practical. Advances in Kernel methods. MIT Press, Cambridge
Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Proceedings of the IEEE workshop on neural networks for signal processing, pp 276–285
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Advances in Kernel methods. MIT Press, Cambridge
Keerthi S, Shevade S, Bhattacharyya C, Murthy K (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649
Chang CC, Lin CJ (2001) LibSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm
Fine S, Scheinberg K (2001) Efficient SVM training using low-rank kernel representation. J Mach Learn Res 2:243–264
Cossalter M, Yan R, Zheng L (2011) Adaptive kernel approximation for large-scale non-linear SVM prediction. In: Proceedings of the international conference on machine learning
Chen M-S, Lin K-P (2011) Efficient kernel approximation for large-scale support vector machine classification. In: Proceedings of the SIAM international conference on data mining
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Graf H, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Proceedings of the advances in neural information processing systems, pp 521–528
Smola A, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Proceedings of the international conference on machine learning, pp 911–918
Pavlov D, Chudova D, Smyth P (2000) Towards scalable support vector machines using squashing. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 295–299
Li B, Chi M, Fan J, Xue X (2007) Support cluster machine. In: Proceedings of the international conference on machine learning, pp 505–512
Zanni L, Serafini T, Zanghirati G (2006) Parallel software for training large scale support vector machines on multiprocessor systems. J Mach Learn Res 7:1467–1492
Collobert R, Bengio S, Bengio Y (2002) A parallel mixture of SVMs for very large scale problems. Neural Comput 14(5):1105–1114
Gu Q, Han J (2013) Clustered support vector machines. In: Proceedings of the international conference on artificial intelligence and statistics
Yuan Z, Zheng N, Liu Y (2005) A cascaded mixture SVM classifier for object detection. In: Proceedings of the international conference on advances in neural networks, pp 906–912
Yu H, Yang J, Han J (2003) Classifying large data sets using svm with hierarchical clusters. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 306–315
Sun S, Tseng C, Chen Y, Chuang S, Fu H (2004) Cluster-based support vector machines in text-independent speaker identification. In: Proceedings of the international conference on neural networks
Boley D, Cao D (2004) Training support vector machine using adaptive clustering. In: Proceedings of the SIAM international conference on data mining
Lawrence N, Seeger M, Herbrich R (2003) Fast sparse Gaussian process methods: the informative vector machine. In: Proceedings of the advances in neural information processing systems, pp 609–616
Kim P, Chang H, Song D, Choi J (2007) Fast support vector data description using k-means clustering. In: Proceedings of the international symposium on neural networks: advances in neural networks, part III, pp 506–514
Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474
Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34(1):19–27
Lee LH, Lee CH, Rajkumar R, Isa D (2012) An enhanced support vector machine classification framework by using Euclidean distance function for text document categorization. Appl Intell 37(1):80–99
Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition workshops
Menendez HD, Barrero DF, Camacho D (2013) A multi-objective genetic graph-based clustering algorithm with memory optimization. In: Proceedings of the IEEE Congress on evolutionary computation
Ciaccia P, Patella M, Zezula P (1998) Bulk loading the M-tree. In: Proceedings of the Australasian database conference, pp 15–26
Ciaccia P, Patella M (2000) PAC nearest neighbor queries: approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the international conference on data engineering
Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2001) Support vetor clustering. J Mach Learn Res 2:125–137
Davy M, Godsill S (2002) Detection of abrupt spectral changes using support vector machines—an application to audio signal segmentation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing
Mayur D, Nicole I, Piotr I, Vahab SM (2004) Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the annual ACM symposium on computational geometry
Acknowledgements
This work is supported by Natural Science Foundation of China (61070033, 61203280, 61202270), Guangdong Natural Science Funds for Distinguished Young Scholar (S2013050014133), Natural Science Foundation of Guangdong province (9251009001000005, S2011040004187, S2012040007078), Specialized Research Fund for the Doctoral Program of Higher Education (20124420120004), Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, GDUT Overseas Outstanding Doctoral Fund (405120095), Science and Technology Plan Project of Guangzhou City (12C42111607, 201200000031, 2012J5100054), Science and Technology Plan Project of Panyu District Guangzhou (2012-Z-03-67), Australian Research Council Discovery Grant (DP1096218, DP130102691), and ARC Linkage Grant (LP100200774, LP120100566).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, Y., Liu, B., Hao, Z. et al. A K-Farthest-Neighbor-based approach for support vector data description. Appl Intell 41, 196–211 (2014). https://doi.org/10.1007/s10489-013-0502-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-013-0502-0