Skip to main content
Log in

A K-Farthest-Neighbor-based approach for support vector data description

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Support vector data description (SVDD) is a well-known technique for one-class classification problems. However, it incurs high time complexity in handling large-scale datasets. In this paper, we propose a novel approach, named K-Farthest-Neighbor-based Concept Boundary Detection (KFN-CBD), to improve the training efficiency of SVDD. KFN-CBD aims at identifying the examples lying close to the boundary of the target class, and these examples, instead of the entire dataset, are then used to learn the classifier. Extensive experiments have shown that KFN-CBD obtains substantial speedup compared to standard SVDD, and meanwhile maintains comparable accuracy as the entire dataset used.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Algorithm 2
Algorithm 2.1
Algorithm 2.1.1
Algorithm 2.1.2
Algorithm 2.2
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Quadratic programming (QP) problem is about optimizing a quadratic function of several variables subject to linear constraints on these variables. The optimization problems of SVM and SVDD are QP problems.

  2. Available at http://yann.lecun.com/exdb/mnist/.

  3. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

  4. Available at http://archive.ics.uci.edu.

  5. Available at www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

  1. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  2. Yu H, Hsieh C, Chang K, Lin C (2012) Large linear classification when data cannot fit in memory. ACM Trans Knowl Discov from Data 5(4):23210–23230

    Google Scholar 

  3. Diosan L, Rogozan A, Pecuchet JP (2012) Improving classification performance of support vector machine by genetically optimising kernel shape and hyper-parameters. Appl Intell 36(2):280–294

    Article  Google Scholar 

  4. Shao YH, Wang Z, Chen WJ, Deng NY (2013) Least squares twin parametric-margin support vector machine for classification. Appl Intell 39(3):451–464

    Article  Google Scholar 

  5. Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66

    Article  MATH  Google Scholar 

  6. Wang Z, Zhang Q, Sun X (2010) Document clustering algorithm based on NMF and SVDD. In: Proceedings of the international conference on communication systems, networks and applications

    Google Scholar 

  7. Wang D, Tan X (2013) Centering SVDD for unsupervised feature representation in object classification. In: Proceedings of the international conference on neural information processing

    Google Scholar 

  8. Tax DMJ, Duin RPW (1999) Support vector data description applied to machine vibration analysis. In: Proceedings of the fifth annual conference of the ASCI, pp 398–405

    Google Scholar 

  9. Lee SW, Park J (2006) Low resolution face recognition based on support vector data description. Pattern Recognit 39(9):1809–1812

    Article  MATH  Google Scholar 

  10. Brunner C, Fischer A, Luig K, Thies T (2012) Pairwise support vector machines and their application to large scale problems. J Mach Learn Res 13(1):2279–2292

    MATH  MathSciNet  Google Scholar 

  11. Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the international conference on very large data bases, pp 426–435

    Google Scholar 

  12. Debnath R, Muramatsu M, Takahashi H (2005) An efficient support vector machine learning method with second-order cone programming for large-scale problems. Appl Intell 23(3):219–239

    Article  MATH  Google Scholar 

  13. Joachims T (1998) Making large-scale support vector machine learning practical. Advances in Kernel methods. MIT Press, Cambridge

    Google Scholar 

  14. Osuna E, Freund R, Girosi F (1997) An improved training algorithm for support vector machines. In: Proceedings of the IEEE workshop on neural networks for signal processing, pp 276–285

    Google Scholar 

  15. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. Advances in Kernel methods. MIT Press, Cambridge

    Google Scholar 

  16. Keerthi S, Shevade S, Bhattacharyya C, Murthy K (2001) Improvements to Platt’s SMO algorithm for SVM classifier design. Neural Comput 13(3):637–649

    Article  MATH  Google Scholar 

  17. Chang CC, Lin CJ (2001) LibSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm

  18. Fine S, Scheinberg K (2001) Efficient SVM training using low-rank kernel representation. J Mach Learn Res 2:243–264

    Google Scholar 

  19. Cossalter M, Yan R, Zheng L (2011) Adaptive kernel approximation for large-scale non-linear SVM prediction. In: Proceedings of the international conference on machine learning

    Google Scholar 

  20. Chen M-S, Lin K-P (2011) Efficient kernel approximation for large-scale support vector machine classification. In: Proceedings of the SIAM international conference on data mining

    Google Scholar 

  21. Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    MATH  MathSciNet  Google Scholar 

  22. Graf H, Cosatto E, Bottou L, Dourdanovic I, Vapnik V (2005) Parallel support vector machines: the cascade SVM. In: Proceedings of the advances in neural information processing systems, pp 521–528

    Google Scholar 

  23. Smola A, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. In: Proceedings of the international conference on machine learning, pp 911–918

    Google Scholar 

  24. Pavlov D, Chudova D, Smyth P (2000) Towards scalable support vector machines using squashing. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 295–299

    Google Scholar 

  25. Li B, Chi M, Fan J, Xue X (2007) Support cluster machine. In: Proceedings of the international conference on machine learning, pp 505–512

    Google Scholar 

  26. Zanni L, Serafini T, Zanghirati G (2006) Parallel software for training large scale support vector machines on multiprocessor systems. J Mach Learn Res 7:1467–1492

    MATH  MathSciNet  Google Scholar 

  27. Collobert R, Bengio S, Bengio Y (2002) A parallel mixture of SVMs for very large scale problems. Neural Comput 14(5):1105–1114

    Article  MATH  Google Scholar 

  28. Gu Q, Han J (2013) Clustered support vector machines. In: Proceedings of the international conference on artificial intelligence and statistics

    Google Scholar 

  29. Yuan Z, Zheng N, Liu Y (2005) A cascaded mixture SVM classifier for object detection. In: Proceedings of the international conference on advances in neural networks, pp 906–912

    Google Scholar 

  30. Yu H, Yang J, Han J (2003) Classifying large data sets using svm with hierarchical clusters. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, pp 306–315

    Google Scholar 

  31. Sun S, Tseng C, Chen Y, Chuang S, Fu H (2004) Cluster-based support vector machines in text-independent speaker identification. In: Proceedings of the international conference on neural networks

    Google Scholar 

  32. Boley D, Cao D (2004) Training support vector machine using adaptive clustering. In: Proceedings of the SIAM international conference on data mining

    Google Scholar 

  33. Lawrence N, Seeger M, Herbrich R (2003) Fast sparse Gaussian process methods: the informative vector machine. In: Proceedings of the advances in neural information processing systems, pp 609–616

    Google Scholar 

  34. Kim P, Chang H, Song D, Choi J (2007) Fast support vector data description using k-means clustering. In: Proceedings of the international symposium on neural networks: advances in neural networks, part III, pp 506–514

    Google Scholar 

  35. Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474

    Article  Google Scholar 

  36. Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34(1):19–27

    Article  MATH  Google Scholar 

  37. Lee LH, Lee CH, Rajkumar R, Isa D (2012) An enhanced support vector machine classification framework by using Euclidean distance function for text document categorization. Appl Intell 37(1):80–99

    Article  Google Scholar 

  38. Garcia V, Debreuve E, Barlaud M (2008) Fast k nearest neighbor search using GPU. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition workshops

    Google Scholar 

  39. Menendez HD, Barrero DF, Camacho D (2013) A multi-objective genetic graph-based clustering algorithm with memory optimization. In: Proceedings of the IEEE Congress on evolutionary computation

    Google Scholar 

  40. Ciaccia P, Patella M, Zezula P (1998) Bulk loading the M-tree. In: Proceedings of the Australasian database conference, pp 15–26

    Google Scholar 

  41. Ciaccia P, Patella M (2000) PAC nearest neighbor queries: approximate and controlled search in high-dimensional and metric spaces. In: Proceedings of the international conference on data engineering

    Google Scholar 

  42. Ben-Hur A, Horn D, Siegelmann H, Vapnik V (2001) Support vetor clustering. J Mach Learn Res 2:125–137

    Google Scholar 

  43. Davy M, Godsill S (2002) Detection of abrupt spectral changes using support vector machines—an application to audio signal segmentation. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing

    Google Scholar 

  44. Mayur D, Nicole I, Piotr I, Vahab SM (2004) Locality-sensitive hashing scheme based on P-stable distributions. In: Proceedings of the annual ACM symposium on computational geometry

    Google Scholar 

Download references

Acknowledgements

This work is supported by Natural Science Foundation of China (61070033, 61203280, 61202270), Guangdong Natural Science Funds for Distinguished Young Scholar (S2013050014133), Natural Science Foundation of Guangdong province (9251009001000005, S2011040004187, S2012040007078), Specialized Research Fund for the Doctoral Program of Higher Education (20124420120004), Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry, GDUT Overseas Outstanding Doctoral Fund (405120095), Science and Technology Plan Project of Guangzhou City (12C42111607, 201200000031, 2012J5100054), Science and Technology Plan Project of Panyu District Guangzhou (2012-Z-03-67), Australian Research Council Discovery Grant (DP1096218, DP130102691), and ARC Linkage Grant (LP100200774, LP120100566).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanshan Xiao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xiao, Y., Liu, B., Hao, Z. et al. A K-Farthest-Neighbor-based approach for support vector data description. Appl Intell 41, 196–211 (2014). https://doi.org/10.1007/s10489-013-0502-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0502-0

Keywords

Navigation