Skip to main content
Log in

Training data reduction to speed up SVM training

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Traditional Support Vector Machine (SVM) solution suffers from O(n 2) time complexity, which makes it impractical to very large datasets. To reduce its high computational complexity, several data reduction methods are proposed in previous studies. However, such methods are not effective to extract informative patterns. In this paper, a two-stage informative pattern extraction approach is proposed. The first stage of our approach is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are constructed on the sampled datasets. Training data correctly classified by all the weak classifiers are cleaned due to lacking useful information for training. To further extract more informative training data, two informative pattern extraction algorithms are proposed in the second stage. As most training data are eliminated and only the more informative samples remain, the final SVM training time is reduced significantly. Contributions of this paper are three-fold. (1) First, a parallelized bootstrap sampling based method is proposed to clean the initial training data. By doing that, a large number of training data with little information are eliminated. (2) Then, we present two algorithms to effectively extract more informative training data. Both algorithms are based on maximum information entropy according to the empirical misclassification probability of each sample estimated in the first stage. Therefore, training time can be further reduced for training data further reduction. (3) Finally, empirical studies on four large datasets show the effectiveness of our approach in reducing the training data size and the computational cost, compared with the state-of-the-art algorithms, including PEGASOS, LIBLINEAR SVM and RSVM. Meanwhile, the generalization performance of our approach is comparable with baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. d is a bound on the number of non-zero features for dataset and λ is the regularization parameter of SVM.

  2. The \(\widetilde {O}(\cdot)\) notation hides logarithmic factors.

References

  1. Wang SZ, Li ZJ, Chao WH, Cao QH (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In: Proceedings of IJCNN

    Google Scholar 

  2. Cao YB, Xu J, Liu TY, Li H, Huang YL, Hon HW (2006) Adapting ranking SVM to document retrieval. In: Proceedings of SIGIR, pp 186–193

    Google Scholar 

  3. Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SIAM workshop on link analysis, counter-terrorism and security

    Google Scholar 

  4. Burges C (1999) Geometry and invariance in kernel based methods. In: Advances in kernel methods: support vector learning. MIT Press, Cambridge

    Google Scholar 

  5. Panda N, Edward YC, Wu G (2006) Concept boundary detection for speeding up SVMs. In: Proceedings of ICML, pp 681–688

    Chapter  Google Scholar 

  6. Graf HP, Cosatto E, Bottou L, Durdanovic I, Vapnik V (2006) Parallel support vector machines: the cascade SVM. In: Advances in neural information processing system, vol 17. MIT Press, Cambridge, pp 521–528

    Google Scholar 

  7. Lawrence ND, Seeger M, Herbrich R (2003) Fast sparse Gaussian process methods: the informative vector machine. In: Advances in neural information processing systems. MIT Press, Cambridge

    Google Scholar 

  8. Yu H, Yang J, Han J (2003) Classifying large datasets using SVM with hierarchical clusters. In: Proceedings of KDD

    Google Scholar 

  9. Vapnik V (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  10. Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208

    Google Scholar 

  11. Joachims T (1999) Making large-scale support vector machine learning practical. In: Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 169–184

    Google Scholar 

  12. Kao WC, Chung KM, Sun CL, Lin CJ (2004) Decomposition methods for linear support vector machines. Neural Comput. 16(8):1689–1704

    Article  MATH  Google Scholar 

  13. Tsang IW, James TK, Cheung PM (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392

    MATH  MathSciNet  Google Scholar 

  14. Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of SDM

    Google Scholar 

  15. Fine S, Scheinberg K (2001) Efficient SVM training using low-rank kernel representations. J Mach Learn Res 2:243–264

    Google Scholar 

  16. Shai SS, Srebro N (2008) SVM optimization: inverse dependence on training set size. In: Proceedings of ICML

    Google Scholar 

  17. Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of KDD

    Google Scholar 

  18. Smola A, Vishwanathan S, Le Q (2008) Bundle methods for machine learning. In: Advances in neural information processing systems

    Google Scholar 

  19. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874

    MATH  Google Scholar 

  20. Shai SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-GrAdient solver for SVM. In: Proceedings of ICML

    Google Scholar 

  21. Peter LB, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482

    MathSciNet  Google Scholar 

  22. Guyon I, Matic N, Vapnik V (1994) Discovering informative patterns and data cleaning. In: Proceedings of AAAI workshop on knowledge discovery in databases

    Google Scholar 

  23. MacKay D (1992) Information-based objective functions for active data selection. Neural Comput 4(4):590–604

    Article  Google Scholar 

  24. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27

    Article  Google Scholar 

  25. Chang CC, Lin CJ (2001) IJCNN 2001 challenge: generalization ability and text decoding. In: Proceedings of IJCNN

    Google Scholar 

  26. Smits GF, Jordan EM (2002) Improved SVM regression using mixtures of kernels. In: Proceedings of IJCNN

    Google Scholar 

  27. Kumar A, Ghosh SK, Dadhwal VK (2006) Study of mixed kernel effect on classification accuracy using density estimation. In: Mid-term ISPRS symposium, ITC

    Google Scholar 

  28. Shi YH, Gao Y, Wang RL, Zhang Y, Wang D (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell 38(1):16–28

    Article  Google Scholar 

  29. Collobert R, Bengio S, Bengio Y (2002) A parallel mixtures of SVMs for very large scale problems. Neural Comput 14:1105–1114

    Article  MATH  Google Scholar 

  30. Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474

    Article  Google Scholar 

  31. Idris A, Khan A, Lee YS (2013) Intelligent churn prediction in Telecom: employing mRMR feature selection and RotBoost based ensemble classification. Appl Intell 39(3):659–672

    Article  Google Scholar 

  32. Maudes J, Diez JJR, Osorio CG, Pardo C (2011) Random projections for linear SVM ensembles. Appl Intell 34(3):347–359

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61170189, 61370126, 61202239), the Research Fund for the Doctoral Program of Higher Education (Grant No. 20111102130003), and the Fund of the State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2013ZX-19).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhoujun Li or Chunyang Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, S., Li, Z., Liu, C. et al. Training data reduction to speed up SVM training. Appl Intell 41, 405–420 (2014). https://doi.org/10.1007/s10489-014-0524-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-014-0524-2

Keywords

Navigation