Abstract
Traditional Support Vector Machine (SVM) solution suffers from O(n 2) time complexity, which makes it impractical to very large datasets. To reduce its high computational complexity, several data reduction methods are proposed in previous studies. However, such methods are not effective to extract informative patterns. In this paper, a two-stage informative pattern extraction approach is proposed. The first stage of our approach is data cleaning based on bootstrap sampling. A bundle of weak SVM classifiers are constructed on the sampled datasets. Training data correctly classified by all the weak classifiers are cleaned due to lacking useful information for training. To further extract more informative training data, two informative pattern extraction algorithms are proposed in the second stage. As most training data are eliminated and only the more informative samples remain, the final SVM training time is reduced significantly. Contributions of this paper are three-fold. (1) First, a parallelized bootstrap sampling based method is proposed to clean the initial training data. By doing that, a large number of training data with little information are eliminated. (2) Then, we present two algorithms to effectively extract more informative training data. Both algorithms are based on maximum information entropy according to the empirical misclassification probability of each sample estimated in the first stage. Therefore, training time can be further reduced for training data further reduction. (3) Finally, empirical studies on four large datasets show the effectiveness of our approach in reducing the training data size and the computational cost, compared with the state-of-the-art algorithms, including PEGASOS, LIBLINEAR SVM and RSVM. Meanwhile, the generalization performance of our approach is comparable with baseline methods.
Similar content being viewed by others
Notes
d is a bound on the number of non-zero features for dataset and λ is the regularization parameter of SVM.
The \(\widetilde {O}(\cdot)\) notation hides logarithmic factors.
References
Wang SZ, Li ZJ, Chao WH, Cao QH (2012) Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning. In: Proceedings of IJCNN
Cao YB, Xu J, Liu TY, Li H, Huang YL, Hon HW (2006) Adapting ranking SVM to document retrieval. In: Proceedings of SIGIR, pp 186–193
Hasan MA, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SIAM workshop on link analysis, counter-terrorism and security
Burges C (1999) Geometry and invariance in kernel based methods. In: Advances in kernel methods: support vector learning. MIT Press, Cambridge
Panda N, Edward YC, Wu G (2006) Concept boundary detection for speeding up SVMs. In: Proceedings of ICML, pp 681–688
Graf HP, Cosatto E, Bottou L, Durdanovic I, Vapnik V (2006) Parallel support vector machines: the cascade SVM. In: Advances in neural information processing system, vol 17. MIT Press, Cambridge, pp 521–528
Lawrence ND, Seeger M, Herbrich R (2003) Fast sparse Gaussian process methods: the informative vector machine. In: Advances in neural information processing systems. MIT Press, Cambridge
Yu H, Yang J, Han J (2003) Classifying large datasets using SVM with hierarchical clusters. In: Proceedings of KDD
Vapnik V (1998) Statistical learning theory. Wiley, New York
Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 185–208
Joachims T (1999) Making large-scale support vector machine learning practical. In: Advances in kernel methods—support vector learning. MIT Press, Cambridge, pp 169–184
Kao WC, Chung KM, Sun CL, Lin CJ (2004) Decomposition methods for linear support vector machines. Neural Comput. 16(8):1689–1704
Tsang IW, James TK, Cheung PM (2005) Core vector machines: fast SVM training on very large data sets. J Mach Learn Res 6:363–392
Lee YJ, Mangasarian OL (2001) RSVM: reduced support vector machines. In: Proceedings of SDM
Fine S, Scheinberg K (2001) Efficient SVM training using low-rank kernel representations. J Mach Learn Res 2:243–264
Shai SS, Srebro N (2008) SVM optimization: inverse dependence on training set size. In: Proceedings of ICML
Joachims T (2006) Training linear SVMs in linear time. In: Proceedings of KDD
Smola A, Vishwanathan S, Le Q (2008) Bundle methods for machine learning. In: Advances in neural information processing systems
Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9:1871–1874
Shai SS, Singer Y, Srebro N (2007) Pegasos: primal estimated sub-GrAdient solver for SVM. In: Proceedings of ICML
Peter LB, Mendelson S (2002) Rademacher and Gaussian complexities: risk bounds and structural results. J Mach Learn Res 3:463–482
Guyon I, Matic N, Vapnik V (1994) Discovering informative patterns and data cleaning. In: Proceedings of AAAI workshop on knowledge discovery in databases
MacKay D (1992) Information-based objective functions for active data selection. Neural Comput 4(4):590–604
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27
Chang CC, Lin CJ (2001) IJCNN 2001 challenge: generalization ability and text decoding. In: Proceedings of IJCNN
Smits GF, Jordan EM (2002) Improved SVM regression using mixtures of kernels. In: Proceedings of IJCNN
Kumar A, Ghosh SK, Dadhwal VK (2006) Study of mixed kernel effect on classification accuracy using density estimation. In: Mid-term ISPRS symposium, ITC
Shi YH, Gao Y, Wang RL, Zhang Y, Wang D (2013) Transductive cost-sensitive lung cancer image classification. Appl Intell 38(1):16–28
Collobert R, Bengio S, Bengio Y (2002) A parallel mixtures of SVMs for very large scale problems. Neural Comput 14:1105–1114
Wang CW, You WH (2013) Boosting-SVM: effective learning with reduced data dimension. Appl Intell 39(3):465–474
Idris A, Khan A, Lee YS (2013) Intelligent churn prediction in Telecom: employing mRMR feature selection and RotBoost based ensemble classification. Appl Intell 39(3):659–672
Maudes J, Diez JJR, Osorio CG, Pardo C (2011) Random projections for linear SVM ensembles. Appl Intell 34(3):347–359
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61170189, 61370126, 61202239), the Research Fund for the Doctoral Program of Higher Education (Grant No. 20111102130003), and the Fund of the State Key Laboratory of Software Development Environment (Grant No. SKLSDE-2013ZX-19).
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Wang, S., Li, Z., Liu, C. et al. Training data reduction to speed up SVM training. Appl Intell 41, 405–420 (2014). https://doi.org/10.1007/s10489-014-0524-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-014-0524-2