Skip to main content
Log in

CBCH (clustering-based convex hull) for reducing training time of support vector machine

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) is an efficient machine learning technique widely applied to various classification problems due to its robustness. However, the training time grows dramatically as the number of training data increases. As a result, the applicability of SVM to large-scale datasets is somewhat limited. In SVM, only a few training samples called support vectors (SVs) affect the construction of hyperplane. Therefore, removing training data irrelevant to the SVs does not degrade the performance of SVM. In this paper the clustering-based convex hull (CBCH) scheme is introduced which allows to efficiently remove insignificant data and thereby reduce the training time of SVM. The CBCH scheme initially applies k-mean clustering algorithm to the given training data points, and then, the convex hull of each cluster is obtained. Only the vertices of the convex hulls and the data points relevant to the SVs are included as training data points. Computer simulation over various sizes and types of datasets reveals that the proposed scheme is considerably faster and more accurate than the existing SVM classifiers. The proposed algorithm is based on geometric interpretation of the SVM and applicable to both linearly separable and linearly inseparable datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Yao Y, Liu Y, Yu Y, Xu H, Lv W, Li Z, Chen X (2013) K-SVM: an effective SVM algorithm based on K-means clustering. J Comput 8:2632–2639

    Google Scholar 

  2. Varadwaj P, Purohit N, Arora B (2009) Detection of splice sites using support vector machine. In: International Conference on Contemporary Computing, pp 493–502

  3. Kumar MA, Gopal M (2010) A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recogn Lett 31:1437–1444

    Article  Google Scholar 

  4. Mitra V, Wang C-J, Banerjee S (2007) Text classification: a least square support vector machine approach. Appl Soft Comput 7:908–914

    Article  Google Scholar 

  5. Sánchez AVD (2003) Advanced support vector machines and kernel methods. Neurocomputing 55:5–20

    Article  Google Scholar 

  6. Dong J, Krzyżak A, Suen CY (2005) An improved handwritten Chinese character recognition system using support vector machine. Pattern Recogn Lett 26:1849–1856

    Article  Google Scholar 

  7. Yang Y, Yu D, Cheng J (2007) A fault diagnosis approach for roller bearing based on IMF envelope spectrum and SVM. Measurement 40:943–950

    Article  Google Scholar 

  8. Abbasion S, Rafsanjani A, Farshidianfar A, Irani N (2007) Rolling element bearings multi-fault classification based on the wavelet denoising and support vector machine. Mech Syst Signal Process 21:2933–2945

    Article  Google Scholar 

  9. Zeng M, Yang Y, Zheng J, Cheng J (2015) Maximum margin classification based on flexible convex hulls. Neurocomputing 149:957–965

    Article  Google Scholar 

  10. Bennett KP, Bredensteiner EJ (2000) Duality and geometry in SVM classifiers. In: ICML, pp 57–64

  11. Vapnik VN, Kotz S (1982) Estimation of dependences based on empirical data. Springer, New York

    MATH  Google Scholar 

  12. Platt J (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Technical report MSR-TR-98-14, Microsoft research

  13. Awad M, Khan L, Bastani F, Yen I-L (2004) An effective support vector machines (SVMs) performance using hierarchical clustering. In: 16th IEEE International Conference on Tools with Artificial Intelligence, pp 663–667

  14. Yu H, Yang J, Han J, Li X (2005) Making SVMs scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Discov 11:295–321

    Article  MathSciNet  Google Scholar 

  15. Heisele B, Serre T, Prentice S, Poggio T (2003) Hierarchical classification and feature reduction for fast face detection with support vector machines. Pattern Recogn 36:2007–2017

    Article  MATH  Google Scholar 

  16. Sohn S, Dagli CH (2001) Advantages of using fuzzy class memberships in self-organizing map and support vector machines. In: Proceedings 2001. IJCNN’01. International Joint Conference on Neural Networks, pp 1886–1890

  17. Cervantes J, Li X, Yu W (2006) Support vector machine classification based on fuzzy clustering for large data sets. In: Mexican International Conference on Artificial Intelligence, pp 572–582

  18. Almasi ON, Rouhani M (2016) Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets. Turk J Electr Eng Comput Sci 24:219–233

    Article  Google Scholar 

  19. Cervantes J, Li X, Yu W, Li K (2008) Support vector machine classification for large data sets via minimum enclosing ball clustering. Neurocomputing 71:611–619

    Article  Google Scholar 

  20. Shen X-J, Mu L, Li Z, Wu H-X, Gou J-P, Chen X (2016) Large-scale support vector machine classification with redundant data reduction. Neurocomputing 172:189–197

    Article  Google Scholar 

  21. Shen X, Li Z, Jiang Z, Zhan Y (2013) Distributed SVM classification with redundant Data removing. Green Computing and Communications (GreenCom), 2013 IEEE and Internet of Things (iThings/CPSCom), IEEE International Conference on and IEEE Cyber, Physical and Social Computing. IEEE, pp 866–870

  22. Koggalage R, Halgamuge S (2004) Reducing the number of training samples for fast support vector machine classification. Neural Inf Process Lett Rev 2:57–65

    Google Scholar 

  23. De Almeida MB, de Pádua Braga A, Braga JP (2000) SVM-KM: speeding SVMs learning with a priori cluster selection and k-means. In: Proceedings-2000. Sixth Brazilian Symposium on Neural Networks, pp 162–167

  24. Bang S, Jhun M (2014) Weighted support vector machine using k-means clustering. Commun Stat Simul Comput 43:2307–2324

    Article  MathSciNet  MATH  Google Scholar 

  25. Xu W, Dong L (2016) A novel relative density based support vector machine. Opt Int J Light Electron Opt 127:10348–10354

    Article  Google Scholar 

  26. Li C, Liu K, Wang H (2011) The incremental learning algorithm with support vector machine based on hyperplane-distance. Appl Intell 34:19–27

    Article  Google Scholar 

  27. Xia S, Xiong Z, Luo Y, Dong L (2015) A method to improve support vector machine based on distance to hyperplane. Opt Int J Light Electron Opt 126:2405–2410

    Article  Google Scholar 

  28. Sun Z, Guo Z, Liu C, Wang X, Liu J, Liu S (2017) Fast extended one-versus-rest multi-label support vector machine using approximate extreme points. IEEE Access 5:8526–8535

    Article  Google Scholar 

  29. Crisp DJ, Burges CJ (2000) A geometric interpretation of v-SVM classifiers. In: Advances in neural information processing systems, pp 244–250

  30. Mavroforakis ME, Sdralis M, Theodoridis S (2006) A novel SVM geometric algorithm based on reduced convex hulls. In: 18th International Conference on pattern Recognition (ICPR’06), pp 564–568

  31. Osuna E, De Castro O (2002) Convex hull in feature space for support vector machines. Springer, New York, pp 411–419

    MATH  Google Scholar 

  32. Mavroforakis ME, Theodoridis S (2006) A geometric approach to support vector machine (SVM) classification. IEEE Trans Neural Netw 17:671–682

    Article  Google Scholar 

  33. Chau AL, Li X, Yu W (2013) Large data sets classification using convex–concave hull and support vector machine. Soft Comput 17:793–804

    Article  Google Scholar 

  34. Chau AL, Li X, Yu W (2013) Convex-concave hull for classification with support vector machine. In: 2012 IEEE 12th International Conference on Data Mining Workshops, pp 431–438

  35. Nalepa J, Kawulok M (2018) Selecting training sets for support vector machines: a review. Artificial Intelligence Review, pp 1–44

  36. Nalepa J, Kawulok M (2014) Adaptive genetic algorithm to select training data for support vector machines. In: European Conference on the Applications of Evolutionary Computation, pp 514–525

  37. Kawulok M, Nalepa J (2012) Support vector machines training data selection using a genetic algorithm. In: Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR), pp 557–565

  38. Nalepa J, Kawulok M (2014) A memetic algorithm to select training data for support vector machines. In: Proceedings of the 2014 Annual Conference on Genetic and Evolutionary Computation, pp 573–580

  39. Nalepa J, Kawulok M (2016) Adaptive memetic algorithm for minimizing distance in the vehicle routing problem with time windows. Soft Comput 20:2309–2327

    Article  Google Scholar 

  40. Nalepa J, siminski K, Kawulok M (2015) Towards parameter-less support vector machines. In: ACPR, pp 211–215

  41. Barber CB, Dobkin DP, Huhdanpaa H (1996) The quickhull algorithm for convex hulls. ACM TOMS 22:469–483

    Article  MathSciNet  MATH  Google Scholar 

  42. Theodoridis S, Mavroforakis M (2007) Reduced convex hulls: a geometric approach to support vector machines [lecture notes]. IEEE Signal Process Mag 24:119–122

    Article  Google Scholar 

  43. Li Y, Wang Y, He G (2012) Clustering-based distributed support vector machine in wireless sensor networks. J Inf Comput Sci 9:1083–1096

    Google Scholar 

  44. De Berg M, Van Kreveld M, Overmars M, Schwarzkopf OC (2000) Computational geometry. In: Computational geometry. Springer, pp 1–17

  45. Li X, Cervantes J, Yu W (2010) A novel SVM classification method for large data sets. In: 2010 IEEE International Conference on Granular Computing, pp 297–302

  46. Wang J, Wu X, Zhang C (2005) Support vector machines based on K-means clustering for real-time business intelligence systems. Int J Bus Intell Data Min 1:54–64

    Article  Google Scholar 

  47. Inaba M, Katoh N, Imai H (1994) Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the tenth annual symposium on Computational geometry, pp 332–339

Download references

Acknowledgements

This work was partly supported by Institute for Information and communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2016-0-00133, Research on Edge computing via collective intelligence of hyperconnection IoT nodes), Korea, under the National Program for Excellence in SW supervised by the IITP (Institute for Information and communications Technology Promotion) (2015-0-00914), Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2016R1A6A3A11931385, Research of key technologies based on software defined wireless sensor network for real-time public safety service, 2017R1A2B2009095, Research on SDN-based WSN Supporting Real-time Stream Data Processing and Multiconnectivity), the second Brain Korea 21 PLUS Project, and Samsung Electronics.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hee Yong Youn.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Birzhandi, P., Youn, H.Y. CBCH (clustering-based convex hull) for reducing training time of support vector machine. J Supercomput 75, 5261–5279 (2019). https://doi.org/10.1007/s11227-019-02795-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02795-9

Keywords

Navigation