Skip to main content
Log in

Multiresolution hierarchical support vector machine for classification of large datasets

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Support vector machine (SVM) is a popular supervised learning algorithm based on margin maximization. It has a high training cost and does not scale well to a large number of data points. We propose a multiresolution algorithm MRH-SVM that trains SVM on a hierarchical data aggregation structure, which also serves as a common data input to other learning algorithms. The proposed algorithm learns SVM models using high-level data aggregates and only visits data aggregates at more detailed levels where support vectors reside. In addition to performance improvements, the algorithm has advantages such as the ability to handle data streams and datasets with imbalanced classes. Experimental results show significant performance improvements in comparison with existing SVM algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Alwajidi S, Yang L (2019) Multi-resolution hierarchical structure for efficient data aggregation and mining of big data. In: 2019 international conference on automation, computational and technology management (ICACTM), pp 153–159

  2. Arun Kumar M, Gopal M (2010) A hybrid SVM based decision tree. Pattern Recognit 43(12):3977–3987

    Article  MATH  Google Scholar 

  3. Bauer S, Köhler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for kernel SVM pedestrian detection. In: 2010 IEEE computer society conference on computer vision and pattern recognition—workshops, pp 61–68

  4. Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619

    MathSciNet  MATH  Google Scholar 

  5. Caruana G, Li M, Liu Y (2013) An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing 108:45–57

    Article  Google Scholar 

  6. Cervantes J, García Lamont F, López-Chau A, Rodríguez Mazahua L, Sergio Ruíz J (2015) Data selection based on decision tree for SVM classification on large data sets. Appl Soft Comput 37:787–798

    Article  Google Scholar 

  7. Chang EY (2011) PSVM: parallelizing support vector machines on distributed computers. Foundations of large-scale multimedia information management and retrieval. Springer, Berlin, pp 213–230

    Chapter  Google Scholar 

  8. Cieslak D, Chawla N, Striegel A (2006) Combating imbalance in network intrusion datasets. In: 2006 IEEE international conference on granular computing, Atlanta, GA, USA, pp 732–737

  9. Dimitrov DV (2016) Medical Internet of Things and big data in healthcare. Healthc Inform Res 22(3):156–163

    Article  Google Scholar 

  10. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  11. Fei B, Liu J (2006) Binary tree of SVM: a new fast multiclass training and classification algorithm. IEEE Trans Neural Netw 17(3):696–704

    Article  MathSciNet  Google Scholar 

  12. Freire AL, Barreto GA, Veloso M, Varela AT (2009) Short-term memory mechanisms in neural network learning of robot navigation tasks: a case study. In: 2009 6th Latin American robotics symposium (LARS 2009), pp 1–6

  13. He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284

    Article  Google Scholar 

  14. Horng S-J, Su M-Y, Chen Y-H, Kao T-W, Chen R-J, Lai J-L, Perkasa CD (2011) A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst Appl 38(1):306–313

    Article  Google Scholar 

  15. Hsieh C-J, Si S, Dhillon IS (2014) A divide-and-conquer solver for kernel support vector machines. In: 2014 31st international conference on machine learning (ICML 2014), pp 566–574

  16. Huerta R, Mosqueiro T, Fonollosa J, Rulkov NF, Rodriguez-Lujan I (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176

    Article  Google Scholar 

  17. Imam T, Ting KM, Kamruzzaman J (2006) z-SVM: an SVM for improved classification of imbalanced data. In: Sattar A, Kang B-H (eds) 2006 19th Australian joint conference on artificial intelligence (AI 2006). Lecture notes in artificial intelligence, vol 4304. Springer, Berlin, pp 264–273

    Google Scholar 

  18. Ju X, Tian Y (2018) A divide-and-conquer method for large scale \(\nu \)-nonparallel support vector machines. Neural Comput Appl 29(9):497–509

    Article  Google Scholar 

  19. Krishnan NC, Cook DJ (2014) Activity recognition on streaming sensor data. Pervasive Mob Comput 10(Pt B):138–154

    Article  Google Scholar 

  20. Kwapisz JR, Weiss GM, Moore SA (2010) Activity recognition using cell phone accelerometers. ACM SIGKDD Explor Newsl 12(2):74–82

    Article  Google Scholar 

  21. Madzarov G, Gjorgjevikj D (2009) Multi-class classification using support vector machines in decision tree architecture. In: IEEE EUROCON 2009, pp 288–295

  22. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine Learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  23. Platt JC (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft Technical Report MSR-TR-98-14

  24. Qadri YA, Nauman A, Zikria YB, Vasilakos AV, Kim SW (2020) The future of healthcare internet of things: a survey of emerging technologies. IEEE Commun Surv Tutor 22(2):1121–1167

    Article  Google Scholar 

  25. Qolomany B, Al-Fuqaha A, Benhaddou D, Gupta A (2017) Role of deep LSTM neural networks and Wi-Fi networks in support of occupancy prediction in smart buildings. In: 2017 IEEE 19th international conference on high performance computing and communications; IEEE 15th international conference on smart city; IEEE 3rd international conference on data science and systems (HPCC/SmartCity/DSS), pp 50–57

  26. Razzaghi T, Safro I (2015) Scalable multilevel support vector machines. Procedia Comput Sci 51:2683–2687

    Article  Google Scholar 

  27. Rossi ALD, Carvalho AC (2008) Bio-inspired optimization techniques for SVM parameter tuning. In: 2008 10th Brazilian symposium on neural networks, pp 57–62

  28. Sadrfaridpour E, Jeereddy S, Kennedy K, Luckow A, Razzaghi T, Safro I (2017) Algebraic multigrid support vector machines. In: 2017 25th European symposium on artificial neural networks (ESANN), Bruges, Belgium, pp 35–40

  29. Vapnik VN (1995) The nature of statistical learning theory. Springer, New York

    Book  MATH  Google Scholar 

  30. Vapnik VN (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

  31. Wang J, Wang J, Zeng G, Tu Z, Gan R, Li S (2012) Scalable k-NN graph construction for visual descriptors. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, pp 1106–1113

  32. Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(05):961–976

    Article  Google Scholar 

  33. Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: 2003 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 306-315

  34. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: 1996 ACM SIGMOD international conference on management of data, pp 103–114

  35. Zhao Y, Wong ZS-Y, Tsui KL (2018) A framework of rebalancing imbalanced healthcare data for rare events classification: a case of look-alike sound-alike mix-up incident detection. J Healthc Eng 2018:1–11

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alwajidi, S., Yang, L. Multiresolution hierarchical support vector machine for classification of large datasets. Knowl Inf Syst 64, 3447–3462 (2022). https://doi.org/10.1007/s10115-022-01755-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01755-9

Keywords

Navigation