Multiresolution hierarchical support vector machine for classification of large datasets

Alwajidi, Safaa; Yang, Li

doi:10.1007/s10115-022-01755-9

Multiresolution hierarchical support vector machine for classification of large datasets

Regular Paper
Published: 15 September 2022

Volume 64, pages 3447–3462, (2022)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Safaa Alwajidi¹ &
Li Yang²

219 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Support vector machine (SVM) is a popular supervised learning algorithm based on margin maximization. It has a high training cost and does not scale well to a large number of data points. We propose a multiresolution algorithm MRH-SVM that trains SVM on a hierarchical data aggregation structure, which also serves as a common data input to other learning algorithms. The proposed algorithm learns SVM models using high-level data aggregates and only visits data aggregates at more detailed levels where support vectors reside. In addition to performance improvements, the algorithm has advantages such as the ability to handle data streams and datasets with imbalanced classes. Experimental results show significant performance improvements in comparison with existing SVM algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

Article 30 August 2019

Xibin Dong, Zhiwen Yu, … Qianli Ma

A Review on Random Forest: An Ensemble Classifier

References

Alwajidi S, Yang L (2019) Multi-resolution hierarchical structure for efficient data aggregation and mining of big data. In: 2019 international conference on automation, computational and technology management (ICACTM), pp 153–159
Arun Kumar M, Gopal M (2010) A hybrid SVM based decision tree. Pattern Recognit 43(12):3977–3987
Article MATH Google Scholar
Bauer S, Köhler S, Doll K, Brunsmann U (2010) FPGA-GPU architecture for kernel SVM pedestrian detection. In: 2010 IEEE computer society conference on computer vision and pattern recognition—workshops, pp 61–68
Bordes A, Ertekin S, Weston J, Bottou L (2005) Fast kernel classifiers with online and active learning. J Mach Learn Res 6:1579–1619
MathSciNet MATH Google Scholar
Caruana G, Li M, Liu Y (2013) An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing 108:45–57
Article Google Scholar
Cervantes J, García Lamont F, López-Chau A, Rodríguez Mazahua L, Sergio Ruíz J (2015) Data selection based on decision tree for SVM classification on large data sets. Appl Soft Comput 37:787–798
Article Google Scholar
Chang EY (2011) PSVM: parallelizing support vector machines on distributed computers. Foundations of large-scale multimedia information management and retrieval. Springer, Berlin, pp 213–230
Chapter Google Scholar
Cieslak D, Chawla N, Striegel A (2006) Combating imbalance in network intrusion datasets. In: 2006 IEEE international conference on granular computing, Atlanta, GA, USA, pp 732–737
Dimitrov DV (2016) Medical Internet of Things and big data in healthcare. Healthc Inform Res 22(3):156–163
Article Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Fei B, Liu J (2006) Binary tree of SVM: a new fast multiclass training and classification algorithm. IEEE Trans Neural Netw 17(3):696–704
Article MathSciNet Google Scholar
Freire AL, Barreto GA, Veloso M, Varela AT (2009) Short-term memory mechanisms in neural network learning of robot navigation tasks: a case study. In: 2009 6th Latin American robotics symposium (LARS 2009), pp 1–6
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21(9):1263–1284
Article Google Scholar
Horng S-J, Su M-Y, Chen Y-H, Kao T-W, Chen R-J, Lai J-L, Perkasa CD (2011) A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Syst Appl 38(1):306–313
Article Google Scholar
Hsieh C-J, Si S, Dhillon IS (2014) A divide-and-conquer solver for kernel support vector machines. In: 2014 31st international conference on machine learning (ICML 2014), pp 566–574
Huerta R, Mosqueiro T, Fonollosa J, Rulkov NF, Rodriguez-Lujan I (2016) Online decorrelation of humidity and temperature in chemical sensors for continuous monitoring. Chemom Intell Lab Syst 157:169–176
Article Google Scholar
Imam T, Ting KM, Kamruzzaman J (2006) z-SVM: an SVM for improved classification of imbalanced data. In: Sattar A, Kang B-H (eds) 2006 19th Australian joint conference on artificial intelligence (AI 2006). Lecture notes in artificial intelligence, vol 4304. Springer, Berlin, pp 264–273
Google Scholar
Ju X, Tian Y (2018) A divide-and-conquer method for large scale \(\nu \)-nonparallel support vector machines. Neural Comput Appl 29(9):497–509
Article Google Scholar
Krishnan NC, Cook DJ (2014) Activity recognition on streaming sensor data. Pervasive Mob Comput 10(Pt B):138–154
Article Google Scholar
Kwapisz JR, Weiss GM, Moore SA (2010) Activity recognition using cell phone accelerometers. ACM SIGKDD Explor Newsl 12(2):74–82
Article Google Scholar
Madzarov G, Gjorgjevikj D (2009) Multi-class classification using support vector machines in decision tree architecture. In: IEEE EUROCON 2009, pp 288–295
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine Learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Platt JC (1998) Sequential minimal optimization: a fast algorithm for training support vector machines. Microsoft Technical Report MSR-TR-98-14
Qadri YA, Nauman A, Zikria YB, Vasilakos AV, Kim SW (2020) The future of healthcare internet of things: a survey of emerging technologies. IEEE Commun Surv Tutor 22(2):1121–1167
Article Google Scholar
Qolomany B, Al-Fuqaha A, Benhaddou D, Gupta A (2017) Role of deep LSTM neural networks and Wi-Fi networks in support of occupancy prediction in smart buildings. In: 2017 IEEE 19th international conference on high performance computing and communications; IEEE 15th international conference on smart city; IEEE 3rd international conference on data science and systems (HPCC/SmartCity/DSS), pp 50–57
Razzaghi T, Safro I (2015) Scalable multilevel support vector machines. Procedia Comput Sci 51:2683–2687
Article Google Scholar
Rossi ALD, Carvalho AC (2008) Bio-inspired optimization techniques for SVM parameter tuning. In: 2008 10th Brazilian symposium on neural networks, pp 57–62
Sadrfaridpour E, Jeereddy S, Kennedy K, Luckow A, Razzaghi T, Safro I (2017) Algebraic multigrid support vector machines. In: 2017 25th European symposium on artificial neural networks (ESANN), Bruges, Belgium, pp 35–40
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York
MATH Google Scholar
Wang J, Wang J, Zeng G, Tu Z, Gan R, Li S (2012) Scalable k-NN graph construction for visual descriptors. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, pp 1106–1113
Yang X, Song Q, Wang Y (2007) A weighted support vector machine for data classification. Int J Pattern Recognit Artif Intell 21(05):961–976
Article Google Scholar
Yu H, Yang J, Han J (2003) Classifying large data sets using SVMs with hierarchical clusters. In: 2003 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 306-315
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: 1996 ACM SIGMOD international conference on management of data, pp 103–114
Zhao Y, Wong ZS-Y, Tsui KL (2018) A framework of rebalancing imbalanced healthcare data for rare events classification: a case of look-alike sound-alike mix-up incident detection. J Healthc Eng 2018:1–11
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of North Carolina at Pembroke, 1 University Drive, Pembroke, NC, 28372, USA
Safaa Alwajidi
Department of Computer Science, Western Michigan University, 1903 West Michigan Avenue, Kalamazoo, MI, 49008, USA
Li Yang

Authors

Safaa Alwajidi
View author publications
You can also search for this author in PubMed Google Scholar
Li Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alwajidi, S., Yang, L. Multiresolution hierarchical support vector machine for classification of large datasets. Knowl Inf Syst 64, 3447–3462 (2022). https://doi.org/10.1007/s10115-022-01755-9

Download citation

Received: 08 April 2021
Revised: 20 August 2022
Accepted: 27 August 2022
Published: 15 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10115-022-01755-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multiresolution hierarchical support vector machine for classification of large datasets

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

A Review on Random Forest: An Ensemble Classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multiresolution hierarchical support vector machine for classification of large datasets

Abstract

Access this article

Similar content being viewed by others

A Systematic Review on Supervised and Unsupervised Machine Learning Algorithms for Data Science

A survey on ensemble learning

A Review on Random Forest: An Ensemble Classifier

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation