A network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing

Zhang, Hao; Xiao, Zude; Gu, Jason; Liu, Yanhua

doi:10.1007/s11227-023-05474-y

A network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing

Published: 15 June 2023

Volume 79, pages 20445–20480, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Hao Zhang^1,2,
Zude Xiao^1,2,
Jason Gu³ &
…
Yanhua Liu^1,2

465 Accesses
3 Citations
Explore all metrics

Abstract

With the rapid development of network technology, the Internet has brought significant convenience to various sectors of society, holding a prominent position. Due to the unpredictable and severe consequences resulting from malicious attacks, the detection of anomalous network traffic has garnered considerable attention from researchers over the past few decades. Accurately labeling a sufficient amount of network traffic data as a training dataset within a short period of time is a challenging task, given the rapid and massive generation of network traffic data. Furthermore, the proportion of malicious attack traffic is relatively small compared to the overall traffic data, and the distribution of traffic data across different types of malicious attacks also varies significantly. To address the aforementioned challenges, this paper presents a novel network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing. Building upon the assumption of consistent distribution between labeled and unlabeled data, this paper introduces the multiclass split balancing strategy and the adaptive confidence threshold function. These innovative approaches aim to tackle the issue of the multiclass imbalanced in traffic data. By leveraging the mutually beneficial relationship between semi-supervised learning and ensemble learning, this paper presents the collaborative rotation forest algorithm. This algorithm is specifically designed to enhance performance of anomaly detection in an environment with label inadequacy. Several comparative experiments conducted on the NSL-KDD, UNSW-NB15, and ToN-IoT demonstrate that the proposed algorithm achieves significant improvements in performance. Specifically, it enhances precision by 1.5–5.7%, recall by 1.5−5.7%, and F-Measure by 1.4−4.3% compared to the state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Autoperman: Automatic Network Traffic Anomaly Detection with Ensemble Learning

A Hybrid Framework for Class-Imbalanced Classification

Unbalanced Data Oversampling Method for Traffic Multi-classification in Convergence Network

Availability of data and materials

Not applicable.

References

Moustafa N, Hu J, Slay J (2019) A holistic review of network anomaly detection systems: a comprehensive survey. J Netw Comput Appl 128:33–55
Google Scholar
Mishra P, Varadharajan V, Tupakula U, Pilli ES (2019) A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Commun Surv Tutor 21(1):686–728
Google Scholar
García-Teodoro P, Díaz-Verdejo J, Maciá-Fernández G, Vázquez E (2009) Anomaly-based network intrusion detection: techniques, systems and challenges. Comput Secur 28(1):18–28
Google Scholar
Bhuyan MH, Bhattacharyya DK, Kalita JK (2014) Network anomaly detection: methods, systems and tools. IEEE Commun Surv Tutor 16(1):303–336
Google Scholar
Pajouh HH, Javidan R, Khayami R, Dehghantanha A, Choo K-KR (2019) A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Trans Emerg Top Comput 7(2):314–323
Google Scholar
Zhou Y, Cheng G, Jiang S, Dai M (2020) Building an efficient intrusion detection system based on feature selection and ensemble classifier. Comput Netw 174:107247
Google Scholar
Gu J, Lu S (2021) An effective intrusion detection approach using SVM with naïve Bayes feature embedding. Comput Secur 103:102158
Google Scholar
Zhou Y, Mazzuchi TA, Sarkani S (2020) M-AdaBoost-A based ensemble system for network intrusion detection. Expert Syst Appl 162:113864
Google Scholar
Li X, Zhu M, Yang LT, Xu M, Ma Z, Zhong C, Li H, Xiang Y (2021) Sustainable ensemble learning driving intrusion detection model. IEEE Trans Dependable Secure Comput 18(4):1591–1604
Google Scholar
Panigrahi R, Borah S, Bhoi AK, Ijaz MF, Pramanik M, Kumar Y, Jhaveri RH (2021) A consolidated decision tree-based intrusion detection system for binary and multiclass imbalanced datasets. Mathematics 9(7):751
Google Scholar
Al-Jarrah OY, Al-Hammdi Y, Yoo PD, Muhaidat S, Al-Qutayri M (2018) Semi-supervised multi-layered clustering model for intrusion detection. Digital Commun Netw 4(4):277–286
Google Scholar
Rathore S, Park JH (2018) Semi-supervised learning based distributed attack detection framework for IoT. Appl Soft Comput 72:79–89
Google Scholar
Camacho J, Maciá-Fernández G, Fuentes-García NM, Saccenti E (2019) Semi-supervised multivariate statistical network monitoring for learning security threats. IEEE Trans Inf Forensics Secur 14(8):2179–2189
Google Scholar
Yao H, Fu D, Zhang P, Li M, Liu Y (2019) MSML: a novel multilevel semi-supervised machine learning framework for intrusion detection system. IEEE Internet Things J 6(2):1949–1959
Google Scholar
Li W, Meng W, Au MH (2020) Enhancing collaborative intrusion detection via disagreement-based semi-supervised learning in IoT environments. J Netw Comput Appl 161:102631
Google Scholar
Marteau P-F (2021) Random partitioning forest for point-wise and collective anomaly detection-application to network intrusion detection. IEEE Trans Inf Forensics Secur 16:2157–2172
Google Scholar
Carrasco RSM, Sicilia M-A (2018) Unsupervised intrusion detection through skip-gram models of network behavior. Comput Secur 78:187–197
Google Scholar
Li X, Chen W, Zhang Q, Wu L (2020) Building auto-encoder intrusion detection system based on random forest feature selection. Comput Secur 95:101851
Google Scholar
Liang W, Li K-C, Long J, Kui X, Zomaya AY (2020) An industrial network intrusion detection algorithm based on multifeature data clustering optimization model. IEEE Trans Industr Inf 16(3):2063–2071
Google Scholar
Binbusayyis A, Vaiyapuri T (2021) Unsupervised deep learning approach for network intrusion detection combining convolutional autoencoder and one-class SVM. Appl Intell 51(10):7094–7108
Google Scholar
Ahmed M, Naser Mahmood A, Hu J (2016) A survey of network anomaly detection techniques. J Netw Comput Appl 60:19–31
Google Scholar
Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167
Google Scholar
Joachims T (1999) Transductive inference for text classi cation using support vector machines. In: Icml, vol 99, pp 200–209
Yuan Y, Li X, Wang Q, Nie F (2021) A semi-supervised learning algorithm via adaptive Laplacian graph. Neurocomputing 426:162–173
Google Scholar
Calder J, Cook B, Thorpe M, Slepcev D (2020) Poisson Learning: Graph Based Semi-Supervised Learning At Very Low Label Rates. In: Proceedings of the 37th International Conference on Machine Learning, pp 1306–1316. PMLR.
Mallapragada PK, Jin R, Jain AK, Liu Y (2009) SemiBoost: boosting for semi-supervised learning. IEEE Trans Pattern Anal Mach Intell 31(11):2000–2014
Google Scholar
Chen K, Wang S (2011) Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions. IEEE Trans Pattern Anal Mach Intell 33(1):129–143
Google Scholar
Tanha J (2018) MSSBoost: a new multiclass boosting to semi-supervised learning. Neurocomputing 314:251–266
Google Scholar
Li M, Zhou Z-H (2007) Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Trans Syst Man Cybern Part A Syst Humans 37(6):1088–1098
Google Scholar
Gu S, Jin Y (2017) Multi-train: a semi-supervised heterogeneous ensemble classifier. Neurocomputing 249:202–211
Google Scholar
de Vries S, Thierens D (2021) A reliable ensemble based approach to semi-supervised learning. Knowl-Based Syst 215:106738
Google Scholar
Lu J, Gong Y (2021) A co-training method based on entropy and multi-criteria. Appl Intell 51(6):3212–3225
Google Scholar
Zhou Z-H, Li M (2005) Tri-training: exploiting unlabeled data using three classifiers. IEEE Trans Knowl Data Eng 17(11):1529–1541
Google Scholar
Zhou Z-H (2009) When semi-supervised learning meets ensemble learning. In: Benediktsson JA, Kittler J, Roli F (eds) Multiple classifier systems. Lecture Notes in Computer Science, pp 529–538. Springer, Heidelberg
Liang XW, Jiang AP, Li T, Xue YY, Wang GT (2020) LR-SMOTE - an improved unbalanced data set oversampling based on K-means and SVM. Knowl-Based Syst 196:105845
Google Scholar
Soltanzadeh P, Hashemzadeh M (2021) RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inf Sci 542:92–111
MathSciNet MATH Google Scholar
Tsai C-F, Lin W-C, Hu Y-H, Yao G-T (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54
Google Scholar
Guzmán-Ponce A, Sánchez JS, Valdovinos RM, Marcial-Romero JR (2021) DBIG-US: a two-stage under-sampling algorithm to face the class imbalance problem. Expert Syst Appl 168:114301
Google Scholar
Halimu C, Kasem A (2021) Split Balancing (sBal)-A Data Preprocessing Sampling Technique for Ensemble Methods for Binary Classification in Imbalanced Datasets. In: Alfred R, Iida H, Haviluddin H, Anthony P (eds) Computational science and technology. Lecture notes in electrical engineering, pp 241–257. Springer, Singapore
Iranmehr A, Masnadi-Shirazi H, Vasconcelos N (2019) Cost-sensitive support vector machines. Neurocomputing 343:50–64
Google Scholar
Wong ML, Seng K, Wong PK (2020) Cost-sensitive ensemble of stacked denoising autoencoders for class imbalance problems in business domain. Expert Syst Appl 141:112918
Google Scholar
Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl-Based Syst 158:81–93
Google Scholar
Yang K, Yu Z, Wen X, Cao W, Chen CLP, Wong H-S, You J (2020) Hybrid classifier ensemble for imbalanced data. IEEE Trans Neural Netw Learn Syst 31(4):1387–1400
MathSciNet Google Scholar
Razavi-Far R, Farajzadeh-Zanajni M, Wang B, Saif M, Chakrabarti S (2021) Imputation-based ensemble techniques for class imbalance learning. IEEE Trans Knowl Data Eng 33(5):1988–2001
Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
MATH Google Scholar
Thabtah F, Hammoud S, Kamalov F, Gonsalves A (2020) Data imbalance in classification: experimental evaluation. Inf Sci 513:429–441
MathSciNet Google Scholar
Rodriguez JJ, Kuncheva LI, Alonso CJ (2006) Rotation forest: a new classifier ensemble method. IEEE Trans Pattern Anal Mach Intell 28(10):1619–1630
Google Scholar
Sagi O, Rokach L (2018) Ensemble learning: A survey. WIREs Data Mining and Knowledge Discovery 8(4)
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, pp 1–6
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 Military Communications and Information Systems Conference (MilCIS), pp 1–6
Moustafa N (2021) A new distributed architecture for evaluating AI-based security systems at the edge: network TON_iot datasets. Sustain Cities Soc 72:102994
Google Scholar
KDD Cup 1999 Data. http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html Accessed 20 May 2022
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: 33rd Annual Meeting of the Association for Computational Linguistics, pp 189–196
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pp 92–100

Download references

Funding

This work was supported in part by the Fund of the China Scholarship Council, the National Natural Science Foundation of China under Grants U1804263 and 61877010, the Natural Science Foundation of Fujian Province China under Grants 2021J01616, 2020J01130167 and 2021J01625, and the Joint Straits Fund of Key Program of the National Natural Science Foundation of China under Grant U1705262.

Author information

Authors and Affiliations

College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, China
Hao Zhang, Zude Xiao & Yanhua Liu
Fujian Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou, 350116, China
Hao Zhang, Zude Xiao & Yanhua Liu
Department of Electrical and Computer Engineering, Dalhousie University, Halifax, NS, B3J 1Z1, Canada
Jason Gu

Authors

Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zude Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jason Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yanhua Liu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

HZ contributed to the conception of the study and performed the data analyses. ZX performed the experiment and wrote the main manuscript text. JG contributed significantly to analysis and manuscript preparation. YL helped perform the analysis with constructive discussions. All authors reviewed the manuscript.

Corresponding author

Correspondence to Yanhua Liu.

Ethics declarations

Ethical approval

Applicable for both human and/ or animal studies.

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, H., Xiao, Z., Gu, J. et al. A network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing. J Supercomput 79, 20445–20480 (2023). https://doi.org/10.1007/s11227-023-05474-y

Download citation

Accepted: 01 June 2023
Published: 15 June 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s11227-023-05474-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing

Abstract

Access this article

Similar content being viewed by others

Autoperman: Automatic Network Traffic Anomaly Detection with Ensemble Learning

A Hybrid Framework for Class-Imbalanced Classification

Unbalanced Data Oversampling Method for Traffic Multi-classification in Convergence Network

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A network anomaly detection algorithm based on semi-supervised learning and adaptive multiclass balancing

Abstract

Access this article

Similar content being viewed by others

Autoperman: Automatic Network Traffic Anomaly Detection with Ensemble Learning

A Hybrid Framework for Class-Imbalanced Classification

Unbalanced Data Oversampling Method for Traffic Multi-classification in Convergence Network

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical approval

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation