Semi-supervised machine learning framework for network intrusion detection

Li, Jieling; Zhang, Hao; Liu, Yanhua; Liu, Zhihuang

doi:10.1007/s11227-022-04390-x

Semi-supervised machine learning framework for network intrusion detection

Published: 14 March 2022

Volume 78, pages 13122–13144, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jieling Li^1,2,
Hao Zhang ORCID: orcid.org/0000-0002-2092-074X^1,2,
Yanhua Liu^1,2 &
…
Zhihuang Liu^1,2

1074 Accesses
Explore all metrics

Abstract

Network intrusion detection plays an important role as tools for managing and identifying potential threats, which presents various challenges. Redundant features and difficult marking in data cause a long-term problem in network traffic detection. In this paper, we propose a semi-supervised machine learning framework based on multi-strategy feature filtering, principal component analysis (PCA), and an improved Tri-Light Gradient Boosting Machine (Tri-LightGBM) based on stratified sampling. This multi-strategy feature filtering method employing Fisher score and Information gain can select features that have good category discrimination and are more relevant to category labels. After that, we combine PCA to convert multiple features into comprehensive features, which are used as the input of the Tri-LightGBM model. Tri-LightGBM can exploit unlabeled data cooperatively and maintain a large disagreement among the base learners. Moreover, we propose a stratified sampling based on labeled categories to reduce the probability of being selected as the same category during the model update process. Thus, the Tri-LightGBM based on stratified sampling can compensate for the classification error rate caused by the imbalance of the dataset. The semi-supervised machine learning framework is evaluated on two intrusion detection evaluation datasets, namely UNSW-NB15 and CIC-IDS-2017. The evaluation results show that the multi-strategy feature filtering method can increase the accuracy, recall, precision, and F-measure by up to 0.5%, and reduce the false-positive rate by up to 0.5%. Furthermore, the precision rate of minority categories can be increased by about 1–2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MIM: A multiple integration model for intrusion detection on imbalanced samples

Article 10 July 2024

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers

Article 10 April 2024

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Injadat M, Moubayed A, Nassif AB, Shami A (2020) Multi-stage optimized machine learning framework for network intrusion detection. IEEE Trans Netw Service Manag 18(2):1803–1816
Article Google Scholar
Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
Article MathSciNet Google Scholar
Choi H, Kim M, Lee G, Kim W (2019) Unsupervised learning approach for network intrusion detection system using autoencoders. J Supercomput 75(9):5597–5621
Article Google Scholar
Camacho J, Macia-Fernandez G, Fuentes-García NM, Saccenti E (2019) Semi-supervised multivariate statistical network monitoring for learning security threats. IEEE Trans Inform Forensics Security 14(8):2179–2189
Article Google Scholar
El-Khatib K (2009) Impact of feature reduction on the efficiency of wireless intrusion detection systems. IEEE Trans Parallel Distributed Syst 21(8):1143–1149
Article Google Scholar
Zhang H, Li J-L, Liu X-M, Dong C (2021) Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection. Future Generation Comput Syst 122:130–143
Article Google Scholar
Kumar G (2020) An improved ensemble approach for effective intrusion detection. J Supercomput 76(1):275–291
Article Google Scholar
Zhang, H., Li, J.: A new network intrusion detection based on semi-supervised dimensionality reduction and tri-lightgbm. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), pp. 35–40 (2020). IEEE
Moustafa N, Slay J, Creech G (2017) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Trans Big Data 5(4):481–494
Article Google Scholar
Pontes C, Souza M, Gondim J, Bishop M, Marotta M (2021) A new method for flow-based network intrusion detection using the inverse potts model. IEEE Trans Netw Service Manag 18(2):1125–1136
Article Google Scholar
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks 20(3), 542–542 (2009)
Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)
Qureshi AS, Khan A, Shamim N, Durad MH (2020) Intrusion detection using deep sparse auto-encoder and self-taught learning. Neural Comput Appl 32(8):3135–3147
Article Google Scholar
Zhao F, Zhang H, Peng J, Zhuang X, Na S-G (2020) A semi-self-taught network intrusion detection system. Neural Comput Appl 32(23):17169–17179
Article Google Scholar
Li W, Meng W, Luo X, Kwok LF (2016) Mvpsys: Toward practical multi-view based false alarm reduction system in network intrusion detection. Comput Security 60:177–192
Article Google Scholar
Bennett, K., Demiriz, A., et al.: Semi-supervised support vector machines. Advances in Neural Information processing systems, 368–374 (1999)
Mousavi, A., Ghidary, S.S., Karimi, Z.: Semi-supervised intrusion detection via online laplacian twin support vector machine. In: 2015 Signal Processing and Intelligent Systems Conference (SPIS), pp. 138–142 (2015). IEEE
Li C, Zhu J, Zhang B (2017) Max-margin deep generative models for (semi-) supervised learning. IEEE Trans Pattern Anal Mach Intell 40(11):2762–2775
Article Google Scholar
Zhao Y, Ball R, Mosesian J, de Palma J-F, Lehman B (2014) Graph-based semi-supervised learning for fault detection and classification in solar photovoltaic arrays. IEEE Trans Power Electron 30(5):2848–2858
Article Google Scholar
Balaanand M, Karthikeyan N, Karthik S, Varatharajan R, Manogaran G, Sivaparthipan C (2019) An enhanced graph-based semi-supervised learning algorithm to detect fake users on twitter. J Supercomput 75(9):6085–6105
Article Google Scholar
Al-Jarrah OY, Al-Hammdi Y, Yoo PD, Muhaidat S, Al-Qutayri M (2018) Semi-supervised multi-layered clustering model for intrusion detection. Digital Commun Netw 4(4):277–286
Article Google Scholar
Versaci M, Angiulli G, di Barba P, Morabito FC (2020) Joint use of eddy current imaging and fuzzy similarities to assess the integrity of steel plates. Open Phys 18(1):230–240
Article Google Scholar
Gao Y, Liu Y, Jin Y, Chen J, Wu H (2018) A novel semi-supervised learning approach for network intrusion detection on cloud-based robotic system. IEEE Access 6:50927–50938
Article Google Scholar
Li W, Meng W, Au MH (2020) Enhancing collaborative intrusion detection via disagreement-based semi-supervised learning in iot environments. J Netw Comput Appl 161
Yuan Y, Huo L, Yuan Y, Wang Z (2019) Semi-supervised tri-adaboost algorithm for network intrusion detection. Int J Distributed Sens Netw 15(6):1550147719846052
D’hooge, L., Verkerken, M., Wauters, T., Volckaert, B., De Turck, F.: Hierarchical feature block ranking for data-efficient intrusion detection modeling. Computer Networks 201, 108613 (2021)
Dong S-Y, Kim B-K, Lee S-Y (2015) Eeg-based classification of implicit intention during self-relevant sentence reading. IEEE Trans Cybernet 46(11):2535–2542
Li Y, Liu Z (2005) Information entropy-based viewpoint planning for 3-d object reconstruction. IEEE Trans Robot 21(3):324–337
Article Google Scholar
Yang, J., Zhang, D., Frangi, A.F., Yang, J.-y.: Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Transactions on pattern analysis and machine intelligence 26(1), 131–137 (2004)
Martinez AM, Kak AC (2001) Pca versus lda. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Article Google Scholar
Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans knowl Data Eng 17(11):1529–1541
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances Neural Inform Processing Syst 30:3146–3154
Google Scholar
Moorthy SMK, Calders K, Vicari MB, Verbeeck H (2019) Improved supervised learning-based approach for leaf and wood classification from lidar point clouds of forests. IEEE Trans Geosci Remote Sens 58(5):3057–3070
Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370
Google Scholar
Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167
Article Google Scholar
Koroniotis N, Moustafa N, Sitnikova E (2020) A new network forensic framework based on deep learning for internet of things networks: A particle deep framework. Future Generation Comput Syst 110:91–106
Article Google Scholar
Moustafa N, Choo K-KR, Radwan I, Camtepe S (2019) Outlier dirichlet mixture mechanism: Adversarial statistical learning for anomaly detection in the fog. IEEE Trans Inform Forensics Security 14(8):1975–1987
Article Google Scholar
D’hooge, L., Wauters, T., Volckaert, B., De Turck, F.: Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. J Inform Sec Appl 54, 102564 (2020)
Shi, N., Yuan, X., Hernandez, J., Roy, K., Esterline, A.: Self-learning semi-supervised machine learning for network intrusion detection. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 59–64 (2018). IEEE
Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised k-means ddos detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365
Article Google Scholar
Shah, S., Muhuri, P.S., Yuan, X., Roy, K., Chatterjee, P.: Implementing a network intrusion detection system using semi-supervised support vector machine and random forest. In: Proceedings of the 2021 ACM Southeast Conference, pp. 180–184 (2021)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant U1804263 and 61877010, the Natural Science Foundation of Fujian Province China under Grant 2021J01616, 2020J01130167, and 2021J01625, and the Joint Straits Fund of Key Program of the National Natural Science Foundation of China under Grant U1705262.

Author information

Authors and Affiliations

College of Computer and Data Science, Fuzhou University, Fuzhou, 350116, China
Jieling Li, Hao Zhang, Yanhua Liu & Zhihuang Liu
Fujian Key Laboratory of Network Computing and Intelligent Information Processing, Fuzhou University, Fuzhou, 350116, China
Jieling Li, Hao Zhang, Yanhua Liu & Zhihuang Liu

Authors

Jieling Li
View author publications
You can also search for this author inPubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Yanhua Liu
View author publications
You can also search for this author inPubMed Google Scholar
Zhihuang Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Hao Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Zhang, H., Liu, Y. et al. Semi-supervised machine learning framework for network intrusion detection. J Supercomput 78, 13122–13144 (2022). https://doi.org/10.1007/s11227-022-04390-x

Download citation

Accepted: 19 February 2022
Published: 14 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11227-022-04390-x

Keywords

Profiles

Hao Zhang View author profile
Zhihuang Liu View author profile

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Semi-supervised machine learning framework for network intrusion detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MIM: A multiple integration model for intrusion detection on imbalanced samples

Machine Learning-Based Hybrid Feature Selection for Improvised Network Intrusion Detection

Enhancing network intrusion detection: a dual-ensemble approach with CTGAN-balanced data and weak classifiers

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles

Subscribe and save

Buy Now