Abstract
Network intrusion detection plays an important role as tools for managing and identifying potential threats, which presents various challenges. Redundant features and difficult marking in data cause a long-term problem in network traffic detection. In this paper, we propose a semi-supervised machine learning framework based on multi-strategy feature filtering, principal component analysis (PCA), and an improved Tri-Light Gradient Boosting Machine (Tri-LightGBM) based on stratified sampling. This multi-strategy feature filtering method employing Fisher score and Information gain can select features that have good category discrimination and are more relevant to category labels. After that, we combine PCA to convert multiple features into comprehensive features, which are used as the input of the Tri-LightGBM model. Tri-LightGBM can exploit unlabeled data cooperatively and maintain a large disagreement among the base learners. Moreover, we propose a stratified sampling based on labeled categories to reduce the probability of being selected as the same category during the model update process. Thus, the Tri-LightGBM based on stratified sampling can compensate for the classification error rate caused by the imbalance of the dataset. The semi-supervised machine learning framework is evaluated on two intrusion detection evaluation datasets, namely UNSW-NB15 and CIC-IDS-2017. The evaluation results show that the multi-strategy feature filtering method can increase the accuracy, recall, precision, and F-measure by up to 0.5%, and reduce the false-positive rate by up to 0.5%. Furthermore, the precision rate of minority categories can be increased by about 1–2%.








Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Injadat M, Moubayed A, Nassif AB, Shami A (2020) Multi-stage optimized machine learning framework for network intrusion detection. IEEE Trans Netw Service Manag 18(2):1803–1816
Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998
Choi H, Kim M, Lee G, Kim W (2019) Unsupervised learning approach for network intrusion detection system using autoencoders. J Supercomput 75(9):5597–5621
Camacho J, Macia-Fernandez G, Fuentes-García NM, Saccenti E (2019) Semi-supervised multivariate statistical network monitoring for learning security threats. IEEE Trans Inform Forensics Security 14(8):2179–2189
El-Khatib K (2009) Impact of feature reduction on the efficiency of wireless intrusion detection systems. IEEE Trans Parallel Distributed Syst 21(8):1143–1149
Zhang H, Li J-L, Liu X-M, Dong C (2021) Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection. Future Generation Comput Syst 122:130–143
Kumar G (2020) An improved ensemble approach for effective intrusion detection. J Supercomput 76(1):275–291
Zhang, H., Li, J.: A new network intrusion detection based on semi-supervised dimensionality reduction and tri-lightgbm. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), pp. 35–40 (2020). IEEE
Moustafa N, Slay J, Creech G (2017) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Trans Big Data 5(4):481–494
Pontes C, Souza M, Gondim J, Bishop M, Marotta M (2021) A new method for flow-based network intrusion detection using the inverse potts model. IEEE Trans Netw Service Manag 18(2):1125–1136
Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks 20(3), 542–542 (2009)
Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)
Qureshi AS, Khan A, Shamim N, Durad MH (2020) Intrusion detection using deep sparse auto-encoder and self-taught learning. Neural Comput Appl 32(8):3135–3147
Zhao F, Zhang H, Peng J, Zhuang X, Na S-G (2020) A semi-self-taught network intrusion detection system. Neural Comput Appl 32(23):17169–17179
Li W, Meng W, Luo X, Kwok LF (2016) Mvpsys: Toward practical multi-view based false alarm reduction system in network intrusion detection. Comput Security 60:177–192
Bennett, K., Demiriz, A., et al.: Semi-supervised support vector machines. Advances in Neural Information processing systems, 368–374 (1999)
Mousavi, A., Ghidary, S.S., Karimi, Z.: Semi-supervised intrusion detection via online laplacian twin support vector machine. In: 2015 Signal Processing and Intelligent Systems Conference (SPIS), pp. 138–142 (2015). IEEE
Li C, Zhu J, Zhang B (2017) Max-margin deep generative models for (semi-) supervised learning. IEEE Trans Pattern Anal Mach Intell 40(11):2762–2775
Zhao Y, Ball R, Mosesian J, de Palma J-F, Lehman B (2014) Graph-based semi-supervised learning for fault detection and classification in solar photovoltaic arrays. IEEE Trans Power Electron 30(5):2848–2858
Balaanand M, Karthikeyan N, Karthik S, Varatharajan R, Manogaran G, Sivaparthipan C (2019) An enhanced graph-based semi-supervised learning algorithm to detect fake users on twitter. J Supercomput 75(9):6085–6105
Al-Jarrah OY, Al-Hammdi Y, Yoo PD, Muhaidat S, Al-Qutayri M (2018) Semi-supervised multi-layered clustering model for intrusion detection. Digital Commun Netw 4(4):277–286
Versaci M, Angiulli G, di Barba P, Morabito FC (2020) Joint use of eddy current imaging and fuzzy similarities to assess the integrity of steel plates. Open Phys 18(1):230–240
Gao Y, Liu Y, Jin Y, Chen J, Wu H (2018) A novel semi-supervised learning approach for network intrusion detection on cloud-based robotic system. IEEE Access 6:50927–50938
Li W, Meng W, Au MH (2020) Enhancing collaborative intrusion detection via disagreement-based semi-supervised learning in iot environments. J Netw Comput Appl 161
Yuan Y, Huo L, Yuan Y, Wang Z (2019) Semi-supervised tri-adaboost algorithm for network intrusion detection. Int J Distributed Sens Netw 15(6):1550147719846052
D’hooge, L., Verkerken, M., Wauters, T., Volckaert, B., De Turck, F.: Hierarchical feature block ranking for data-efficient intrusion detection modeling. Computer Networks 201, 108613 (2021)
Dong S-Y, Kim B-K, Lee S-Y (2015) Eeg-based classification of implicit intention during self-relevant sentence reading. IEEE Trans Cybernet 46(11):2535–2542
Li Y, Liu Z (2005) Information entropy-based viewpoint planning for 3-d object reconstruction. IEEE Trans Robot 21(3):324–337
Yang, J., Zhang, D., Frangi, A.F., Yang, J.-y.: Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Transactions on pattern analysis and machine intelligence 26(1), 131–137 (2004)
Martinez AM, Kak AC (2001) Pca versus lda. IEEE Trans Pattern Anal Mach Intell 23(2):228–233
Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans knowl Data Eng 17(11):1529–1541
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances Neural Inform Processing Syst 30:3146–3154
Moorthy SMK, Calders K, Vicari MB, Verbeeck H (2019) Improved supervised learning-based approach for leaf and wood classification from lidar point clouds of forests. IEEE Trans Geosci Remote Sens 58(5):3057–3070
Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370
Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167
Koroniotis N, Moustafa N, Sitnikova E (2020) A new network forensic framework based on deep learning for internet of things networks: A particle deep framework. Future Generation Comput Syst 110:91–106
Moustafa N, Choo K-KR, Radwan I, Camtepe S (2019) Outlier dirichlet mixture mechanism: Adversarial statistical learning for anomaly detection in the fog. IEEE Trans Inform Forensics Security 14(8):1975–1987
D’hooge, L., Wauters, T., Volckaert, B., De Turck, F.: Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. J Inform Sec Appl 54, 102564 (2020)
Shi, N., Yuan, X., Hernandez, J., Roy, K., Esterline, A.: Self-learning semi-supervised machine learning for network intrusion detection. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 59–64 (2018). IEEE
Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised k-means ddos detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365
Shah, S., Muhuri, P.S., Yuan, X., Roy, K., Chatterjee, P.: Implementing a network intrusion detection system using semi-supervised support vector machine and random forest. In: Proceedings of the 2021 ACM Southeast Conference, pp. 180–184 (2021)
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant U1804263 and 61877010, the Natural Science Foundation of Fujian Province China under Grant 2021J01616, 2020J01130167, and 2021J01625, and the Joint Straits Fund of Key Program of the National Natural Science Foundation of China under Grant U1705262.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, J., Zhang, H., Liu, Y. et al. Semi-supervised machine learning framework for network intrusion detection. J Supercomput 78, 13122–13144 (2022). https://doi.org/10.1007/s11227-022-04390-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04390-x
Keywords
Profiles
- Hao Zhang View author profile
- Zhihuang Liu View author profile