Skip to main content
Log in

Semi-supervised machine learning framework for network intrusion detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Network intrusion detection plays an important role as tools for managing and identifying potential threats, which presents various challenges. Redundant features and difficult marking in data cause a long-term problem in network traffic detection. In this paper, we propose a semi-supervised machine learning framework based on multi-strategy feature filtering, principal component analysis (PCA), and an improved Tri-Light Gradient Boosting Machine (Tri-LightGBM) based on stratified sampling. This multi-strategy feature filtering method employing Fisher score and Information gain can select features that have good category discrimination and are more relevant to category labels. After that, we combine PCA to convert multiple features into comprehensive features, which are used as the input of the Tri-LightGBM model. Tri-LightGBM can exploit unlabeled data cooperatively and maintain a large disagreement among the base learners. Moreover, we propose a stratified sampling based on labeled categories to reduce the probability of being selected as the same category during the model update process. Thus, the Tri-LightGBM based on stratified sampling can compensate for the classification error rate caused by the imbalance of the dataset. The semi-supervised machine learning framework is evaluated on two intrusion detection evaluation datasets, namely UNSW-NB15 and CIC-IDS-2017. The evaluation results show that the multi-strategy feature filtering method can increase the accuracy, recall, precision, and F-measure by up to 0.5%, and reduce the false-positive rate by up to 0.5%. Furthermore, the precision rate of minority categories can be increased by about 1–2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Injadat M, Moubayed A, Nassif AB, Shami A (2020) Multi-stage optimized machine learning framework for network intrusion detection. IEEE Trans Netw Service Manag 18(2):1803–1816

    Article  Google Scholar 

  2. Ambusaidi MA, He X, Nanda P, Tan Z (2016) Building an intrusion detection system using a filter-based feature selection algorithm. IEEE Trans Comput 65(10):2986–2998

    Article  MathSciNet  Google Scholar 

  3. Choi H, Kim M, Lee G, Kim W (2019) Unsupervised learning approach for network intrusion detection system using autoencoders. J Supercomput 75(9):5597–5621

    Article  Google Scholar 

  4. Camacho J, Macia-Fernandez G, Fuentes-García NM, Saccenti E (2019) Semi-supervised multivariate statistical network monitoring for learning security threats. IEEE Trans Inform Forensics Security 14(8):2179–2189

    Article  Google Scholar 

  5. El-Khatib K (2009) Impact of feature reduction on the efficiency of wireless intrusion detection systems. IEEE Trans Parallel Distributed Syst 21(8):1143–1149

    Article  Google Scholar 

  6. Zhang H, Li J-L, Liu X-M, Dong C (2021) Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection. Future Generation Comput Syst 122:130–143

    Article  Google Scholar 

  7. Kumar G (2020) An improved ensemble approach for effective intrusion detection. J Supercomput 76(1):275–291

    Article  Google Scholar 

  8. Zhang, H., Li, J.: A new network intrusion detection based on semi-supervised dimensionality reduction and tri-lightgbm. In: 2020 International Conference on Pervasive Artificial Intelligence (ICPAI), pp. 35–40 (2020). IEEE

  9. Moustafa N, Slay J, Creech G (2017) Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks. IEEE Trans Big Data 5(4):481–494

    Article  Google Scholar 

  10. Pontes C, Souza M, Gondim J, Bishop M, Marotta M (2021) A new method for flow-based network intrusion detection using the inverse potts model. IEEE Trans Netw Service Manag 18(2):1125–1136

    Article  Google Scholar 

  11. Chapelle, O., Scholkopf, B., Zien, A.: Semi-supervised learning (chapelle, o. et al., eds.; 2006)[book reviews]. IEEE Transactions on Neural Networks 20(3), 542–542 (2009)

  12. Xie, Q., Luong, M.-T., Hovy, E., Le, Q.V.: Self-training with noisy student improves imagenet classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10687–10698 (2020)

  13. Qureshi AS, Khan A, Shamim N, Durad MH (2020) Intrusion detection using deep sparse auto-encoder and self-taught learning. Neural Comput Appl 32(8):3135–3147

    Article  Google Scholar 

  14. Zhao F, Zhang H, Peng J, Zhuang X, Na S-G (2020) A semi-self-taught network intrusion detection system. Neural Comput Appl 32(23):17169–17179

    Article  Google Scholar 

  15. Li W, Meng W, Luo X, Kwok LF (2016) Mvpsys: Toward practical multi-view based false alarm reduction system in network intrusion detection. Comput Security 60:177–192

    Article  Google Scholar 

  16. Bennett, K., Demiriz, A., et al.: Semi-supervised support vector machines. Advances in Neural Information processing systems, 368–374 (1999)

  17. Mousavi, A., Ghidary, S.S., Karimi, Z.: Semi-supervised intrusion detection via online laplacian twin support vector machine. In: 2015 Signal Processing and Intelligent Systems Conference (SPIS), pp. 138–142 (2015). IEEE

  18. Li C, Zhu J, Zhang B (2017) Max-margin deep generative models for (semi-) supervised learning. IEEE Trans Pattern Anal Mach Intell 40(11):2762–2775

    Article  Google Scholar 

  19. Zhao Y, Ball R, Mosesian J, de Palma J-F, Lehman B (2014) Graph-based semi-supervised learning for fault detection and classification in solar photovoltaic arrays. IEEE Trans Power Electron 30(5):2848–2858

    Article  Google Scholar 

  20. Balaanand M, Karthikeyan N, Karthik S, Varatharajan R, Manogaran G, Sivaparthipan C (2019) An enhanced graph-based semi-supervised learning algorithm to detect fake users on twitter. J Supercomput 75(9):6085–6105

    Article  Google Scholar 

  21. Al-Jarrah OY, Al-Hammdi Y, Yoo PD, Muhaidat S, Al-Qutayri M (2018) Semi-supervised multi-layered clustering model for intrusion detection. Digital Commun Netw 4(4):277–286

    Article  Google Scholar 

  22. Versaci M, Angiulli G, di Barba P, Morabito FC (2020) Joint use of eddy current imaging and fuzzy similarities to assess the integrity of steel plates. Open Phys 18(1):230–240

    Article  Google Scholar 

  23. Gao Y, Liu Y, Jin Y, Chen J, Wu H (2018) A novel semi-supervised learning approach for network intrusion detection on cloud-based robotic system. IEEE Access 6:50927–50938

    Article  Google Scholar 

  24. Li W, Meng W, Au MH (2020) Enhancing collaborative intrusion detection via disagreement-based semi-supervised learning in iot environments. J Netw Comput Appl 161

  25. Yuan Y, Huo L, Yuan Y, Wang Z (2019) Semi-supervised tri-adaboost algorithm for network intrusion detection. Int J Distributed Sens Netw 15(6):1550147719846052

  26. D’hooge, L., Verkerken, M., Wauters, T., Volckaert, B., De Turck, F.: Hierarchical feature block ranking for data-efficient intrusion detection modeling. Computer Networks 201, 108613 (2021)

  27. Dong S-Y, Kim B-K, Lee S-Y (2015) Eeg-based classification of implicit intention during self-relevant sentence reading. IEEE Trans Cybernet 46(11):2535–2542

  28. Li Y, Liu Z (2005) Information entropy-based viewpoint planning for 3-d object reconstruction. IEEE Trans Robot 21(3):324–337

    Article  Google Scholar 

  29. Yang, J., Zhang, D., Frangi, A.F., Yang, J.-y.: Two-dimensional pca: a new approach to appearance-based face representation and recognition. IEEE Transactions on pattern analysis and machine intelligence 26(1), 131–137 (2004)

  30. Martinez AM, Kak AC (2001) Pca versus lda. IEEE Trans Pattern Anal Mach Intell 23(2):228–233

    Article  Google Scholar 

  31. Zhou Z-H, Li M (2005) Tri-training: Exploiting unlabeled data using three classifiers. IEEE Trans knowl Data Eng 17(11):1529–1541

  32. Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: A highly efficient gradient boosting decision tree. Advances Neural Inform Processing Syst 30:3146–3154

    Google Scholar 

  33. Moorthy SMK, Calders K, Vicari MB, Verbeeck H (2019) Improved supervised learning-based approach for leaf and wood classification from lidar point clouds of forests. IEEE Trans Geosci Remote Sens 58(5):3057–3070

  34. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370

    Google Scholar 

  35. Ring M, Wunderlich S, Scheuring D, Landes D, Hotho A (2019) A survey of network-based intrusion detection data sets. Comput Secur 86:147–167

    Article  Google Scholar 

  36. Koroniotis N, Moustafa N, Sitnikova E (2020) A new network forensic framework based on deep learning for internet of things networks: A particle deep framework. Future Generation Comput Syst 110:91–106

    Article  Google Scholar 

  37. Moustafa N, Choo K-KR, Radwan I, Camtepe S (2019) Outlier dirichlet mixture mechanism: Adversarial statistical learning for anomaly detection in the fog. IEEE Trans Inform Forensics Security 14(8):1975–1987

    Article  Google Scholar 

  38. D’hooge, L., Wauters, T., Volckaert, B., De Turck, F.: Inter-dataset generalization strength of supervised machine learning methods for intrusion detection. J Inform Sec Appl 54, 102564 (2020)

  39. Shi, N., Yuan, X., Hernandez, J., Roy, K., Esterline, A.: Self-learning semi-supervised machine learning for network intrusion detection. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI), pp. 59–64 (2018). IEEE

  40. Gu Y, Li K, Guo Z, Wang Y (2019) Semi-supervised k-means ddos detection method using hybrid feature selection algorithm. IEEE Access 7:64351–64365

    Article  Google Scholar 

  41. Shah, S., Muhuri, P.S., Yuan, X., Roy, K., Chatterjee, P.: Implementing a network intrusion detection system using semi-supervised support vector machine and random forest. In: Proceedings of the 2021 ACM Southeast Conference, pp. 180–184 (2021)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant U1804263 and 61877010, the Natural Science Foundation of Fujian Province China under Grant 2021J01616, 2020J01130167, and 2021J01625, and the Joint Straits Fund of Key Program of the National Natural Science Foundation of China under Grant U1705262.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Zhang, H., Liu, Y. et al. Semi-supervised machine learning framework for network intrusion detection. J Supercomput 78, 13122–13144 (2022). https://doi.org/10.1007/s11227-022-04390-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04390-x

Keywords

Navigation