Skip to main content

Advertisement

Log in

Enhancing network intrusion detection by lifelong active online learning

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Machine learning has been widely used to build intrusion detection models in detecting unknown attack traffic. How to train a model properly in order to attain the desired intrusion detection is an important topic. In contrast to offline learning, online learning proves more practical as it can update models simultaneously in the detecting process to comply with real network traffic. Active learning is an effective way to realize online learning. Among existing active learning mechanisms proposed to perform intrusion detection, most fail to meet the real online environment or to run persistently. This paper presents a new active online learning mechanism to secure better intrusion detection performance. The new mechanism advances related works in bringing the lifelong learning practice to fit in the online environment. It uses the efficient random forest (RF) as the detection model to train samples and adds a new tree to train a new batch of data when updating the model at each online stage, to pursue lifelong learning. By training a new batch of data only, it can keep the previously trained weights from being updated so as to preserve the past knowledge. Our mechanism is experimentally proved to yield better overall results than existing mechanisms: It produces superior training efficiency and detection performance—with the least training time, best training data quality and much reduced training data quantity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availability

(A statement on how any datasets used can be accessed) All of the material is owned by the authors and/or no permissions are required.

References

  1. Veeramachaneni K et al (2016) AI2: Training a big data machine to defend. In: Proceedings of the 2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, and IEEE International Conference on Intelligent Data and Security, pp 49–54

  2. Das S et al (2016) Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 2016 IEEE 16th International Conference on Data Mining, pp 853–858

  3. Sharma M et al (2016) Active learning with rationales for identifying operationally significant anomalies in aviation. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 209–225

  4. Dang Q-V (2020) Active learning for intrusion detection systems. In: Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies, pp 1–3

  5. Chuang PJ, Wu KL (2021) Employing on-line training in SDN intrusion detection. J Inf Sci Eng 37(2):483–496

    Google Scholar 

  6. Ziai A (2021) Active learning for network intrusion detection. Data Sci Springer Singap 2021:3–14

    Google Scholar 

  7. Boukela L et al (2021) A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In: Proceedings of the 2021 International Conference on Security, Pattern Analysis, and Cybernetics, pp 374–379

  8. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, pp 108–116

  9. Chen Z, Liu B (2018) Lifelong machine learning. Synth Lectures Artif Intell Mach Learn 12(3):1–207

    Article  Google Scholar 

  10. Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv 51(3):1–36

    Article  Google Scholar 

  11. Breiman L (2001) Random forests Machine learning 45(1):5–32

    Article  Google Scholar 

  12. Khalilian S et al (2020) Pcb defect detection using denoising convolutional autoencoders. In: Proceedings of the 2020 International Conference on Machine Vision and Image Processing

  13. Yoo J, Lee H, Kwak N (2020) Unpriortized autoencoder for image generation. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp 763–767

  14. Dhanabal L, Shantharajah SP (2015) A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int J Adv Res Computer Commun Eng 4(6):446–452

    Google Scholar 

  15. Mussmann S, Liang P (2018) On the relationship between data efficiency and error for uncertainty sampling. Proc Int Conf Mach Learn 2018:3674–3682

    Google Scholar 

  16. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp 103–114

  17. Aggarwal CC, Kong X, Gu Q, Han J, Yu PS (2015) Active learning: a survey. Data classification: algorithms and applications, edited by C. C. Aggarwal, CRC Press, Ch. 22, pp 571–605

  18. Khurana A, Verma OP (2023) Optimal feature selection for imbalanced text classification. IEEE Trans Artif Intell 4(1):135–147

    Article  Google Scholar 

  19. Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713

    Article  Google Scholar 

  20. Quinlan JR (1986) Induction of decision trees. Mach learn 1(1):81–106

    Article  Google Scholar 

  21. Anaconda, The World’s Most Popular Data Science Platform, https://www.anaconda.com, 2022.

  22. Spyder IDE, https://www.spyder-ide.org, 2022.

  23. Scikit-learn: machine learning in Python, https://github.com/scikit-learn/scikit-learn, 2022.

  24. Knapp SK (1990) Accelerate FPGA macros with one-hot approach. Electron Des 38(17):71–78

    Google Scholar 

  25. Receiver operating characteristic, https://en.wikipedia.org/wiki/Receiver_operating_characteristic, 2022.

Download references

Funding

(Details of any funding received) No funding.

Author information

Authors and Affiliations

Authors

Contributions

(Applicable for submissions with multiple authors) P-J Chuang and P-Y Huang wrote the main manuscript text, prepared all the figures and reviewed the manuscript.

Corresponding author

Correspondence to Po-Jen Chuang.

Ethics declarations

Conflict of interest

(Always applicable and includes interests of a financial or personal nature). No, I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

(Applicable for both human and/ or animal studies. Ethical committees, Internal Review Boards and guidelines followed must be named. When applicable, additional headings with statements on consent to participate and consent to publish are also required). This declaration is “not applicable”.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuang, PJ., Huang, PY. Enhancing network intrusion detection by lifelong active online learning. J Supercomput 80, 16428–16451 (2024). https://doi.org/10.1007/s11227-024-06070-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-024-06070-4

Keywords