Abstract
Machine learning has been widely used to build intrusion detection models in detecting unknown attack traffic. How to train a model properly in order to attain the desired intrusion detection is an important topic. In contrast to offline learning, online learning proves more practical as it can update models simultaneously in the detecting process to comply with real network traffic. Active learning is an effective way to realize online learning. Among existing active learning mechanisms proposed to perform intrusion detection, most fail to meet the real online environment or to run persistently. This paper presents a new active online learning mechanism to secure better intrusion detection performance. The new mechanism advances related works in bringing the lifelong learning practice to fit in the online environment. It uses the efficient random forest (RF) as the detection model to train samples and adds a new tree to train a new batch of data when updating the model at each online stage, to pursue lifelong learning. By training a new batch of data only, it can keep the previously trained weights from being updated so as to preserve the past knowledge. Our mechanism is experimentally proved to yield better overall results than existing mechanisms: It produces superior training efficiency and detection performance—with the least training time, best training data quality and much reduced training data quantity.
Access this article
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.



















Similar content being viewed by others
Data availability
(A statement on how any datasets used can be accessed) All of the material is owned by the authors and/or no permissions are required.
References
Veeramachaneni K et al (2016) AI2: Training a big data machine to defend. In: Proceedings of the 2016 IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance and Smart Computing, and IEEE International Conference on Intelligent Data and Security, pp 49–54
Das S et al (2016) Incorporating expert feedback into active anomaly discovery. In: Proceedings of the 2016 IEEE 16th International Conference on Data Mining, pp 853–858
Sharma M et al (2016) Active learning with rationales for identifying operationally significant anomalies in aviation. In: Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 209–225
Dang Q-V (2020) Active learning for intrusion detection systems. In: Proceedings of the 2020 RIVF International Conference on Computing and Communication Technologies, pp 1–3
Chuang PJ, Wu KL (2021) Employing on-line training in SDN intrusion detection. J Inf Sci Eng 37(2):483–496
Ziai A (2021) Active learning for network intrusion detection. Data Sci Springer Singap 2021:3–14
Boukela L et al (2021) A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In: Proceedings of the 2021 International Conference on Security, Pattern Analysis, and Cybernetics, pp 374–379
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, pp 108–116
Chen Z, Liu B (2018) Lifelong machine learning. Synth Lectures Artif Intell Mach Learn 12(3):1–207
Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv 51(3):1–36
Breiman L (2001) Random forests Machine learning 45(1):5–32
Khalilian S et al (2020) Pcb defect detection using denoising convolutional autoencoders. In: Proceedings of the 2020 International Conference on Machine Vision and Image Processing
Yoo J, Lee H, Kwak N (2020) Unpriortized autoencoder for image generation. In: Proceedings of the 2020 IEEE International Conference on Image Processing, pp 763–767
Dhanabal L, Shantharajah SP (2015) A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. Int J Adv Res Computer Commun Eng 4(6):446–452
Mussmann S, Liang P (2018) On the relationship between data efficiency and error for uncertainty sampling. Proc Int Conf Mach Learn 2018:3674–3682
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, pp 103–114
Aggarwal CC, Kong X, Gu Q, Han J, Yu PS (2015) Active learning: a survey. Data classification: algorithms and applications, edited by C. C. Aggarwal, CRC Press, Ch. 22, pp 571–605
Khurana A, Verma OP (2023) Optimal feature selection for imbalanced text classification. IEEE Trans Artif Intell 4(1):135–147
Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713
Quinlan JR (1986) Induction of decision trees. Mach learn 1(1):81–106
Anaconda, The World’s Most Popular Data Science Platform, https://www.anaconda.com, 2022.
Spyder IDE, https://www.spyder-ide.org, 2022.
Scikit-learn: machine learning in Python, https://github.com/scikit-learn/scikit-learn, 2022.
Knapp SK (1990) Accelerate FPGA macros with one-hot approach. Electron Des 38(17):71–78
Receiver operating characteristic, https://en.wikipedia.org/wiki/Receiver_operating_characteristic, 2022.
Funding
(Details of any funding received) No funding.
Author information
Authors and Affiliations
Contributions
(Applicable for submissions with multiple authors) P-J Chuang and P-Y Huang wrote the main manuscript text, prepared all the figures and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
(Always applicable and includes interests of a financial or personal nature). No, I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
(Applicable for both human and/ or animal studies. Ethical committees, Internal Review Boards and guidelines followed must be named. When applicable, additional headings with statements on consent to participate and consent to publish are also required). This declaration is “not applicable”.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chuang, PJ., Huang, PY. Enhancing network intrusion detection by lifelong active online learning. J Supercomput 80, 16428–16451 (2024). https://doi.org/10.1007/s11227-024-06070-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-024-06070-4