Skip to main content
Log in

B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Data imbalance in network intrusion detection datasets tends to incur underfitting or deviation in classifier training. This investigation applies Batched Variational AutoEncoders (B-VAE) to generate a desirable data generation model which can balance intrusion detection datasets to enhance the detection practice. To improve insufficient VAE decoder training in the VAE approach, we apply B-VAE to train one decoder for each piece of data by a batched duplicated data and form multiple batched VAEs to provide sufficient decoder training. The unique practice of B-VAE makes the generated data all similar to but different from the original data, to secure desirable data balance for better classifier training and classification results. Experimental evaluation conducted to compare the performance of related balancing approaches shows that our B-VAE outperforms others in that it is able to maintain the same classification accuracy (in terms of F1-scores) regardless of any Imbalance Ratio (IR) change. Specifically, B-VAE manages to solve the problem of insufficient decoder training in existing approaches and so to enhance the intrusion detection performance—mainly because it can secure balanced data generation to lift the classification accuracy due to sufficient decoder training and utilization of exact features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Data availability

All of the material is owned by the authors and/or no permissions are required.

References

  1. Chuang P-J, Wu D-Y (2019) Applying deep learning to balancing network intrusion detection datasets. In Proceedings of the 2019 IEEE 11th International Conference on Advanced Infocomm Technology, pp. 213–217

  2. Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, 2009, pp. 53–58

  3. NSL-KDD dataset, https://www.unb.ca/cic/datasets/nsl.html, 2022.

  4. Chawla NV, Bowyer KW, Hall LO, Kegelmeyerm WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  5. Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14(1):106–121

    Article  Google Scholar 

  6. Fernandez A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61(1):863–905

    Article  MathSciNet  MATH  Google Scholar 

  7. Rosadi D et al., (2021) Improving machine learning prediction of peatlands fire occurrence for unbalanced data using SMOTE approach. In: Proceedings of the 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, 2021, pp. 160–163

  8. Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: Fusing deep learning and SMOTE for imbalanced datal. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3136503

    Article  Google Scholar 

  9. Khurana A, Verma OP (2023) Optimal feature selection for imbalanced text classification. IEEE Trans Artif Intell 4(1):135–147

    Article  Google Scholar 

  10. Dinh PV et al., (2017) Deep learning combined with de-noising data for network intrusion detection. In: Proceedings of the 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems, 2017, pp. 55–60

  11. Potluri S and Diedrich C (2016) Accelerated deep neural networks for enhanced intrusion detection system. In: Proceedings of the 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation, 2016, pp. 1–8.

  12. Doersch C (2016), Tutorial on variational autoencoders. arXiv:1606.05908 [stat.ML], pp. 1–23.

  13. Yang H, Qiu RC, Shi X, and He X (2018) Deep learning architecture for voltage stability evaluation in smart grid based on variational autoencoders. arXiv:1808.05762 [eess.SP], pp. 1–9

  14. Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713

    Article  Google Scholar 

  15. Wesche T, Goertler G, Hubert W (1987) Modified habitat suitability index model for brown trout in southeastern Wyoming. North Am J Fisheries Manag 7:232–237

    Article  Google Scholar 

  16. Anaconda, The World’s Most Popular Data Science Platform, https://www.anaconda.com, 2022.

  17. Spyder IDE, https://www.spyder-ide.org, 2022.

  18. Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv 51(3):1–36

    Article  Google Scholar 

  19. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  20. Scikit-learn: machine learning in Python, https://github.com/scikit-learn/scikit-learn, 2022.

  21. SMOTE-variants for imbalanced learning, https://github.com/analyticalmindsltd /smote_variants, 2022.

  22. DeepSMOTE, https://github.com/dd1github/DeepSMOTE, 2022.

  23. BBO : https://github.com/aroshanineshat/BBO-Python, 2022.

  24. Xiao H, Rasul K, and Vollgraf R, (2017) Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, 2017

  25. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, 2018, pp. 108–116

  26. Precision and recall, https://en.wikipedia.org/wiki/Precision_and_recall, 2022.

  27. Chuang P-J, Wu K-L (2021) Employing on-line training in SDN intrusion detection. J Inf Sci Eng 37(2):483–496

    Google Scholar 

  28. Boukela L, Zhang G, Yacoub M, and Bouzefrane S (2021) A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In: Proceedings of the 2021 International Conference on Security, Pattern Analysis, and Cybernetics, 2021, pp. 374–379

Download references

Funding

No funding.

Author information

Authors and Affiliations

Authors

Contributions

P-JC and P-YH wrote the main manuscript text, prepared all the figures and reviewed the manuscript.

Corresponding author

Correspondence to Po-Jen Chuang.

Ethics declarations

Conflict of interests

No, I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical Approval

Ethical committees, Internal Review Boards and guidelines followed must be named. When applicable, additional headings with statements on consent to participate and consent to publish are also required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chuang, PJ., Huang, PY. B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection. J Supercomput 79, 13262–13286 (2023). https://doi.org/10.1007/s11227-023-05171-w

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05171-w

Keywords

Navigation