B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection

Chuang, Po-Jen; Huang, Pang-Yu

doi:10.1007/s11227-023-05171-w

B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection

Published: 22 March 2023

Volume 79, pages 13262–13286, (2023)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Po-Jen Chuang¹ &
Pang-Yu Huang¹

199 Accesses
Explore all metrics

Abstract

Data imbalance in network intrusion detection datasets tends to incur underfitting or deviation in classifier training. This investigation applies Batched Variational AutoEncoders (B-VAE) to generate a desirable data generation model which can balance intrusion detection datasets to enhance the detection practice. To improve insufficient VAE decoder training in the VAE approach, we apply B-VAE to train one decoder for each piece of data by a batched duplicated data and form multiple batched VAEs to provide sufficient decoder training. The unique practice of B-VAE makes the generated data all similar to but different from the original data, to secure desirable data balance for better classifier training and classification results. Experimental evaluation conducted to compare the performance of related balancing approaches shows that our B-VAE outperforms others in that it is able to maintain the same classification accuracy (in terms of F1-scores) regardless of any Imbalance Ratio (IR) change. Specifically, B-VAE manages to solve the problem of insufficient decoder training in existing approaches and so to enhance the intrusion detection performance—mainly because it can secure balanced data generation to lift the classification accuracy due to sufficient decoder training and utilization of exact features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

A survey on federated learning: challenges and applications

Article 11 November 2022

Data collection and quality challenges in deep learning: a data-centric AI perspective

Article 03 January 2023

Data availability

All of the material is owned by the authors and/or no permissions are required.

References

Chuang P-J, Wu D-Y (2019) Applying deep learning to balancing network intrusion detection datasets. In Proceedings of the 2019 IEEE 11th International Conference on Advanced Infocomm Technology, pp. 213–217
Tavallaee M, Bagheri E, Lu W, Ghorbani AA (2009) A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2nd IEEE International Conference on Computational Intelligence for Security and Defense Applications, 2009, pp. 53–58
NSL-KDD dataset, https://www.unb.ca/cic/datasets/nsl.html, 2022.
Chawla NV, Bowyer KW, Hall LO, Kegelmeyerm WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
Blagus R, Lusa L (2013) SMOTE for high-dimensional class-imbalanced data. BMC Bioinformatics 14(1):106–121
Article Google Scholar
Fernandez A, Garcia S, Herrera F, Chawla NV (2018) Smote for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. J Artif Intell Res 61(1):863–905
Article MathSciNet MATH Google Scholar
Rosadi D et al., (2021) Improving machine learning prediction of peatlands fire occurrence for unbalanced data using SMOTE approach. In: Proceedings of the 2021 International Conference on Data Science, Artificial Intelligence, and Business Analytics, 2021, pp. 160–163
Dablain D, Krawczyk B, Chawla NV (2022) DeepSMOTE: Fusing deep learning and SMOTE for imbalanced datal. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2021.3136503
Article Google Scholar
Khurana A, Verma OP (2023) Optimal feature selection for imbalanced text classification. IEEE Trans Artif Intell 4(1):135–147
Article Google Scholar
Dinh PV et al., (2017) Deep learning combined with de-noising data for network intrusion detection. In: Proceedings of the 2017 21st Asia Pacific Symposium on Intelligent and Evolutionary Systems, 2017, pp. 55–60
Potluri S and Diedrich C (2016) Accelerated deep neural networks for enhanced intrusion detection system. In: Proceedings of the 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation, 2016, pp. 1–8.
Doersch C (2016), Tutorial on variational autoencoders. arXiv:1606.05908 [stat.ML], pp. 1–23.
Yang H, Qiu RC, Shi X, and He X (2018) Deep learning architecture for voltage stability evaluation in smart grid based on variational autoencoders. arXiv:1808.05762 [eess.SP], pp. 1–9
Simon D (2008) Biogeography-based optimization. IEEE Trans Evol Comput 12(6):702–713
Article Google Scholar
Wesche T, Goertler G, Hubert W (1987) Modified habitat suitability index model for brown trout in southeastern Wyoming. North Am J Fisheries Manag 7:232–237
Article Google Scholar
Anaconda, The World’s Most Popular Data Science Platform, https://www.anaconda.com, 2022.
Spyder IDE, https://www.spyder-ide.org, 2022.
Resende PAA, Drummond AC (2018) A survey of random forest based methods for intrusion detection systems. ACM Comput Surv 51(3):1–36
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Scikit-learn: machine learning in Python, https://github.com/scikit-learn/scikit-learn, 2022.
SMOTE-variants for imbalanced learning, https://github.com/analyticalmindsltd /smote_variants, 2022.
DeepSMOTE, https://github.com/dd1github/DeepSMOTE, 2022.
BBO : https://github.com/aroshanineshat/BBO-Python, 2022.
Xiao H, Rasul K, and Vollgraf R, (2017) Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747, 2017
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. In: Proceedings of the 4th International Conference on Information Systems Security and Privacy, 2018, pp. 108–116
Precision and recall, https://en.wikipedia.org/wiki/Precision_and_recall, 2022.
Chuang P-J, Wu K-L (2021) Employing on-line training in SDN intrusion detection. J Inf Sci Eng 37(2):483–496
Google Scholar
Boukela L, Zhang G, Yacoub M, and Bouzefrane S (2021) A near-autonomous and incremental intrusion detection system through active learning of known and unknown attacks. In: Proceedings of the 2021 International Conference on Security, Pattern Analysis, and Cybernetics, 2021, pp. 374–379

Download references

Funding

No funding.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Tamkang University, Tamsui, New Taipei City, 25137, Taiwan, R. O. C.
Po-Jen Chuang & Pang-Yu Huang

Authors

Po-Jen Chuang
View author publications
You can also search for this author in PubMed Google Scholar
Pang-Yu Huang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P-JC and P-YH wrote the main manuscript text, prepared all the figures and reviewed the manuscript.

Corresponding author

Correspondence to Po-Jen Chuang.

Ethics declarations

Conflict of interests

No, I declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical Approval

Ethical committees, Internal Review Boards and guidelines followed must be named. When applicable, additional headings with statements on consent to participate and consent to publish are also required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chuang, PJ., Huang, PY. B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection. J Supercomput 79, 13262–13286 (2023). https://doi.org/10.1007/s11227-023-05171-w

Download citation

Accepted: 07 March 2023
Published: 22 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s11227-023-05171-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A survey on federated learning: challenges and applications

Data collection and quality challenges in deep learning: a data-centric AI perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

B-VAE: a new dataset balancing approach using batched Variational AutoEncoders to enhance network intrusion detection

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

A survey on federated learning: challenges and applications

Data collection and quality challenges in deep learning: a data-centric AI perspective

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interests

Ethical Approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation