VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss

Ding, Hongwei; Sun, Yu; Huang, Nana; Cui, Xiaohui

doi:10.1007/s00521-023-09180-x

VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss

Original Article
Published: 25 November 2023

Volume 36, pages 2883–2899, (2024)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Hongwei Ding^1,2^na1,
Yu Sun^1,3^na1,
Nana Huang² &
…
Xiaohui Cui ORCID: orcid.org/0000-0002-0851-1994^1,2

567 Accesses
Explore all metrics

Abstract

The purpose of imbalanced data classification is to solve the problem of unfair learning caused by the large difference in data distribution. Traditional classifiers are designed on the basis of balanced data, but the performance of imbalanced data will decline sharply. Therefore, balancing the majority class and minority class samples before classification is a popular strategy for solving imbalanced learning. Current methods for data balance mainly include oversampling and undersampling. However, the existing undersampling will face the problem of losing important sample information, while oversampling cannot effectively fit the global distribution and generate noise. In recent years, generative adversarial network (GAN) has shown great potential in fitting real sample distributions. Based on this, this paper proposes an improved GAN and biased loss combined model, namely VGAN-BL, to solve the learning problem under imbalanced conditions. In the improvement based on GAN, VAE is used to generate latent vectors with posterior distribution as the input of GAN, and KL similarity measurement loss is introduced into the generator to improve the quality of minority samples generated by GAN. In addition, we propose a biased loss definition method based on the discriminator to improve the performance of classifier. Experiments on 20 real datasets show that the classification performance of the proposed method is significantly improved compared with other advanced methods. The source code can be found here: https://github.com/universuen/VGAN-BL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data Augment in Imbalanced Learning Based on Generative Adversarial Networks

Imbalanced data classification based on diverse sample generation and classifier fusion

Article 12 April 2021

SMOTE oversampling algorithm based on generative adversarial network

Article 25 February 2025

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data used in this article are a common dataset for imbalanced learning, which can be found here: http://www.keel.es/.

References

Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47–70
Article Google Scholar
Mirzaei B, Nikpour B, Nezamabadi-pour H (2021) CDBH: a clustering and density-based hybrid approach for imbalanced data classification. Expert Syst Appl 164:114035
Article Google Scholar
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Google Scholar
Sun Z, Song Q, Zhu X, Sun H, Baowen X, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Article Google Scholar
Hao X, Jiang Z, Xiao Q, Wang Q, Yao Y, Liu B, Liu J (2021) Producing more with less: a GAN-based network attack detection approach for imbalanced data. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 384–390
Zhang W, Peng P, Zhang H (2021) Using bidirectional GAN with improved training architecture for imbalanced tasks. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 714–719
García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl-Based Syst 25(1):13–21
Article Google Scholar
Douzas G, Bacao F (2017) Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl 82:40–52
Article Google Scholar
Zhaozhao X, Shen D, Nie T, Kou Y, Yin N, Han X (2021) A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf Sci 572:574–589
Article MathSciNet Google Scholar
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
Article Google Scholar
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328
Lee D, Kim K (2021) An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data. Expert Syst Appl 184:115442
Article Google Scholar
Koziarski M (2021) Potential anchoring for imbalanced data classification. Pattern recognition, p 108114
Das B, Krishnan NC, Cook DJ (2014) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234
Article Google Scholar
Dongdong L, Ziqiu C, Bolu W, Zhe W, Hai Y, Wenli D (2021) Entropy-based hybrid sampling ensemble learning for imbalanced data. Int J Intell Syst
Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251
Article Google Scholar
He Y, Lin F, Tzeng NF (2021) Interpretable minority synthesis for imbalanced classification. In: Proceedings of the thirtieth international joint conference on artificial intelligence
Choi J, Yi KM, Kim J, Choo J, Kim B, Chang J, Gwon Y, Chang HJ (2021) Vab-al: incorporating class imbalance and difficulty with variational Bayes for active learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6749–6758
Adiban M, Siniscalchi SM, Salvi G (2023) A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing 537:296–308
Article Google Scholar
Lan ZC, Huang GY, Li YP, Rho S, Vimal S, Chen BW (2022) Conquering insufficient/imbalanced data learning for the internet of medical things. Neural Computing and Applications, pp 1–10
Zhu B, Pan X, Broucke S, Xiao J (2022) A GAN-based hybrid sampling method for imbalanced customer classification. Inf Sci 609:1397–1411
Article Google Scholar
Son M, Jung S, Jung S, Hwang E (2021) BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing. J Supercomput 1–25
Teng H, Wang C, Yang Q, Chen X, Li R (2023) Leveraging adversarial augmentation on imbalance data for online trading fraud detection. IEEE Trans Comput Soc Syst
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of Wasserstein GANs. Preprint arXiv:1704.00028
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. Preprint arXiv:1802.05957
Kingma DP, Welling M (2013) Auto-encoding variational Bayes. Preprint arXiv:1312.6114
Gu Q, Cai Z, Zhu L, Huang B (2008) Data mining on imbalanced data sets. In: 2008 International conference on advanced computer theory and engineering, IEEE, pp 1020–1024
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS), IEEE, pp 1–6
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116
Google Scholar

Download references

Acknowledgements

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. This research was supported by the National Key R &D Program of China (No.2018YFC1604000).

Author information

Hongwei Ding and Yu Sun contributed equally to this work.

Authors and Affiliations

School of Cyber Science and Engineering, Wuhan University, Wuhan, China
Hongwei Ding, Yu Sun & Xiaohui Cui
Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan University, Wuhan, China
Hongwei Ding, Nana Huang & Xiaohui Cui
School of Computing, National University of Singapore, Singapore, Singapore
Yu Sun

Authors

Hongwei Ding
View author publications
You can also search for this author inPubMed Google Scholar
Yu Sun
View author publications
You can also search for this author inPubMed Google Scholar
Nana Huang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaohui Cui
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiaohui Cui.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ding, H., Sun, Y., Huang, N. et al. VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss. Neural Comput & Applic 36, 2883–2899 (2024). https://doi.org/10.1007/s00521-023-09180-x

Download citation

Received: 12 April 2022
Accepted: 20 October 2023
Published: 25 November 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s00521-023-09180-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Data Augment in Imbalanced Learning Based on Generative Adversarial Networks

Imbalanced data classification based on diverse sample generation and classifier fusion

SMOTE oversampling algorithm based on generative adversarial network

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now