Skip to main content
Log in

VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The purpose of imbalanced data classification is to solve the problem of unfair learning caused by the large difference in data distribution. Traditional classifiers are designed on the basis of balanced data, but the performance of imbalanced data will decline sharply. Therefore, balancing the majority class and minority class samples before classification is a popular strategy for solving imbalanced learning. Current methods for data balance mainly include oversampling and undersampling. However, the existing undersampling will face the problem of losing important sample information, while oversampling cannot effectively fit the global distribution and generate noise. In recent years, generative adversarial network (GAN) has shown great potential in fitting real sample distributions. Based on this, this paper proposes an improved GAN and biased loss combined model, namely VGAN-BL, to solve the learning problem under imbalanced conditions. In the improvement based on GAN, VAE is used to generate latent vectors with posterior distribution as the input of GAN, and KL similarity measurement loss is introduced into the generator to improve the quality of minority samples generated by GAN. In addition, we propose a biased loss definition method based on the discriminator to improve the performance of classifier. Experiments on 20 real datasets show that the classification performance of the proposed method is significantly improved compared with other advanced methods. The source code can be found here: https://github.com/universuen/VGAN-BL.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data used in this article are a common dataset for imbalanced learning, which can be found here: http://www.keel.es/.

References

  1. Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203

    Article  Google Scholar 

  2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  3. Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47–70

    Article  Google Scholar 

  4. Mirzaei B, Nikpour B, Nezamabadi-pour H (2021) CDBH: a clustering and density-based hybrid approach for imbalanced data classification. Expert Syst Appl 164:114035

    Article  Google Scholar 

  5. Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587

    Google Scholar 

  6. Sun Z, Song Q, Zhu X, Sun H, Baowen X, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637

    Article  Google Scholar 

  7. Hao X, Jiang Z, Xiao Q, Wang Q, Yao Y, Liu B, Liu J (2021) Producing more with less: a GAN-based network attack detection approach for imbalanced data. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 384–390

  8. Zhang W, Peng P, Zhang H (2021) Using bidirectional GAN with improved training architecture for imbalanced tasks. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 714–719

  9. García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl-Based Syst 25(1):13–21

    Article  Google Scholar 

  10. Douzas G, Bacao F (2017) Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl 82:40–52

    Article  Google Scholar 

  11. Zhaozhao X, Shen D, Nie T, Kou Y, Yin N, Han X (2021) A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf Sci 572:574–589

    Article  MathSciNet  Google Scholar 

  12. Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887

  13. Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20

    Article  Google Scholar 

  14. He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328

  15. Lee D, Kim K (2021) An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data. Expert Syst Appl 184:115442

    Article  Google Scholar 

  16. Koziarski M (2021) Potential anchoring for imbalanced data classification. Pattern recognition, p 108114

  17. Das B, Krishnan NC, Cook DJ (2014) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234

    Article  Google Scholar 

  18. Dongdong L, Ziqiu C, Bolu W, Zhe W, Hai Y, Wenli D (2021) Entropy-based hybrid sampling ensemble learning for imbalanced data. Int J Intell Syst

  19. Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251

    Article  Google Scholar 

  20. He Y, Lin F, Tzeng NF (2021) Interpretable minority synthesis for imbalanced classification. In: Proceedings of the thirtieth international joint conference on artificial intelligence

  21. Choi J, Yi KM, Kim J, Choo J, Kim B, Chang J, Gwon Y, Chang HJ (2021) Vab-al: incorporating class imbalance and difficulty with variational Bayes for active learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6749–6758

  22. Adiban M, Siniscalchi SM, Salvi G (2023) A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing 537:296–308

    Article  Google Scholar 

  23. Lan ZC, Huang GY, Li YP, Rho S, Vimal S, Chen BW (2022) Conquering insufficient/imbalanced data learning for the internet of medical things. Neural Computing and Applications, pp 1–10

  24. Zhu B, Pan X, Broucke S, Xiao J (2022) A GAN-based hybrid sampling method for imbalanced customer classification. Inf Sci 609:1397–1411

    Article  Google Scholar 

  25. Son M, Jung S, Jung S, Hwang E (2021) BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing. J Supercomput 1–25

  26. Teng H, Wang C, Yang Q, Chen X, Li R (2023) Leveraging adversarial augmentation on imbalance data for online trading fraud detection. IEEE Trans Comput Soc Syst

  27. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27

  28. Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223

  29. Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of Wasserstein GANs. Preprint arXiv:1704.00028

  30. Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. Preprint arXiv:1802.05957

  31. Kingma DP, Welling M (2013) Auto-encoding variational Bayes. Preprint arXiv:1312.6114

  32. Gu Q, Cai Z, Zhu L, Huang B (2008) Data mining on imbalanced data sets. In: 2008 International conference on advanced computer theory and engineering, IEEE, pp 1020–1024

  33. Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS), IEEE, pp 1–6

  34. Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116

    Google Scholar 

Download references

Acknowledgements

The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. This research was supported by the National Key R &D Program of China (No.2018YFC1604000).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaohui Cui.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ding, H., Sun, Y., Huang, N. et al. VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss. Neural Comput & Applic 36, 2883–2899 (2024). https://doi.org/10.1007/s00521-023-09180-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-023-09180-x

Keywords

Navigation