Abstract
The purpose of imbalanced data classification is to solve the problem of unfair learning caused by the large difference in data distribution. Traditional classifiers are designed on the basis of balanced data, but the performance of imbalanced data will decline sharply. Therefore, balancing the majority class and minority class samples before classification is a popular strategy for solving imbalanced learning. Current methods for data balance mainly include oversampling and undersampling. However, the existing undersampling will face the problem of losing important sample information, while oversampling cannot effectively fit the global distribution and generate noise. In recent years, generative adversarial network (GAN) has shown great potential in fitting real sample distributions. Based on this, this paper proposes an improved GAN and biased loss combined model, namely VGAN-BL, to solve the learning problem under imbalanced conditions. In the improvement based on GAN, VAE is used to generate latent vectors with posterior distribution as the input of GAN, and KL similarity measurement loss is introduced into the generator to improve the quality of minority samples generated by GAN. In addition, we propose a biased loss definition method based on the discriminator to improve the performance of classifier. Experiments on 20 real datasets show that the classification performance of the proposed method is significantly improved compared with other advanced methods. The source code can be found here: https://github.com/universuen/VGAN-BL.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
The data used in this article are a common dataset for imbalanced learning, which can be found here: http://www.keel.es/.
References
Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Vuttipittayamongkol P, Elyan E (2020) Neighbourhood-based undersampling approach for handling imbalanced and overlapped data. Inf Sci 509:47–70
Mirzaei B, Nikpour B, Nezamabadi-pour H (2021) CDBH: a clustering and density-based hybrid approach for imbalanced data classification. Expert Syst Appl 164:114035
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst 29(8):3573–3587
Sun Z, Song Q, Zhu X, Sun H, Baowen X, Zhou Y (2015) A novel ensemble method for classifying imbalanced data. Pattern Recogn 48(5):1623–1637
Hao X, Jiang Z, Xiao Q, Wang Q, Yao Y, Liu B, Liu J (2021) Producing more with less: a GAN-based network attack detection approach for imbalanced data. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 384–390
Zhang W, Peng P, Zhang H (2021) Using bidirectional GAN with improved training architecture for imbalanced tasks. In: 2021 IEEE 24th international conference on computer supported cooperative work in design (CSCWD), IEEE, pp 714–719
García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowl-Based Syst 25(1):13–21
Douzas G, Bacao F (2017) Self-organizing map oversampling (SOMO) for imbalanced data set learning. Expert Syst Appl 82:40–52
Zhaozhao X, Shen D, Nie T, Kou Y, Yin N, Han X (2021) A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inf Sci 572:574–589
Han H, Wang WY, Mao BH (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887
Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf Sci 465:1–20
He H, Bai Y, Garcia EA, Li S (2008) ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), IEEE, pp 1322–1328
Lee D, Kim K (2021) An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data. Expert Syst Appl 184:115442
Koziarski M (2021) Potential anchoring for imbalanced data classification. Pattern recognition, p 108114
Das B, Krishnan NC, Cook DJ (2014) RACOG and wRACOG: two probabilistic oversampling techniques. IEEE Trans Knowl Data Eng 27(1):222–234
Dongdong L, Ziqiu C, Bolu W, Zhe W, Hai Y, Wenli D (2021) Entropy-based hybrid sampling ensemble learning for imbalanced data. Int J Intell Syst
Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans Knowl Data Eng 28(1):238–251
He Y, Lin F, Tzeng NF (2021) Interpretable minority synthesis for imbalanced classification. In: Proceedings of the thirtieth international joint conference on artificial intelligence
Choi J, Yi KM, Kim J, Choo J, Kim B, Chang J, Gwon Y, Chang HJ (2021) Vab-al: incorporating class imbalance and difficulty with variational Bayes for active learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6749–6758
Adiban M, Siniscalchi SM, Salvi G (2023) A step-by-step training method for multi generator GANs with application to anomaly detection and cybersecurity. Neurocomputing 537:296–308
Lan ZC, Huang GY, Li YP, Rho S, Vimal S, Chen BW (2022) Conquering insufficient/imbalanced data learning for the internet of medical things. Neural Computing and Applications, pp 1–10
Zhu B, Pan X, Broucke S, Xiao J (2022) A GAN-based hybrid sampling method for imbalanced customer classification. Inf Sci 609:1397–1411
Son M, Jung S, Jung S, Hwang E (2021) BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing. J Supercomput 1–25
Teng H, Wang C, Yang Q, Chen X, Li R (2023) Leveraging adversarial augmentation on imbalance data for online trading fraud detection. IEEE Trans Comput Soc Syst
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein generative adversarial networks. In: International conference on machine learning, PMLR, pp 214–223
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville AC (2017) Improved training of Wasserstein GANs. Preprint arXiv:1704.00028
Miyato T, Kataoka T, Koyama M, Yoshida Y (2018) Spectral normalization for generative adversarial networks. Preprint arXiv:1802.05957
Kingma DP, Welling M (2013) Auto-encoding variational Bayes. Preprint arXiv:1312.6114
Gu Q, Cai Z, Zhu L, Huang B (2008) Data mining on imbalanced data sets. In: 2008 International conference on advanced computer theory and engineering, IEEE, pp 1020–1024
Moustafa N, Slay J (2015) UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In: 2015 military communications and information systems conference (MilCIS), IEEE, pp 1–6
Sharafaldin I, Lashkari AH, Ghorbani AA (2018) Toward generating a new intrusion detection dataset and intrusion traffic characterization. ICISSp 1:108–116
Acknowledgements
The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University. This research was supported by the National Key R &D Program of China (No.2018YFC1604000).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, H., Sun, Y., Huang, N. et al. VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss. Neural Comput & Applic 36, 2883–2899 (2024). https://doi.org/10.1007/s00521-023-09180-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-09180-x