Skip to main content

Advertisement

Log in

A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

In real-world datasets, the number of samples in each class is often imbalanced, which results in the classifier’s suboptimal performance. Presently, the imbalanced binary classification approach based on deep learning has achieved good results and gets more attention constantly. In this study, we present a dual encoder (Denoising Auto-Encoder) DAE neural network based on non-dominated sorting genetic algorithm (NSGA-III) and generative adversarial network (GAN) to address the imbalanced binary classification problem. The primary aim of our approach is to increase the separability between the reconstruction error of minority class latent features and the reconstruction error of majority class latent features. For this purpose, we first create a dual encoder DAE network to obtain the reconstruction error of latent features of training data. Second, when training the neural network, we introduced GAN to perform a layer-wise training which can improve the training effect of the model. Third, in order to increase the separability of the reconstruction error of minority class and majority class, we utilize NSGA-III to optimize the parameters of the second encoder. Then, we can obtain a set of non-dominated solutions. Finally, based on the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method, we can get the best solution, which is the most appropriate parameter set of the second encoder to distinguish the minority class and the majority class. The experiment results on both benchmark datasets and a real-world dataset for communication anomaly detection demonstrate the superiority of the proposed approach in imbalanced binary classification problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Zhu H, Liu G, Zhou M, Xie Y, Abusorrah A, Kang Q (2020) Optimizing weighted extreme learning machines for imbalanced classification and application to credit card fraud detection. Neurocomputing 407:50–62

    Article  Google Scholar 

  2. Bauder RA, Khoshgoftaar TM (2018) The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data. Health Inf Sci Syst 6(1):9

    Article  Google Scholar 

  3. Jiang X, Pan S, Long G, Xiong F, Jiang J, Zhang C (2019) Cost-Sensitive Parallel Learning Framework for Insurance Intelligence Operation. IEEE Trans Indust Electron 66(12):9713–9723

    Article  Google Scholar 

  4. Chang T-M, Shih C-H, Hsu M-F (2016) Hybrid segmentation strategy and multi-agent svms for corporate risk management in class imbalanced situations. J Testing Eval 44(3):1364–1375

    Google Scholar 

  5. Sun J, Zhou M, Ai W, Li H (2019) Dynamic prediction of relative financial distress based on imbalanced data stream: from the view of one industry. Risk Manag 21(4):215–242

    Article  Google Scholar 

  6. Huang X, Zhang C-Z, Yuan J (2020) Predicting extreme financial risks on imbalanced dataset: a combined kernel fcm and kernel smote based svm classifier. Comput Econ 56(1):187–216

    Article  Google Scholar 

  7. An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability. Special Lect IE 2(1):1–18

    Google Scholar 

  8. Schlegl, T, Seeböck P, Waldstein SM, Schmidt-Erfurth U, Langs G (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging, Springer, pp 146–157

  9. Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR, Efficient gan-based anomaly detection, arXiv preprint arXiv:1802.06222

  10. Akcay S, Atapour-Abarghouei A, Breckon TP, Ganomaly: Semi-supervised anomaly detection via adversarial training. In: Asian conference on computer vision, Springer, 2018, pp 622–637

  11. Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232

    Article  Google Scholar 

  12. Laurikkala J (2002) Instance-based data reduction for improved identification of difficult small classes. Intell Data Anal 6(4):311–322

    Article  Google Scholar 

  13. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  14. He H, Bai Y, Garcia EA, Li SA (2008) Adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328

  15. Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using one-class and two-class svm-based multiclassifiers. Intell Data Anal 18(1):95–112

    Article  Google Scholar 

  16. Chaki S, Verma AK, Routray A, Mohanty WK, Jenamani M , A one class classifier based framework using svdd: Application to an imbalanced geological dataset, arXiv preprint arXiv:1612.01349

  17. Suk H-I, Lee S-W, Shen D, Initiative ADN et al (2015) Latent feature representation with stacked auto-encoder for ad/mci diagnosis. Brain Struct Funct 220(2):841–859

    Article  Google Scholar 

  18. Li SZ, Jain A (Eds.) (2009) Fisher Criterion, Springer US, Boston, MA, pp 549–549. https://doi.org/10.1007/978-0-387-73003-5_585.

  19. Deb K, Jain H (2013) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints. IEEE Trans Evolut Comput 18(4):577–601

    Article  Google Scholar 

  20. Hwang C-L, Masud ASM (2012) Multiple objective decision making-methods and applications: a state-of-the-art survey, vol 164. Springer Science & Business Media, Berlin

    Google Scholar 

  21. Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. Acm sigkdd Explorations Newsletter 6(1):50–59

    Article  Google Scholar 

  22. Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887

  23. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680

  24. Wang Z, Wang J, Wang Y (2018) An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition. Neurocomputing 310:213–222

    Article  Google Scholar 

  25. Lee YO, Jo J, Hwang J (2017) Application of deep neural network and generative adversarial network to industrial maintenance: A case study of induction motor fault detection. In: 2017 IEEE International Conference on Big Data (Big Data), IEEE, pp 3248–3253

  26. Mao W, Liu Y, Ding L, Li Y (2019) Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: a comparative study. IEEE Access 7:9515–9530

    Article  Google Scholar 

  27. Zhang X-L, Ren F (2009) Study on combinability of svm and adaboost algorithm. Appl Res Comput 26:77–78

    Google Scholar 

  28. Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Expert Syst Appl 40(15):5916–5923

    Article  Google Scholar 

  29. Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, Springer, pp 107–119

  30. Yan J, Han S (2018) Classifying imbalanced data sets by a novel re-sample and cost-sensitive stacked generalization method. Math Problems Eng 2018:5036710. https://www.hindawi.com/journals/mpe/2018/5036710/

    Google Scholar 

  31. Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103

  32. Hu J, Yang H, Lyu MR, King I, So AM-C (2017) Online nonlinear auc maximization for imbalanced data sets. IEEE Trans Neural Netw Learn Syst 29(4):882–895

    Article  Google Scholar 

  33. Das I, Dennis JE (1996) Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. Siam J Optim 8(3):631–657

    Article  MathSciNet  Google Scholar 

  34. Kingma DP, Ba J, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980

  35. Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(17):1–5

    Google Scholar 

  36. Junior RLF, Osiro L, Carpinetti CRL (2014) A comparison between fuzzy ahp and fuzzy topsis methods to supplier selection. Appl Soft Comput 21:194–209

    Article  Google Scholar 

  37. Wang X, Huang F, Cheng Y (2014) Super-parameter selection for gaussian-kernel svm based on outlier-resisting. Measurement 58:147–153

    Article  Google Scholar 

  38. Qu J, Liu F, Ma Y, Fan J (2020) Temporal-spatial collaborative prediction for lte-r communication quality based on deep learning. IEEE Access 8:94817–94832

    Article  Google Scholar 

  39. Lods A, Malinowski S, Tavenard R, Amsaleg L (2017) Learning dtw-preserving shapelets. In: International symposium on intelligent data analysis, Springer, pp 198–209

Download references

Funding

This work was partially supported by the Young Elite Scientist Sponsorship Program by Henan Association for Science and Technology, China under Grant No. 2020HYTP008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiantao Qu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qu, J., Liu, F. & Ma, Y. A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN. Pattern Anal Applic 25, 17–34 (2022). https://doi.org/10.1007/s10044-021-01035-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-021-01035-2

Keywords

Navigation