A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN

Qu, Jiantao; Liu, Feng; Ma, Yuxiang

doi:10.1007/s10044-021-01035-2

A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN

Theoretical Advances
Published: 16 October 2021

Volume 25, pages 17–34, (2022)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

558 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

In real-world datasets, the number of samples in each class is often imbalanced, which results in the classifier’s suboptimal performance. Presently, the imbalanced binary classification approach based on deep learning has achieved good results and gets more attention constantly. In this study, we present a dual encoder (Denoising Auto-Encoder) DAE neural network based on non-dominated sorting genetic algorithm (NSGA-III) and generative adversarial network (GAN) to address the imbalanced binary classification problem. The primary aim of our approach is to increase the separability between the reconstruction error of minority class latent features and the reconstruction error of majority class latent features. For this purpose, we first create a dual encoder DAE network to obtain the reconstruction error of latent features of training data. Second, when training the neural network, we introduced GAN to perform a layer-wise training which can improve the training effect of the model. Third, in order to increase the separability of the reconstruction error of minority class and majority class, we utilize NSGA-III to optimize the parameters of the second encoder. Then, we can obtain a set of non-dominated solutions. Finally, based on the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method, we can get the best solution, which is the most appropriate parameter set of the second encoder to distinguish the minority class and the majority class. The experiment results on both benchmark datasets and a real-world dataset for communication anomaly detection demonstrate the superiority of the proposed approach in imbalanced binary classification problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine

D-AE: A Discriminant Encode-Decode Nets for Data Generation

Network intrusion detection based on conditional wasserstein variational autoencoder with generative adversarial network and one-dimensional convolutional neural networks

Article Open access 27 September 2022

References

Zhu H, Liu G, Zhou M, Xie Y, Abusorrah A, Kang Q (2020) Optimizing weighted extreme learning machines for imbalanced classification and application to credit card fraud detection. Neurocomputing 407:50–62
Article Google Scholar
Bauder RA, Khoshgoftaar TM (2018) The effects of varying class distribution on learner behavior for medicare fraud detection with imbalanced big data. Health Inf Sci Syst 6(1):9
Article Google Scholar
Jiang X, Pan S, Long G, Xiong F, Jiang J, Zhang C (2019) Cost-Sensitive Parallel Learning Framework for Insurance Intelligence Operation. IEEE Trans Indust Electron 66(12):9713–9723
Article Google Scholar
Chang T-M, Shih C-H, Hsu M-F (2016) Hybrid segmentation strategy and multi-agent svms for corporate risk management in class imbalanced situations. J Testing Eval 44(3):1364–1375
Google Scholar
Sun J, Zhou M, Ai W, Li H (2019) Dynamic prediction of relative financial distress based on imbalanced data stream: from the view of one industry. Risk Manag 21(4):215–242
Article Google Scholar
Huang X, Zhang C-Z, Yuan J (2020) Predicting extreme financial risks on imbalanced dataset: a combined kernel fcm and kernel smote based svm classifier. Comput Econ 56(1):187–216
Article Google Scholar
An J, Cho S (2015) Variational autoencoder based anomaly detection using reconstruction probability. Special Lect IE 2(1):1–18
Google Scholar
Schlegl, T, Seeböck P, Waldstein SM, Schmidt-Erfurth U, Langs G (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: International conference on information processing in medical imaging, Springer, pp 146–157
Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR, Efficient gan-based anomaly detection, arXiv preprint arXiv:1802.06222
Akcay S, Atapour-Abarghouei A, Breckon TP, Ganomaly: Semi-supervised anomaly detection via adversarial training. In: Asian conference on computer vision, Springer, 2018, pp 622–637
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232
Article Google Scholar
Laurikkala J (2002) Instance-based data reduction for improved identification of difficult small classes. Intell Data Anal 6(4):311–322
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
He H, Bai Y, Garcia EA, Li SA (2008) Adaptive synthetic sampling approach for imbalanced learning. In: IEEE international joint conference on neural networks (IEEE world congress on computational intelligence). IEEE, pp 1322–1328
Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using one-class and two-class svm-based multiclassifiers. Intell Data Anal 18(1):95–112
Article Google Scholar
Chaki S, Verma AK, Routray A, Mohanty WK, Jenamani M , A one class classifier based framework using svdd: Application to an imbalanced geological dataset, arXiv preprint arXiv:1612.01349
Suk H-I, Lee S-W, Shen D, Initiative ADN et al (2015) Latent feature representation with stacked auto-encoder for ad/mci diagnosis. Brain Struct Funct 220(2):841–859
Article Google Scholar
Li SZ, Jain A (Eds.) (2009) Fisher Criterion, Springer US, Boston, MA, pp 549–549. https://doi.org/10.1007/978-0-387-73003-5_585.
Deb K, Jain H (2013) An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part i: solving problems with box constraints. IEEE Trans Evolut Comput 18(4):577–601
Article Google Scholar
Hwang C-L, Masud ASM (2012) Multiple objective decision making-methods and applications: a state-of-the-art survey, vol 164. Springer Science & Business Media, Berlin
Google Scholar
Phua C, Alahakoon D, Lee V (2004) Minority report in fraud detection: classification of skewed data. Acm sigkdd Explorations Newsletter 6(1):50–59
Article Google Scholar
Han H, Wang W-Y, Mao B-H (2005) Borderline-smote: a new over-sampling method in imbalanced data sets learning. In: International conference on intelligent computing, Springer, pp 878–887
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Wang Z, Wang J, Wang Y (2018) An intelligent diagnosis scheme based on generative adversarial learning deep neural networks and its application to planetary gearbox fault pattern recognition. Neurocomputing 310:213–222
Article Google Scholar
Lee YO, Jo J, Hwang J (2017) Application of deep neural network and generative adversarial network to industrial maintenance: A case study of induction motor fault detection. In: 2017 IEEE International Conference on Big Data (Big Data), IEEE, pp 3248–3253
Mao W, Liu Y, Ding L, Li Y (2019) Imbalanced fault diagnosis of rolling bearing based on generative adversarial network: a comparative study. IEEE Access 7:9515–9530
Article Google Scholar
Zhang X-L, Ren F (2009) Study on combinability of svm and adaboost algorithm. Appl Res Comput 26:77–78
Google Scholar
Sahin Y, Bulkan S, Duman E (2013) A cost-sensitive decision tree approach for fraud detection. Expert Syst Appl 40(15):5916–5923
Article Google Scholar
Chawla NV, Lazarevic A, Hall LO, Bowyer KW (2003) Smoteboost: Improving prediction of the minority class in boosting. In: European conference on principles of data mining and knowledge discovery, Springer, pp 107–119
Yan J, Han S (2018) Classifying imbalanced data sets by a novel re-sample and cost-sensitive stacked generalization method. Math Problems Eng 2018:5036710. https://www.hindawi.com/journals/mpe/2018/5036710/
Google Scholar
Vincent P, Larochelle H, Bengio Y, Manzagol P-A (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on Machine learning, pp 1096–1103
Hu J, Yang H, Lyu MR, King I, So AM-C (2017) Online nonlinear auc maximization for imbalanced data sets. IEEE Trans Neural Netw Learn Syst 29(4):882–895
Article Google Scholar
Das I, Dennis JE (1996) Normal-boundary intersection: a new method for generating the pareto surface in nonlinear multicriteria optimization problems. Siam J Optim 8(3):631–657
Article MathSciNet Google Scholar
Kingma DP, Ba J, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980
Lemaître G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 18(17):1–5
Google Scholar
Junior RLF, Osiro L, Carpinetti CRL (2014) A comparison between fuzzy ahp and fuzzy topsis methods to supplier selection. Appl Soft Comput 21:194–209
Article Google Scholar
Wang X, Huang F, Cheng Y (2014) Super-parameter selection for gaussian-kernel svm based on outlier-resisting. Measurement 58:147–153
Article Google Scholar
Qu J, Liu F, Ma Y, Fan J (2020) Temporal-spatial collaborative prediction for lte-r communication quality based on deep learning. IEEE Access 8:94817–94832
Article Google Scholar
Lods A, Malinowski S, Tavenard R, Amsaleg L (2017) Learning dtw-preserving shapelets. In: International symposium on intelligent data analysis, Springer, pp 198–209

Download references

Funding

This work was partially supported by the Young Elite Scientist Sponsorship Program by Henan Association for Science and Technology, China under Grant No. 2020HYTP008.

Author information

Authors and Affiliations

School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
Jiantao Qu & Feng Liu
Engineering Research Center of Network Management Technology for High Speed Railway, Ministry of Education, Beijing, 100044, China
Jiantao Qu & Feng Liu
School of Computer and Information Engineering, Henan University, Kaifeng, 475004, China
Jiantao Qu & Yuxiang Ma

Authors

Jiantao Qu
View author publications
You can also search for this author in PubMed Google Scholar
Feng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yuxiang Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiantao Qu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qu, J., Liu, F. & Ma, Y. A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN. Pattern Anal Applic 25, 17–34 (2022). https://doi.org/10.1007/s10044-021-01035-2

Download citation

Received: 18 September 2020
Accepted: 23 September 2021
Published: 16 October 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10044-021-01035-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN

Abstract

Access this article

Similar content being viewed by others

A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine

D-AE: A Discriminant Encode-Decode Nets for Data Generation

Network intrusion detection based on conditional wasserstein variational autoencoder with generative adversarial network and one-dimensional convolutional neural networks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A dual encoder DAE neural network for imbalanced binary classification based on NSGA-III and GAN

Abstract

Access this article

Similar content being viewed by others

A Novel Method to Create Synthetic Samples with Autoencoder Multi-layer Extreme Learning Machine

D-AE: A Discriminant Encode-Decode Nets for Data Generation

Network intrusion detection based on conditional wasserstein variational autoencoder with generative adversarial network and one-dimensional convolutional neural networks

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation