MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data

Li, Bin; Gong, Xiaofeng; Wang, Chen; Wu, Ruijuan; Bian, Tong; Li, Yanming; Wang, Zhiyuan; Luo, Ruisen

doi:10.1007/s10489-021-02235-3

MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data

Published: 09 March 2021

Volume 51, pages 7384–7401, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Bin Li¹,
Xiaofeng Gong¹,
Chen Wang^1,2,
Ruijuan Wu³,
Tong Bian¹,
Yanming Li¹,
Zhiyuan Wang¹ &
…
Ruisen Luo ORCID: orcid.org/0000-0003-0213-9375¹

793 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deteriorating the performance of the traditional classification algorithms. The previous dealing methods often focus on resampling and algorithm adjustment, but ignore enhancing the ability of feature learning. In this study, we have proposed a novel algorithm for imbalanced data classification: Maximum Mean Discrepancy-Encouraging Convolutional Autoencoder (MMD-CAE), from the perspective of feature learning. The algorithm adopts a two-phase target training process. The cross entropy loss is employed to calculate reconstruction loss of data, and the Maximum Mean Discrepancy (MMD) with intra-variance constraint is used to stimulate the feature discrepancy in bottleneck layer. By encouraging maximization of MMD between two-class samples, and mapping the original space to a higher dimension space via kernel skills, the features can be learned to form a more effective feature space. The proposed algorithm is tested on ten groups of samples with different imbalance ratios. The performance metrics of recall rate, F1 score, G-means and AUC verify that the proposed algorithm surpasses the existing state-of-the-art methods in this field, also with stronger generalization ability. This study could shed new lights on the related studies in terms of constituting more effective feature space via the proposed MMD with intra-variance constraint method, and the holistic MMD-CAE algorithm for imbalanced data classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-class imbalanced image classification using conditioned GANs

Article 18 September 2021

Ensemble learning method based on CNN for class imbalanced data

Article 19 December 2023

Deep Over-sampling Framework for Classifying Imbalanced Data

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Guo HX, Li YJ, Jennifer S et al (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239
Article Google Scholar
Manogaran G, Shakeel P, Hassanein A et al (2019) Machine learning Approach-Based gamma distribution for brain tumor detection and data sample imbalance analysis. Quality Control Trans 7:12–19
Google Scholar
Minǎstireanu E, Meṡnitǎ G (2020) Methods of handling unbalanced datasets in credit card fraud detection. Brain 11:131–143
Article Google Scholar
Liu S, Lin G, Han Q et al (2020) Deepbalance: Deep-learning and fuzzy oversampling for vulnerability detection. IEEE Trans Fuzzy Syst 28(7):1329–1343
Google Scholar
Ren R, Yang Y, Sun L et al (2020) Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data. Appl Intell 1–23
Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granular Comput 2(3):1–9
Article Google Scholar
Liu H, Zhang L (2018) Fuzzy rule-based systems for recognition-intensive classification in granular computing context. Granular Comput 3(4):1–11
Google Scholar
Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granular Comput 4(2):197–209
Article Google Scholar
Liu H, Cocea M (2018) Granular computing based approach of rule learning for binary classification. Granular Comput
Liu H, Cocea M (2019) Nature-inspired framework of ensemble learning for collaborative classification in granular computing context. Granular Comput 4(4):715–724
Article Google Scholar
Aydav PS, Minz S (2019) Granulation-based self-training for the semi-supervised classification of remote-sensing images. Granular Comput
Luo R, Feng Q, Wang C et al (2018) Feature learning with a Divergence-Encouraging autoencoder for imbalanced data classification. IEEE Access PP(99):1–1
Article Google Scholar
Yan H, Li Z, Wang Q et al (2020) Weighted and Class-Specific maximum mean discrepancy for unsupervised domain adaptation. IEEE Trans Multimed 22(9):2420–2433
Article Google Scholar
Borges TA, Neves RF (2020) Ensemble of machine learning algorithms for cryptocurrency investment with different data resampling methods. Appl Soft Comput 90:106187
Article Google Scholar
Devi D, Biswas SK, Purkayastha B (2017) Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance. Pattern Recogn Lett 93:3–12
Article Google Scholar
Lattimore T, Szepesvari C (2019) Cleaning up the neighborhood: A full classification for adversarial partial monitoring. Algo Learn Theory 529-556
Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Article Google Scholar
Torres FR, Carrascoochoa JA, Martineztrinidad JF et al (2016) SMOTE-D a Deterministic Version of SMOTE. Mexican Conf Pattern Recognit 177-188
Cruz R, Souza M, Sabourin R et al (2019) Dynamic ensemble selection and data preprocessing for Multi-Class imbalance learning. Int J Pattern Recognit Artif Intell 33(11):238–251
Article Google Scholar
Mostafa E, Roesmann M, Maack C et al (2020) Automated pressure regulation for a silage bagging machine. Comput Electron Agric 173:105399
Article Google Scholar
Hassib EM, El-Desouky AI, El-Kenawy E et al (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access 99:1–1
Google Scholar
Aboozar T, Georgina C et al (2020) Adaboost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366
Article Google Scholar
Blaszczynski J, Deckert M, Stefanowski J et al (2012) IIVotes ensemble for imbalanced data. Intell Data Anal 16(5):777–801–324-331
Article Google Scholar
Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers. Intell Data Anal 18(1):95–112
Article Google Scholar
Chaki S, Verma AK, Routray A et al (2016) A One class Classifier based Framework using SVDD: Application to an Imbalanced Geological Dataset. arXiv:1612.01349
Dhar S, Cherkassky V (2017) Development and Evaluation of Cost-Sensitive universum-SVM. IEEE Trans Cybern 45(4):806–818
Article Google Scholar
Wu CC, Chen YL, Tang K (2019) Cost-sensitive decision tree with multiple resource constraints. Appl Intell 49(10):3765–3782
Article Google Scholar
Lin T, Goyal P, Girshick R et al (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Machine Intell 42(2):318–327
Article Google Scholar
Lu YW, Liu KL, Hsu CY (2019) Conditional Generative Adversarial Network for Defect Classification with Class Imbalance. IEEE Int Conf Smart Manufact
Pasupa K, Vatathanavaro S, Tungjitnob S et al (2020) Convolutional neural networks based focal loss for class imbalance problem: A case study of canine red blood cells morphology classification. J Ambient Intell Human Comput 17:1868– 5137
Google Scholar
Qian Y, Ma Jun et al (2020) EMSGD: An improved learning algorithm of neural networks with imbalanced data. IEEE Access 8:64086–64098
Article Google Scholar
Pouyanfar S, Tao Y, Mohan A et al (2018) Dynamic sampling in convolutional neural networks for imbalanced data classification. IEEE MIPR
Jayadeva PH, Sharma M et al (2019) Twin neural networks for the classification of large unbalanced datasets. Neurocomputing 343(28):34–49
Article Google Scholar
Gerych W, Agu E, Rundensteiner E et al (2019) Classifying Depression in Imbalanced Datasets Using an Autoencoder-Based Anomaly Detection Approach [C]. ieee Int Conf Semantic Comput 124–127
Yang J, Xie G, Yang Y (2020) An improved ensemble fusion autoencoder model for fault diagnosis from imbalanced and incomplete data. Control Eng Pract 98:104358
Article Google Scholar
Zou F, Shen L, Jie Z et al (2019) A sufficient condition for convergences of Adam and RMSProp. Comput Vision And Pattern Recognit 11127–11135
Mukkamala MC, Hein M (2017) Variants of RMSProp and Adagrad with logarithmic regret bounds. arXiv: Learning
Rafiei MH, Adeli H (2017) A new neural dynamic classification algorithm. IEEE Trans Neural Netw 28(12):3074–3083
Article MathSciNet Google Scholar
Ding Y (2016) Imbalanced network traffic classification based on ensemble feature selection. Int Conf Signal Process 1–4
Watanabe T, Kimura T (2018) Method and apparatus for speech recognition. J Acoust Soc Am 109(3):864
Google Scholar
Ahmed I, Almadi N, Gastli A et al (2019) Mitigation of voltage imbalance in power distribution system using MPC-controlled packed-U-cells converter. Energ Sci Eng 7(9)
Zhao C, Xin Y, Li X et al (2020) A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci 10(3):936
Article Google Scholar
Gai K, Zhu X, Li H et al (2017) Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction. arXiv: Machine Learning
Wang C et al (2019) Scalar quantization as sparse least square optimization. In: IEEE Transactions on pattern analysis and machine intelligence, https://doi.org/10.1109/TPAMI.2019.2952096

Download references

Acknowledgements

We would like to thank Michael Tan in University College London, the UK for proofreading our work. This work is supported by the Sichuan Science and Technology Program (2020YFG0051), and the University-Enterprise Cooperation Projects (17H1199, 19H0355, 19H1121).

Author information

Authors and Affiliations

College of Electrical Engineering, Sichuan University, 24 South Section 1, One Ring Road, Chengdu, China
Bin Li, Xiaofeng Gong, Chen Wang, Tong Bian, Yanming Li, Zhiyuan Wang & Ruisen Luo
Department of Computer Science, Rutgers University - New Brunswick, piscataway, NJ, 08854, USA
Chen Wang
Chengdu Dagongbochuang Information Technology Co., Ltd., Chengdu, 610059, China
Ruijuan Wu

Authors

Bin Li
View author publications
You can also search for this author inPubMed Google Scholar
Xiaofeng Gong
View author publications
You can also search for this author inPubMed Google Scholar
Chen Wang
View author publications
You can also search for this author inPubMed Google Scholar
Ruijuan Wu
View author publications
You can also search for this author inPubMed Google Scholar
Tong Bian
View author publications
You can also search for this author inPubMed Google Scholar
Yanming Li
View author publications
You can also search for this author inPubMed Google Scholar
Zhiyuan Wang
View author publications
You can also search for this author inPubMed Google Scholar
Ruisen Luo
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Ruisen Luo.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, B., Gong, X., Wang, C. et al. MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data. Appl Intell 51, 7384–7401 (2021). https://doi.org/10.1007/s10489-021-02235-3

Download citation

Accepted: 25 January 2021
Published: 09 March 2021
Issue Date: October 2021
DOI: https://doi.org/10.1007/s10489-021-02235-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Multi-class imbalanced image classification using conditioned GANs

Ensemble learning method based on CNN for class imbalanced data

Deep Over-sampling Framework for Classifying Imbalanced Data

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now