Abstract
Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deteriorating the performance of the traditional classification algorithms. The previous dealing methods often focus on resampling and algorithm adjustment, but ignore enhancing the ability of feature learning. In this study, we have proposed a novel algorithm for imbalanced data classification: Maximum Mean Discrepancy-Encouraging Convolutional Autoencoder (MMD-CAE), from the perspective of feature learning. The algorithm adopts a two-phase target training process. The cross entropy loss is employed to calculate reconstruction loss of data, and the Maximum Mean Discrepancy (MMD) with intra-variance constraint is used to stimulate the feature discrepancy in bottleneck layer. By encouraging maximization of MMD between two-class samples, and mapping the original space to a higher dimension space via kernel skills, the features can be learned to form a more effective feature space. The proposed algorithm is tested on ten groups of samples with different imbalance ratios. The performance metrics of recall rate, F1 score, G-means and AUC verify that the proposed algorithm surpasses the existing state-of-the-art methods in this field, also with stronger generalization ability. This study could shed new lights on the related studies in terms of constituting more effective feature space via the proposed MMD with intra-variance constraint method, and the holistic MMD-CAE algorithm for imbalanced data classification.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Guo HX, Li YJ, Jennifer S et al (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239
Manogaran G, Shakeel P, Hassanein A et al (2019) Machine learning Approach-Based gamma distribution for brain tumor detection and data sample imbalance analysis. Quality Control Trans 7:12–19
Minǎstireanu E, Meṡnitǎ G (2020) Methods of handling unbalanced datasets in credit card fraud detection. Brain 11:131–143
Liu S, Lin G, Han Q et al (2020) Deepbalance: Deep-learning and fuzzy oversampling for vulnerability detection. IEEE Trans Fuzzy Syst 28(7):1329–1343
Ren R, Yang Y, Sun L et al (2020) Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data. Appl Intell 1–23
Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granular Comput 2(3):1–9
Liu H, Zhang L (2018) Fuzzy rule-based systems for recognition-intensive classification in granular computing context. Granular Comput 3(4):1–11
Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granular Comput 4(2):197–209
Liu H, Cocea M (2018) Granular computing based approach of rule learning for binary classification. Granular Comput
Liu H, Cocea M (2019) Nature-inspired framework of ensemble learning for collaborative classification in granular computing context. Granular Comput 4(4):715–724
Aydav PS, Minz S (2019) Granulation-based self-training for the semi-supervised classification of remote-sensing images. Granular Comput
Luo R, Feng Q, Wang C et al (2018) Feature learning with a Divergence-Encouraging autoencoder for imbalanced data classification. IEEE Access PP(99):1–1
Yan H, Li Z, Wang Q et al (2020) Weighted and Class-Specific maximum mean discrepancy for unsupervised domain adaptation. IEEE Trans Multimed 22(9):2420–2433
Borges TA, Neves RF (2020) Ensemble of machine learning algorithms for cryptocurrency investment with different data resampling methods. Appl Soft Comput 90:106187
Devi D, Biswas SK, Purkayastha B (2017) Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance. Pattern Recogn Lett 93:3–12
Lattimore T, Szepesvari C (2019) Cleaning up the neighborhood: A full classification for adversarial partial monitoring. Algo Learn Theory 529-556
Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357
Torres FR, Carrascoochoa JA, Martineztrinidad JF et al (2016) SMOTE-D a Deterministic Version of SMOTE. Mexican Conf Pattern Recognit 177-188
Cruz R, Souza M, Sabourin R et al (2019) Dynamic ensemble selection and data preprocessing for Multi-Class imbalance learning. Int J Pattern Recognit Artif Intell 33(11):238–251
Mostafa E, Roesmann M, Maack C et al (2020) Automated pressure regulation for a silage bagging machine. Comput Electron Agric 173:105399
Hassib EM, El-Desouky AI, El-Kenawy E et al (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access 99:1–1
Aboozar T, Georgina C et al (2020) Adaboost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366
Blaszczynski J, Deckert M, Stefanowski J et al (2012) IIVotes ensemble for imbalanced data. Intell Data Anal 16(5):777–801–324-331
Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers. Intell Data Anal 18(1):95–112
Chaki S, Verma AK, Routray A et al (2016) A One class Classifier based Framework using SVDD: Application to an Imbalanced Geological Dataset. arXiv:1612.01349
Dhar S, Cherkassky V (2017) Development and Evaluation of Cost-Sensitive universum-SVM. IEEE Trans Cybern 45(4):806–818
Wu CC, Chen YL, Tang K (2019) Cost-sensitive decision tree with multiple resource constraints. Appl Intell 49(10):3765–3782
Lin T, Goyal P, Girshick R et al (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Machine Intell 42(2):318–327
Lu YW, Liu KL, Hsu CY (2019) Conditional Generative Adversarial Network for Defect Classification with Class Imbalance. IEEE Int Conf Smart Manufact
Pasupa K, Vatathanavaro S, Tungjitnob S et al (2020) Convolutional neural networks based focal loss for class imbalance problem: A case study of canine red blood cells morphology classification. J Ambient Intell Human Comput 17:1868– 5137
Qian Y, Ma Jun et al (2020) EMSGD: An improved learning algorithm of neural networks with imbalanced data. IEEE Access 8:64086–64098
Pouyanfar S, Tao Y, Mohan A et al (2018) Dynamic sampling in convolutional neural networks for imbalanced data classification. IEEE MIPR
Jayadeva PH, Sharma M et al (2019) Twin neural networks for the classification of large unbalanced datasets. Neurocomputing 343(28):34–49
Gerych W, Agu E, Rundensteiner E et al (2019) Classifying Depression in Imbalanced Datasets Using an Autoencoder-Based Anomaly Detection Approach [C]. ieee Int Conf Semantic Comput 124–127
Yang J, Xie G, Yang Y (2020) An improved ensemble fusion autoencoder model for fault diagnosis from imbalanced and incomplete data. Control Eng Pract 98:104358
Zou F, Shen L, Jie Z et al (2019) A sufficient condition for convergences of Adam and RMSProp. Comput Vision And Pattern Recognit 11127–11135
Mukkamala MC, Hein M (2017) Variants of RMSProp and Adagrad with logarithmic regret bounds. arXiv: Learning
Rafiei MH, Adeli H (2017) A new neural dynamic classification algorithm. IEEE Trans Neural Netw 28(12):3074–3083
Ding Y (2016) Imbalanced network traffic classification based on ensemble feature selection. Int Conf Signal Process 1–4
Watanabe T, Kimura T (2018) Method and apparatus for speech recognition. J Acoust Soc Am 109(3):864
Ahmed I, Almadi N, Gastli A et al (2019) Mitigation of voltage imbalance in power distribution system using MPC-controlled packed-U-cells converter. Energ Sci Eng 7(9)
Zhao C, Xin Y, Li X et al (2020) A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci 10(3):936
Gai K, Zhu X, Li H et al (2017) Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction. arXiv: Machine Learning
Wang C et al (2019) Scalar quantization as sparse least square optimization. In: IEEE Transactions on pattern analysis and machine intelligence, https://doi.org/10.1109/TPAMI.2019.2952096
Acknowledgements
We would like to thank Michael Tan in University College London, the UK for proofreading our work. This work is supported by the Sichuan Science and Technology Program (2020YFG0051), and the University-Enterprise Cooperation Projects (17H1199, 19H0355, 19H1121).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, B., Gong, X., Wang, C. et al. MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data. Appl Intell 51, 7384–7401 (2021). https://doi.org/10.1007/s10489-021-02235-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02235-3