Skip to main content
Log in

MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Imbalanced data classification problem is widely existed in commercial activities and social production. It refers to the scenarios with considerable gap of sample amount among classes, thus significantly deteriorating the performance of the traditional classification algorithms. The previous dealing methods often focus on resampling and algorithm adjustment, but ignore enhancing the ability of feature learning. In this study, we have proposed a novel algorithm for imbalanced data classification: Maximum Mean Discrepancy-Encouraging Convolutional Autoencoder (MMD-CAE), from the perspective of feature learning. The algorithm adopts a two-phase target training process. The cross entropy loss is employed to calculate reconstruction loss of data, and the Maximum Mean Discrepancy (MMD) with intra-variance constraint is used to stimulate the feature discrepancy in bottleneck layer. By encouraging maximization of MMD between two-class samples, and mapping the original space to a higher dimension space via kernel skills, the features can be learned to form a more effective feature space. The proposed algorithm is tested on ten groups of samples with different imbalance ratios. The performance metrics of recall rate, F1 score, G-means and AUC verify that the proposed algorithm surpasses the existing state-of-the-art methods in this field, also with stronger generalization ability. This study could shed new lights on the related studies in terms of constituting more effective feature space via the proposed MMD with intra-variance constraint method, and the holistic MMD-CAE algorithm for imbalanced data classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Guo HX, Li YJ, Jennifer S et al (2017) Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl 73:220–239

    Article  Google Scholar 

  2. Manogaran G, Shakeel P, Hassanein A et al (2019) Machine learning Approach-Based gamma distribution for brain tumor detection and data sample imbalance analysis. Quality Control Trans 7:12–19

    Google Scholar 

  3. Minǎstireanu E, Meṡnitǎ G (2020) Methods of handling unbalanced datasets in credit card fraud detection. Brain 11:131–143

    Article  Google Scholar 

  4. Liu S, Lin G, Han Q et al (2020) Deepbalance: Deep-learning and fuzzy oversampling for vulnerability detection. IEEE Trans Fuzzy Syst 28(7):1329–1343

    Google Scholar 

  5. Ren R, Yang Y, Sun L et al (2020) Oversampling technique based on fuzzy representativeness difference for classifying imbalanced data. Appl Intell 1–23

  6. Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granular Comput 2(3):1–9

    Article  Google Scholar 

  7. Liu H, Zhang L (2018) Fuzzy rule-based systems for recognition-intensive classification in granular computing context. Granular Comput 3(4):1–11

    Google Scholar 

  8. Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granular Comput 4(2):197–209

    Article  Google Scholar 

  9. Liu H, Cocea M (2018) Granular computing based approach of rule learning for binary classification. Granular Comput

  10. Liu H, Cocea M (2019) Nature-inspired framework of ensemble learning for collaborative classification in granular computing context. Granular Comput 4(4):715–724

    Article  Google Scholar 

  11. Aydav PS, Minz S (2019) Granulation-based self-training for the semi-supervised classification of remote-sensing images. Granular Comput

  12. Luo R, Feng Q, Wang C et al (2018) Feature learning with a Divergence-Encouraging autoencoder for imbalanced data classification. IEEE Access PP(99):1–1

    Article  Google Scholar 

  13. Yan H, Li Z, Wang Q et al (2020) Weighted and Class-Specific maximum mean discrepancy for unsupervised domain adaptation. IEEE Trans Multimed 22(9):2420–2433

    Article  Google Scholar 

  14. Borges TA, Neves RF (2020) Ensemble of machine learning algorithms for cryptocurrency investment with different data resampling methods. Appl Soft Comput 90:106187

    Article  Google Scholar 

  15. Devi D, Biswas SK, Purkayastha B (2017) Redundancy-driven modified Tomek-link based undersampling: A solution to class imbalance. Pattern Recogn Lett 93:3–12

    Article  Google Scholar 

  16. Lattimore T, Szepesvari C (2019) Cleaning up the neighborhood: A full classification for adversarial partial monitoring. Algo Learn Theory 529-556

  17. Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE: Synthetic minority over-sampling technique. J Artif Intell Res 16(1):321–357

    Article  Google Scholar 

  18. Torres FR, Carrascoochoa JA, Martineztrinidad JF et al (2016) SMOTE-D a Deterministic Version of SMOTE. Mexican Conf Pattern Recognit 177-188

  19. Cruz R, Souza M, Sabourin R et al (2019) Dynamic ensemble selection and data preprocessing for Multi-Class imbalance learning. Int J Pattern Recognit Artif Intell 33(11):238–251

    Article  Google Scholar 

  20. Mostafa E, Roesmann M, Maack C et al (2020) Automated pressure regulation for a silage bagging machine. Comput Electron Agric 173:105399

    Article  Google Scholar 

  21. Hassib EM, El-Desouky AI, El-Kenawy E et al (2019) An imbalanced big data mining framework for improving optimization algorithms performance. IEEE Access 99:1–1

    Google Scholar 

  22. Aboozar T, Georgina C et al (2020) Adaboost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning. Neurocomputing 404:351–366

    Article  Google Scholar 

  23. Blaszczynski J, Deckert M, Stefanowski J et al (2012) IIVotes ensemble for imbalanced data. Intell Data Anal 16(5):777–801–324-331

    Article  Google Scholar 

  24. Maldonado S, Montecinos C (2014) Robust classification of imbalanced data using one-class and two-class SVM-based multiclassifiers. Intell Data Anal 18(1):95–112

    Article  Google Scholar 

  25. Chaki S, Verma AK, Routray A et al (2016) A One class Classifier based Framework using SVDD: Application to an Imbalanced Geological Dataset. arXiv:1612.01349

  26. Dhar S, Cherkassky V (2017) Development and Evaluation of Cost-Sensitive universum-SVM. IEEE Trans Cybern 45(4):806–818

    Article  Google Scholar 

  27. Wu CC, Chen YL, Tang K (2019) Cost-sensitive decision tree with multiple resource constraints. Appl Intell 49(10):3765–3782

    Article  Google Scholar 

  28. Lin T, Goyal P, Girshick R et al (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Machine Intell 42(2):318–327

    Article  Google Scholar 

  29. Lu YW, Liu KL, Hsu CY (2019) Conditional Generative Adversarial Network for Defect Classification with Class Imbalance. IEEE Int Conf Smart Manufact

  30. Pasupa K, Vatathanavaro S, Tungjitnob S et al (2020) Convolutional neural networks based focal loss for class imbalance problem: A case study of canine red blood cells morphology classification. J Ambient Intell Human Comput 17:1868– 5137

    Google Scholar 

  31. Qian Y, Ma Jun et al (2020) EMSGD: An improved learning algorithm of neural networks with imbalanced data. IEEE Access 8:64086–64098

    Article  Google Scholar 

  32. Pouyanfar S, Tao Y, Mohan A et al (2018) Dynamic sampling in convolutional neural networks for imbalanced data classification. IEEE MIPR

  33. Jayadeva PH, Sharma M et al (2019) Twin neural networks for the classification of large unbalanced datasets. Neurocomputing 343(28):34–49

    Article  Google Scholar 

  34. Gerych W, Agu E, Rundensteiner E et al (2019) Classifying Depression in Imbalanced Datasets Using an Autoencoder-Based Anomaly Detection Approach [C]. ieee Int Conf Semantic Comput 124–127

  35. Yang J, Xie G, Yang Y (2020) An improved ensemble fusion autoencoder model for fault diagnosis from imbalanced and incomplete data. Control Eng Pract 98:104358

    Article  Google Scholar 

  36. Zou F, Shen L, Jie Z et al (2019) A sufficient condition for convergences of Adam and RMSProp. Comput Vision And Pattern Recognit 11127–11135

  37. Mukkamala MC, Hein M (2017) Variants of RMSProp and Adagrad with logarithmic regret bounds. arXiv: Learning

  38. Rafiei MH, Adeli H (2017) A new neural dynamic classification algorithm. IEEE Trans Neural Netw 28(12):3074–3083

    Article  MathSciNet  Google Scholar 

  39. Ding Y (2016) Imbalanced network traffic classification based on ensemble feature selection. Int Conf Signal Process 1–4

  40. Watanabe T, Kimura T (2018) Method and apparatus for speech recognition. J Acoust Soc Am 109(3):864

    Google Scholar 

  41. Ahmed I, Almadi N, Gastli A et al (2019) Mitigation of voltage imbalance in power distribution system using MPC-controlled packed-U-cells converter. Energ Sci Eng 7(9)

  42. Zhao C, Xin Y, Li X et al (2020) A heterogeneous ensemble learning framework for spam detection in social networks with imbalanced data. Appl Sci 10(3):936

    Article  Google Scholar 

  43. Gai K, Zhu X, Li H et al (2017) Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction. arXiv: Machine Learning

  44. Wang C et al (2019) Scalar quantization as sparse least square optimization. In: IEEE Transactions on pattern analysis and machine intelligence, https://doi.org/10.1109/TPAMI.2019.2952096

Download references

Acknowledgements

We would like to thank Michael Tan in University College London, the UK for proofreading our work. This work is supported by the Sichuan Science and Technology Program (2020YFG0051), and the University-Enterprise Cooperation Projects (17H1199, 19H0355, 19H1121).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ruisen Luo.

Ethics declarations

Conflict of Interests

The authors declare no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, B., Gong, X., Wang, C. et al. MMD-encouraging convolutional autoencoder: a novel classification algorithm for imbalanced data. Appl Intell 51, 7384–7401 (2021). https://doi.org/10.1007/s10489-021-02235-3

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02235-3

Keywords

Navigation