Skip to main content
Log in

One-class ensemble classifier for data imbalance problems

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Imbalanced data classification is an important issue in machine learning. Despite various studies, solving the data imbalance problem is still difficult. Since the oversampling method uses fake minority data, such a method is untrusted and causing security instability. The main objective of this paper is to improve accuracy for data imbalance classification without generating fake minority data. For this purpose, a reliable strategy is proposed using an ensemble of one-class classifiers. Such a classifier does not suffer data imbalance problems since the model learns from a single class. In particular, training data is split into minority and majority sets. Then, one-class classifiers are trained separately and applied to compute minority and majority scores for testing data. Finally, classification is made based on the combination of both scores. The proposed method is experimented with using imbalanced-learn datasets. Moreover, the result is compared with sampling methods via Decision Tree and K Nearest Neighbors classifiers. One-class ensemble classifier outperforms sampling methods in 20 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Sengupta S, Basak S, Saikia P, Paul S, Tsalavoutis V, Atiah F, Ravi V, Peters A (2020) A review of deep learning with special emphasis on architectures, applications and recent trends. Knowledge-Based Syst 194:105596. https://doi.org/10.1016/j.knosys.2020.105596

    Article  Google Scholar 

  2. Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425:76–91

    Article  MathSciNet  Google Scholar 

  3. Huang J-W, Chiang C-W, Chang J-W (2018) Email security level classification of imbalanced data using artificial neural network: the real case in a world-leading enterprise. Eng Appl Artificial Intell 75:11–21

    Article  Google Scholar 

  4. Hernandez-Matamoros A, Fujita H, Perez-Meana H (2020) A novel approach to create synthetic biomedical signals using BiRNN. Inform Sci 541:218–241

    Article  MathSciNet  Google Scholar 

  5. Sun J, Li H, Fujita H, Binbin F, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Information Fusion 54:128–144

    Article  Google Scholar 

  6. Zhu R, Guo Y, Xue J-H (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognit Lett 133:217–223

    Article  Google Scholar 

  7. Chawla NV, Bowyer KW, Hall LO (2002) W. Philip Kegelmeyer, SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  MATH  Google Scholar 

  8. He H, Yang B, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'08) 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969

  9. Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, proceedings of the 2005 international conference on intelligent computing (ICIC'05). Lect Notes Comput Sci 3644:878–887

    Article  Google Scholar 

  10. Wilson DL (1972) Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans Syst Man Commun 2(3):408–421

    Article  MathSciNet  MATH  Google Scholar 

  11. Tomek I (1976) Two modifications of CNN, In Systems, Man, and Cybernetics, IEEE Transactions on, 6:769–772. https://doi.org/10.1109/TSMC.1976.4309452

  12. Smith D, Michael R, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Machine Learn 95(2):225–256

    Article  MathSciNet  MATH  Google Scholar 

  13. Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. ICML 97:179–186

    Google Scholar 

  14. Xie X, Liu H, Zeng S, Lin L, Li W (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowledge Based Syst 13:106689. https://doi.org/10.1016/j.knosys.2020.106689

    Article  Google Scholar 

  15. Wang C, Deng C, Yu Z, Hui D, Gong X, Luo R (2021) Adaptive ensemble of classifiers with regularization for imbalanced data classification. Information Fusion 69:81–102

    Article  Google Scholar 

  16. Vuttipittayamongkol P, Elyan E, Petrovski A (2021) On the class overlap problem in imbalanced data classification. Knowledge-Based Syst 212:106631. https://doi.org/10.1016/j.knosys.2020.106631

    Article  Google Scholar 

  17. Barella VH, Garcia LPF, de Souto MCP, Lorena AC, de Carvalho ACPLF (2021) Assessing the data complexity of imbalanced datasets. Information Sci 553:83–109

    Article  MathSciNet  MATH  Google Scholar 

  18. Scholkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., ¨ and Williamson, R. C. Estimating the Support of a High Dimensional Distribution. Neural computation, 13(7): 1443–1471, 2001

    Article  MATH  Google Scholar 

  19. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM sigmod record 29(2):93-104. https://doi.org/10.1145/335191.335388

  20. Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. Eighth IEEE International Conference on Data Mining. ICDM’08 413-422. https://doi.org/10.1109/ICDM.2008.17

  21. Hayashi T, Ambai K, Fujita H (2020) Applying Cluster-Based Zero-Shot Classifier to Data Imbalance Problems. In: Fujita H, Fournier-Viger P, Ali M, Sasaki J (eds) Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices. IEA/AIE 2020. Lecture notes in computer science, vol 12144. Springer, Cham. https://doi.org/10.1007/978-3-030-55789-8_65

    Chapter  Google Scholar 

  22. Silva C, Bouwmans T, Frélicot C (2017) Superpixel-based online wagging one-class ensemble for feature selection in foreground/background separation. Pattern Recognition Lett 100:144–151

    Article  Google Scholar 

  23. Krawczyk B, Galar M, Woźniak M, Bustince H, Herrera F (2018) Dynamic ensemble selection for multi-class classification with one-class classifiers. Pattern Recognition 83:34–51

    Article  Google Scholar 

  24. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Louppe G, Prettenhofer P, Weiss R, Weiss RJ, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  25. Mario A, Figueiredo T, Jain AK (2002) Unsupervised Learning of Finite Mixture Models. IEEE Trans Pattern Anal Machine Intell 24(3):381–396

    Article  Google Scholar 

  26. Lemaıˆtre G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Machine Learn Res 18:1–5

    Google Scholar 

  27. Ali-Gombe A, Elyan E (2019) MFC-GAN: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 361:212–221

    Article  Google Scholar 

  28. Wang W, Zheng VW, Yu H, Miao C (2019) A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans Intell Syst Technol (TIST) 10(2):13. https://doi.org/10.1145/3293318

    Article  Google Scholar 

  29. Sun X, Gu J, Sun H (2020) Research progress of zero-shot learning. Appl Intell 51:360–3614

    Google Scholar 

  30. Bia J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl-Based Syst 158:81–93

    Article  Google Scholar 

  31. Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-Imbalance: An open-source software for multi-class imbalance learning. Knowledge-Based Syst 174:137–143

    Article  Google Scholar 

  32. Karczmarek P, Kiersztyn A, Pedrycz W, Al E (2020) K-means-based isolation forest. Knowledge-Based Syst 195:105659. https://doi.org/10.1016/j.knosys.2020.105659

    Article  Google Scholar 

  33. Liu F, Yu Y, Song P, Fan Y, Tong X (2020) Scalable KDE-based top-n local outlier detection over large-scale data streams. Knowledge-Based Syst 204:106186. https://doi.org/10.1016/j.knosys.2020.106186

    Article  Google Scholar 

  34. Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: A survey. Knowledge-Based Syst 196:105754. https://doi.org/10.1016/j.knosys.2020.105754

    Article  Google Scholar 

  35. Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Humaniz Comput 12:1897–1911

    Article  Google Scholar 

  36. Ruff, L., Görnitz, N., Deecke, L., Siddiqui, S.A., Vandermeulen, R.A., Binder, A., Müller, E., & Kloft, M. (2018). Deep One-Class Classification. Proceedings of the 35th International Conference on Machine Learning. PMLR 80:4393–4402

  37. Yang Y, Hou C, Lang Y, Yue G, He Y (2019) One-Class Classification Using Generative Adversarial Networks. IEEE Access 7:37970–37979. https://doi.org/10.1109/ACCESS.2019.2905933

    Article  Google Scholar 

  38. Golan I, El-Yaniv R (2018) Deep anomaly detection using geometric transformations. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Curran associates Inc., Red Hook, pp 9781–9791

    Google Scholar 

  39. Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Information Sci 560:217–234

    Article  MathSciNet  Google Scholar 

  40. Sun J, Fujita H, Zheng Y, Ai W (2021) Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods. Inform Sci 559:153–170

    Article  MathSciNet  Google Scholar 

  41. Hayashi T, Fujita H (2021) One-Class Classification Approach Using Feature-Slide Prediction Subtask for Feature Data. In: Fujita H, Selamat A, Lin JCW, Ali M (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science 12799:84–96. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_8

Download references

Acknowledgements

This study is supported by JSPS/JAPAN KAKENHI (Grants-in-Aid for Scientific Research) #JP20K11955.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Toshitaka Hayashi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hayashi, T., Fujita, H. One-class ensemble classifier for data imbalance problems. Appl Intell 52, 17073–17089 (2022). https://doi.org/10.1007/s10489-021-02671-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02671-1

Keywords

Navigation