One-class ensemble classifier for data imbalance problems

Hayashi, Toshitaka; Fujita, Hamido

doi:10.1007/s10489-021-02671-1

One-class ensemble classifier for data imbalance problems

Published: 27 July 2021

Volume 52, pages 17073–17089, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

1470 Accesses
21 Citations
1 Altmetric
Explore all metrics

Abstract

Imbalanced data classification is an important issue in machine learning. Despite various studies, solving the data imbalance problem is still difficult. Since the oversampling method uses fake minority data, such a method is untrusted and causing security instability. The main objective of this paper is to improve accuracy for data imbalance classification without generating fake minority data. For this purpose, a reliable strategy is proposed using an ensemble of one-class classifiers. Such a classifier does not suffer data imbalance problems since the model learns from a single class. In particular, training data is split into minority and majority sets. Then, one-class classifiers are trained separately and applied to compute minority and majority scores for testing data. Finally, classification is made based on the combination of both scores. The proposed method is experimented with using imbalanced-learn datasets. Moreover, the result is compared with sampling methods via Decision Tree and K Nearest Neighbors classifiers. One-class ensemble classifier outperforms sampling methods in 20 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem

A selective ensemble learning algorithm for imbalanced dataset

Article 04 September 2021

Imbalanced Data Classification Method Based on Ensemble Learning

References

Sengupta S, Basak S, Saikia P, Paul S, Tsalavoutis V, Atiah F, Ravi V, Peters A (2020) A review of deep learning with special emphasis on architectures, applications and recent trends. Knowledge-Based Syst 194:105596. https://doi.org/10.1016/j.knosys.2020.105596
Article Google Scholar
Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with DTE-SBD: decision tree ensemble based on SMOTE and bagging with differentiated sampling rates. Inf Sci 425:76–91
Article MathSciNet Google Scholar
Huang J-W, Chiang C-W, Chang J-W (2018) Email security level classification of imbalanced data using artificial neural network: the real case in a world-leading enterprise. Eng Appl Artificial Intell 75:11–21
Article Google Scholar
Hernandez-Matamoros A, Fujita H, Perez-Meana H (2020) A novel approach to create synthetic biomedical signals using BiRNN. Inform Sci 541:218–241
Article MathSciNet Google Scholar
Sun J, Li H, Fujita H, Binbin F, Ai W (2020) Class-imbalanced dynamic financial distress prediction based on Adaboost-SVM ensemble combined with SMOTE and time weighting. Information Fusion 54:128–144
Article Google Scholar
Zhu R, Guo Y, Xue J-H (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognit Lett 133:217–223
Article Google Scholar
Chawla NV, Bowyer KW, Hall LO (2002) W. Philip Kegelmeyer, SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article MATH Google Scholar
He H, Yang B, Garcia EA, Li S (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning, Proceedings of the IEEE International Joint Conference on Neural Networks (IJCNN'08) 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, proceedings of the 2005 international conference on intelligent computing (ICIC'05). Lect Notes Comput Sci 3644:878–887
Article Google Scholar
Wilson DL (1972) Asymptotic Properties of Nearest Neighbor Rules Using Edited Data. IEEE Trans Syst Man Commun 2(3):408–421
Article MathSciNet MATH Google Scholar
Tomek I (1976) Two modifications of CNN, In Systems, Man, and Cybernetics, IEEE Transactions on, 6:769–772. https://doi.org/10.1109/TSMC.1976.4309452
Smith D, Michael R, Martinez T, Giraud-Carrier C (2014) An instance level analysis of data complexity. Machine Learn 95(2):225–256
Article MathSciNet MATH Google Scholar
Kubat M, Matwin S (1997) Addressing the curse of imbalanced training sets: one-sided selection. ICML 97:179–186
Google Scholar
Xie X, Liu H, Zeng S, Lin L, Li W (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowledge Based Syst 13:106689. https://doi.org/10.1016/j.knosys.2020.106689
Article Google Scholar
Wang C, Deng C, Yu Z, Hui D, Gong X, Luo R (2021) Adaptive ensemble of classifiers with regularization for imbalanced data classification. Information Fusion 69:81–102
Article Google Scholar
Vuttipittayamongkol P, Elyan E, Petrovski A (2021) On the class overlap problem in imbalanced data classification. Knowledge-Based Syst 212:106631. https://doi.org/10.1016/j.knosys.2020.106631
Article Google Scholar
Barella VH, Garcia LPF, de Souto MCP, Lorena AC, de Carvalho ACPLF (2021) Assessing the data complexity of imbalanced datasets. Information Sci 553:83–109
Article MathSciNet MATH Google Scholar
Scholkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J., ¨ and Williamson, R. C. Estimating the Support of a High Dimensional Distribution. Neural computation, 13(7): 1443–1471, 2001
Article MATH Google Scholar
Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. ACM sigmod record 29(2):93-104. https://doi.org/10.1145/335191.335388
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. Eighth IEEE International Conference on Data Mining. ICDM’08 413-422. https://doi.org/10.1109/ICDM.2008.17
Hayashi T, Ambai K, Fujita H (2020) Applying Cluster-Based Zero-Shot Classifier to Data Imbalance Problems. In: Fujita H, Fournier-Viger P, Ali M, Sasaki J (eds) Trends in Artificial Intelligence Theory and Applications. Artificial Intelligence Practices. IEA/AIE 2020. Lecture notes in computer science, vol 12144. Springer, Cham. https://doi.org/10.1007/978-3-030-55789-8_65
Chapter Google Scholar
Silva C, Bouwmans T, Frélicot C (2017) Superpixel-based online wagging one-class ensemble for feature selection in foreground/background separation. Pattern Recognition Lett 100:144–151
Article Google Scholar
Krawczyk B, Galar M, Woźniak M, Bustince H, Herrera F (2018) Dynamic ensemble selection for multi-class classification with one-class classifiers. Pattern Recognition 83:34–51
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Louppe G, Prettenhofer P, Weiss R, Weiss RJ, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Mario A, Figueiredo T, Jain AK (2002) Unsupervised Learning of Finite Mixture Models. IEEE Trans Pattern Anal Machine Intell 24(3):381–396
Article Google Scholar
Lemaıˆtre G, Nogueira F, Aridas CK (2017) Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Machine Learn Res 18:1–5
Google Scholar
Ali-Gombe A, Elyan E (2019) MFC-GAN: class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 361:212–221
Article Google Scholar
Wang W, Zheng VW, Yu H, Miao C (2019) A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans Intell Syst Technol (TIST) 10(2):13. https://doi.org/10.1145/3293318
Article Google Scholar
Sun X, Gu J, Sun H (2020) Research progress of zero-shot learning. Appl Intell 51:360–3614
Google Scholar
Bia J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowl-Based Syst 158:81–93
Article Google Scholar
Zhang C, Bi J, Xu S, Ramentol E, Fan G, Qiao B, Fujita H (2019) Multi-Imbalance: An open-source software for multi-class imbalance learning. Knowledge-Based Syst 174:137–143
Article Google Scholar
Karczmarek P, Kiersztyn A, Pedrycz W, Al E (2020) K-means-based isolation forest. Knowledge-Based Syst 195:105659. https://doi.org/10.1016/j.knosys.2020.105659
Article Google Scholar
Liu F, Yu Y, Song P, Fan Y, Tong X (2020) Scalable KDE-based top-n local outlier detection over large-scale data streams. Knowledge-Based Syst 204:106186. https://doi.org/10.1016/j.knosys.2020.106186
Article Google Scholar
Alam S, Sonbhadra SK, Agarwal S, Nagabhushan P (2020) One-class support vector classifiers: A survey. Knowledge-Based Syst 196:105754. https://doi.org/10.1016/j.knosys.2020.105754
Article Google Scholar
Hayashi T, Fujita H (2021) Cluster-based zero-shot learning for multivariate data. J Ambient Intell Humaniz Comput 12:1897–1911
Article Google Scholar
Ruff, L., Görnitz, N., Deecke, L., Siddiqui, S.A., Vandermeulen, R.A., Binder, A., Müller, E., & Kloft, M. (2018). Deep One-Class Classification. Proceedings of the 35th International Conference on Machine Learning. PMLR 80:4393–4402
Yang Y, Hou C, Lang Y, Yue G, He Y (2019) One-Class Classification Using Generative Adversarial Networks. IEEE Access 7:37970–37979. https://doi.org/10.1109/ACCESS.2019.2905933
Article Google Scholar
Golan I, El-Yaniv R (2018) Deep anomaly detection using geometric transformations. In: Proceedings of the 32nd international conference on neural information processing systems (NIPS’18). Curran associates Inc., Red Hook, pp 9781–9791
Google Scholar
Hayashi T, Fujita H, Hernandez-Matamoros A (2021) Less complexity one-class classification approach using construction error of convolutional image transformation network. Information Sci 560:217–234
Article MathSciNet Google Scholar
Sun J, Fujita H, Zheng Y, Ai W (2021) Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods. Inform Sci 559:153–170
Article MathSciNet Google Scholar
Hayashi T, Fujita H (2021) One-Class Classification Approach Using Feature-Slide Prediction Subtask for Feature Data. In: Fujita H, Selamat A, Lin JCW, Ali M (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2021. Lecture Notes in Computer Science 12799:84–96. Springer, Cham. https://doi.org/10.1007/978-3-030-79463-7_8

Download references

Acknowledgements

This study is supported by JSPS/JAPAN KAKENHI (Grants-in-Aid for Scientific Research) #JP20K11955.

Author information

Authors and Affiliations

Faculty of Software and Information Science, Iwate Prefectural University, Takizawa, Japan
Toshitaka Hayashi
i-somet.org Incorporated Association, Morioka, Japan
Hamido Fujita
Regional Research Center, Iwate Prefectural University, Takizawa, Japan
Hamido Fujita

Authors

Toshitaka Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Hamido Fujita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toshitaka Hayashi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hayashi, T., Fujita, H. One-class ensemble classifier for data imbalance problems. Appl Intell 52, 17073–17089 (2022). https://doi.org/10.1007/s10489-021-02671-1

Download citation

Accepted: 07 July 2021
Published: 27 July 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10489-021-02671-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

One-class ensemble classifier for data imbalance problems

Abstract

Access this article

Similar content being viewed by others

Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem

A selective ensemble learning algorithm for imbalanced dataset

Imbalanced Data Classification Method Based on Ensemble Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

One-class ensemble classifier for data imbalance problems

Abstract

Access this article

Similar content being viewed by others

Controlled Under-Sampling with Majority Voting Ensemble Learning for Class Imbalance Problem

A selective ensemble learning algorithm for imbalanced dataset

Imbalanced Data Classification Method Based on Ensemble Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation