Abstract
The imbalanced data classification is one of the most critical challenges in the field of data mining. The state-of-the-art class-overlap under-sampling algorithm considers that the majority nearest neighbors of minority class instances are more prone to class-overlap. When the number of minority instances is small, the instances removed by such methods are not thorough. Therefore, a Sequential Three-way Rules class-overlap undersampling method based on fuzzy hierarchical subspace is proposed, which is inspired by sequential three-way decision. First, the fuzzy hierarchical subspace (FHS) concept is proposed to construct the fuzzy hierarchical subspace structure of the dataset. Then, a sequential three-way rules is constructed to find the equivalent majority instances of the minority instances from the fuzzy hierarchical subspace. We assume that the equivalent majority instances are overlapping instances of the minority class. Finally, in order to preserve the information of the majority instances in the equivalence class, we keep the majority instances with the largest Mahalanobis distance from the center of the equivalence class. Experimental results on 18 real datasets show that S3RCU outperforms or partially outperforms state-of-the-art class-overlap under-sampling methods on two evaluation metrics, F-measure and KAPPA.
This work was supported by the Science Foundation of China.
University of Petroleum, Beijing (No. 2462020YXZZ023).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Al, S., Dener, M.: STL-HDL: a new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 110, 102435 (2021)
Pozi, M.S.M., Sulaiman, M.N., Mustapha, N., Perumal, T.: Improving anomalous rare attack detection rate for intrusion detection system using support vector machine and genetic programming. Neural Process. Lett. 44(2), 279–290 (2016)
Naderalvojoud, B., Sezer, E.A.: Term evaluation metrics in imbalanced text categorization. Nat. Lang. Eng. 26(1), 31–47 (2020)
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)
Krawczyk, B., Galar, M., Jeleń, Ł, Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726 (2016)
Rupapara, V., Rustam, F., Shahzad, H.F., Mehmood, A., Ashraf, I., Choi, G.S.: Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access 9, 78621–78634 (2021)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.). 42(4), 1119–1130 (2012)
Yao, J.T., Vasilakos, A.V., Pedrycz, W.: Granular computing: perspectives and challenges. IEEE Trans. Cybern. 43(6), 1977–1989 (2013)
Yan, Y.T., Wu, Z.B., Du, X.Q., Chen, J., Zhao, S., Zhang, Y.P.: A three-way decision ensemble method for imbalanced data oversampling. Int. J. Approx. Reason. 107, 1–16 (2019)
Dai, Q., Liu, J.W., Liu, Y.: Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl. Soft Comput. 124, 109083 (2022)
Liu, Y., Liu, Y., Yu, B.X.B., Zhong, S., Hu, Z.: Noise-robust oversampling for imbalanced data classification. Pattern Recogn. 133, 109008 (2023). https://doi.org/10.1016/j.patcog.2022.109008
Liang, T., Xu, J., Zou, B., Wang, Z., Zeng, J.: LDAMSS: fast and efficient undersampling method for imbalanced learning. Appl. Intell. 52(6), 6794–6811 (2021). https://doi.org/10.1007/s10489-021-02780-x
Wang, G., Wong, K.W.: An accuracy-maximization learning framework for supervised and semi-supervised imbalanced data. Knowl. Based Syst. 255, 109678 (2022)
Liu, J.: Fuzzy support vector machine for imbalanced data with borderline noise. Fuzzy Sets Syst. 413, 64–73 (2021)
Peng, P., Zhang, W., Zhang, Y., Wang, H., Zhang, H.: Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis. Appl. Soft Comput. 114, 108138 (2022)
Aram, K.Y., Lam, S.S., Khasawneh, M.T.: Linear cost-sensitive max-margin embedded feature selection for SVM. Expert Syst. Appl. 197, 116683 (2022)
Ren, J., Wang, Y., Mao, M., Cheung, Y.M.: Equalization ensemble for large scale highly imbalanced data classification. Knowl. Based Syst. 242, 108295 (2022)
Gupta, N., Jindal, V., Bedi, P.: CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 112, 102499 (2022)
Yan, Y., Zhu, Y., Liu, R., Zhang, Y., Zhang, Y., Zhang, L.: Spatial distribution-based imbalanced undersampling. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3161537
Azhar, N.A., Pozi, M.S.M., Din, A.M., Jatowt, A.: An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3179381
Koziarski, M., Bellinger, C., Woźniak, M.: RB-CCR: radial-based combined cleaning and resampling algorithm for imbalanced data classification. Mach. Learn. 110(11–12), 3059–3093 (2021). https://doi.org/10.1007/s10994-021-06012-8
Tsai, C.F., Lin, W.C., Hu, Y.H., Yao, G.T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019)
Xie, X., Liu, H., Zeng, S., Lin, L., Li, W.: A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl. Based Syst. 213, 106689 (2021)
Li, Z., Huang, M., Liu, G., Jiang, C.: A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Syst. Appl. 175, 114750 (2021)
Yuan, B.W., Zhang, Z.L., Luo, X.G., Yu, Y., Zou, X.H., Zou, X.D.: OIS-RF: a novel overlap and imbalance sensitive random forest. Eng. Appl. Artif. Intell. 104, 104335 (2021)
Wang, C., He, Q., Shao, M., Xu, Y., Hu, Q.: A unified information measure for general binary relations. Knowl. Based Syst. 135, 18–28 (2017)
Yao, Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 212, 106631 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dai, Q., Liu, J.w., Yang, J.p. (2023). Sequential Three-Way Rules Class-Overlap Under-Sampling Based on Fuzzy Hierarchical Subspace for Imbalanced Data. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_2
Download citation
DOI: https://doi.org/10.1007/978-981-99-1639-9_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1638-2
Online ISBN: 978-981-99-1639-9
eBook Packages: Computer ScienceComputer Science (R0)