Skip to main content

Sequential Three-Way Rules Class-Overlap Under-Sampling Based on Fuzzy Hierarchical Subspace for Imbalanced Data

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2022)

Abstract

The imbalanced data classification is one of the most critical challenges in the field of data mining. The state-of-the-art class-overlap under-sampling algorithm considers that the majority nearest neighbors of minority class instances are more prone to class-overlap. When the number of minority instances is small, the instances removed by such methods are not thorough. Therefore, a Sequential Three-way Rules class-overlap undersampling method based on fuzzy hierarchical subspace is proposed, which is inspired by sequential three-way decision. First, the fuzzy hierarchical subspace (FHS) concept is proposed to construct the fuzzy hierarchical subspace structure of the dataset. Then, a sequential three-way rules is constructed to find the equivalent majority instances of the minority instances from the fuzzy hierarchical subspace. We assume that the equivalent majority instances are overlapping instances of the minority class. Finally, in order to preserve the information of the majority instances in the equivalence class, we keep the majority instances with the largest Mahalanobis distance from the center of the equivalence class. Experimental results on 18 real datasets show that S3RCU outperforms or partially outperforms state-of-the-art class-overlap under-sampling methods on two evaluation metrics, F-measure and KAPPA.

This work was supported by the Science Foundation of China.

University of Petroleum, Beijing (No. 2462020YXZZ023).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  2. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)

    Article  MATH  Google Scholar 

  3. Al, S., Dener, M.: STL-HDL: a new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 110, 102435 (2021)

    Article  Google Scholar 

  4. Pozi, M.S.M., Sulaiman, M.N., Mustapha, N., Perumal, T.: Improving anomalous rare attack detection rate for intrusion detection system using support vector machine and genetic programming. Neural Process. Lett. 44(2), 279–290 (2016)

    Article  Google Scholar 

  5. Naderalvojoud, B., Sezer, E.A.: Term evaluation metrics in imbalanced text categorization. Nat. Lang. Eng. 26(1), 31–47 (2020)

    Article  Google Scholar 

  6. Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)

    Article  Google Scholar 

  7. Krawczyk, B., Galar, M., Jeleń, Ł, Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726 (2016)

    Article  Google Scholar 

  8. Rupapara, V., Rustam, F., Shahzad, H.F., Mehmood, A., Ashraf, I., Choi, G.S.: Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access 9, 78621–78634 (2021)

    Article  Google Scholar 

  9. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  10. Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.). 42(4), 1119–1130 (2012)

    Google Scholar 

  11. Yao, J.T., Vasilakos, A.V., Pedrycz, W.: Granular computing: perspectives and challenges. IEEE Trans. Cybern. 43(6), 1977–1989 (2013)

    Article  Google Scholar 

  12. Yan, Y.T., Wu, Z.B., Du, X.Q., Chen, J., Zhao, S., Zhang, Y.P.: A three-way decision ensemble method for imbalanced data oversampling. Int. J. Approx. Reason. 107, 1–16 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  13. Dai, Q., Liu, J.W., Liu, Y.: Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl. Soft Comput. 124, 109083 (2022)

    Article  Google Scholar 

  14. Liu, Y., Liu, Y., Yu, B.X.B., Zhong, S., Hu, Z.: Noise-robust oversampling for imbalanced data classification. Pattern Recogn. 133, 109008 (2023). https://doi.org/10.1016/j.patcog.2022.109008

    Article  Google Scholar 

  15. Liang, T., Xu, J., Zou, B., Wang, Z., Zeng, J.: LDAMSS: fast and efficient undersampling method for imbalanced learning. Appl. Intell. 52(6), 6794–6811 (2021). https://doi.org/10.1007/s10489-021-02780-x

    Article  Google Scholar 

  16. Wang, G., Wong, K.W.: An accuracy-maximization learning framework for supervised and semi-supervised imbalanced data. Knowl. Based Syst. 255, 109678 (2022)

    Google Scholar 

  17. Liu, J.: Fuzzy support vector machine for imbalanced data with borderline noise. Fuzzy Sets Syst. 413, 64–73 (2021)

    Article  MathSciNet  Google Scholar 

  18. Peng, P., Zhang, W., Zhang, Y., Wang, H., Zhang, H.: Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis. Appl. Soft Comput. 114, 108138 (2022)

    Article  Google Scholar 

  19. Aram, K.Y., Lam, S.S., Khasawneh, M.T.: Linear cost-sensitive max-margin embedded feature selection for SVM. Expert Syst. Appl. 197, 116683 (2022)

    Article  Google Scholar 

  20. Ren, J., Wang, Y., Mao, M., Cheung, Y.M.: Equalization ensemble for large scale highly imbalanced data classification. Knowl. Based Syst. 242, 108295 (2022)

    Article  Google Scholar 

  21. Gupta, N., Jindal, V., Bedi, P.: CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 112, 102499 (2022)

    Article  Google Scholar 

  22. Yan, Y., Zhu, Y., Liu, R., Zhang, Y., Zhang, Y., Zhang, L.: Spatial distribution-based imbalanced undersampling. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3161537

  23. Azhar, N.A., Pozi, M.S.M., Din, A.M., Jatowt, A.: An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3179381

  24. Koziarski, M., Bellinger, C., Woźniak, M.: RB-CCR: radial-based combined cleaning and resampling algorithm for imbalanced data classification. Mach. Learn. 110(11–12), 3059–3093 (2021). https://doi.org/10.1007/s10994-021-06012-8

    Article  MathSciNet  MATH  Google Scholar 

  25. Tsai, C.F., Lin, W.C., Hu, Y.H., Yao, G.T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019)

    Article  Google Scholar 

  26. Xie, X., Liu, H., Zeng, S., Lin, L., Li, W.: A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl. Based Syst. 213, 106689 (2021)

    Article  Google Scholar 

  27. Li, Z., Huang, M., Liu, G., Jiang, C.: A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Syst. Appl. 175, 114750 (2021)

    Article  Google Scholar 

  28. Yuan, B.W., Zhang, Z.L., Luo, X.G., Yu, Y., Zou, X.H., Zou, X.D.: OIS-RF: a novel overlap and imbalance sensitive random forest. Eng. Appl. Artif. Intell. 104, 104335 (2021)

    Article  Google Scholar 

  29. Wang, C., He, Q., Shao, M., Xu, Y., Hu, Q.: A unified information measure for general binary relations. Knowl. Based Syst. 135, 18–28 (2017)

    Article  Google Scholar 

  30. Yao, Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)

    Article  MathSciNet  Google Scholar 

  31. Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 212, 106631 (2021)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jian- wei Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dai, Q., Liu, J.w., Yang, J.p. (2023). Sequential Three-Way Rules Class-Overlap Under-Sampling Based on Fuzzy Hierarchical Subspace for Imbalanced Data. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-1639-9_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-1638-2

  • Online ISBN: 978-981-99-1639-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics