Sequential Three-Way Rules Class-Overlap Under-Sampling Based on Fuzzy Hierarchical Subspace for Imbalanced Data

Dai, Qi; Liu, Jian- wei; Yang, Jia- peng

doi:10.1007/978-981-99-1639-9_2

Qi Dai¹⁰,
Jian- wei Liu ORCID: orcid.org/0000-0002-0634-4408¹⁰ &
Jia- peng Yang¹¹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1791))

Included in the following conference series:

International Conference on Neural Information Processing

762 Accesses

Abstract

The imbalanced data classification is one of the most critical challenges in the field of data mining. The state-of-the-art class-overlap under-sampling algorithm considers that the majority nearest neighbors of minority class instances are more prone to class-overlap. When the number of minority instances is small, the instances removed by such methods are not thorough. Therefore, a Sequential Three-way Rules class-overlap undersampling method based on fuzzy hierarchical subspace is proposed, which is inspired by sequential three-way decision. First, the fuzzy hierarchical subspace (FHS) concept is proposed to construct the fuzzy hierarchical subspace structure of the dataset. Then, a sequential three-way rules is constructed to find the equivalent majority instances of the minority instances from the fuzzy hierarchical subspace. We assume that the equivalent majority instances are overlapping instances of the minority class. Finally, in order to preserve the information of the majority instances in the equivalence class, we keep the majority instances with the largest Mahalanobis distance from the center of the equivalence class. Experimental results on 18 real datasets show that S3RCU outperforms or partially outperforms state-of-the-art class-overlap under-sampling methods on two evaluation metrics, F-measure and KAPPA.

This work was supported by the Science Foundation of China.

University of Petroleum, Beijing (No. 2462020YXZZ023).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Article MATH Google Scholar
Al, S., Dener, M.: STL-HDL: a new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 110, 102435 (2021)
Article Google Scholar
Pozi, M.S.M., Sulaiman, M.N., Mustapha, N., Perumal, T.: Improving anomalous rare attack detection rate for intrusion detection system using support vector machine and genetic programming. Neural Process. Lett. 44(2), 279–290 (2016)
Article Google Scholar
Naderalvojoud, B., Sezer, E.A.: Term evaluation metrics in imbalanced text categorization. Nat. Lang. Eng. 26(1), 31–47 (2020)
Article Google Scholar
Wu, Q., Ye, Y., Zhang, H., Ng, M.K., Ho, S.S.: ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowl. Based Syst. 67, 105–116 (2014)
Article Google Scholar
Krawczyk, B., Galar, M., Jeleń, Ł, Herrera, F.: Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy. Appl. Soft Comput. 38, 714–726 (2016)
Article Google Scholar
Rupapara, V., Rustam, F., Shahzad, H.F., Mehmood, A., Ashraf, I., Choi, G.S.: Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access 9, 78621–78634 (2021)
Article Google Scholar
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progr. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Article Google Scholar
Wang, S., Yao, X.: Multiclass imbalance problems: analysis and potential solutions. IEEE Trans. Syst. Man Cybern. Part B (Cybern.). 42(4), 1119–1130 (2012)
Google Scholar
Yao, J.T., Vasilakos, A.V., Pedrycz, W.: Granular computing: perspectives and challenges. IEEE Trans. Cybern. 43(6), 1977–1989 (2013)
Article Google Scholar
Yan, Y.T., Wu, Z.B., Du, X.Q., Chen, J., Zhao, S., Zhang, Y.P.: A three-way decision ensemble method for imbalanced data oversampling. Int. J. Approx. Reason. 107, 1–16 (2019)
Article MathSciNet MATH Google Scholar
Dai, Q., Liu, J.W., Liu, Y.: Multi-granularity relabeled under-sampling algorithm for imbalanced data. Appl. Soft Comput. 124, 109083 (2022)
Article Google Scholar
Liu, Y., Liu, Y., Yu, B.X.B., Zhong, S., Hu, Z.: Noise-robust oversampling for imbalanced data classification. Pattern Recogn. 133, 109008 (2023). https://doi.org/10.1016/j.patcog.2022.109008
Article Google Scholar
Liang, T., Xu, J., Zou, B., Wang, Z., Zeng, J.: LDAMSS: fast and efficient undersampling method for imbalanced learning. Appl. Intell. 52(6), 6794–6811 (2021). https://doi.org/10.1007/s10489-021-02780-x
Article Google Scholar
Wang, G., Wong, K.W.: An accuracy-maximization learning framework for supervised and semi-supervised imbalanced data. Knowl. Based Syst. 255, 109678 (2022)
Google Scholar
Liu, J.: Fuzzy support vector machine for imbalanced data with borderline noise. Fuzzy Sets Syst. 413, 64–73 (2021)
Article MathSciNet Google Scholar
Peng, P., Zhang, W., Zhang, Y., Wang, H., Zhang, H.: Non-revisiting genetic cost-sensitive sparse autoencoder for imbalanced fault diagnosis. Appl. Soft Comput. 114, 108138 (2022)
Article Google Scholar
Aram, K.Y., Lam, S.S., Khasawneh, M.T.: Linear cost-sensitive max-margin embedded feature selection for SVM. Expert Syst. Appl. 197, 116683 (2022)
Article Google Scholar
Ren, J., Wang, Y., Mao, M., Cheung, Y.M.: Equalization ensemble for large scale highly imbalanced data classification. Knowl. Based Syst. 242, 108295 (2022)
Article Google Scholar
Gupta, N., Jindal, V., Bedi, P.: CSE-IDS: Using cost-sensitive deep learning and ensemble algorithms to handle class imbalance in network-based intrusion detection systems. Comput. Secur. 112, 102499 (2022)
Article Google Scholar
Yan, Y., Zhu, Y., Liu, R., Zhang, Y., Zhang, Y., Zhang, L.: Spatial distribution-based imbalanced undersampling. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3161537
Azhar, N.A., Pozi, M.S.M., Din, A.M., Jatowt, A.: An investigation of SMOTE based methods for imbalanced datasets with data complexity analysis. IEEE Trans. Knowl. Data Eng. (2022). https://doi.org/10.1109/TKDE.2022.3179381
Koziarski, M., Bellinger, C., Woźniak, M.: RB-CCR: radial-based combined cleaning and resampling algorithm for imbalanced data classification. Mach. Learn. 110(11–12), 3059–3093 (2021). https://doi.org/10.1007/s10994-021-06012-8
Article MathSciNet MATH Google Scholar
Tsai, C.F., Lin, W.C., Hu, Y.H., Yao, G.T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019)
Article Google Scholar
Xie, X., Liu, H., Zeng, S., Lin, L., Li, W.: A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl. Based Syst. 213, 106689 (2021)
Article Google Scholar
Li, Z., Huang, M., Liu, G., Jiang, C.: A hybrid method with dynamic weighted entropy for handling the problem of class imbalance with overlap in credit card fraud detection. Expert Syst. Appl. 175, 114750 (2021)
Article Google Scholar
Yuan, B.W., Zhang, Z.L., Luo, X.G., Yu, Y., Zou, X.H., Zou, X.D.: OIS-RF: a novel overlap and imbalance sensitive random forest. Eng. Appl. Artif. Intell. 104, 104335 (2021)
Article Google Scholar
Wang, C., He, Q., Shao, M., Xu, Y., Hu, Q.: A unified information measure for general binary relations. Knowl. Based Syst. 135, 18–28 (2017)
Article Google Scholar
Yao, Y.: Three-way decisions with probabilistic rough sets. Inf. Sci. 180(3), 341–353 (2010)
Article MathSciNet Google Scholar
Vuttipittayamongkol, P., Elyan, E., Petrovski, A.: On the class overlap problem in imbalanced data classification. Knowl. Based Syst. 212, 106631 (2021)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Automation, China University of Petroleum, Beijing, China
Qi Dai & Jian- wei Liu
College of Science, North China University of Science and Technology, Tangshan, China
Jia- peng Yang

Authors

Qi Dai
View author publications
You can also search for this author in PubMed Google Scholar
Jian- wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jia- peng Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian- wei Liu .

Editor information

Editors and Affiliations

Indian Institute of Technology Indore, Indore, India
Mohammad Tanveer
Indian Institute of Information Technology - Allahabad, Prayagraj, India
Sonali Agarwal
Kobe University, Kobe, Japan
Seiichi Ozawa
Indian Institute of Technology Patna, Patna, India
Asif Ekbal
University of Innsbruck, Innsbruck, Austria
Adam Jatowt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dai, Q., Liu, J.w., Yang, J.p. (2023). Sequential Three-Way Rules Class-Overlap Under-Sampling Based on Fuzzy Hierarchical Subspace for Imbalanced Data. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1791. Springer, Singapore. https://doi.org/10.1007/978-981-99-1639-9_2

Download citation

DOI: https://doi.org/10.1007/978-981-99-1639-9_2
Published: 15 April 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1638-2
Online ISBN: 978-981-99-1639-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics