Abstract
Data classification is a commonly used data processing method in the fields of networks and distributed systems, and it has attracted extensive attention in recent years. Nevertheless, the existing classification algorithms are mainly aimed at relatively balanced datasets, while the data in reality often exhibits imbalanced characteristics. In this paper, we propose a novel Hybrid Resampling-based Ensemble (HRE) model, which aims to solve the classification problem of highly skewed data. The main idea of the HRE is to leverage the resampling approach for tackling class imbalance, and then twelve classifiers are further adopted to construct an ensemble model. Besides, a novel combination of under-sampling and over-sampling is elaborately proposed to balance the heterogeneity among different data categories. We decide the resampling rate in an empirical manner, which provides a practical guideline for the use of sampling methods. We compare the effect of different resampling methods based on the imbalanced network anomaly detection dataset, where few abnormal data need to be distinguished from a large number of common network traffics. The results of extensive experiments show that the HRE model achieves better accuracy performance than the methods without hybrid resampling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Tsai, C.F., Lin, W.C., Hu, Y.H., Yao, G.T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019)
Li, H.: A divide and conquer approach for imbalanced multi-class classification and its application to medical decision making. Pak. J. Pharm. Sci. 29 (2016)
Mahajan, V., Misra, R., Mahajan, R.: Review of data mining techniques for churn prediction in telecom. J. Inf. Organ. Sci. 39(2), 183–197 (2015)
Liu, Y., Wang, J., Niu, S., Song, H.: Deep learning enabled reliable identity verification and spoofing detection. In: Yu, D., Dressler, F., Yu, J. (eds.) WASA 2020. LNCS, vol. 12384, pp. 333–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59016-1_28
Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)
West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)
Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149, 677–682 (2015)
Luo, M., Wang, K., Cai, Z., Liu, A., Li, Y., Cheang, C.F.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Continua 58(1), 15–26 (2019)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Disc. 1(3), 291–316 (1997)
Pelayo, L., Dick, S.: Applying novel resampling strategies to software defect prediction. In: NAFIPS 2007–2007 Annual Meeting of the North American Fuzzy Information Processing Society, pp. 69–72. IEEE (2007)
Yijing, L., Haixiang, G., Xiao, L., Yanan, L., Jinling, L.: Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl.-Based Syst. 94, 88–104 (2016)
Tao, X., Peng, Y., Zhao, F., Wang, S.F., Liu, Z.: An improved parallel network traffic anomaly detection method based on bagging and GRU. In: Yu, D., Dressler, F., Yu, J. (eds.) WASA 2020. LNCS, vol. 12384, pp. 420–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59016-1_35
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91
Young, W.A., Nykl, S.L., Weckman, G.R., Chelberg, D.M.: Using voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput. Appl. 26(5), 1041–1054 (2015)
Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46(3), 707–730 (2015). https://doi.org/10.1007/s10115-015-0846-3
López, V., Del Río, S., Benítez, J.M., Herrera, F.: Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets Syst. 258, 5–38 (2015)
Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)
Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12
Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)
Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)
Nanni, L., Fantozzi, C., Lazzarini, N.: Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158, 48–61 (2015)
Acknowledgement
The work is supported by the National Key Research and Development Program of China under grant 2018YFB0204301, the National Natural Science Foundation (NSF) under grant 62072306 and 62002378, Tianjin Science and Technology Foundation under Grant No.18ZXJMTG00290, Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory under grant 6142110200407.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, R., Luo, L., Chen, Y., Xia, J., Guo, D. (2021). A Hybrid Framework for Class-Imbalanced Classification. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12937. Springer, Cham. https://doi.org/10.1007/978-3-030-85928-2_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-85928-2_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85927-5
Online ISBN: 978-3-030-85928-2
eBook Packages: Computer ScienceComputer Science (R0)