Skip to main content

A Hybrid Framework for Class-Imbalanced Classification

  • Conference paper
  • First Online:
Book cover Wireless Algorithms, Systems, and Applications (WASA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12937))

  • 1855 Accesses

Abstract

Data classification is a commonly used data processing method in the fields of networks and distributed systems, and it has attracted extensive attention in recent years. Nevertheless, the existing classification algorithms are mainly aimed at relatively balanced datasets, while the data in reality often exhibits imbalanced characteristics. In this paper, we propose a novel Hybrid Resampling-based Ensemble (HRE) model, which aims to solve the classification problem of highly skewed data. The main idea of the HRE is to leverage the resampling approach for tackling class imbalance, and then twelve classifiers are further adopted to construct an ensemble model. Besides, a novel combination of under-sampling and over-sampling is elaborately proposed to balance the heterogeneity among different data categories. We decide the resampling rate in an empirical manner, which provides a practical guideline for the use of sampling methods. We compare the effect of different resampling methods based on the imbalanced network anomaly detection dataset, where few abnormal data need to be distinguished from a large number of common network traffics. The results of extensive experiments show that the HRE model achieves better accuracy performance than the methods without hybrid resampling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tsai, C.F., Lin, W.C., Hu, Y.H., Yao, G.T.: Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf. Sci. 477, 47–54 (2019)

    Article  Google Scholar 

  2. Li, H.: A divide and conquer approach for imbalanced multi-class classification and its application to medical decision making. Pak. J. Pharm. Sci. 29 (2016)

    Google Scholar 

  3. Mahajan, V., Misra, R., Mahajan, R.: Review of data mining techniques for churn prediction in telecom. J. Inf. Organ. Sci. 39(2), 183–197 (2015)

    Google Scholar 

  4. Liu, Y., Wang, J., Niu, S., Song, H.: Deep learning enabled reliable identity verification and spoofing detection. In: Yu, D., Dressler, F., Yu, J. (eds.) WASA 2020. LNCS, vol. 12384, pp. 333–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59016-1_28

    Chapter  Google Scholar 

  5. Zafeiriou, S., Zhang, C., Zhang, Z.: A survey on face detection in the wild: past, present and future. Comput. Vis. Image Underst. 138, 1–24 (2015)

    Article  Google Scholar 

  6. West, J., Bhattacharya, M.: Intelligent financial fraud detection: a comprehensive review. Comput. Secur. 57, 47–66 (2016)

    Article  Google Scholar 

  7. Kang, S., Cho, S., Kang, P.: Constructing a multi-class classifier using one-against-one approach with different binary classifiers. Neurocomputing 149, 677–682 (2015)

    Article  Google Scholar 

  8. Luo, M., Wang, K., Cai, Z., Liu, A., Li, Y., Cheang, C.F.: Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Continua 58(1), 15–26 (2019)

    Article  Google Scholar 

  9. Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)

    Article  Google Scholar 

  10. Fawcett, T., Provost, F.: Adaptive fraud detection. Data Min. Knowl. Disc. 1(3), 291–316 (1997)

    Article  Google Scholar 

  11. Pelayo, L., Dick, S.: Applying novel resampling strategies to software defect prediction. In: NAFIPS 2007–2007 Annual Meeting of the North American Fuzzy Information Processing Society, pp. 69–72. IEEE (2007)

    Google Scholar 

  12. Yijing, L., Haixiang, G., Xiao, L., Yanan, L., Jinling, L.: Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data. Knowl.-Based Syst. 94, 88–104 (2016)

    Article  Google Scholar 

  13. Tao, X., Peng, Y., Zhao, F., Wang, S.F., Liu, Z.: An improved parallel network traffic anomaly detection method based on bagging and GRU. In: Yu, D., Dressler, F., Yu, J. (eds.) WASA 2020. LNCS, vol. 12384, pp. 420–431. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59016-1_35

    Chapter  Google Scholar 

  14. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. (CSUR) 49(2), 1–50 (2016)

    Article  Google Scholar 

  15. Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  16. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  17. Han, H., Wang, W.-Y., Mao, B.-H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 878–887. Springer, Heidelberg (2005). https://doi.org/10.1007/11538059_91

    Chapter  Google Scholar 

  18. Young, W.A., Nykl, S.L., Weckman, G.R., Chelberg, D.M.: Using voronoi diagrams to improve classification performances when modeling imbalanced datasets. Neural Comput. Appl. 26(5), 1041–1054 (2015)

    Article  Google Scholar 

  19. Ando, S.: Classifying imbalanced data in distance-based feature space. Knowl. Inf. Syst. 46(3), 707–730 (2015). https://doi.org/10.1007/s10115-015-0846-3

    Article  Google Scholar 

  20. López, V., Del Río, S., Benítez, J.M., Herrera, F.: Cost-sensitive linguistic fuzzy rule based classification systems under the mapreduce framework for imbalanced big data. Fuzzy Sets Syst. 258, 5–38 (2015)

    Article  MathSciNet  Google Scholar 

  21. Freund, Y.: Boosting a weak learning algorithm by majority. Inf. Comput. 121(2), 256–285 (1995)

    Article  MathSciNet  Google Scholar 

  22. Chawla, N.V., Lazarevic, A., Hall, L.O., Bowyer, K.W.: SMOTEBoost: improving prediction of the minority class in boosting. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 107–119. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39804-2_12

    Chapter  Google Scholar 

  23. Seiffert, C., Khoshgoftaar, T.M., Van Hulse, J., Napolitano, A.: Rusboost: a hybrid approach to alleviating class imbalance. IEEE Trans. Syst. Man Cybern.-Part A Syst. Hum. 40(1), 185–197 (2009)

    Article  Google Scholar 

  24. Liu, X.Y., Wu, J., Zhou, Z.H.: Exploratory undersampling for class-imbalance learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(2), 539–550 (2008)

    Google Scholar 

  25. Nanni, L., Fantozzi, C., Lazzarini, N.: Coupling different methods for overcoming the class imbalance problem. Neurocomputing 158, 48–61 (2015)

    Article  Google Scholar 

Download references

Acknowledgement

The work is supported by the National Key Research and Development Program of China under grant 2018YFB0204301, the National Natural Science Foundation (NSF) under grant 62072306 and 62002378, Tianjin Science and Technology Foundation under Grant No.18ZXJMTG00290, Open Fund of Science and Technology on Parallel and Distributed Processing Laboratory under grant 6142110200407.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lailong Luo or Yingwen Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, R., Luo, L., Chen, Y., Xia, J., Guo, D. (2021). A Hybrid Framework for Class-Imbalanced Classification. In: Liu, Z., Wu, F., Das, S.K. (eds) Wireless Algorithms, Systems, and Applications. WASA 2021. Lecture Notes in Computer Science(), vol 12937. Springer, Cham. https://doi.org/10.1007/978-3-030-85928-2_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-85928-2_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-85927-5

  • Online ISBN: 978-3-030-85928-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics