Skip to main content

A Classification Method for Imbalanced Data Based on Ant Lion Optimizer

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2022)

Abstract

Imbalanced data will bring difficulties in data processing, which is very common in data engineering. These data usually have sophisticated distributions. Different resampling methods are required for dealing with data with different distributions, while fixed ones are adopted traditionally. Therefore, to select appropriate resampling methods for data with such characteristics, we propose a novel classification method for Imbalanced Data based on Ant Lion Optimizer, called ALOID. It combines adaptive resampling strategies, feature selection, and ensemble classifiers. The adaptive resampling strategy refers to utilizing roulette wheel selection to choose the most suitable resampling method with a greater probability for each dataset according to the variable probabilities of resampling methods. Then a two-stage approach is further used in feature selection: preprocessing and enhancing. In addition, we adopt an ensemble classifier with dynamic weights. The variable probabilities of resampling methods, features, and the weights of base classifiers are coded in individual solutions. A large number of comprehensive experiments have been carried out in this paper. ALOID is compared with 8 state-of-the-art algorithms on 33 publicly available imbalanced datasets. Using K-nearest neighbor as the base classifier, we have found ALOID outperforms other methods in most cases, especially on high-dimensional imbalanced datasets. Experiment results demonstrate the performance advantage of ALOID over other comparable algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://sci2s.ugr.es/keel/imbalanced.php.

  2. 2.

    https://archive.ics.uci.edu/ml/datasets.php.

  3. 3.

    https://jundongl.github.io/scikit-feature/datasets.html.

References

  1. Guo, H., Li, Y., Jennifer, S., Gu, M., Huang, Y., Gong, B.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)

    Article  Google Scholar 

  2. Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 1–50 (2016)

    Article  Google Scholar 

  3. Wang, C., Deng, C., Yu, Z., Hui, D., Gong, X., Luo, R.: Adaptive ensemble of classifiers with regularization for imbalanced data classification. Inf. Fusion 69, 81–102 (2021)

    Article  Google Scholar 

  4. Alkuhlani, A., Nassef, M., Farag, I.: Multistage feature selection approach for high-dimensional cancer data. Soft Comput. 21, 6895–6906 (2017)

    Article  Google Scholar 

  5. Mousavian, M., Chen, J., Greening, S.: Feature selection and imbalanced data handling for depression detection. In: Wang, S., et al. (eds.) BI 2018. LNCS (LNAI), vol. 11309, pp. 349–358. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05587-5_33

    Chapter  Google Scholar 

  6. Sun, J., et al.: FDHelper: assist unsupervised fraud detection experts with interactive feature selection and evaluation. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–12. Association for Computing Machinery (2020)

    Google Scholar 

  7. Al-Mandhari, I., Guan, L., Edirisinghe, E.A.: Impact of the structure of data pre-processing pipelines on the performance of classifiers when applied to imbalanced network intrusion detection system dataset. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1037, pp. 577–589. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29516-5_45

    Chapter  Google Scholar 

  8. Sharma, S., Somayaji, A., Japkowicz, N.: Learning over subconcepts: strategies for 1-class classification. Comput. Intell. 34, 440–467 (2018)

    Article  Google Scholar 

  9. Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2014)

    Article  Google Scholar 

  10. Rodríguez, J.J., Díez-Pastor, J.F., Arnaiz-González, l., Kuncheva, L.I.: Random balance ensembles for multiclass imbalance learning. Knowl.-Based Syst. 193, 105434 (2020)

    Google Scholar 

  11. Liu, Y., Wang, Y., Ren, X., Zhou, H., Diao, X.: A classification method based on feature selection for imbalanced data. IEEE Access 7, 81794–81807 (2019)

    Article  Google Scholar 

  12. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)

    Article  Google Scholar 

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)

    Article  MATH  Google Scholar 

  14. Soltanzadeh, P., Hashemzadeh, M.: RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inf. Sci. 542, 92–111 (2021)

    Article  MATH  Google Scholar 

  15. Turlapati, V.P.K., Prusty, M.R.: Outlier-smote: a refined oversampling technique for improved detection of COVID-19. Intell.-Based Med. 3–4, 100023 (2020)

    Article  Google Scholar 

  16. Hamidzadeh, J., Kashefi, N., Moradi, M.: Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem. Eng. Appl. Artif. Intell. 90, 103500 (2020)

    Article  Google Scholar 

  17. Li, J., Fong, S., Wong, R.K., Chu, V.W.: Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion 39, 1–24 (2018)

    Article  Google Scholar 

  18. Trittenbach, H., Englhardt, A., Böhm, K.: An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Syst. Appl. 168, 114372 (2021)

    Article  Google Scholar 

  19. Almaghrabi, F., Xu, D., Yang, J.: An evidential reasoning rule based feature selection for improving trauma outcome prediction. Appl. Soft Comput. 103, 107112 (2021)

    Article  Google Scholar 

  20. Effrosynidis, D., Arampatzis, A.: An evaluation of feature selection methods for environmental data. Eco. Inform. 61, 101224 (2021)

    Article  Google Scholar 

  21. Mena, L.J., Gonzalez, J.A.: Symbolic one-class learning from imbalanced datasets: application in medical diagnosis. Int. J. Artif. Intell. Tools 18(2), 273–309 (2009)

    Article  Google Scholar 

  22. Tsai, C.F., Lin, W.C.: Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9, 13717–13726 (2021)

    Article  Google Scholar 

  23. Lee, J., Lee, Y.C., Kim, J.T.: Fault detection based on one-class deep learning for manufacturing applications limited to an imbalanced database. J. Manuf. Syst. 57, 357–366 (2020)

    Article  Google Scholar 

  24. Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)

    Article  Google Scholar 

  25. Li, F., Zhang, X., Zhang, X., Du, C., Xu, Y., Tian, Y.: Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Inf. Sci. 422, 242–256 (2018)

    Article  Google Scholar 

  26. Wang, Z., Wang, B., Cheng, Y., Li, D., Zhang, J.: Cost-sensitive fuzzy multiple kernel learning for imbalanced problem. Neurocomputing 366, 178–193 (2019)

    Article  Google Scholar 

  27. Chen, Z., Duan, J., Kang, L., Qiu, G.: A hybrid data-level ensemble to enable learning from highly imbalanced dataset. Inf. Sci. 554, 157–176 (2020)

    Article  MATH  Google Scholar 

  28. López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250(250), 113–141 (2013)

    Article  Google Scholar 

  29. Guo, L., Boukir, S.: Margin-based ordered aggregation for ensemble pruning. Pattern Recogn. Lett. 34(6), 603–609 (2013)

    Article  Google Scholar 

  30. Seng, Z., Kareem, S.A., Varathan, K.D.: A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst. Appl. 168, 114246 (2021)

    Article  Google Scholar 

  31. Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 139–150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28931-6_14

    Chapter  Google Scholar 

  32. Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017)

    Article  Google Scholar 

  33. Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)

    Article  Google Scholar 

  34. Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331 (2009)

    Google Scholar 

  35. Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets (2018)

    Google Scholar 

  36. Beheshti, Z.: BMNABC: binary multi-neighborhood artificial bee colony for high-dimensional discrete optimization problems. Cybern. Syst. 49, 452–474 (2018)

    Article  Google Scholar 

  37. He, X., Zhang, Q., Sun, N., Dong, Y.: Feature selection with discrete binary differential evolution. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 327–330 (2009)

    Google Scholar 

  38. Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary grey wolf optimization approaches for feature selection. Neurocomputing 172(8), 371–381 (2016)

    Article  Google Scholar 

  39. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)

    Article  MATH  Google Scholar 

  40. Yan, K., Zhang, D.: Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens. Actuators B Chem. 212, 353–363 (2015)

    Article  Google Scholar 

  41. Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)

    Article  Google Scholar 

  42. Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5), 360–363 (2005)

    Google Scholar 

  43. Chen, Y., Lin, C.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yi Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, M., Liu, Y., Zheng, Q., Li, X., Qin, W. (2022). A Classification Method for Imbalanced Data Based on Ant Lion Optimizer. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_26

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-9297-1_26

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-9296-4

  • Online ISBN: 978-981-19-9297-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics