A Classification Method for Imbalanced Data Based on Ant Lion Optimizer

Li, Mengmeng; Liu, Yi; Zheng, Qibin; Li, Xiang; Qin, Wei

doi:10.1007/978-981-19-9297-1_26

Mengmeng Li ORCID: orcid.org/0000-0002-9380-8097⁷,
Yi Liu ORCID: orcid.org/0000-0002-8490-6285⁷,
Qibin Zheng⁷,
Xiang Li⁷ &
…
Wei Qin⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1744))

Included in the following conference series:

International Conference on Data Mining and Big Data

655 Accesses
1 Citations

Abstract

Imbalanced data will bring difficulties in data processing, which is very common in data engineering. These data usually have sophisticated distributions. Different resampling methods are required for dealing with data with different distributions, while fixed ones are adopted traditionally. Therefore, to select appropriate resampling methods for data with such characteristics, we propose a novel classification method for Imbalanced Data based on Ant Lion Optimizer, called ALOID. It combines adaptive resampling strategies, feature selection, and ensemble classifiers. The adaptive resampling strategy refers to utilizing roulette wheel selection to choose the most suitable resampling method with a greater probability for each dataset according to the variable probabilities of resampling methods. Then a two-stage approach is further used in feature selection: preprocessing and enhancing. In addition, we adopt an ensemble classifier with dynamic weights. The variable probabilities of resampling methods, features, and the weights of base classifiers are coded in individual solutions. A large number of comprehensive experiments have been carried out in this paper. ALOID is compared with 8 state-of-the-art algorithms on 33 publicly available imbalanced datasets. Using K-nearest neighbor as the base classifier, we have found ALOID outperforms other methods in most cases, especially on high-dimensional imbalanced datasets. Experiment results demonstrate the performance advantage of ALOID over other comparable algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A binary PSO-based ensemble under-sampling model for rebalancing imbalanced training data

Article 11 November 2021

A novel oversampling and feature selection hybrid algorithm for imbalanced data classification

Article 24 June 2022

A Dynamic Decision-Making Method Based on Ensemble Methods for Complex Unbalanced Data

Notes

References

Guo, H., Li, Y., Jennifer, S., Gu, M., Huang, Y., Gong, B.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
Article Google Scholar
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 1–50 (2016)
Article Google Scholar
Wang, C., Deng, C., Yu, Z., Hui, D., Gong, X., Luo, R.: Adaptive ensemble of classifiers with regularization for imbalanced data classification. Inf. Fusion 69, 81–102 (2021)
Article Google Scholar
Alkuhlani, A., Nassef, M., Farag, I.: Multistage feature selection approach for high-dimensional cancer data. Soft Comput. 21, 6895–6906 (2017)
Article Google Scholar
Mousavian, M., Chen, J., Greening, S.: Feature selection and imbalanced data handling for depression detection. In: Wang, S., et al. (eds.) BI 2018. LNCS (LNAI), vol. 11309, pp. 349–358. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05587-5_33
Chapter Google Scholar
Sun, J., et al.: FDHelper: assist unsupervised fraud detection experts with interactive feature selection and evaluation. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1–12. Association for Computing Machinery (2020)
Google Scholar
Al-Mandhari, I., Guan, L., Edirisinghe, E.A.: Impact of the structure of data pre-processing pipelines on the performance of classifiers when applied to imbalanced network intrusion detection system dataset. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1037, pp. 577–589. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29516-5_45
Chapter Google Scholar
Sharma, S., Somayaji, A., Japkowicz, N.: Learning over subconcepts: strategies for 1-class classification. Comput. Intell. 34, 440–467 (2018)
Article Google Scholar
Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872–2885 (2014)
Article Google Scholar
Rodríguez, J.J., Díez-Pastor, J.F., Arnaiz-González, l., Kuncheva, L.I.: Random balance ensembles for multiclass imbalance learning. Knowl.-Based Syst. 193, 105434 (2020)
Google Scholar
Liu, Y., Wang, Y., Ren, X., Zhou, H., Diao, X.: A classification method based on feature selection for imbalanced data. IEEE Access 7, 81794–81807 (2019)
Article Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321–357 (2002)
Article MATH Google Scholar
Soltanzadeh, P., Hashemzadeh, M.: RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inf. Sci. 542, 92–111 (2021)
Article MATH Google Scholar
Turlapati, V.P.K., Prusty, M.R.: Outlier-smote: a refined oversampling technique for improved detection of COVID-19. Intell.-Based Med. 3–4, 100023 (2020)
Article Google Scholar
Hamidzadeh, J., Kashefi, N., Moradi, M.: Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem. Eng. Appl. Artif. Intell. 90, 103500 (2020)
Article Google Scholar
Li, J., Fong, S., Wong, R.K., Chu, V.W.: Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion 39, 1–24 (2018)
Article Google Scholar
Trittenbach, H., Englhardt, A., Böhm, K.: An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Syst. Appl. 168, 114372 (2021)
Article Google Scholar
Almaghrabi, F., Xu, D., Yang, J.: An evidential reasoning rule based feature selection for improving trauma outcome prediction. Appl. Soft Comput. 103, 107112 (2021)
Article Google Scholar
Effrosynidis, D., Arampatzis, A.: An evaluation of feature selection methods for environmental data. Eco. Inform. 61, 101224 (2021)
Article Google Scholar
Mena, L.J., Gonzalez, J.A.: Symbolic one-class learning from imbalanced datasets: application in medical diagnosis. Int. J. Artif. Intell. Tools 18(2), 273–309 (2009)
Article Google Scholar
Tsai, C.F., Lin, W.C.: Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9, 13717–13726 (2021)
Article Google Scholar
Lee, J., Lee, Y.C., Kim, J.T.: Fault detection based on one-class deep learning for manufacturing applications limited to an imbalanced database. J. Manuf. Syst. 57, 357–366 (2020)
Article Google Scholar
Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)
Article Google Scholar
Li, F., Zhang, X., Zhang, X., Du, C., Xu, Y., Tian, Y.: Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Inf. Sci. 422, 242–256 (2018)
Article Google Scholar
Wang, Z., Wang, B., Cheng, Y., Li, D., Zhang, J.: Cost-sensitive fuzzy multiple kernel learning for imbalanced problem. Neurocomputing 366, 178–193 (2019)
Article Google Scholar
Chen, Z., Duan, J., Kang, L., Qiu, G.: A hybrid data-level ensemble to enable learning from highly imbalanced dataset. Inf. Sci. 554, 157–176 (2020)
Article MATH Google Scholar
López, V., Fernández, A., García, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250(250), 113–141 (2013)
Article Google Scholar
Guo, L., Boukir, S.: Margin-based ordered aggregation for ensemble pruning. Pattern Recogn. Lett. 34(6), 603–609 (2013)
Article Google Scholar
Seng, Z., Kareem, S.A., Varathan, K.D.: A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst. Appl. 168, 114246 (2021)
Article Google Scholar
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 139–150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28931-6_14
Chapter Google Scholar
Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38–49 (2017)
Article Google Scholar
Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80–98 (2015)
Article Google Scholar
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324–331 (2009)
Google Scholar
Fernández, A., García, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets (2018)
Google Scholar
Beheshti, Z.: BMNABC: binary multi-neighborhood artificial bee colony for high-dimensional discrete optimization problems. Cybern. Syst. 49, 452–474 (2018)
Article Google Scholar
He, X., Zhang, Q., Sun, N., Dong, Y.: Feature selection with discrete binary differential evolution. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 327–330 (2009)
Google Scholar
Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary grey wolf optimization approaches for feature selection. Neurocomputing 172(8), 371–381 (2016)
Article Google Scholar
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389–422 (2002)
Article MATH Google Scholar
Yan, K., Zhang, D.: Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens. Actuators B Chem. 212, 353–363 (2015)
Article Google Scholar
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195–215 (1998)
Article Google Scholar
Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5), 360–363 (2005)
Google Scholar
Chen, Y., Lin, C.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315–324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Academy of Military Sciences, Beijing, China
Mengmeng Li, Yi Liu, Qibin Zheng, Xiang Li & Wei Qin

Authors

Mengmeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Yi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qibin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wei Qin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yi Liu .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Ying Tan
Southern University of Science and Technology, Shenzhen, China
Yuhui Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, M., Liu, Y., Zheng, Q., Li, X., Qin, W. (2022). A Classification Method for Imbalanced Data Based on Ant Lion Optimizer. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_26

Download citation

DOI: https://doi.org/10.1007/978-981-19-9297-1_26
Published: 20 January 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Classification Method for Imbalanced Data Based on Ant Lion Optimizer