Abstract
Imbalanced data will bring difficulties in data processing, which is very common in data engineering. These data usually have sophisticated distributions. Different resampling methods are required for dealing with data with different distributions, while fixed ones are adopted traditionally. Therefore, to select appropriate resampling methods for data with such characteristics, we propose a novel classification method for Imbalanced Data based on Ant Lion Optimizer, called ALOID. It combines adaptive resampling strategies, feature selection, and ensemble classifiers. The adaptive resampling strategy refers to utilizing roulette wheel selection to choose the most suitable resampling method with a greater probability for each dataset according to the variable probabilities of resampling methods. Then a two-stage approach is further used in feature selection: preprocessing and enhancing. In addition, we adopt an ensemble classifier with dynamic weights. The variable probabilities of resampling methods, features, and the weights of base classifiers are coded in individual solutions. A large number of comprehensive experiments have been carried out in this paper. ALOID is compared with 8 state-of-the-art algorithms on 33 publicly available imbalanced datasets. Using K-nearest neighbor as the base classifier, we have found ALOID outperforms other methods in most cases, especially on high-dimensional imbalanced datasets. Experiment results demonstrate the performance advantage of ALOID over other comparable algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Guo, H., Li, Y., Jennifer, S., Gu, M., Huang, Y., Gong, B.: Learning from class-imbalanced data: review of methods and applications. Expert Syst. Appl. 73, 220ā239 (2017)
Branco, P., Torgo, L., Ribeiro, R.P.: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv. 49(2), 1ā50 (2016)
Wang, C., Deng, C., Yu, Z., Hui, D., Gong, X., Luo, R.: Adaptive ensemble of classifiers with regularization for imbalanced data classification. Inf. Fusion 69, 81ā102 (2021)
Alkuhlani, A., Nassef, M., Farag, I.: Multistage feature selection approach for high-dimensional cancer data. Soft Comput. 21, 6895ā6906 (2017)
Mousavian, M., Chen, J., Greening, S.: Feature selection and imbalanced data handling for depression detection. In: Wang, S., et al. (eds.) BI 2018. LNCS (LNAI), vol. 11309, pp. 349ā358. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-05587-5_33
Sun, J., et al.: FDHelper: assist unsupervised fraud detection experts with interactive feature selection and evaluation. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pp. 1ā12. Association for Computing Machinery (2020)
Al-Mandhari, I., Guan, L., Edirisinghe, E.A.: Impact of the structure of data pre-processing pipelines on the performance of classifiers when applied to imbalanced network intrusion detection system dataset. In: Bi, Y., Bhatia, R., Kapoor, S. (eds.) IntelliSys 2019. AISC, vol. 1037, pp. 577ā589. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-29516-5_45
Sharma, S., Somayaji, A., Japkowicz, N.: Learning over subconcepts: strategies for 1-class classification. Comput. Intell. 34, 440ā467 (2018)
Zhang, X., Hu, B.: A new strategy of cost-free learning in the class imbalance problem. IEEE Trans. Knowl. Data Eng. 26(12), 2872ā2885 (2014)
RodrĆguez, J.J., DĆez-Pastor, J.F., Arnaiz-GonzĆ”lez, l., Kuncheva, L.I.: Random balance ensembles for multiclass imbalance learning. Knowl.-Based Syst. 193, 105434 (2020)
Liu, Y., Wang, Y., Ren, X., Zhou, H., Diao, X.: A classification method based on feature selection for imbalanced data. IEEE Access 7, 81794ā81807 (2019)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263ā1284 (2009)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16(1), 321ā357 (2002)
Soltanzadeh, P., Hashemzadeh, M.: RCSMOTE: range-controlled synthetic minority over-sampling technique for handling the class imbalance problem. Inf. Sci. 542, 92ā111 (2021)
Turlapati, V.P.K., Prusty, M.R.: Outlier-smote: a refined oversampling technique for improved detection of COVID-19. Intell.-Based Med. 3ā4, 100023 (2020)
Hamidzadeh, J., Kashefi, N., Moradi, M.: Combined weighted multi-objective optimizer for instance reduction in two-class imbalanced data problem. Eng. Appl. Artif. Intell. 90, 103500 (2020)
Li, J., Fong, S., Wong, R.K., Chu, V.W.: Adaptive multi-objective swarm fusion for imbalanced data classification. Inf. Fusion 39, 1ā24 (2018)
Trittenbach, H., Englhardt, A., Bƶhm, K.: An overview and a benchmark of active learning for outlier detection with one-class classifiers. Expert Syst. Appl. 168, 114372 (2021)
Almaghrabi, F., Xu, D., Yang, J.: An evidential reasoning rule based feature selection for improving trauma outcome prediction. Appl. Soft Comput. 103, 107112 (2021)
Effrosynidis, D., Arampatzis, A.: An evaluation of feature selection methods for environmental data. Eco. Inform. 61, 101224 (2021)
Mena, L.J., Gonzalez, J.A.: Symbolic one-class learning from imbalanced datasets: application in medical diagnosis. Int. J. Artif. Intell. Tools 18(2), 273ā309 (2009)
Tsai, C.F., Lin, W.C.: Feature selection and ensemble learning techniques in one-class classifiers: an empirical study of two-class imbalanced datasets. IEEE Access 9, 13717ā13726 (2021)
Lee, J., Lee, Y.C., Kim, J.T.: Fault detection based on one-class deep learning for manufacturing applications limited to an imbalanced database. J. Manuf. Syst. 57, 357ā366 (2020)
Gao, L., Zhang, L., Liu, C., Wu, S.: Handling imbalanced medical image data: a deep-learning-based one-class classification approach. Artif. Intell. Med. 108, 101935 (2020)
Li, F., Zhang, X., Zhang, X., Du, C., Xu, Y., Tian, Y.: Cost-sensitive and hybrid-attribute measure multi-decision tree over imbalanced data sets. Inf. Sci. 422, 242ā256 (2018)
Wang, Z., Wang, B., Cheng, Y., Li, D., Zhang, J.: Cost-sensitive fuzzy multiple kernel learning for imbalanced problem. Neurocomputing 366, 178ā193 (2019)
Chen, Z., Duan, J., Kang, L., Qiu, G.: A hybrid data-level ensemble to enable learning from highly imbalanced dataset. Inf. Sci. 554, 157ā176 (2020)
LĆ³pez, V., FernĆ”ndez, A., GarcĆa, S., Palade, V., Herrera, F.: An insight into classification with imbalanced data: empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250(250), 113ā141 (2013)
Guo, L., Boukir, S.: Margin-based ordered aggregation for ensemble pruning. Pattern Recogn. Lett. 34(6), 603ā609 (2013)
Seng, Z., Kareem, S.A., Varathan, K.D.: A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification. Expert Syst. Appl. 168, 114246 (2021)
Napierala, K., Stefanowski, J.: Identification of different types of minority class examples in imbalanced data. In: Corchado, E., SnĆ”Å”el, V., Abraham, A., WoÅŗniak, M., GraƱa, M., Cho, S.-B. (eds.) HAIS 2012. LNCS (LNAI), vol. 7209, pp. 139ā150. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28931-6_14
Moayedikia, A., Ong, K.L., Boo, Y.L., Yeoh, W.G., Jensen, R.: Feature selection for high dimensional imbalanced class data using harmony search. Eng. Appl. Artif. Intell. 57, 38ā49 (2017)
Mirjalili, S.: The ant lion optimizer. Adv. Eng. Softw. 83, 80ā98 (2015)
Wang, S., Yao, X.: Diversity analysis on imbalanced data sets by using ensemble models. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining, pp. 324ā331 (2009)
FernĆ”ndez, A., GarcĆa, S., Galar, M., Prati, R.C., Krawczyk, B., Herrera, F.: Learning from Imbalanced Data Sets (2018)
Beheshti, Z.: BMNABC: binary multi-neighborhood artificial bee colony for high-dimensional discrete optimization problems. Cybern. Syst. 49, 452ā474 (2018)
He, X., Zhang, Q., Sun, N., Dong, Y.: Feature selection with discrete binary differential evolution. In: 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 327ā330 (2009)
Emary, E., Zawbaa, H.M., Hassanien, A.E.: Binary grey wolf optimization approaches for feature selection. Neurocomputing 172(8), 371ā381 (2016)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1), 389ā422 (2002)
Yan, K., Zhang, D.: Feature selection and analysis on correlated gas sensor data with recursive feature elimination. Sens. Actuators B Chem. 212, 353ā363 (2015)
Kubat, M., Holte, R.C., Matwin, S.: Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30(2), 195ā215 (1998)
Viera, A.J., Garrett, J.M.: Understanding interobserver agreement: the kappa statistic. Fam. Med. 37(5), 360ā363 (2005)
Chen, Y., Lin, C.: Combining SVMs with various feature selection strategies. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds.) Feature Extraction, pp. 315ā324. Springer, Heidelberg (2006). https://doi.org/10.1007/978-3-540-35488-8_13
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, M., Liu, Y., Zheng, Q., Li, X., Qin, W. (2022). A Classification Method for Imbalanced Data Based on Ant Lion Optimizer. In: Tan, Y., Shi, Y. (eds) Data Mining and Big Data. DMBD 2022. Communications in Computer and Information Science, vol 1744. Springer, Singapore. https://doi.org/10.1007/978-981-19-9297-1_26
Download citation
DOI: https://doi.org/10.1007/978-981-19-9297-1_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-9296-4
Online ISBN: 978-981-19-9297-1
eBook Packages: Computer ScienceComputer Science (R0)