Abstract
Given a new class imbalanced dataset D and limited computational resources, the challenge arises of selecting promising class imbalanced learning (CIL) pipelines that include resampling methods, classification models, and their corresponding hyperparameters. To address this challenge, we study Zero-shot Automated Machine Learning and propose a new approach aiming at class imbalanced data, called Zero-shot Automated Class Imbalance Learning (ZAutoCIL). ZAutoCIL employs domain-independent meta-learning to develop a zero-shot surrogate model for automated class imbalanced learning. This model aims to recommend effective CIL pipelines for new unseen imbalanced datasets without requiring additional search. Specifically, we meta-train a two-tower model to serve as the surrogate model, adapted from recommender systems, using a pairwise ranking loss on the meta-dataset gained from collecting performance data across a wide range of CIL pipelines and a comprehensive repository of class imbalance datasets. We perform extensive experiments on 100 datasets grouped in 4 parts based on their imbalance ratio. The experimental results demonstrate the efficacy of our approach in automating the recommendation of CIL pipelines given any target imbalanced datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alcalá-Fdez, J., Sanchez, L., Garcia, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., et al.: Keel: a software tool to assess evolutionary algorithms for data mining problems. Soft. Comput. 13, 307–318 (2009)
Alcobaça, E., Siqueira, F., Rivolli, A., Garcia, L.P.F., Oliva, J.T., de Carvalho, A.C.P.L.F.: Mfe: Towards reproducible meta-feature extraction. Journal of Machine Learning Research 21(111), 1–5 (2020), http://jmlr.org/papers/v21/19-348.html
Chawla, N.V.: Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook pp. 875–886 (2010)
Chen, W., Liu, T.Y., Lan, Y., Ma, Z.M., Li, H.: Ranking measures and loss functions in learning to rank. Advances in Neural Information Processing Systems 22 (2009)
Erickson, B.J., Kitamura, F.: Magician’s corner: 9. performance metrics for machine learning models (2021)
Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. Advances in neural information processing systems 28 (2015)
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., Bing, G.: Learning from class-imbalanced data: Review of methods and applications. Expert Syst. Appl. 73, 220–239 (2017)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Hinton, G., Srivastava, N., Swersky, K.: Neural networks for machine learning lecture 6a overview of mini-batch gradient descent. Cited on 14(8), 2 (2012)
Hutter, F., Kotthoff, L., Vanschoren, J.: Automated machine learning: methods, systems, challenges. Springer Nature (2019)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress in artificial intelligence 5(4), 221–232 (2016)
LemaÃŽtre, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(17), 1–5 (2017)
Li, H.: A short introduction to learning to rank. IEICE Trans. Inf. Syst. 94(10), 1854–1862 (2011)
Liu, X.Y., Zhou, Z.H.: Ensemble methods for class imbalance learning. Imbalanced learning: Foundations, algorithms, and applications pp. 61–82 (2013)
Moniz, N., Cerqueira, V.: Automated imbalanced classification via meta-learning. Expert Syst. Appl. 178, 115011 (2021)
Nguyen, D.A., Kong, J., Wang, H., Menzel, S., Sendhoff, B., Kononova, A.V., Bäck, T.: Improved automated cash optimization with tree parzen estimators for class imbalance problems. In: 2021 IEEE 8th international conference on data science and advanced analytics (DSAA). pp. 1–9. IEEE (2021)
Öztürk, E., Ferreira, F., Jomaa, H., Schmidt-Thieme, L., Grabocka, J., Hutter, F.: Zero-shot automl with pretrained models. In: International Conference on Machine Learning. pp. 17138–17155. PMLR (2022)
Pasumarthi, R.K., Bruch, S., Wang, X., Li, C., Bendersky, M., Najork, M., Pfeifer, J., Golbandi, N., Anil, R., Wolf, S.: Tf-ranking: Scalable tensorflow library for learning-to-rank. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. pp. 2970–2978 (2019)
Rezvani, S., Wang, X.: A broad review on class imbalance learning techniques. Appl. Soft Comput. 143, 110415 (2023)
Rivolli, A., Garcia, L.P., Soares, C., Vanschoren, J., de Carvalho, A.C.: Characterizing classification datasets: a study of meta-features for meta-learning. arXiv preprint arXiv:1808.10406 (2018)
Singh, P., Vanschoren, J.: Automated imbalanced learning. arXiv preprint arXiv:2211.00376 (2022)
Tornede, A., Wever, M., Hüllermeier, E.: Extreme algorithm selection with dyadic feature representation. In: International Conference on Discovery Science. pp. 309–324. Springer (2020)
Truong, A., Walters, A., Goodsitt, J., Hines, K., Bruss, C.B., Farivar, R.: Towards automated machine learning: Evaluation and comparison of automl approaches and tools. In: 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI). pp. 1471–1479. IEEE (2019)
Vanschoren, J.: Meta-learning: A survey. arXiv preprint arXiv:1810.03548 (2018)
Vieira, P.M., Rodrigues, F.: An automated approach for binary classification on imbalanced data. Knowledge and Information Systems pp. 1–21 (2024)
Wang, K., Xue, Q., Lu, J.J.: Risky driver recognition with class imbalance data and automated machine learning framework. Int. J. Environ. Res. Public Health 18(14), 7534 (2021)
Wang, S., Yao, X.: Multiclass imbalance problems: Analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42(4), 1119–1130 (2012)
Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10(2), 1–37 (2019)
Wang, Z., Wang, S.: Online automated machine learning for class imbalanced data streams. In: 2023 International Joint Conference on Neural Networks (IJCNN). pp. 1–8. IEEE (2023)
Winkelmolen, F., Ivkin, N., Bozkurt, H.F., Karnin, Z.: Practical and sample efficient zero-shot hpo. arXiv preprint arXiv:2007.13382 (2020)
Wistuba, M., Grabocka, J.: Few-shot bayesian optimization with deep kernel surrogates. arXiv preprint arXiv:2101.07667 (2021)
XU, S., Wang, J.: On strong convergence of the two-tower model for recommender system (2021)
Yang, F., Zou, Q.: maml: an automated machine learning pipeline with a microbiome repository for human disease classification. Database 2020, baaa050 (2020)
Zhang, J., Sun, Z., Qi, Y.: Autoidl: Automated imbalanced data learning via collaborative filtering. In: International Conference on Knowledge Science, Engineering and Management. pp. 96–104. Springer (2020)
Acknowledgements
This work is supported by the EPSRC Early Career Researchers International Collaboration Grants [EP/Y002539/1].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, Z., Wang, S. (2025). Zero-shot Automated Class Imbalanced Learning. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15324. Springer, Cham. https://doi.org/10.1007/978-3-031-78383-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-78383-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78382-1
Online ISBN: 978-3-031-78383-8
eBook Packages: Computer ScienceComputer Science (R0)