Abstract
Bulk construction of pattern classifiers, whether for optimizing input data configurations or method hyperparameters, is a computationally highly complex task. The main problem is the prediction quality evaluation function based on estimation using the selected experimental protocol. In the case of iterative optimization algorithms, such an evaluation is computationally-intensive, runs in each iteration, and requires a separate data partition for quality estimation. So-called proxy models may be alternative solutions, which estimate classifier quality on data characteristics without the need to train the prediction model. There are some premises that the problem complexity measures can be used for this purpose. However, this paper negatively verifies this hypothesis – confirming the predictive potential of evaluating the effectiveness of models by complexity measures but also showing a relatively large measurement error in direct relation between quality metric and proxy measure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barella, V.H., Garcia, L.P., de Souto, M.C., Lorena, A.C., de Carvalho, A.C.: Assessing the data complexity of imbalanced datasets. Inf. Sci. 553, 83–109 (2021)
Bartz, E., Zaefferer, M., Mersmann, O., Bartz-Beielstein, T.: Experimental investigation and evaluation of model-based hyperparameter optimization. arXiv preprint arXiv:2107.08761 (2021)
Camacho-Urriolagoitia, F.J., Villuendas-Rey, Y., López-Yáñez, I., Camacho-Nieto, O., Yáñez-Márquez, C.: Correlation assessment of the performance of associative classifiers on credit datasets based on data complexity measures. Mathematics 10(9), 1460 (2022)
Costa, A.J., Santos, M.S., Soares, C., Abreu, P.H.: Analysis of imbalance strategies recommendation using a meta-learning approach. In: 7th ICML workshop on automated machine learning (AutoML-ICML2020), pp. 1–10 (2020)
Dogo, E.M., Nwulu, N.I., Twala, B., Aigbavboa, C.: Accessing imbalance learning using dynamic selection approach in water quality anomaly detection. Symmetry 13(5), 818 (2021)
García, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)
Goethals, S., Martens, D., Evgeniou, T.: The non-linear nature of the cost of comprehensibility. J. Big Data 9(1), 1–23 (2022)
Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection, vol. 253, p. 40 (2003)
Khoshgoftaar, T.M., Seliya, N., Drown, D.J.: Evolutionary data analysis for the class imbalance problem. Intell. Data Anal. 14(1), 69–88 (2010)
Komorniczak, J., Ksieniewicz, P.: problexity-an open-source python library for supervised learning problem complexity assessment. Neurocomputing 521, 126–136 (2023)
Komorniczak, J., Ksieniewicz, P., Woźniak, M.: Data complexity and classification accuracy correlation in oversampling algorithms. In: 4th International Workshop on Learning with Imbalanced Domains: Theory and Applications Co-located with ECML/PKDD 2022 (2022)
Kong, J., Kowalczyk, W., Nguyen, D.A., Bäck, T., Menzel, S.: Hyperparameter optimisation for improving classification under class imbalance. In: 2019 IEEE SCCI, pp. 3072–3078. IEEE (2019)
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Li, G., Togo, R., Ogawa, T., Haseyama, M.: Dataset complexity assessment based on cumulative maximum scaled area under laplacian spectrum. Multimedia Tools Appl., 1–17 (2022)
Li, M., Xiong, A., Wang, L., Deng, S., Ye, J.: ACO resampling: enhancing the performance of oversampling methods for class imbalance classification. Knowl.-Based Syst. 196, 105818 (2020)
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)
Morán-Fernández, L., Bólon-Canedo, V., Alonso-Betanzos, A.: How important is data quality? best classifiers vs best features. Neurocomputing (2022)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Reuß, F., Greimeister-Pfeil, I., Vreugdenhil, M., Wagner, W.: Comparison of long short-term memory networks and random forest for sentinel-1 time series based large scale crop classification. Remote Sens. 13(24), 5000 (2021)
Rivolli, A., Garcia, L.P., Soares, C., Vanschoren, J., de Carvalho, A.C.: Meta-features for meta-learning. Knowl.-Based Syst. 240, 108101 (2022)
Santos, M.S., Abreu, P.H., Japkowicz, N., Fernández, A., Soares, C., Wilk, S., Santos, J.: On the joint-effect of class imbalance and overlap: a critical review. Artif. Intell. Rev., 1–69 (2022)
Acknowledgement
This work was supported by the Polish National Science Centre under the grant No. 2019/35/B/ST6/04442.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Komorniczak, J., Ksieniewicz, P., Woźniak, M. (2023). Analysis of the Possibility to Employ Relationship Between the Problem Complexity and the Classification Quality as Model Optimization Proxy. In: Burduk, R., Choraś, M., Kozik, R., Ksieniewicz, P., Marciniak, T., Trajdos, P. (eds) Progress on Pattern Classification, Image Processing and Communications. CORES IP&C 2023 2023. Lecture Notes in Networks and Systems, vol 766. Springer, Cham. https://doi.org/10.1007/978-3-031-41630-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-41630-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41629-3
Online ISBN: 978-3-031-41630-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)