Analysis of the Possibility to Employ Relationship Between the Problem Complexity and the Classification Quality as Model Optimization Proxy

Komorniczak, Joanna; Ksieniewicz, Paweł; Woźniak, Michał

doi:10.1007/978-3-031-41630-9_8

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 766))

Included in the following conference series:

55 Accesses

Abstract

Bulk construction of pattern classifiers, whether for optimizing input data configurations or method hyperparameters, is a computationally highly complex task. The main problem is the prediction quality evaluation function based on estimation using the selected experimental protocol. In the case of iterative optimization algorithms, such an evaluation is computationally-intensive, runs in each iteration, and requires a separate data partition for quality estimation. So-called proxy models may be alternative solutions, which estimate classifier quality on data characteristics without the need to train the prediction model. There are some premises that the problem complexity measures can be used for this purpose. However, this paper negatively verifies this hypothesis – confirming the predictive potential of evaluating the effectiveness of models by complexity measures but also showing a relatively large measurement error in direct relation between quality metric and proxy measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barella, V.H., Garcia, L.P., de Souto, M.C., Lorena, A.C., de Carvalho, A.C.: Assessing the data complexity of imbalanced datasets. Inf. Sci. 553, 83–109 (2021)
Article MathSciNet MATH Google Scholar
Bartz, E., Zaefferer, M., Mersmann, O., Bartz-Beielstein, T.: Experimental investigation and evaluation of model-based hyperparameter optimization. arXiv preprint arXiv:2107.08761 (2021)
Camacho-Urriolagoitia, F.J., Villuendas-Rey, Y., López-Yáñez, I., Camacho-Nieto, O., Yáñez-Márquez, C.: Correlation assessment of the performance of associative classifiers on credit datasets based on data complexity measures. Mathematics 10(9), 1460 (2022)
Article Google Scholar
Costa, A.J., Santos, M.S., Soares, C., Abreu, P.H.: Analysis of imbalance strategies recommendation using a meta-learning approach. In: 7th ICML workshop on automated machine learning (AutoML-ICML2020), pp. 1–10 (2020)
Google Scholar
Dogo, E.M., Nwulu, N.I., Twala, B., Aigbavboa, C.: Accessing imbalance learning using dynamic selection approach in water quality anomaly detection. Symmetry 13(5), 818 (2021)
Article Google Scholar
García, S., Herrera, F.: Evolutionary undersampling for classification with imbalanced datasets: proposals and taxonomy. Evol. Comput. 17(3), 275–306 (2009)
Article MathSciNet Google Scholar
Goethals, S., Martens, D., Evgeniou, T.: The non-linear nature of the cost of comprehensibility. J. Big Data 9(1), 1–23 (2022)
Article Google Scholar
Guyon, I.: Design of experiments of the nips 2003 variable selection benchmark. In: NIPS 2003 Workshop on Feature Extraction and Feature Selection, vol. 253, p. 40 (2003)
Google Scholar
Khoshgoftaar, T.M., Seliya, N., Drown, D.J.: Evolutionary data analysis for the class imbalance problem. Intell. Data Anal. 14(1), 69–88 (2010)
Article Google Scholar
Komorniczak, J., Ksieniewicz, P.: problexity-an open-source python library for supervised learning problem complexity assessment. Neurocomputing 521, 126–136 (2023)
Article Google Scholar
Komorniczak, J., Ksieniewicz, P., Woźniak, M.: Data complexity and classification accuracy correlation in oversampling algorithms. In: 4th International Workshop on Learning with Imbalanced Domains: Theory and Applications Co-located with ECML/PKDD 2022 (2022)
Google Scholar
Kong, J., Kowalczyk, W., Nguyen, D.A., Bäck, T., Menzel, S.: Hyperparameter optimisation for improving classification under class imbalance. In: 2019 IEEE SCCI, pp. 3072–3078. IEEE (2019)
Google Scholar
Lemaître, G., Nogueira, F., Aridas, C.K.: Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
Google Scholar
Li, G., Togo, R., Ogawa, T., Haseyama, M.: Dataset complexity assessment based on cumulative maximum scaled area under laplacian spectrum. Multimedia Tools Appl., 1–17 (2022)
Google Scholar
Li, M., Xiong, A., Wang, L., Deng, S., Ye, J.: ACO resampling: enhancing the performance of oversampling methods for class imbalance classification. Knowl.-Based Syst. 196, 105818 (2020)
Google Scholar
Lorena, A.C., Garcia, L.P., Lehmann, J., Souto, M.C., Ho, T.K.: How complex is your classification problem? a survey on measuring classification complexity. ACM Comput. Surv. (CSUR) 52(5), 1–34 (2019)
Article Google Scholar
Morán-Fernández, L., Bólon-Canedo, V., Alonso-Betanzos, A.: How important is data quality? best classifiers vs best features. Neurocomputing (2022)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
Reuß, F., Greimeister-Pfeil, I., Vreugdenhil, M., Wagner, W.: Comparison of long short-term memory networks and random forest for sentinel-1 time series based large scale crop classification. Remote Sens. 13(24), 5000 (2021)
Article Google Scholar
Rivolli, A., Garcia, L.P., Soares, C., Vanschoren, J., de Carvalho, A.C.: Meta-features for meta-learning. Knowl.-Based Syst. 240, 108101 (2022)
Google Scholar
Santos, M.S., Abreu, P.H., Japkowicz, N., Fernández, A., Soares, C., Wilk, S., Santos, J.: On the joint-effect of class imbalance and overlap: a critical review. Artif. Intell. Rev., 1–69 (2022)
Google Scholar

Download references

Acknowledgement

This work was supported by the Polish National Science Centre under the grant No. 2019/35/B/ST6/04442.

Author information

Authors and Affiliations

Department of Systems and Computer Networks, Wrocław University of Science and Technology, Wrocław, Poland
Joanna Komorniczak, Paweł Ksieniewicz & Michał Woźniak

Authors

Joanna Komorniczak
View author publications
You can also search for this author in PubMed Google Scholar
Paweł Ksieniewicz
View author publications
You can also search for this author in PubMed Google Scholar
Michał Woźniak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joanna Komorniczak .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Robert Burduk
University of Technology and Life Science, Bydgoszcz, Poland
Michał Choraś
University of Technology and Life Science, Bydgoszcz, Poland
Rafał Kozik
Wrocław University of Science and Technology, Wrocław, Poland
Paweł Ksieniewicz
University of Technology and Life Science, Bydgoszcz, Poland
Tomasz Marciniak
Wrocław University of Science and Technology, Wrocław, Poland
Paweł Trajdos

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Komorniczak, J., Ksieniewicz, P., Woźniak, M. (2023). Analysis of the Possibility to Employ Relationship Between the Problem Complexity and the Classification Quality as Model Optimization Proxy. In: Burduk, R., Choraś, M., Kozik, R., Ksieniewicz, P., Marciniak, T., Trajdos, P. (eds) Progress on Pattern Classification, Image Processing and Communications. CORES IP&C 2023 2023. Lecture Notes in Networks and Systems, vol 766. Springer, Cham. https://doi.org/10.1007/978-3-031-41630-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-41630-9_8
Published: 01 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41629-3
Online ISBN: 978-3-031-41630-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Analysis of the Possibility to Employ Relationship Between the Problem Complexity and the Classification Quality as Model Optimization Proxy