Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation

Pereira, João Luiz Junho; Smith-Miles, Kate; Muñoz, Mario Andrés; Lorena, Ana Carolina

doi:10.1007/s10618-023-00957-1

Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation

Published: 20 October 2023

Volume 38, pages 461–500, (2024)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

João Luiz Junho Pereira ORCID: orcid.org/0000-0001-9923-7419¹,
Kate Smith-Miles²,
Mario Andrés Muñoz³ &
…
Ana Carolina Lorena¹

797 Accesses
2 Altmetric
Explore all metrics

Abstract

Whenever a new supervised machine learning (ML) algorithm or solution is developed, it is imperative to evaluate the predictive performance it attains for diverse datasets. This is done in order to stress test the strengths and weaknesses of the novel algorithms and provide evidence for situations in which they are most useful. A common practice is to gather some datasets from public benchmark repositories for such an evaluation. But little or no specific criteria are used in the selection of these datasets, which is often ad-hoc. In this paper, the importance of gathering a diverse benchmark of datasets in order to properly evaluate ML models and really understand their capabilities is investigated. Leveraging from meta-learning studies evaluating the diversity of public repositories of datasets, this paper introduces an optimization method to choose varied classification and regression datasets from a pool of candidate datasets. The method is based on maximum coverage, circular packing, and the meta-heuristic Lichtenberg Algorithm for ensuring that diverse datasets able to challenge the ML algorithms more broadly are chosen. The selections were compared experimentally with a random selection of datasets and with clustering by k-medoids and proved to be more effective regarding the diversity of the chosen benchmarks and the ability to challenge the ML algorithms at different levels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dataset Weighting via Intrinsic Data Characteristics for Pairwise Statistical Comparisons in Classification

CIAMS: clustering indices-based automatic classification model selection

Article 19 August 2023

PMLB: a large benchmark suite for machine learning evaluation and comparison

Article Open access 11 December 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The benchmark datasets and outputs of their analysis are available at (https://matilda.unimelb.edu.au/matilda/).

Code availability

The source code for Lichtenberg-MATILDA algorithm is in https://www.mathworks.com/matlabcentral/fileexchange/123930-lichtenberg-algorithm-for-benchmark-datasets-selection.

References

Aguiar GJ, Santana EJ, de Carvalho AC, Junior SB (2022) Using meta-learning for multi-target regression. Inf Sci 584:665–684
Article Google Scholar
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Multiple-Valued Logic Soft Comput 17:255–287
Google Scholar
Alipour H, Muñoz MA, Smith-Miles K (2023) Enhanced instance space analysis for the maximum flow problem. Eur J Oper Res 304(2):411–428
Article MathSciNet Google Scholar
Arora P, Varshney S et al (2016) Analysis of k-means and k-medoids algorithm for big data. Procedia Comput Sci 78:507–512
Article Google Scholar
Bang-Jensen J, Gutin G, Yeo A (2004) When the greedy algorithm fails. Discret Optim 1(2):121–127
Article MathSciNet Google Scholar
Benavoli A, Corani G, Demšar J, Zaffalon M (2017) Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis. J Mach Learn Res 18(1):2653–2688
MathSciNet Google Scholar
Bischl B, Casalicchio G, Feurer M, Hutter F, Lang M, Mantovani RG, van Rijn JN, Vanschoren J (2017) Openml benchmarking suites. arXiv: Machine Learning
Botchkarev A (2018) Performance metrics (error measures) in machine learning regression, forecasting and prognostics: properties and typology. arXiv preprint arXiv:1809.03006
Broyden CG (1970) The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J Appl Math 6(1):76–90
Article Google Scholar
Calvo B, Santafé Rodrigo G (2016) scmamp: statistical comparison of multiple algorithms in multiple problems. The R Journal, Vol 8/1, Aug 2016
Castillo I, Kampas FJ, Pintér JD (2008) Solving circle packing problems by global optimization: numerical results and industrial applications. Eur J Oper Res 191(3):786–802
Article MathSciNet Google Scholar
Clement CL, Kauwe SK, Sparks TD (2020) Benchmark aflow data sets for machine learning. Integr Mater Manuf Innov 9(2):153–156
Article Google Scholar
Cohen R, Katzir L (2008) The generalized maximum coverage problem. Inf Process Lett 108(1):15–22
Article MathSciNet Google Scholar
Corani G, Benavoli A (2015) A Bayesian approach for comparing cross-validated algorithms on multiple data sets. Mach Learn 100(2–3):285–304
Article MathSciNet Google Scholar
Davenport TH, Ronanki R (2018) Artificial intelligence for the real world. Harv Bus Rev 96(1):108–116
Google Scholar
Demsar J (2006) Statistical comparisons of classifiers over multiple datasets. J Mach Learn Res 7:1–30
MathSciNet Google Scholar
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Dueben PD, Schultz MG, Chantry M, Gagne DJ, Hall DM, McGovern A (2022) Challenges and benchmark datasets for machine learning in the atmospheric sciences: definition, status, and outlook. Artif Intell Earth Syst 1(3):e210002
Google Scholar
Ferri C, Hernández-Orallo J, Modroiu R (2009) An experimental comparison of performance measures for classification. Pattern Recogn Lett 30(1):27–38
Article ADS Google Scholar
Flores JJ, Martínez J, Calderón F (2016) Evolutionary computation solutions to the circle packing problem. Soft Comput 20(4):1521–1535
Article Google Scholar
Garcia LP, Lorena AC, de Souto M, Ho TK (2018) Classifier recommendation using data complexity measures. In: IEEE Proceedings of ICPR 2018
Hannousse A, Yahiouche S (2021) Towards benchmark datasets for machine learning based website phishing detection: an experimental study. Eng Appl Artif Intell 104:104347
Article Google Scholar
Hansen N, Auger A, Finck S, Ros R (2014) Real-parameter black-box optimization benchmarking BBOB-2010: Experimental setup. Tech. Rep. RR-7215, INRIA, http://coco.lri.fr/downloads/download15.02/bbobdocexperiment.pdf
Hochbaum DS (1996) Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems. In: Approximation algorithms for NP-hard problems, pp 94–143
Hooker JN (1995) Testing heuristics: we have it all wrong. J Heurist 1:33–42
Article Google Scholar
Hu W, Fey M, Zitnik M, Dong Y, Ren H, Liu B, Catasta M, Leskovec J (2020) Open graph benchmark: datasets for machine learning on graphs. Adv Neural Inf Process Syst 33:22118–22133
Google Scholar
Janairo AG, Baun JJ, Concepcion R, Relano RJ, Francisco K, Enriquez ML, Bandala A, Vicerra RR, Alipio M, Dadios EP (2022) Optimization of subsurface imaging antenna capacitance through geometry modeling using archimedes, lichtenberg and henry gas solubility metaheuristics. In: 2022 IEEE international IOT, electronics and mechatronics conference (IEMTRONICS), IEEE, pp 1–8
Joyce T, Herrmann JM (2018) A review of no free lunch theorems, and their implications for metaheuristic optimisation. In: Yang XS (ed) Nature-inspired algorithms and applied optimization. Springer, Cham, pp 27–51
Chapter Google Scholar
Khuller S, Moss A, Naor JS (1999) The budgeted maximum coverage problem. Inf Process Lett 70(1):39–45
Article MathSciNet Google Scholar
Kumar A, Nadeem M, Banka H (2023) Nature inspired optimization algorithms: a comprehensive overview. Evol Syst 14(1):141–156
Article Google Scholar
LLC M (2019) International institution of forecasters. https://forecasters.org/resources/time-series-data/m3-competition/
Lorena AC, Maciel AI, de Miranda PB, Costa IG, Prudêncio RB (2018) Data complexity meta-features for regression problems. Mach Learn 107(1):209–246
Article MathSciNet Google Scholar
Lorena AC, Garcia LP, Lehmann J, Souto MC, Ho TK (2019) How complex is your classification problem? A survey on measuring classification complexity. ACM Comput Surv (CSUR) 52(5):1–34
Article Google Scholar
Luengo J, Herrera F (2015) An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowl Inf Syst 42(1):147–180
Article Google Scholar
Ma BJ, Pereira JLJ, Oliva D, Liu S, Kuo YH (2023) Manta ray foraging optimizer-based image segmentation with a two-strategy enhancement. Knowl Based Syst 28:110247
Article Google Scholar
Macià N, Bernadó-Mansilla E (2014) Towards UCI+: a mindful repository design. Inf Sci 261:237–262
Article Google Scholar
Matt PA, Ziegler R, Brajovic D, Roth M, Huber MF (2022) A nested genetic algorithm for explaining classification data sets with decision rules. arXiv preprint arXiv:2209.07575
Muñoz MA, Smith-Miles KA (2019) Generating new space-filling test instances for continuous black-box optimization. Evolut Comput. https://doi.org/10.1162/evco_a_00262
Article Google Scholar
Muñoz MA, Smith-Miles K (2020) Generating new space-filling test instances for continuous black-box optimization. Evol Comput 28(3):379–404
Article PubMed Google Scholar
Munoz MA, Villanova L, Baatar D, Smith-Miles K (2018) Instance spaces for machine learning classification. Mach Learn 107(1):109–147
Article MathSciNet Google Scholar
Muñoz MA, Yan T, Leal MR, Smith-Miles K, Lorena AC, Pappa GL, Rodrigues RM (2021) An instance space analysis of regression problems. ACM Trans Knowl Discov Data (TKDD) 15(2):1–25
Article Google Scholar
Nascimento AI, Bastos-Filho CJ (2010) A particle swarm optimization based approach for the maximum coverage problem in cellular base stations positioning. In: 2010 10th international conference on hybrid intelligent systems, IEEE, pp 91–96
Olson RS, La Cava W, Orzechowski P, Urbanowicz RJ, Moore JH (2017) PMLB: a large benchmark suite for machine learning evaluation and comparison. BioData Min 10(1):1–13
Article Google Scholar
Orriols-Puig A, Macia N, Ho TK (2010) Documentation for the data complexity library in C++. Universitat Ramon Llull La Salle 196(1–40):12
Google Scholar
Paleyes A, Urma RG, Lawrence ND (2022) Challenges in deploying machine learning: a survey of case studies. ACM Comput Surv 55(6):1–29
Article Google Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36(2):3336–3341
Article Google Scholar
Pereira JLJ, Francisco MB, da Cunha Jr SS, Gomes GF (2021a) A powerful Lichtenberg optimization algorithm: a damage identification case study. Eng Appl Artif Intell 97:104055
Article Google Scholar
Pereira JLJ, Francisco MB, Diniz CA, Oliver GA, Cunha SS Jr, Gomes GF (2021b) Lichtenberg algorithm: a novel hybrid physics-based meta-heuristic for global optimization. Expert Syst Appl 170:114522
Article Google Scholar
Pereira JLJ, Oliver GA, Francisco MB, Cunha SS, Gomes GF (2021c) A review of multi-objective optimization: methods and algorithms in mechanical engineering problems. Arch Comput Methods Eng. https://doi.org/10.1007/s11831-021-09663-x
Article Google Scholar
Pereira JLJ, Francisco MB, de Oliveira LA, Chaves JAS, Cunha SS Jr, Gomes GF (2022a) Multi-objective sensor placement optimization of helicopter rotor blade based on feature selection. Mech Syst Signal Process 180:109466
Article Google Scholar
Pereira JLJ, Francisco MB, Ribeiro RF, Cunha SS, Gomes GF (2022b) Deep multiobjective design optimization of CFRP isogrid tubes using Lichtenberg algorithm. Soft Comput 26:7195–7209
Article Google Scholar
Pereira JLJ, Oliver GA, Francisco MB, Cunha SS Jr, Gomes GF (2022c) Multi-objective Lichtenberg algorithm: a hybrid physics-based meta-heuristic for solving engineering problems. Expert Syst Appl 187:115939
Article Google Scholar
Rahmani O, Naderi B, Mohammadi M, Koupaei MN (2018) A novel genetic algorithm for the maximum coverage problem in the three-level supply chain network. Int J Ind Syst Eng 30(2):219–236
Google Scholar
Ristoski P, Vries GKDd, Paulheim H (2016) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: International semantic web conference. Springer, pp 186–194
Rivolli A, Garcia LP, Soares C, Vanschoren J, de Carvalho AC (2022) Meta-features for meta-learning. Knowl-Based Syst 240:108101
Article Google Scholar
Smith-Miles K, Muñoz MA (2023) Instance space analysis for algorithm testing: methodology and software tools. ACM Comput Surv. https://doi.org/10.1145/3572895
Article Google Scholar
Smith-Miles KA (2009) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv (CSUR) 41(1):6
Article Google Scholar
Soares C (2009) UCI++: improved support for algorithm selection using datasetoids. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 499–506
Takamoto M, Praditia T, Leiteritz R, MacKinlay D, Alesiani F, Pflüger D, Niepert M (2022) Pdebench: an extensive benchmark for scientific machine learning. arXiv preprint arXiv:2210.07182
Taşdemir A, Demirci S, Aslan S (2022) Performance investigation of immune plasma algorithm on solving wireless sensor deployment problem. In: 2022 9th international conference on electrical and electronics engineering (ICEEE), IEEE, pp 296–300
Thiyagalingam J, Shankar M, Fox G, Hey T (2022) Scientific machine learning benchmarks. Nat Rev Phys 4(6):413–420
Article Google Scholar
Tian Z, Wang J (2022) Variable frequency wind speed trend prediction system based on combined neural network and improved multi-objective optimization algorithm. Energy 254:124249
Article Google Scholar
Tossa F, Abdou W, Ansari K, Ezin EC, Gouton P (2022) Area coverage maximization under connectivity constraint in wireless sensor networks. Sensors 22(5):1712
Article ADS PubMed PubMed Central Google Scholar
Vanschoren J (2019) Meta-learning. In: Hutter F, Kotthoff L, Vanschoren J (eds) Automated machine learning. Springer, Cham, pp 35–61
Chapter Google Scholar
Vanschoren J, Van Rijn JN, Bischl B, Torgo L (2014) Openml: networked science in machine learning. ACM SIGKDD Explor Newsl 15(2):49–60
Article Google Scholar
Witten TA Jr, Sander LM (1981) Diffusion-limited aggregation, a kinetic critical phenomenon. Phys Rev Lett 47(19):1400
Article ADS CAS Google Scholar
Wolpert DH (2002) The supervised learning no-free-lunch theorems. In: Roy R, Koppen M, Ovaska S, Furuhashi T, Hoffmann F (eds) Soft computing and industry. Springer, London, pp 25–42
Chapter Google Scholar
Xiao H, Cheng Y (2022) The image segmentation of Osmanthus fragrans based on optimization algorithms. In: 2022 4th international conference on advances in computer technology. Information science and communications (CTISC), IEEE, pp 1–5
Yang XS (2020) Nature-inspired optimization algorithms. Academic Press, New York
Google Scholar
Yarrow S, Razak KA, Seitz AR, Seriès P (2014) Detecting and quantifying topography in neural maps. PLoS ONE 9(2):e87178
Article ADS PubMed PubMed Central Google Scholar
Yuan Y, Tole K, Ni F, He K, Xiong Z, Liu J (2022) Adaptive simulated annealing with greedy search for the circle bin packing problem. Comput Oper Res 144:105826
Article Google Scholar
Zhang Z, Schwartz S, Wagner L, Miller W (2000) A greedy algorithm for aligning DNA sequences. J Comput Biol 7(1–2):203–214
Article CAS PubMed Google Scholar

Download references

Funding

This work was partially supported by the Brazilian research agencies CNPq (Grant 307892/2020-4) and FAPESP (Grants 2021/06870-3 and 2022/10683-7). The Australian authors gratefully acknowledge funding from the Australian Research Council (Grant IC200100009) provided to the ARC Training Centre in Optimisation Technologies, Integrated Methodologies and Applications (OPTIMA).

Author information

Authors and Affiliations

Instituto Tecnológico de Aeronáutica, São José dos Campos, Brazil
João Luiz Junho Pereira & Ana Carolina Lorena
School of Mathematics and Statistics, University of Melbourne, Melbourne, Australia
Kate Smith-Miles
School of Computer and Information Systems, University of Melbourne, Melbourne, Australia
Mario Andrés Muñoz

Authors

João Luiz Junho Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Kate Smith-Miles
View author publications
You can also search for this author in PubMed Google Scholar
Mario Andrés Muñoz
View author publications
You can also search for this author in PubMed Google Scholar
Ana Carolina Lorena
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JLJP implemented the optimization framework dedicated to benchmark selection and has run the computational experiments from the paper. KS-M is the proponent of the ISA framework. MAM has implemented the MATILDA tool and generated the instance spaces for classification and regression problems. ACL proposed and supervised all the work from this paper. All authors contributed with paper writing and organization.

Corresponding author

Correspondence to João Luiz Junho Pereira.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Responsible editor: Charalampos Tsourakakis.

A Supplementary Material

1.1 A.1 Extra figures and tables

Figure 14 summarizes the Lichtenberg Algorithm.

Table 1 presents the list of classification datasets selected by the LM algorithm, for each benchmark size M.

Table 1 The most diverse classification datasets

Full size table

Table 2 shows the list of classification datasets chosen when only the hardest quadrant of the IS is considered as search space.

Table 2 The most challenging classification datasets

Full size table

Table 3 presents the list of regression datasets selected by the LM algorithm, for each benchmark size M.

Table 3 The most diverse regression datasets

Full size table

Table 4 shows the list of regression datasets chosen when only the hardest quadrant of the IS is considered as search space.

Table 4 The most challenging regression datasets

Full size table

1.2 A.2 More non-parametric tests’ analysis

Figure 15 shows the CD diagrams after comparing the pool of regression algorithms in $\mathcal A$ using the diverse and the hard benchmarks containing $M=30$ datasets. As in the case of classification problems, the Friedman multiple comparison test is employed, followed by the Nemenyi test at 95% of confidence level (Demsar 2006; Calvo and Santafé Rodrigo 2016). There are more noticeable differences in the rankins of algorithms here. For instance, the algorithm Bayesian ARD (ARD), which is the best performing algorithms for the set of hard datasets, is one of the intermediary solutions in the diverse benchmark of datasets.

In addition to the Friedman multiple comparison non-parametric test used and expressed in Figs. 7 and 15, which is graphically valuable for multiple comparisons along multiple datasets, the Bayesian non-parametric test is also used here. This method allows a more detailed comparison of the performance of the employed algorithms, both regressors and classifiers, in a pairwise comparison (Benavoli et al. 2017; Corani and Benavoli 2015).

Starting from the performance difference between two algorithms in all the datasets, this test calculates the probability p of the algorithm to be the best in these datasets. Or even the probability that both are equivalent, through the determination of a region of practical equivalence (rope). Therefore, three probabilistic regions have to be determined. However, having 10 classifiers and 14 regressors in this work, this would result in 45 and 91 combinations, respectively. Considering diverse and hardest datasets, there would be 272 combinations to apply the test. Since the Friedman-test pointed out the best ranked algorithms in each of the cases, these will be used as reference to be compared with the others.

Table 5 shows the Bayesian test results for the ML algorithms on the diverse and difficult classification datasets. In both, Classifier 1 is the one with the best ranking in the Friedman test. The results coincide with those of the Friedman test in pointing out that LSVM overcomes the results of NB, PSVM and QDA for diverse datasets, with a large certainty. In the hardest datasets, RF was best ranked and there is a large confidence (higher than 90%) that its results are superior to those of QDA, PSVM, LDA and RSVM. In the Friedman test (diagram of Fig. 7b), the differences between RF, LDA, and RSVM are not conclusive.

Table 5 Results of Bayesian test between the best ranked classifier in the Friedman test and the other classifiers for the benchmark of 30 datasets

Full size table

Table 6 brings the results of the Bayesian test for the regressors on the diverse and hardest regression datasets. For diverse regression datasets, BAG was the best ranked algorithm and is compared against the other regressors. BAG has outperformed most of the algorithms, with the exception of RF and GB, where the rope probability precludes this assertion. Still, it can be seen that it is only slightly better than ARD. Dealing with hard datasets, there is no doubt that for all datasets the ARD algorithm is the most accurate, whilst in the Friedman test ARD has shown similar performance to GB, BAG, AB, RF and nSVR.

Table 6 Results of Bayesian test between the best ranked regressor in the Friedman test and the other regressors for the benchmard of 30 diverse datasets

Full size table

1.3 A.3 Comparison to other meta-heuristics

Meta-heuristics are nature-inspired optimization algorithms that computationally assemble some natural behavior to explore and exploit search spaces to find the best possible solutions. They can be divided according to their inspiration creation and basis into the following categories: (i) evolutionary (most common); (ii) swarms; (iii) physical phenomena; and (iv) human behaviors. Besides this, they can be divided according to their search strategies into population and trajectory-based, being the former the category that presents the vast majority of known algorithms (Yang 2020).

In recent years, the literature has brought an explosion of meta-heuristic applications in optimization problems, overlapping with classical and gradient-based methods. Some of the factors that contribute to their success are: (i) better ability to escape from local optima; (ii) better ability to deal with multimodal, convex, and discrete problems; (iii) better capacity to deal with many variables and objectives; (iv) gradient independence; (v) independence from explicit equations, since they can be, for example, easily associated with numerical analysis software and ML algorithms to give responses from inputs, among others (Kumar et al. 2023; Pereira et al. 2021c).

Also according to the no free-lunch theorem, there is no single meta-heuristic that can be the best in all applications and they compete to deliver the best results at the lowest computational cost (Wolpert 2002; Joyce and Herrmann 2018). As seen before, the optimization problem proposed in this study is combinatorial and was solved in the paper with the LA algorithm. But, for fair comparison, other three meta-heuristics are applied here: GA, PSO, and DE. They are the most popular and classical meta-heuristics and have several good reported results (Yang 2020). All these algorithms have as common parameters the population size and number of iterations.

The GA is the most popular evolution-based meta-heuristic in the literature and is inspired by the natural selection phenomenon and genetics in biology. The agents with best fitness survive and the others tends to vanish. It uses the principles of reproduction, crossover, and mutation to guide the population in the search space through the generations. Crossover improves exploitation, while the mutation guarantees better exploration. Its particular parameters are crossover and mutation rates.

DE has the similar inspiration to GA, but at each iteration it randomly selects three agents in the entire population and combines their characteristics. Its particular parameters are crossover rate (probability that a new solution will be created by the three agents) and differential weight (distance between them).

PSO is the most popular swarm-based optimizer and is inspired by the bird flocking social behavior, where a set of particles (potential solutions) moves around the search space by updating their positions based on their own best position and the best position found by the swarm. It has three particular parameters: cognitive factor (attraction between the particle and its personal best position), social factor (attraction between the particle and the swarm’s best position), and inertia weight (controls the impact of the particle’s previous speed on its present speed).

The main parameters and the recommended values by the authors that published these algorithms are in Table 7. Beyond these parameters, the population size and number of iterations are shared between all algorithms. They are set to ten times the number of optimization variables and one hundred, respectively (Yang 2020).

Table 7 Meta-heuristics settings

Full size table

The algorithms with the parameters in Table 7 were applied in the problem of Eq. 14 for the Classification IS to select ten diverse datasets, which results in twenty design variables. The only objective here is to observe which of the four algorithms finds the maximum coverage of the IS. A number pf 10 datasets was chosen because it represents a median dimensionality among those adopted in this study, with a moderate computational cost. Running all cases would be computational costly and comparing metaheuristics is not the main purpose of this study. All simulations were run using the software R2022b MATLAB on a CORE i7 Dell computer with 8GB and 1 TB HHD. Each meta-heuristic was run 10 times. The mean and standard deviation of the maximum coverage result and the total time spent on the simulations are in Table 8.

Table 8 Meta-heuristics results for the selection of 10 datasets

Full size table

LA was the most accurate technique, finding the best maximum coverage values on average (in bold in Table 8) for the problem and with a lower standard deviation. Next comes GA, PSO and DE. However, it had the third highest computational cost, behind DE and PSO, respectively. Since the algorithm is run in advance in order to select a benchmark that will be used multiple times, our choice was for the technique with highest accuracy and more stable results in the problem.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pereira, J.L.J., Smith-Miles, K., Muñoz, M.A. et al. Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation. Data Min Knowl Disc 38, 461–500 (2024). https://doi.org/10.1007/s10618-023-00957-1

Download citation

Received: 09 February 2023
Accepted: 06 July 2023
Published: 20 October 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10618-023-00957-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dataset Weighting via Intrinsic Data Characteristics for Pairwise Statistical Comparisons in Classification

CIAMS: clustering indices-based automatic classification model selection

PMLB: a large benchmark suite for machine learning evaluation and comparison

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Supplementary Material

1.1 A.1 Extra figures and tables

1.2 A.2 More non-parametric tests’ analysis

1.3 A.3 Comparison to other meta-heuristics

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Optimal selection of benchmarking datasets for unbiased machine learning algorithm evaluation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dataset Weighting via Intrinsic Data Characteristics for Pairwise Statistical Comparisons in Classification

CIAMS: clustering indices-based automatic classification model selection

PMLB: a large benchmark suite for machine learning evaluation and comparison

Explore related subjects

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

A Supplementary Material

A Supplementary Material

1.1 A.1 Extra figures and tables

1.2 A.2 More non-parametric tests’ analysis

1.3 A.3 Comparison to other meta-heuristics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation