Abstract
The quest for greater insights into algorithm strengths and weaknesses, as revealed when studying algorithm performance on large collections of test problems, is supported by interactive visual analytics tools. A recent advance is Instance Space Analysis, which presents a visualization of the space occupied by the test datasets, and the performance of algorithms across the instance space. The strengths and weaknesses of algorithms can be visually assessed, and the adequacy of the test datasets can be scrutinized through visual analytics. This article presents the first Instance Space Analysis of regression problems in Machine Learning, considering the performance of 14 popular algorithms on 4,855 test datasets from a variety of sources. The two-dimensional instance space is defined by measurable characteristics of regression problems, selected from over 26 candidate features. It enables the similarities and differences between test instances to be visualized, along with the predictive performance of regression algorithms across the entire instance space. The purpose of creating this framework for visual analysis of an instance space is twofold: one may assess the capability and suitability of various regression techniques; meanwhile the bias, diversity, and level of difficulty of the regression problems popularly used by the community can be visually revealed. This article shows the applicability of the created regression instance space to provide insights into the strengths and weaknesses of regression algorithms, and the opportunities to diversify the benchmark test instances to support greater insights.
- Jesús Alcalá-Fdez, Alberto Fernández, Julián Luengo, Joaquín Derrac, Salvador García, Luciano Sánchez, and Francisco Herrera. 2011. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing 17, 2--3 (2011), 255--287.Google Scholar
- M. Fatih Amasyali and Okan K. Ersoy. 2009. A Study of Meta Learning for Regression. Technical Report. ECE Technical Reports, Purdue University.Google Scholar
- Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, and Peter Flach. 2019. IRT: A new item response model and its applications. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Vol. 89.Google Scholar
- M. Daszykowski, B. Walczak, and D. L. Massart. 2001. Looking for natural patterns in data: Part 1. Density-based approach. Chemometrics and Intelligent Laboratory Systems 56, 2 (2001), 83--92. DOI:https://doi.org/10.1016/S0169-7439(01)00111-3Google ScholarCross Ref
- Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.Google Scholar
- María José Gacto, Jose Manuel Soto-Hidalgo, Jesús Alcalá-Fdez, and Rafael Alcalá. 2019. Experimental study on 164 algorithms available in software tools for solving standard non-linear regression problems. IEEE Access 7 (2019), 108916--108939.Google ScholarCross Ref
- Taciana A. F. Gomes, Ricardo B. C. Prudêncio, Carlos Soares, André L. D. Rossi, and André Carvalho. 2012. Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75, 1 (2012), 3--13.Google ScholarDigital Library
- Nikolaus Hansen, Anne Auger, Steffen Finck, and Raymond Ros. 2014. Real-Parameter Black-Box Optimization Benchmarking BBOB-2010: Experimental Setup. Technical Report RR-7215. INRIA. Retrieved from http://coco.lri.fr/downloads/download15.02/bbobdocexperiment.pdf.Google Scholar
- Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning. Springer.Google Scholar
- Rob J. Hyndman and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 4 (Oct. 2006), 679--688. DOI:https://doi.org/10.1016/j.ijforecast.2006.03.001Google ScholarCross Ref
- Yanfei Kang, Rob J. Hyndman, and Kate Smith-Miles. 2017. Visualising forecasting algorithm performance using time series instance spaces. International Journal of Forecasting 33, 2 (2017), 345--358.Google ScholarCross Ref
- Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. 2002. Exploiting sampling and meta-learning for parameter setting support vector machines. In Proceedings of the IBERAMIA. Vol. 2002. 217--225.Google Scholar
- MultiMedia LLC. 2019. International Institution of Forecasters. Retrieved from https://forecasters.org/resources/time-series-data/m3-competition/.Google Scholar
- Ana C. Lorena, Aron I. Maciel, Péricles B. C. de Miranda, Ivan G. Costa, and Ricardo B. C. Prudêncio. 2018. Data complety meta-features for regression problems. Machine Learning 107, 1 (2018), 209--246.Google ScholarDigital Library
- Julián Luengo and Francisco Herrera. 2015. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowledge and Information Systems 42, 1 (2015), 147--180.Google ScholarDigital Library
- Fernando Martínez-Plumed, Ricardo B. C. Prudêncio, Adolfo Martínez-Usó, and José Hernández-Orallo. 2019. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 271, June 2019 (2019), 18--42.Google Scholar
- Mario A. Muñoz and Kate A. Smith-Miles. 2019. Generating new space-filling test instances for continuous black-box optimization. Evolutionary Computation 28, 3 (2019), 379--404. DOI:https://doi.org/10.1162/evco_a_00262Google ScholarDigital Library
- Mario A. Muñoz and Kate Smith-Miles. 2017. Performance analysis of continuous black-box optimization algorithms via footprints in instance space. Evolutionary Computation 25, 4 (2017), 529--554.Google ScholarDigital Library
- Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. Instance spaces for machine learning classification. Machine Learning 107, 1 (2018), 109--147.Google ScholarDigital Library
- Mario A. Muñoz and Kate Smith-Miles. 2020. Instance Space Analysis: A Toolkit for the Assessment of Algorithmic Power. Retrieved from https://github.com/andremun/InstanceSpace.Google Scholar
- Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825--2830. http://jmlr.org/papers/v12/pedregosa11a.html.Google ScholarDigital Library
- John R. Rice. 1976. The algorithm selection problem. In Advances in Computers. Vol. 15. Elsevier, 65--118.Google Scholar
- Thomas P. Ryan. 2008. Modern Regression Methods. Vol. 655. John Wiley & Sons.Google Scholar
- Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems 42, 3 (July 2017), 21 pages. DOI:https://doi.org/10.1145/3068335Google ScholarDigital Library
- Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. An instance level analysis of data complexity. Machine Learning 95, 2 (2014), 225--256.Google ScholarDigital Library
- Kate Smith-Miles, Davaatseren Baatar, Brendan Wreford, and Rhyd Lewis. 2014. Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45, May 2014 (2014), 12--24. DOI:https://doi.org/10.1016/j.cor.2013.11.015Google ScholarDigital Library
- Kate Smith-Miles, Mario A. Muñoz and Neelofar. 2019. MATILDA: Melbourne Algorithm Test Instance Library with Data Analytics. Retrieved from https://matilda.unimelb.edu.au.Google Scholar
- Kate Smith-Miles and Thomas T. Tan. 2012. Measuring algorithm footprints in instance space. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation. IEEE, 3446--3453.Google Scholar
- Kate A. Smith-Miles. 2009. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41, 1 (2009), 6.Google ScholarDigital Library
- Carlos Soares, Pavel B. Brazdil, and Petr Kuba. 2004. A meta-learning method to select the kernel width in support vector regression. Machine Learning 54, 3 (2004), 195--209.Google ScholarDigital Library
- Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked science in machine learning. SIGKDD Explorations 15, 2 (2013), 49--60. DOI:https://doi.org/10.1145/2641190.2641198Google ScholarDigital Library
- Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial Intelligence Review 18, 2 (2002), 77--95.Google ScholarDigital Library
Index Terms
- An Instance Space Analysis of Regression Problems
Recommendations
Instance Space Analysis for Algorithm Testing: Methodology and Software Tools
Instance Space Analysis (ISA) is a recently developed methodology to (a) support objective testing of algorithms and (b) assess the diversity of test instances. Representing test instances as feature vectors, the ISA methodology extends Rice’s 1976 ...
Instance space analysis for a personnel scheduling problem
AbstractThis paper considers the Rotating Workforce Scheduling Problem, and shows how the strengths and weaknesses of various solution methods can be understood by the in-depth evaluation offered by a recently developed methodology known as Instance Space ...
Algorithm selection and instance space analysis for curriculum-based course timetabling
AbstractWe propose an algorithm selection approach and an instance space analysis for the well-known curriculum-based course timetabling problem (CB-CTT), which is an important problem for its application in higher education. Several state of the art ...
Comments