research-article

An Instance Space Analysis of Regression Problems

Authors:
Mario Andrés Muñoz

The University of Melbourne, Parkville, Victoria, Australia

The University of Melbourne, Parkville, Victoria, Australia
View Profile

,
Tao Yan

The University of Melbourne, Parkville, Victoria, Australia

The University of Melbourne, Parkville, Victoria, Australia
View Profile

,
Matheus R. Leal

Universidade Federal de Minas Gerais, MG, Brazil

Universidade Federal de Minas Gerais, MG, Brazil
View Profile

,
Kate Smith-Miles

The University of Melbourne, Parkville, Victoria, Australia

The University of Melbourne, Parkville, Victoria, Australia
View Profile

,
Ana Carolina Lorena

Instituto Tecnológico de Aeronáutica, SP - Brazil

Instituto Tecnológico de Aeronáutica, SP - Brazil
View Profile

,
Gisele L. Pappa

Universidade Federal de Minas Gerais, MG, Brazil

Universidade Federal de Minas Gerais, MG, Brazil
View Profile

,
Rômulo Madureira Rodrigues

Instituto Tecnológico de Aeronáutica

Instituto Tecnológico de Aeronáutica
View Profile

ACM Transactions on Knowledge Discovery from Data Volume 15 Issue 2Article No.: 28pp 1–25https://doi.org/10.1145/3436893

Published:27 March 2021Publication History

ACM Transactions on Knowledge Discovery from Data

Abstract

The quest for greater insights into algorithm strengths and weaknesses, as revealed when studying algorithm performance on large collections of test problems, is supported by interactive visual analytics tools. A recent advance is Instance Space Analysis, which presents a visualization of the space occupied by the test datasets, and the performance of algorithms across the instance space. The strengths and weaknesses of algorithms can be visually assessed, and the adequacy of the test datasets can be scrutinized through visual analytics. This article presents the first Instance Space Analysis of regression problems in Machine Learning, considering the performance of 14 popular algorithms on 4,855 test datasets from a variety of sources. The two-dimensional instance space is defined by measurable characteristics of regression problems, selected from over 26 candidate features. It enables the similarities and differences between test instances to be visualized, along with the predictive performance of regression algorithms across the entire instance space. The purpose of creating this framework for visual analysis of an instance space is twofold: one may assess the capability and suitability of various regression techniques; meanwhile the bias, diversity, and level of difficulty of the regression problems popularly used by the community can be visually revealed. This article shows the applicability of the created regression instance space to provide insights into the strengths and weaknesses of regression algorithms, and the opportunities to diversify the benchmark test instances to support greater insights.

References

Jesús Alcalá-Fdez, Alberto Fernández, Julián Luengo, Joaquín Derrac, Salvador García, Luciano Sánchez, and Francisco Herrera. 2011. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing 17, 2--3 (2011), 255--287.Google Scholar
M. Fatih Amasyali and Okan K. Ersoy. 2009. A Study of Meta Learning for Regression. Technical Report. ECE Technical Reports, Purdue University.Google Scholar
Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, and Peter Flach. 2019. IRT: A new item response model and its applications. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Vol. 89.Google Scholar
M. Daszykowski, B. Walczak, and D. L. Massart. 2001. Looking for natural patterns in data: Part 1. Density-based approach. Chemometrics and Intelligent Laboratory Systems 56, 2 (2001), 83--92. DOI:https://doi.org/10.1016/S0169-7439(01)00111-3Google ScholarCross Ref
Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.Google Scholar
María José Gacto, Jose Manuel Soto-Hidalgo, Jesús Alcalá-Fdez, and Rafael Alcalá. 2019. Experimental study on 164 algorithms available in software tools for solving standard non-linear regression problems. IEEE Access 7 (2019), 108916--108939.Google ScholarCross Ref
Taciana A. F. Gomes, Ricardo B. C. Prudêncio, Carlos Soares, André L. D. Rossi, and André Carvalho. 2012. Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75, 1 (2012), 3--13.Google ScholarDigital Library
Nikolaus Hansen, Anne Auger, Steffen Finck, and Raymond Ros. 2014. Real-Parameter Black-Box Optimization Benchmarking BBOB-2010: Experimental Setup. Technical Report RR-7215. INRIA. Retrieved from http://coco.lri.fr/downloads/download15.02/bbobdocexperiment.pdf.Google Scholar
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning. Springer.Google Scholar
Rob J. Hyndman and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 4 (Oct. 2006), 679--688. DOI:https://doi.org/10.1016/j.ijforecast.2006.03.001Google ScholarCross Ref
Yanfei Kang, Rob J. Hyndman, and Kate Smith-Miles. 2017. Visualising forecasting algorithm performance using time series instance spaces. International Journal of Forecasting 33, 2 (2017), 345--358.Google ScholarCross Ref
Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. 2002. Exploiting sampling and meta-learning for parameter setting support vector machines. In Proceedings of the IBERAMIA. Vol. 2002. 217--225.Google Scholar
MultiMedia LLC. 2019. International Institution of Forecasters. Retrieved from https://forecasters.org/resources/time-series-data/m3-competition/.Google Scholar
Ana C. Lorena, Aron I. Maciel, Péricles B. C. de Miranda, Ivan G. Costa, and Ricardo B. C. Prudêncio. 2018. Data complety meta-features for regression problems. Machine Learning 107, 1 (2018), 209--246.Google ScholarDigital Library
Julián Luengo and Francisco Herrera. 2015. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowledge and Information Systems 42, 1 (2015), 147--180.Google ScholarDigital Library
Fernando Martínez-Plumed, Ricardo B. C. Prudêncio, Adolfo Martínez-Usó, and José Hernández-Orallo. 2019. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 271, June 2019 (2019), 18--42.Google Scholar
Mario A. Muñoz and Kate A. Smith-Miles. 2019. Generating new space-filling test instances for continuous black-box optimization. Evolutionary Computation 28, 3 (2019), 379--404. DOI:https://doi.org/10.1162/evco_a_00262Google ScholarDigital Library
Mario A. Muñoz and Kate Smith-Miles. 2017. Performance analysis of continuous black-box optimization algorithms via footprints in instance space. Evolutionary Computation 25, 4 (2017), 529--554.Google ScholarDigital Library
Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. Instance spaces for machine learning classification. Machine Learning 107, 1 (2018), 109--147.Google ScholarDigital Library
Mario A. Muñoz and Kate Smith-Miles. 2020. Instance Space Analysis: A Toolkit for the Assessment of Algorithmic Power. Retrieved from https://github.com/andremun/InstanceSpace.Google Scholar
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825--2830. http://jmlr.org/papers/v12/pedregosa11a.html.Google ScholarDigital Library
John R. Rice. 1976. The algorithm selection problem. In Advances in Computers. Vol. 15. Elsevier, 65--118.Google Scholar
Thomas P. Ryan. 2008. Modern Regression Methods. Vol. 655. John Wiley & Sons.Google Scholar
Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems 42, 3 (July 2017), 21 pages. DOI:https://doi.org/10.1145/3068335Google ScholarDigital Library
Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. An instance level analysis of data complexity. Machine Learning 95, 2 (2014), 225--256.Google ScholarDigital Library
Kate Smith-Miles, Davaatseren Baatar, Brendan Wreford, and Rhyd Lewis. 2014. Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45, May 2014 (2014), 12--24. DOI:https://doi.org/10.1016/j.cor.2013.11.015Google ScholarDigital Library
Kate Smith-Miles, Mario A. Muñoz and Neelofar. 2019. MATILDA: Melbourne Algorithm Test Instance Library with Data Analytics. Retrieved from https://matilda.unimelb.edu.au.Google Scholar
Kate Smith-Miles and Thomas T. Tan. 2012. Measuring algorithm footprints in instance space. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation. IEEE, 3446--3453.Google Scholar
Kate A. Smith-Miles. 2009. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41, 1 (2009), 6.Google ScholarDigital Library
Carlos Soares, Pavel B. Brazdil, and Petr Kuba. 2004. A meta-learning method to select the kernel width in support vector regression. Machine Learning 54, 3 (2004), 195--209.Google ScholarDigital Library
Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked science in machine learning. SIGKDD Explorations 15, 2 (2013), 49--60. DOI:https://doi.org/10.1145/2641190.2641198Google ScholarDigital Library
Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial Intelligence Review 18, 2 (2002), 77--95.Google ScholarDigital Library

Index Terms

Recommendations

Instance Space Analysis for Algorithm Testing: Methodology and Software Tools
Instance Space Analysis (ISA) is a recently developed methodology to (a) support objective testing of algorithms and (b) assess the diversity of test instances. Representing test instances as feature vectors, the ISA methodology extends Rice’s 1976 ...
Read More
Instance space analysis for a personnel scheduling problem
Abstract
This paper considers the Rotating Workforce Scheduling Problem, and shows how the strengths and weaknesses of various solution methods can be understood by the in-depth evaluation offered by a recently developed methodology known as Instance Space ...
Read More
Algorithm selection and instance space analysis for curriculum-based course timetabling
Abstract
We propose an algorithm selection approach and an instance space analysis for the well-known curriculum-based course timetabling problem (CB-CTT), which is an important problem for its application in higher education. Several state of the art ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Knowledge Discovery from Data Volume 15, Issue 2
Survey Paper and Regular Papers
April 2021
524 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3446665
Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA
Issue’s Table of Contents
Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 March 2021
- Accepted: 1 November 2020
- Revised: 1 July 2020
- Received: 1 February 2020
Published in tkdd Volume 15, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Algorithm selection
instance spaces
machine learning
regression
visual analytics
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 265
  Total Downloads
- Downloads (Last 12 months)67
- Downloads (Last 6 weeks)9
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Instance Space Analysis of Regression Problems

ACM Transactions on Knowledge Discovery from Data

Abstract

References

Cited By

Index Terms

Recommendations

Instance Space Analysis for Algorithm Testing: Methodology and Software Tools

Instance space analysis for a personnel scheduling problem

Algorithm selection and instance space analysis for curriculum-based course timetabling