skip to main content
research-article

An Instance Space Analysis of Regression Problems

Published:27 March 2021Publication History
Skip Abstract Section

Abstract

The quest for greater insights into algorithm strengths and weaknesses, as revealed when studying algorithm performance on large collections of test problems, is supported by interactive visual analytics tools. A recent advance is Instance Space Analysis, which presents a visualization of the space occupied by the test datasets, and the performance of algorithms across the instance space. The strengths and weaknesses of algorithms can be visually assessed, and the adequacy of the test datasets can be scrutinized through visual analytics. This article presents the first Instance Space Analysis of regression problems in Machine Learning, considering the performance of 14 popular algorithms on 4,855 test datasets from a variety of sources. The two-dimensional instance space is defined by measurable characteristics of regression problems, selected from over 26 candidate features. It enables the similarities and differences between test instances to be visualized, along with the predictive performance of regression algorithms across the entire instance space. The purpose of creating this framework for visual analysis of an instance space is twofold: one may assess the capability and suitability of various regression techniques; meanwhile the bias, diversity, and level of difficulty of the regression problems popularly used by the community can be visually revealed. This article shows the applicability of the created regression instance space to provide insights into the strengths and weaknesses of regression algorithms, and the opportunities to diversify the benchmark test instances to support greater insights.

References

  1. Jesús Alcalá-Fdez, Alberto Fernández, Julián Luengo, Joaquín Derrac, Salvador García, Luciano Sánchez, and Francisco Herrera. 2011. KEEL data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. Journal of Multiple-Valued Logic & Soft Computing 17, 2--3 (2011), 255--287.Google ScholarGoogle Scholar
  2. M. Fatih Amasyali and Okan K. Ersoy. 2009. A Study of Meta Learning for Regression. Technical Report. ECE Technical Reports, Purdue University.Google ScholarGoogle Scholar
  3. Yu Chen, Telmo Silva Filho, Ricardo B. C. Prudêncio, Tom Diethe, and Peter Flach. 2019. IRT: A new item response model and its applications. In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, Vol. 89.Google ScholarGoogle Scholar
  4. M. Daszykowski, B. Walczak, and D. L. Massart. 2001. Looking for natural patterns in data: Part 1. Density-based approach. Chemometrics and Intelligent Laboratory Systems 56, 2 (2001), 83--92. DOI:https://doi.org/10.1016/S0169-7439(01)00111-3Google ScholarGoogle ScholarCross RefCross Ref
  5. Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.Google ScholarGoogle Scholar
  6. María José Gacto, Jose Manuel Soto-Hidalgo, Jesús Alcalá-Fdez, and Rafael Alcalá. 2019. Experimental study on 164 algorithms available in software tools for solving standard non-linear regression problems. IEEE Access 7 (2019), 108916--108939.Google ScholarGoogle ScholarCross RefCross Ref
  7. Taciana A. F. Gomes, Ricardo B. C. Prudêncio, Carlos Soares, André L. D. Rossi, and André Carvalho. 2012. Combining meta-learning and search techniques to select parameters for support vector machines. Neurocomputing 75, 1 (2012), 3--13.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Nikolaus Hansen, Anne Auger, Steffen Finck, and Raymond Ros. 2014. Real-Parameter Black-Box Optimization Benchmarking BBOB-2010: Experimental Setup. Technical Report RR-7215. INRIA. Retrieved from http://coco.lri.fr/downloads/download15.02/bbobdocexperiment.pdf.Google ScholarGoogle Scholar
  9. Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning. Springer.Google ScholarGoogle Scholar
  10. Rob J. Hyndman and Anne B. Koehler. 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 4 (Oct. 2006), 679--688. DOI:https://doi.org/10.1016/j.ijforecast.2006.03.001Google ScholarGoogle ScholarCross RefCross Ref
  11. Yanfei Kang, Rob J. Hyndman, and Kate Smith-Miles. 2017. Visualising forecasting algorithm performance using time series instance spaces. International Journal of Forecasting 33, 2 (2017), 345--358.Google ScholarGoogle ScholarCross RefCross Ref
  12. Petr Kuba, Pavel Brazdil, Carlos Soares, and Adam Woznica. 2002. Exploiting sampling and meta-learning for parameter setting support vector machines. In Proceedings of the IBERAMIA. Vol. 2002. 217--225.Google ScholarGoogle Scholar
  13. MultiMedia LLC. 2019. International Institution of Forecasters. Retrieved from https://forecasters.org/resources/time-series-data/m3-competition/.Google ScholarGoogle Scholar
  14. Ana C. Lorena, Aron I. Maciel, Péricles B. C. de Miranda, Ivan G. Costa, and Ricardo B. C. Prudêncio. 2018. Data complety meta-features for regression problems. Machine Learning 107, 1 (2018), 209--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Julián Luengo and Francisco Herrera. 2015. An automatic extraction method of the domains of competence for learning classifiers using data complexity measures. Knowledge and Information Systems 42, 1 (2015), 147--180.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fernando Martínez-Plumed, Ricardo B. C. Prudêncio, Adolfo Martínez-Usó, and José Hernández-Orallo. 2019. Item response theory in AI: Analysing machine learning classifiers at the instance level. Artificial Intelligence 271, June 2019 (2019), 18--42.Google ScholarGoogle Scholar
  17. Mario A. Muñoz and Kate A. Smith-Miles. 2019. Generating new space-filling test instances for continuous black-box optimization. Evolutionary Computation 28, 3 (2019), 379--404. DOI:https://doi.org/10.1162/evco_a_00262Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mario A. Muñoz and Kate Smith-Miles. 2017. Performance analysis of continuous black-box optimization algorithms via footprints in instance space. Evolutionary Computation 25, 4 (2017), 529--554.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mario A. Muñoz, Laura Villanova, Davaatseren Baatar, and Kate Smith-Miles. 2018. Instance spaces for machine learning classification. Machine Learning 107, 1 (2018), 109--147.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mario A. Muñoz and Kate Smith-Miles. 2020. Instance Space Analysis: A Toolkit for the Assessment of Algorithmic Power. Retrieved from https://github.com/andremun/InstanceSpace.Google ScholarGoogle Scholar
  21. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 85 (2011), 2825--2830. http://jmlr.org/papers/v12/pedregosa11a.html.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. John R. Rice. 1976. The algorithm selection problem. In Advances in Computers. Vol. 15. Elsevier, 65--118.Google ScholarGoogle Scholar
  23. Thomas P. Ryan. 2008. Modern Regression Methods. Vol. 655. John Wiley & Sons.Google ScholarGoogle Scholar
  24. Erich Schubert, Jörg Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. 2017. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems 42, 3 (July 2017), 21 pages. DOI:https://doi.org/10.1145/3068335Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Michael R. Smith, Tony Martinez, and Christophe Giraud-Carrier. 2014. An instance level analysis of data complexity. Machine Learning 95, 2 (2014), 225--256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kate Smith-Miles, Davaatseren Baatar, Brendan Wreford, and Rhyd Lewis. 2014. Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45, May 2014 (2014), 12--24. DOI:https://doi.org/10.1016/j.cor.2013.11.015Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kate Smith-Miles, Mario A. Muñoz and Neelofar. 2019. MATILDA: Melbourne Algorithm Test Instance Library with Data Analytics. Retrieved from https://matilda.unimelb.edu.au.Google ScholarGoogle Scholar
  28. Kate Smith-Miles and Thomas T. Tan. 2012. Measuring algorithm footprints in instance space. In Proceedings of the 2012 IEEE Congress on Evolutionary Computation. IEEE, 3446--3453.Google ScholarGoogle Scholar
  29. Kate A. Smith-Miles. 2009. Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Computing Surveys 41, 1 (2009), 6.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Carlos Soares, Pavel B. Brazdil, and Petr Kuba. 2004. A meta-learning method to select the kernel width in support vector regression. Machine Learning 54, 3 (2004), 195--209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Joaquin Vanschoren, Jan N. van Rijn, Bernd Bischl, and Luis Torgo. 2013. OpenML: Networked science in machine learning. SIGKDD Explorations 15, 2 (2013), 49--60. DOI:https://doi.org/10.1145/2641190.2641198Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Ricardo Vilalta and Youssef Drissi. 2002. A perspective view and survey of meta-learning. Artificial Intelligence Review 18, 2 (2002), 77--95.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An Instance Space Analysis of Regression Problems

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                Full Access

                • Published in

                  cover image ACM Transactions on Knowledge Discovery from Data
                  ACM Transactions on Knowledge Discovery from Data  Volume 15, Issue 2
                  Survey Paper and Regular Papers
                  April 2021
                  524 pages
                  ISSN:1556-4681
                  EISSN:1556-472X
                  DOI:10.1145/3446665
                  Issue’s Table of Contents

                  Copyright © 2021 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 27 March 2021
                  • Accepted: 1 November 2020
                  • Revised: 1 July 2020
                  • Received: 1 February 2020
                  Published in tkdd Volume 15, Issue 2

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article
                  • Research
                  • Refereed

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader

                HTML Format

                View this article in HTML Format .

                View HTML Format