Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing

Soares, Carlos; Petrak, Johann; Brazdil, Pavel

doi:10.1007/3-540-45329-6_12

Carlos Soares²,
Johann Petrak³ &
Pavel Brazdil²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2258))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

645 Accesses
13 Citations

Abstract

When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

D.W. Aha. Generalizing from case studies: A case study. In D. Sleeman and P. Edwards, editors, Proceedings of the Ninth International Workshop on Machine Learning (ML92), pages 1–10. Morgan Kaufmann, 1992.
Google Scholar
H. Bensusan and C. Giraud-Carrier. Casa batló is in passeig de grácia or landmarking the expertise space. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 29–46, 2000.
Google Scholar
H. Bensusan and A. Kalousis. Estimating the predictive accuracy of a classifier. In P. Flach and L. de Raedt, editors, Proceedings of the 12th European Conference on Machine Learning, pages 25–36. Springer, 2001.
Google Scholar
C. Blake, E. Keogh, and C.J. Merz. Repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.
P. Brazdil, J. Gama, and B. Henery. Characterizing the applicability of classification algorithms using meta-level learning. In F. Bergadano and L. de Raedt, editors, Proceedings of the European Conference on Machine Learning (ECML-94), pages 83–102. Springer-Verlag, 1994.
Google Scholar
J. Fürnkranz and J. Petrak. An evaluation of landmarking variants. In C. Giraud-Carrier, N. Lavrac, and S. Moyle, editors, Working Notes of the ECML/PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, pages 57–68, 2001.
Google Scholar
B. Gu, B. Liu, F. Hu, and H. Liu. Efficiently determine the starting sample size for progressive sampling. In P. Flach and L. de Raedt, editors, Proceedings of the 12th European Conference on Machine Learning. Springer, 2001.
Google Scholar
G. H. John and P. Langley. Static versus dynamic sampling for data mining. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI-Press, 1996.
Google Scholar
J. Keller, I. Paterson, and H. Berrer. An integrated concept for multi-criteria ranking of data-mining algorithms. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, 2000.
Google Scholar
D. Michie, D.J. Spiegelhalter, and C.C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.
Google Scholar
J. Petrak. Fast subsampling performance estimates for classification algorithm selection. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 3–14, 2000.
Google Scholar
B. Pfahringer, H. Bensusan, and C. Giraud-Carrier. Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. In P. Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000), pages 743–750. Morgan Kaufmann, 2000.
Google Scholar
F. Provost, D. Jensen, and T. Oates. Efficient progressive sampling. In S. Chaudhuri and D. Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1999.
Google Scholar
C. Soares and P. Brazdil. Zoomed ranking: Selection of classification algorithms based on relevant performance information. In D.A. Zighed, J. Komorowski, and J. Zytkow, editors, Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD2000), pages 126–135. Springer, 2000.
Google Scholar

Download references

Author information

Authors and Affiliations

LIACC/FEP, University of Porto, Porto
Carlos Soares & Pavel Brazdil
Austrian Research Institute for Artificial Intelligence, Austria
Johann Petrak

Authors

Carlos Soares
View author publications
You can also search for this author in PubMed Google Scholar
Johann Petrak
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Brazdil
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Economics LIACC, Laboratório de Inteligência Artificial e Ciência de Computadores, University of Porto, Rua do Campo Alegre, 823, 4150-180, Porto, Portugal
Pavel Brazdil & Alípio Jorge &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soares, C., Petrak, J., Brazdil, P. (2001). Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing. In: Brazdil, P., Jorge, A. (eds) Progress in Artificial Intelligence. EPIA 2001. Lecture Notes in Computer Science(), vol 2258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45329-6_12

Download citation

DOI: https://doi.org/10.1007/3-540-45329-6_12
Published: 23 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43030-8
Online ISBN: 978-3-540-45329-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics