Skip to main content

Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing

  • Conference paper
  • First Online:
Progress in Artificial Intelligence (EPIA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2258))

Included in the following conference series:

Abstract

When facing the need to select the most appropriate algorithm to apply on a new data set, data analysts often follow an approach which can be related to test-driving cars to decide which one to buy: apply the algorithms on a sample of the data to quickly obtain rough estimates of their performance. These estimates are used to select one or a few of those algorithms to be tried out on the full data set. We describe sampling-based landmarks (SL), a systematization of this approach, building on earlier work on landmarking and sampling. SL are estimates of the performance of algorithms on a small sample of the data that are used as predictors of the performance of those algorithms on the full set. We also describe relative landmarks (RL), that address the inability of earlier landmarks to assess relative performance of algorithms. RL aggregate landmarks to obtain predictors of relative performance. Our experiments indicate that the combination of these two improvements, which we call Sampling-based Relative Landmarks, are better for ranking than traditional data characterization measures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. D.W. Aha. Generalizing from case studies: A case study. In D. Sleeman and P. Edwards, editors, Proceedings of the Ninth International Workshop on Machine Learning (ML92), pages 1–10. Morgan Kaufmann, 1992.

    Google Scholar 

  2. H. Bensusan and C. Giraud-Carrier. Casa batló is in passeig de grácia or landmarking the expertise space. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 29–46, 2000.

    Google Scholar 

  3. H. Bensusan and A. Kalousis. Estimating the predictive accuracy of a classifier. In P. Flach and L. de Raedt, editors, Proceedings of the 12th European Conference on Machine Learning, pages 25–36. Springer, 2001.

    Google Scholar 

  4. C. Blake, E. Keogh, and C.J. Merz. Repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.

  5. P. Brazdil, J. Gama, and B. Henery. Characterizing the applicability of classification algorithms using meta-level learning. In F. Bergadano and L. de Raedt, editors, Proceedings of the European Conference on Machine Learning (ECML-94), pages 83–102. Springer-Verlag, 1994.

    Google Scholar 

  6. J. Fürnkranz and J. Petrak. An evaluation of landmarking variants. In C. Giraud-Carrier, N. Lavrac, and S. Moyle, editors, Working Notes of the ECML/PKDD 2000 Workshop on Integrating Aspects of Data Mining, Decision Support and Meta-Learning, pages 57–68, 2001.

    Google Scholar 

  7. B. Gu, B. Liu, F. Hu, and H. Liu. Efficiently determine the starting sample size for progressive sampling. In P. Flach and L. de Raedt, editors, Proceedings of the 12th European Conference on Machine Learning. Springer, 2001.

    Google Scholar 

  8. G. H. John and P. Langley. Static versus dynamic sampling for data mining. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI-Press, 1996.

    Google Scholar 

  9. J. Keller, I. Paterson, and H. Berrer. An integrated concept for multi-criteria ranking of data-mining algorithms. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, 2000.

    Google Scholar 

  10. D. Michie, D.J. Spiegelhalter, and C.C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.

    Google Scholar 

  11. J. Petrak. Fast subsampling performance estimates for classification algorithm selection. In J. Keller and C. Giraud-Carrier, editors, Meta-Learning: Building Automatic Advice Strategies for Model Selection and Method Combination, pages 3–14, 2000.

    Google Scholar 

  12. B. Pfahringer, H. Bensusan, and C. Giraud-Carrier. Tell me who can learn you and i can tell you who you are: Landmarking various learning algorithms. In P. Langley, editor, Proceedings of the Seventeenth International Conference on Machine Learning (ICML2000), pages 743–750. Morgan Kaufmann, 2000.

    Google Scholar 

  13. F. Provost, D. Jensen, and T. Oates. Efficient progressive sampling. In S. Chaudhuri and D. Madigan, editors, Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1999.

    Google Scholar 

  14. C. Soares and P. Brazdil. Zoomed ranking: Selection of classification algorithms based on relevant performance information. In D.A. Zighed, J. Komorowski, and J. Zytkow, editors, Proceedings of the Fourth European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD2000), pages 126–135. Springer, 2000.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soares, C., Petrak, J., Brazdil, P. (2001). Sampling-Based Relative Landmarks: Systematically Test-Driving Algorithms before Choosing. In: Brazdil, P., Jorge, A. (eds) Progress in Artificial Intelligence. EPIA 2001. Lecture Notes in Computer Science(), vol 2258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45329-6_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-45329-6_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43030-8

  • Online ISBN: 978-3-540-45329-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics