Skip to main content

Organizing the World’s Machine Learning Information

  • Conference paper
Leveraging Applications of Formal Methods, Verification and Validation (ISoLA 2008)

Abstract

All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their original intent and, if properly stored, could be of great use to future research. In this paper, we hope to stimulate the development of such learning experiment repositories by providing a bird’s-eye view of how they can be created and used in practice, bringing together existing approaches and new ideas. We draw parallels between how experiments are being curated in other sciences, and consecutively discuss how both the empirical and theoretical details of learning experiments can be expressed, organized and made universally accessible. Finally, we discuss a range of possible services such a resource can offer, either used directly or integrated into data mining tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Allison, L.: Models for machine learning and data mining in functional programming. Journal of Functional Programming 15(1), 15–32 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  2. Ball, C.A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., et al.: Submission of Microarray Data to Public Repositories. PLoS Biol. 2(9), e317 (2004)

    Article  Google Scholar 

  3. Blockeel, H.: Experiment databases: A novel methodology for experimental research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Blockeel, H., Vanschoren, J.: Experiment databases: Towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., López de Mántaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., et al.: Minimum information about a microarray experiment (MIAME): toward standards for microarray data. Nature Genetics 29, 365–371 (2001)

    Article  Google Scholar 

  6. Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., et al.: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Research 31(1), 68–71 (2003)

    Article  Google Scholar 

  7. Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 161–168. Springer, Heidelberg (2007)

    Google Scholar 

  8. The Data Mining Group: The Predictive Model Markup Language (PMML), version 3.2, http://www.dmg.org/pmml-v3-2.html

  9. Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  10. Džeroski, S.: Towards a General Framework for Data Mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  11. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)

    Article  Google Scholar 

  12. Perlich, C., Provost, F., Siminoff, J.: Tree induction vs. logistic regression: A learning curve analysis. Journal of Machine Learning Research 4, 211–255 (2003)

    MathSciNet  MATH  Google Scholar 

  13. Soldatova, L.N., Clare, A., Sparkes, A., King, R.D.: An ontology for a Robot Scientist. Bioinformatics 22(14), 464–471 (2006)

    Article  Google Scholar 

  14. Stoeckert, C., Causton, H., Ball, C.: Microarray databases: standards and ontologies. Nature Genetics 32, 469–473 (2002)

    Article  Google Scholar 

  15. Vanschoren, J., Pfahringer, B., Holmes, G.: Learning From The Past with Experiment Databases. Working Paper Series 08/2008, Computer Science Department, University of Waikato (2008)

    Google Scholar 

  16. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G. (2008). Organizing the World’s Machine Learning Information. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. ISoLA 2008. Communications in Computer and Information Science, vol 17. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88479-8_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-88479-8_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-88478-1

  • Online ISBN: 978-3-540-88479-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics