Abstract
All around the globe, thousands of learning experiments are being executed on a daily basis, only to be discarded after interpretation. Yet, the information contained in these experiments might have uses beyond their original intent and, if properly stored, could be of great use to future research. In this paper, we hope to stimulate the development of such learning experiment repositories by providing a bird’s-eye view of how they can be created and used in practice, bringing together existing approaches and new ideas. We draw parallels between how experiments are being curated in other sciences, and consecutively discuss how both the empirical and theoretical details of learning experiments can be expressed, organized and made universally accessible. Finally, we discuss a range of possible services such a resource can offer, either used directly or integrated into data mining tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allison, L.: Models for machine learning and data mining in functional programming. Journal of Functional Programming 15(1), 15–32 (2005)
Ball, C.A., Brazma, A., Causton, H., Chervitz, S., Edgar, R., et al.: Submission of Microarray Data to Public Repositories. PLoS Biol. 2(9), e317 (2004)
Blockeel, H.: Experiment databases: A novel methodology for experimental research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)
Blockeel, H., Vanschoren, J.: Experiment databases: Towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., López de Mántaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007)
Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., et al.: Minimum information about a microarray experiment (MIAME): toward standards for microarray data. Nature Genetics 29, 365–371 (2001)
Brazma, A., Parkinson, H., Sarkans, U., Shojatalab, M., et al.: ArrayExpress–a public repository for microarray gene expression data at the EBI. Nucleic Acids Research 31(1), 68–71 (2003)
Caruana, R., Niculescu-Mizil, A.: An empirical comparison of supervised learning algorithms. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 161–168. Springer, Heidelberg (2007)
The Data Mining Group: The Predictive Model Markup Language (PMML), version 3.2, http://www.dmg.org/pmml-v3-2.html
Demšar, J.: Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research 7, 1–30 (2006)
Džeroski, S.: Towards a General Framework for Data Mining. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 259–300. Springer, Heidelberg (2007)
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Perlich, C., Provost, F., Siminoff, J.: Tree induction vs. logistic regression: A learning curve analysis. Journal of Machine Learning Research 4, 211–255 (2003)
Soldatova, L.N., Clare, A., Sparkes, A., King, R.D.: An ontology for a Robot Scientist. Bioinformatics 22(14), 464–471 (2006)
Stoeckert, C., Causton, H., Ball, C.: Microarray databases: standards and ontologies. Nature Genetics 32, 469–473 (2002)
Vanschoren, J., Pfahringer, B., Holmes, G.: Learning From The Past with Experiment Databases. Working Paper Series 08/2008, Computer Science Department, University of Waikato (2008)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vanschoren, J., Blockeel, H., Pfahringer, B., Holmes, G. (2008). Organizing the World’s Machine Learning Information. In: Margaria, T., Steffen, B. (eds) Leveraging Applications of Formal Methods, Verification and Validation. ISoLA 2008. Communications in Computer and Information Science, vol 17. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88479-8_50
Download citation
DOI: https://doi.org/10.1007/978-3-540-88479-8_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88478-1
Online ISBN: 978-3-540-88479-8
eBook Packages: Computer ScienceComputer Science (R0)