Abstract
We discuss application of a machine learning method, Random Forest (RF), for the extraction of relevant biological knowledge from metabolomics fingerprinting experiments. The importance of RF margins and variable significance as well as prediction accuracy is discussed to provide insight into model generalisability and explanatory power. A method is described for detection of relevant features while conserving the redundant structure of the fingerprint data. The methodology is illustrated using two datasets from electrospray ionisation mass spectrometry from 27 Arabidopsis genotypes and a set of transgenic potato lines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Weckwerth, W.: Metabolomics in systems biology Annu. Rev. Plant Biol. 54, 66989 (2003)
Allen, J., et al.: High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nature Biotech. 21, 692–696 (2003)
Catchpole, G.S., et al.: Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops. Proc. Natl. Acad. Sci. USA 102, 14458–14462 (2005)
Breiman, L.: Random Forests. Machine Learning 45(1), 261–277 (2001)
Thomaz, C.E., Gillies, D.F.: A maximum uncertainty LDA-based approach for limited sample size problems with application to face recognition. Technical Report 2004/1, Imperial College London (2004)
Tsujinishi, D., Koshiba, Y., Abe, S.: Why Pairwise Is Better than One-against-All or All-at-Once. In: Proc. International Joint Conference on Neural Networks, vol. 1, pp. 693–698 (2004)
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Machine Learning Res. 3, 1157–1182 (2003)
Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer Series in Statistics (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Enot, D.P., Beckmann, M., Draper, J. (2006). On the Interpretation of High Throughput MS Based Metabolomics Fingerprints with Random Forest. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_22
Download citation
DOI: https://doi.org/10.1007/11875741_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45767-1
Online ISBN: 978-3-540-45768-8
eBook Packages: Computer ScienceComputer Science (R0)