On the Interpretation of High Throughput MS Based Metabolomics Fingerprints with Random Forest

Enot, David P.; Beckmann, Manfred; Draper, John

doi:10.1007/11875741_22

David P. Enot²²,
Manfred Beckmann²² &
John Draper²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 4216))

Included in the following conference series:

International Symposium on Computational Life Science

638 Accesses
2 Citations

Abstract

We discuss application of a machine learning method, Random Forest (RF), for the extraction of relevant biological knowledge from metabolomics fingerprinting experiments. The importance of RF margins and variable significance as well as prediction accuracy is discussed to provide insight into model generalisability and explanatory power. A method is described for detection of relevant features while conserving the redundant structure of the fingerprint data. The methodology is illustrated using two datasets from electrospray ionisation mass spectrometry from 27 Arabidopsis genotypes and a set of transgenic potato lines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Weckwerth, W.: Metabolomics in systems biology Annu. Rev. Plant Biol. 54, 66989 (2003)
Article Google Scholar
Allen, J., et al.: High-throughput classification of yeast mutants for functional genomics using metabolic footprinting. Nature Biotech. 21, 692–696 (2003)
Article Google Scholar
Catchpole, G.S., et al.: Hierarchical metabolomics demonstrates substantial compositional similarity between genetically modified and conventional potato crops. Proc. Natl. Acad. Sci. USA 102, 14458–14462 (2005)
Article Google Scholar
Breiman, L.: Random Forests. Machine Learning 45(1), 261–277 (2001)
Article MATH Google Scholar
Thomaz, C.E., Gillies, D.F.: A maximum uncertainty LDA-based approach for limited sample size problems with application to face recognition. Technical Report 2004/1, Imperial College London (2004)
Google Scholar
Tsujinishi, D., Koshiba, Y., Abe, S.: Why Pairwise Is Better than One-against-All or All-at-Once. In: Proc. International Joint Conference on Neural Networks, vol. 1, pp. 693–698 (2004)
Google Scholar
Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7, 3 (2006)
Article Google Scholar
Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. J. Machine Learning Res. 3, 1157–1182 (2003)
Article MATH Google Scholar
Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer Series in Statistics (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Biological Sciences, University of Wales, Aberystwyth, SY23 3DA, UK
David P. Enot, Manfred Beckmann & John Draper

Authors

David P. Enot
View author publications
You can also search for this author in PubMed Google Scholar
Manfred Beckmann
View author publications
You can also search for this author in PubMed Google Scholar
John Draper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Berkeley Initiative in Soft Computing (BISC), University of California at Berkeley, USA
Michael R. Berthold
Department of Chemistry, Unilever Centre for Molecular Informatics, Cambridge University, CB2 1EW, Cambridge, UK
Robert C. Glen
ALTANA Chair for Bioinformatics and Information Mining, University of Konstanz, Germany
Ingrid Fischer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Enot, D.P., Beckmann, M., Draper, J. (2006). On the Interpretation of High Throughput MS Based Metabolomics Fingerprints with Random Forest. In: R. Berthold, M., Glen, R.C., Fischer, I. (eds) Computational Life Sciences II. CompLife 2006. Lecture Notes in Computer Science(), vol 4216. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875741_22

Download citation

DOI: https://doi.org/10.1007/11875741_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45767-1
Online ISBN: 978-3-540-45768-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics