skip to main content
10.1145/1383529.1383535acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article

SWARM: a scientific workflow for supporting bayesian approaches to improve metabolic models

Published:23 June 2008Publication History

ABSTRACT

With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM deals with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.

References

  1. Reed, J.L. and Palsson, B.Ø. 2003. Thirteen Years of Building Constraint-Based In Silico Models of Escherichia coli. Journal of Bacteriology, Vol. 185, No. 9, p. 2692--2699.Google ScholarGoogle ScholarCross RefCross Ref
  2. Edwards, J.S., Covert, M., and Palsson, B.Ø. 2002. Metabolic modelling of microbes: the flux-balance approach. Environ. Microbiol. 4:133--140.Google ScholarGoogle ScholarCross RefCross Ref
  3. Varma, A. and Palsson, B.Ø. 1994. Metabolic flux balancing: basic concepts, scientific and practical use. BioTechnology 12:994--998.Google ScholarGoogle ScholarCross RefCross Ref
  4. Feist, A.M., Henry, C.S., et al. 2007. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular Systems Biology.Google ScholarGoogle Scholar
  5. Reed, J.L., Vo, T.D., Schilling, C.H., and Palsson, B.Ø. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4(9):R54.Google ScholarGoogle ScholarCross RefCross Ref
  6. Edwards, J.S. and Palsson, B.Ø. 2000. The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci USA. 97:5528--5533.Google ScholarGoogle ScholarCross RefCross Ref
  7. Becker, S.A. and Palsson, B.Ø. 2005. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 7;5(1):8.Google ScholarGoogle Scholar
  8. Thiele, I., Vo, T.D., Price, N.D., and Palsson, B.Ø. 2005. Expanded Metabolic Reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an In Silico Genome-Scale Characterization of Single- and Double-Deletion Mutants. Journal of Bacteriology, Vol.187, No.16, p.5818--5830.Google ScholarGoogle ScholarCross RefCross Ref
  9. Forster, J., Famili, I., Fu, P., Palsson, B.Ø., and Nielsen, J. 2003. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 13(2):244-5-5-53.Google ScholarGoogle ScholarCross RefCross Ref
  10. Duarte, N.C., Herrgard, M.J., and Palsson, B.Ø. 2004. Reconstruction and Validation of Saccharomyces cerevisiae iND750, a Fully Compartmentalized Genome-Scale Metabolic Model. Genome Research 14:1298--1309.Google ScholarGoogle ScholarCross RefCross Ref
  11. Oliveira, A.P., Nielsen, J., and Forster, J. 2005. Modeling Lactococcus lactis using a genome-scale flux model. BMC Microbiol. 27;5:39.Google ScholarGoogle Scholar
  12. Oh, Y.K., Palsson, B.Ø., Park, S.M., Schilling, C.H., and Mahadevan, R. 2007. Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data. J Biol Chem. 10.1074.Google ScholarGoogle Scholar
  13. Schilling, C.H., Covert, M.W., Famili, I., Church, G.M., Edwards, J.S., and Palsson, B.Ø. 2002. Genome-scale metabolic model of Helicobacter pylori 26695. J Bacteriol. 184(16):4582--93.Google ScholarGoogle ScholarCross RefCross Ref
  14. Gates, B., Pinchuk, G.E., Schilling, C., et al. 2006. Genome-Scale Metabolic Model of Shewanella oneidensis MR1. GTL.Google ScholarGoogle Scholar
  15. Feist, M.A., Scholten, C.J., Palsson, B.Ø., et.al. 2006. Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Molecular Systems Biology.Google ScholarGoogle Scholar
  16. Edwards, J.S. and Palsson, B.Ø. 1999. Systems Properties of the Haemophilus influenzae Rd Metabolic Genotype. Journal of Biological Chemistry, 274, 17410--17416.Google ScholarGoogle ScholarCross RefCross Ref
  17. Duarte, N.D., Becker, S.A., et al. 2007. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad. Sci USA 104(6):1777--82.Google ScholarGoogle ScholarCross RefCross Ref
  18. BIGG (A Biochemical Genetic and Genomic Database of large scale metabolic reconstructions.): http://bigg.ucsd.edu/Google ScholarGoogle Scholar
  19. Osterman, A. and Overbeek, R. 2003. Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol , 7:238--251.Google ScholarGoogle ScholarCross RefCross Ref
  20. Overbeek, R., Disz, T., and Stevens, R. 2004. The SEED: a peer-to-peer environment for genome annotation. Communications of the ACM, Vol. 47, No. 11, Pages 46-5-51 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. The SEED: an Annotation/Analysis Tool Provided by FIG: http://theseed.uchicago.edu/.Google ScholarGoogle Scholar
  22. Kanehisa, M., Araki, M., et al. 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484.Google ScholarGoogle ScholarCross RefCross Ref
  23. KEGG: Kyoto Encyclopedia of Genes and Genome: http://www.genome.jp/kegg/.Google ScholarGoogle Scholar
  24. Kharchenko, P., Vitkup, D., and Church, G.M. 2004. Filling gaps in a metabolic network using expression information. Bioinformatics, 20(Suppl 1):I178-I185. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Kharchenko, P., Chen, L., et al. 2006. Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics. 29;7(1):177.Google ScholarGoogle Scholar
  26. DeJongh, M., Formsma, K., Boillot, P., Gould, J., Rycenga, M., and Best, A. 2007. Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics, 8:139.Google ScholarGoogle ScholarCross RefCross Ref
  27. Becker, S.A., Feist, A.M., et al. 2007. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nature Protocols 2, - 727 -- 738 .Google ScholarGoogle Scholar
  28. SimPheny: www.genomatica.com/solutions_simpheny.shtml.Google ScholarGoogle Scholar
  29. Klamt, S., Stelling, J., Ginkel, M., and Gilles, E.D. 2003. FluxAnalyzer: exploring structure, pathways, and flux distributions in metabolic networks on interactive flux maps. Bioinformatics, 19(2): 261--269.Google ScholarGoogle ScholarCross RefCross Ref
  30. Klamt, S., Saez-Rodriguez, J., and Gilles, E.D. 2007. Structural and functional analysis of cellular networks with CellNetAnalyzer. BMC Systems Biology, 1:2.Google ScholarGoogle ScholarCross RefCross Ref
  31. Green, M.L. and Karp, P.D. 2004. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics, vol. 5, no. 76.Google ScholarGoogle Scholar
  32. Karp, P.D., Paley, S., and Romero, P. 2002. The Pathway Tools software. Bioinformatics. 18 Suppl 1:S225--32.Google ScholarGoogle ScholarCross RefCross Ref
  33. Overbeek, R., Begley, T., Butler, R.M., et al. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 7;33(17):5691--702.Google ScholarGoogle Scholar
  34. Graphviz: Graph Visualization Software: www.graphviz.org.Google ScholarGoogle Scholar
  35. CellDesigner: A modeling tool of biochemical networks: http://www.celldesigner.org/Google ScholarGoogle Scholar
  36. Systems Biology Markup Language (SBML): www.sbml.org.Google ScholarGoogle Scholar
  37. Gene Ontology: http://www.geneontology.org/Google ScholarGoogle Scholar
  38. TCDB: Transport Classification Database: www.tcdb.org.Google ScholarGoogle Scholar
  39. Foster, I. and Kesselman, C. (editors), 1999. The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publishers, USA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yu, J. and Buyya, R. 2005. A Taxonomy of Scientific Workflow Systems for Grid Computing. SIGMOD Record, Vol. 34, No. 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Barker, A. and Hemert, J. 2007. Scientific Workflow: A Survey and Reaearch Directions. In Proceedings of the The Third Grid Applications and Middleware Workshop (GAMW'2007), Gdansk, Poland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Ludäscher, B., Altintas, I., et al. 2005. Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience, 36.Google ScholarGoogle Scholar
  43. Bowers, S. and Ludascher, B. 2005. Actor-Oriented Design of Scientific Workflows. In 24 th Intl. Conf. on Conceptual Modeling (ER). Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Oinn, T., Greenwood, M., et al. 2005. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience, Vol 18, Issue 10, Pages 1067 -- 1100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Stevens, R.D., Robinson, A.J., and Goble, C.A. 2003. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19(1) c Oxford University Press.Google ScholarGoogle Scholar
  46. Rygg, A., Roe, P., Wong, O., and Sumitomo, J. 2008. GPFlow: An Intuitive Environment for Web Based Scientific Workflow. Concurrency and Computation: Practice and Experience, Vol 20, Issue 4, pp. 393 -- 408. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Merelli, I., Morra, G., and Milanesi, L. 2005. Bioinformatics Workflow using ASSIST on GRID. In Proc. of The Network Tools and Applications in Biology Workshop (NETTAB), Naples, Italy.Google ScholarGoogle Scholar
  48. Swift: http://www.ci.uchicago.edu/swift/.Google ScholarGoogle Scholar
  49. Simmhan, Y., Plale, B., and Gannon, D. 2005. A Survey of Data Provenance in e-Science, SIGMOD Record, Vol. 34, No. 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Rsync: http://samba.anu.edu.au/rsync/.Google ScholarGoogle Scholar

Index Terms

  1. SWARM: a scientific workflow for supporting bayesian approaches to improve metabolic models

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments
          June 2008
          74 pages
          ISBN:9781605581569
          DOI:10.1145/1383529

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 June 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader