ABSTRACT
With the exponential growth of complete genome sequences, the analysis of these sequences is becoming a powerful approach to build genome-scale metabolic models. These models can be used to study individual molecular components and their relationships, and eventually study cells as systems. However, constructing genome-scale metabolic models manually is time-consuming and labor-intensive. This property of manual model-building process causes the fact that much fewer genome-scale metabolic models are available comparing to hundreds of genome sequences available. To tackle this problem, we design SWARM, a scientific workflow that can be utilized to improve genome-scale metabolic models in high-throughput fashion. SWARM deals with a range of issues including the integration of data across distributed resources, data format conversions, data update, and data provenance. Putting altogether, SWARM streamlines the whole modeling process that includes extracting data from various resources, deriving training datasets to train a set of predictors and applying Bayesian techniques to assemble the predictors, inferring on the ensemble of predictors to insert missing data, and eventually improving draft metabolic networks automatically. By the enhancement of metabolic model construction, SWARM enables scientists to generate many genome-scale metabolic models within a short period of time and with less effort.
- Reed, J.L. and Palsson, B.Ø. 2003. Thirteen Years of Building Constraint-Based In Silico Models of Escherichia coli. Journal of Bacteriology, Vol. 185, No. 9, p. 2692--2699.Google ScholarCross Ref
- Edwards, J.S., Covert, M., and Palsson, B.Ø. 2002. Metabolic modelling of microbes: the flux-balance approach. Environ. Microbiol. 4:133--140.Google ScholarCross Ref
- Varma, A. and Palsson, B.Ø. 1994. Metabolic flux balancing: basic concepts, scientific and practical use. BioTechnology 12:994--998.Google ScholarCross Ref
- Feist, A.M., Henry, C.S., et al. 2007. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Molecular Systems Biology.Google Scholar
- Reed, J.L., Vo, T.D., Schilling, C.H., and Palsson, B.Ø. 2003. An expanded genome-scale model of Escherichia coli K-12 (iJR904 GSM/GPR). Genome Biol. 4(9):R54.Google ScholarCross Ref
- Edwards, J.S. and Palsson, B.Ø. 2000. The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities. Proc Natl Acad Sci USA. 97:5528--5533.Google ScholarCross Ref
- Becker, S.A. and Palsson, B.Ø. 2005. Genome-scale reconstruction of the metabolic network in Staphylococcus aureus N315: an initial draft to the two-dimensional annotation. BMC Microbiol. 7;5(1):8.Google Scholar
- Thiele, I., Vo, T.D., Price, N.D., and Palsson, B.Ø. 2005. Expanded Metabolic Reconstruction of Helicobacter pylori (iIT341 GSM/GPR): an In Silico Genome-Scale Characterization of Single- and Double-Deletion Mutants. Journal of Bacteriology, Vol.187, No.16, p.5818--5830.Google ScholarCross Ref
- Forster, J., Famili, I., Fu, P., Palsson, B.Ø., and Nielsen, J. 2003. Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res. 13(2):244-5-5-53.Google ScholarCross Ref
- Duarte, N.C., Herrgard, M.J., and Palsson, B.Ø. 2004. Reconstruction and Validation of Saccharomyces cerevisiae iND750, a Fully Compartmentalized Genome-Scale Metabolic Model. Genome Research 14:1298--1309.Google ScholarCross Ref
- Oliveira, A.P., Nielsen, J., and Forster, J. 2005. Modeling Lactococcus lactis using a genome-scale flux model. BMC Microbiol. 27;5:39.Google Scholar
- Oh, Y.K., Palsson, B.Ø., Park, S.M., Schilling, C.H., and Mahadevan, R. 2007. Genome-scale Reconstruction of Metabolic Network in Bacillus subtilis Based on High-throughput Phenotyping and Gene Essentiality Data. J Biol Chem. 10.1074.Google Scholar
- Schilling, C.H., Covert, M.W., Famili, I., Church, G.M., Edwards, J.S., and Palsson, B.Ø. 2002. Genome-scale metabolic model of Helicobacter pylori 26695. J Bacteriol. 184(16):4582--93.Google ScholarCross Ref
- Gates, B., Pinchuk, G.E., Schilling, C., et al. 2006. Genome-Scale Metabolic Model of Shewanella oneidensis MR1. GTL.Google Scholar
- Feist, M.A., Scholten, C.J., Palsson, B.Ø., et.al. 2006. Modeling methanogenesis with a genome-scale metabolic reconstruction of Methanosarcina barkeri. Molecular Systems Biology.Google Scholar
- Edwards, J.S. and Palsson, B.Ø. 1999. Systems Properties of the Haemophilus influenzae Rd Metabolic Genotype. Journal of Biological Chemistry, 274, 17410--17416.Google ScholarCross Ref
- Duarte, N.D., Becker, S.A., et al. 2007. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc Natl Acad. Sci USA 104(6):1777--82.Google ScholarCross Ref
- BIGG (A Biochemical Genetic and Genomic Database of large scale metabolic reconstructions.): http://bigg.ucsd.edu/Google Scholar
- Osterman, A. and Overbeek, R. 2003. Missing genes in metabolic pathways: a comparative genomics approach. Curr Opin Chem Biol , 7:238--251.Google ScholarCross Ref
- Overbeek, R., Disz, T., and Stevens, R. 2004. The SEED: a peer-to-peer environment for genome annotation. Communications of the ACM, Vol. 47, No. 11, Pages 46-5-51 Google ScholarDigital Library
- The SEED: an Annotation/Analysis Tool Provided by FIG: http://theseed.uchicago.edu/.Google Scholar
- Kanehisa, M., Araki, M., et al. 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480-D484.Google ScholarCross Ref
- KEGG: Kyoto Encyclopedia of Genes and Genome: http://www.genome.jp/kegg/.Google Scholar
- Kharchenko, P., Vitkup, D., and Church, G.M. 2004. Filling gaps in a metabolic network using expression information. Bioinformatics, 20(Suppl 1):I178-I185. Google ScholarDigital Library
- Kharchenko, P., Chen, L., et al. 2006. Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics. 29;7(1):177.Google Scholar
- DeJongh, M., Formsma, K., Boillot, P., Gould, J., Rycenga, M., and Best, A. 2007. Toward the automated generation of genome-scale metabolic networks in the SEED. BMC Bioinformatics, 8:139.Google ScholarCross Ref
- Becker, S.A., Feist, A.M., et al. 2007. Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox. Nature Protocols 2, - 727 -- 738 .Google Scholar
- SimPheny: www.genomatica.com/solutions_simpheny.shtml.Google Scholar
- Klamt, S., Stelling, J., Ginkel, M., and Gilles, E.D. 2003. FluxAnalyzer: exploring structure, pathways, and flux distributions in metabolic networks on interactive flux maps. Bioinformatics, 19(2): 261--269.Google ScholarCross Ref
- Klamt, S., Saez-Rodriguez, J., and Gilles, E.D. 2007. Structural and functional analysis of cellular networks with CellNetAnalyzer. BMC Systems Biology, 1:2.Google ScholarCross Ref
- Green, M.L. and Karp, P.D. 2004. A Bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics, vol. 5, no. 76.Google Scholar
- Karp, P.D., Paley, S., and Romero, P. 2002. The Pathway Tools software. Bioinformatics. 18 Suppl 1:S225--32.Google ScholarCross Ref
- Overbeek, R., Begley, T., Butler, R.M., et al. 2005. The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 7;33(17):5691--702.Google Scholar
- Graphviz: Graph Visualization Software: www.graphviz.org.Google Scholar
- CellDesigner: A modeling tool of biochemical networks: http://www.celldesigner.org/Google Scholar
- Systems Biology Markup Language (SBML): www.sbml.org.Google Scholar
- Gene Ontology: http://www.geneontology.org/Google Scholar
- TCDB: Transport Classification Database: www.tcdb.org.Google Scholar
- Foster, I. and Kesselman, C. (editors), 1999. The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publishers, USA. Google ScholarDigital Library
- Yu, J. and Buyya, R. 2005. A Taxonomy of Scientific Workflow Systems for Grid Computing. SIGMOD Record, Vol. 34, No. 3. Google ScholarDigital Library
- Barker, A. and Hemert, J. 2007. Scientific Workflow: A Survey and Reaearch Directions. In Proceedings of the The Third Grid Applications and Middleware Workshop (GAMW'2007), Gdansk, Poland. Google ScholarDigital Library
- Ludäscher, B., Altintas, I., et al. 2005. Scientific Workflow Management and the Kepler System. Concurrency and Computation: Practice & Experience, 36.Google Scholar
- Bowers, S. and Ludascher, B. 2005. Actor-Oriented Design of Scientific Workflows. In 24 th Intl. Conf. on Conceptual Modeling (ER). Google ScholarDigital Library
- Oinn, T., Greenwood, M., et al. 2005. Taverna: lessons in creating a workflow environment for the life sciences. Concurrency and Computation: Practice and Experience, Vol 18, Issue 10, Pages 1067 -- 1100. Google ScholarDigital Library
- Stevens, R.D., Robinson, A.J., and Goble, C.A. 2003. myGrid: personalised bioinformatics on the information grid. Bioinformatics 19(1) c Oxford University Press.Google Scholar
- Rygg, A., Roe, P., Wong, O., and Sumitomo, J. 2008. GPFlow: An Intuitive Environment for Web Based Scientific Workflow. Concurrency and Computation: Practice and Experience, Vol 20, Issue 4, pp. 393 -- 408. Google ScholarDigital Library
- Merelli, I., Morra, G., and Milanesi, L. 2005. Bioinformatics Workflow using ASSIST on GRID. In Proc. of The Network Tools and Applications in Biology Workshop (NETTAB), Naples, Italy.Google Scholar
- Swift: http://www.ci.uchicago.edu/swift/.Google Scholar
- Simmhan, Y., Plale, B., and Gannon, D. 2005. A Survey of Data Provenance in e-Science, SIGMOD Record, Vol. 34, No. 3. Google ScholarDigital Library
- Rsync: http://samba.anu.edu.au/rsync/.Google Scholar
Index Terms
- SWARM: a scientific workflow for supporting bayesian approaches to improve metabolic models
Recommendations
Analysis of metabolic evolution in bacteria using whole-genome metabolic models
RECOMB'13: Proceedings of the 17th international conference on Research in Computational Molecular BiologyRecent advances in the automation of metabolic model reconstruction have led to the availability of draft-quality metabolic models (predicted reaction complements) for multiple bacterial species. These reaction complements can be considered as trait ...
An algorithm to assemble gene-protein-reaction associations for genome-scale metabolic model reconstruction
PRIB'12: Proceedings of the 7th IAPR international conference on Pattern Recognition in BioinformaticsThe considerable growth in the number of sequenced genomes and recent advances in Bioinformatics and Systems Biology fields have provided several genome-scale metabolic models (GSMs) that have been used to provide phenotype simulation methods. Given ...
A Survey of Data-Intensive Scientific Workflow Management
Nowadays, more and more computer-based scientific experiments need to handle massive amounts of data. Their data processing consists of multiple computational steps and dependencies within them. A data-intensive scientific workflow is useful for ...
Comments