Putting It All Together: The Design of a Pipeline for Genome-Wide Functional Annotation of Fungi in the Modern Era of “-Omics” Data and Systems Biology

Butler, Greg

doi:10.1007/978-3-642-39437-9_10

Greg Butler²²

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7970))

Included in the following conference series:

International Conference on Data Integration in the Life Sciences

696 Accesses
1 Citations

Abstract

The context for bioinformatics continues to change as new technology brings more varied data in greater volume. We present the preliminary design of a pipeline for functional annotation of fungal genomes. Genome-wide functional annotation benefits from the variety and volume of data available from “-omics” technology, and benefits from the perspective of systems biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma, K., Gerdes, S., Glass, E.M., Kubal, M., Meyer, F., Olsen, G.J., Olson, R., Osterman, A.L., Overbeek, R.A., McNeil, L.K., Paarmann, D., Paczian, T., Parrello, B., Pusch, G.D., Reich, C., Stevens, R., Vassieva, O., Vonstein, V., Wilke, A., Zagnitkos, O.: The RAST server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008)
Article Google Scholar
Friedberg, I.: Automated protein function prediction–the genomic challenge. Brief. Bioinform. 7(3), 225–242 (2006)
Article Google Scholar
Erdin, S., Lisewski, A.M., Lichtarge, O.: Protein function prediction: towards integration of similarity metrics. Curr. Opin. Struct. Biol. 21(2), 180–188 (2011)
Article Google Scholar
Galens, K., Daugherty, S., Creasy, H.H., Angiuoli, S., White, O., Wortman, J., Mahurkar, A., Giglio, M.G.: The IGS standard operating procedure for automated prokaryotic annotation. Stand. Genomic Sci. 4(2), 244–251 (2011)
Article Google Scholar
Mi, H., Muruganujan, A., Gaudet, P., Lewis, S., Thomas, P.D.: PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38, D204–D210 (2010)
Google Scholar
Ooi, H.S., Kwo, C.Y., Wildpaner, M., Sirota, F.L., Eisenhaber, B., Maurer-Stroh, S., Wong, W.C., Schleiffer, A., Schneider, G.: ANNIE: integrated de novo protein sequence annotation. Nucleic Acids Res. 37, W435–W440 (2009)
Google Scholar
Martinez, D., Grigoriev, I.V., Salamov, A.A.: Annotation of fungal genomes. Proc. ANAS (Biol.) 65(5-6), 177–183 (2010)
Google Scholar
Haas, B.J., Pearson, M.D., Cuomo, C.A., Wortman, J.R.: Approaches to fungal genome annotation. Mycology 2(3), 118–141 (2011)
Google Scholar
Mewes, H.W., Frishman, D., Gregory, R., Mannhaupt, G., Mayer, K.F., Münsterkötter, M., Ruepp, A., Spannagl, M., Stümpflen, V., Rattei, T.: MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res. 36, D196–D201 (2008)
Google Scholar
Martin, T., Durrens, P.: Génolevures: Policy for automated annotation of genome sequences, http://www.pasteur.fr/ip/resource/filecenter/document/01s-00004f-0e5/abstract-156.pdf
Angiuoli, S.V., Matalka, M., Gussman, G., Galens, K., Vangala, M., Riley, D.R., Arze, C., White, J.R., White, O., Fricke, W.F.: CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12, 356 (2011)
Article Google Scholar
Frishman, D.: Protein annotation at genomic scale: the current status. Chem. Rev. 107(8), 3448–3466 (2007)
Article Google Scholar
Hawkins, T., Kihara, D.: Function prediction of uncharacterized proteins. J. Bioinform. Comput. Biol. 5(1), 1–30 (2007)
Article Google Scholar
Janga, S.C., Moreno-Hagelsieb, G.: Network-based function prediction and interactomics: the case for metabolic enzymes. Metab. Eng. 13(1), 1–10 (2011)
Article Google Scholar
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)
Article Google Scholar
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Systems Biol. 3, 88 (2007)
Article Google Scholar
Claudel-Renard, C., Faraut, T., Kahn, D.: Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 31(22), 6633–6639 (2003)
Article Google Scholar
Ferrer, L., Dale, J.M., Karp, P.D.: A systematic study of genome context methods: calibration, normalization and combination. BMC Bioinformatics 11, 493 (2010)
Article Google Scholar
Lima, T., Coudert, E., Keller, G., Michoud, K., Rivoire, C., Bulliard, V., de Castro, E., Lachaize, C., Baratin, D., Phan, I., Bougueleret, L., Bairoch, A.: HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478 (2009)
Google Scholar
Kretschmann, E., Apweiler, R.: Automatic rule generation for protein annotation with the C4. data mining algorithm applied on SWISS-PROT. Bioinformatics 17(10), 920–926 (2001)
Article Google Scholar
Yu, G.X.: Ruleminer: a knowledge system for supporting high-throughput protein function annotations. J. Bioinform. Comput. Biol. 2(4), 615–637 (2004)
Article Google Scholar
Artamonova, I.I., Gelfand, M.S., Frishman, D.: Mining sequence annotation databanks for association patterns. Bioinformatics 21, iii49–iii57 (2005)
Article Google Scholar
Poptsova, M.S., Gogarten, J.P.: Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology 156(7), 1909–1917 (2010)
Article Google Scholar
Madupu, R., Dodson, R.J., Brinkac, L., Harkins, D., Durkin, S., Shrivastava, S., Sutton, G., Haft, D.: CharProtDB: a database of experimentally characterized protein annotations. Nucleic Acids Res. 40, D237–D241 (2012)
Article Google Scholar
Overbeek, R., Devine, D., Vonstein, V.: Curation is forever: comparative genomics approaches to functional annotation. Targets 2(4), 138–146 (2003)
Article Google Scholar
Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., Cohoon, M., de Crécy-Lagard, V., Diaz, N., Disz, T., Edwards, R., Fonstein, M., Frank, E.D., Gerdes, S., Glass, E.M., Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A.C., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G.D., Rodionov, D.A., Rückert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33(17), 5691–5702 (2005)
Article Google Scholar
Kuzniar, A., van Ham, R.C., Pongor, S., Leunissen, J.A.: The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 24(11), 539–551 (2008)
Article Google Scholar
Kristensen, D.M., Wolf, Y.I., Mushegian, A.R., Koonin, E.V.: Computational methods for Gene Orthology inference. Brief. Bioinform. 12(5), 379–391 (2011)
Article Google Scholar
Engelhardt, B.E., Srouji, J.R., Brenner, S.E.: Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res. 21(11), 1969–1980 (2011)
Article Google Scholar
Hawkins, T., Luban, S., Kihara, D.: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3), 566–582 (2009)
Article Google Scholar
Chitale, M., Hawkins, T., Park, C., Kihara, D.: ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25(14), 1739–1745 (2009)
Article Google Scholar
Hawkins, T., Kihara, D.: Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 11, 265 (2010)
Article Google Scholar
Santos, F., Boele, J., Teusink, B.: A practical guide to genome-scale metabolic models and their analysis. Methods Enzymol. 500, 509–532 (2011)
Article Google Scholar
Orth, J.D., Palsson, B.Ø.: Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107(3), 403–412 (2010)
Article Google Scholar
Karp, P.D., Krummenacker, M., Latendresse, M., Dale, J.M., Lee, T.J., Kaipa, P., Gilham, F., Spaulding, A., Popescu, L., Altman, T., Paulsen, I., Keseler, I.M., Caspi, R.: Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11(1), 40–79 (2010)
Article Google Scholar
Karp, P.D., Latendresse, M., Caspi, R.: The pathway tools pathway prediction algorithm. Stand. Genomic Sci. 5(3), 424–429 (2011)
Article Google Scholar
Dale, J.M., Popescu, L., Karp, P.D.: Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 11, 15 (2010)
Article Google Scholar
Green, M.L., Karp, P.D.: A bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 5, 76 (2004)
Article Google Scholar
Ferrer, L., Karp, P.D.: Discovering novel subsystems using comparative genomics. Bioinformatics 27(18), 2478–2485 (2011)
Article Google Scholar
Warde-Farley, D., Comes, O., Zuberi, K., Badrawi, R., Chao, P., Franz, M., Grouios, C., Kazi, F., Lopes, C.T., Maitland, A., Mostafavi, S., Montojo, J., Shao, O., Wright, G., Bader, G.D., Morris, Q.: The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38, W214–W220 (2010)
Google Scholar
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Neural Information Processing Systems 16. MIT Press (2004)
Google Scholar
Tsuda, K., Shin, H.J., Schölkopf, B.: Fast protein classification with multiple networks. Bioinformatics 21(suppl. 2), ii59–ii65 (2005)
Google Scholar
Mostafavi, S., Warde-Farley, D., Grouios, C., Morris, Q.: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology 9(suppl. 1), S4 (2008)
Google Scholar
Rattei, T., Arnold, R., Tischler, P., Lindner, D., Stümpflen, V., Mewes, H.W.: SIMAP: the similarity matrix of proteins. Nucleic Acids Res. 34, D252–D256 (2006)
Article Google Scholar
von Mering, C., Kuhn, M., Chaffron, S., Doerks, T., Krüger, B., Snel, B., Bork, P.: STRING 7–recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 35, D358–D362 (2007)
Article Google Scholar
Powell, S., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T., Letunic, I., Doerks, T., Jensen, L.J., von Mering, C., Bork, P.: eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289 (2012)
Article Google Scholar
Jensen, L.J., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., Bork, P., von Mering, C.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416 (2009)
Article Google Scholar
Armengaud, J.: A perfect genome annotation is within reach with the proteomics and genomics alliance. Curr. Opin. Microbiol. 12(3), 292–300 (2009)
Article Google Scholar
Renuse, S., Chaerkady, R., Pandey, A.: Proteogenomics. Proteomics 11(4), 620–630 (2011)
Article Google Scholar
Castellana, N., Bafna, V.: Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 73(11), 2124–2135 (2010)
Article Google Scholar
Majoros, W.H.: Methods for Computational Gene Prediction. CUP (2007)
Google Scholar
Stanke, M., Schöffmann, O., Morgenstern, B., Waack, S.: Gene prediction in eukaryotes with a generalized hidden markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006)
Article Google Scholar
Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H.: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8(10), 785–786 (2011)
Article Google Scholar
Käll, L., Krogh, A., Sonnhammer, E.L.: A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338(5), 1027–1036 (2004)
Article Google Scholar
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000)
Article Google Scholar
Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305(3), 567–580 (2001)
Article Google Scholar
Horton, P., Park, K.J., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C.J., Nakai, K.: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007)
Article Google Scholar
Blum, T., Briesemeister, S., Kohlbacher, O.: MultiLoc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10, 274 (2009)
Article Google Scholar
Li, L., Stoeckert Jr., C.J., Roos, D.S.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003)
Article Google Scholar
Ostlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Roopra, S., Frings, O., Sonnhammer, E.L.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38, D196–D203 (2010)
Article Google Scholar
Altenhoff, A.M., Schneider, A., Gonnet, G.H., Dessimoz, C.: OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res. 39, D289–D294 (2011)
Article Google Scholar
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)
Google Scholar
Soderlund, C., Nelson, W., Shoemaker, A., Paterson, A.: SyMAP: A system for discovering and viewing syntenic regions of fpc maps. Genome Res 16(9), 1159–1168 (2006)
Article Google Scholar
Green, M.L., Karp, P.D.: Using genome-context data to identify specific types of functional associations in pathway/genome databases. Bioinformatics 23(13), i205–i211 (2007)
Article Google Scholar
Notebaart, R.A., van Enckevort, F.H., Francke, C., Siezen, R.J., Teusink, B.: Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinformatics 7, 296 (2006)
Article Google Scholar
Plata, G., Fuhrer, T., Hsiao, T.L., Sauer, U., Vitkup, D.: Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat. Chem. Biol. (September 9, 2012)
Google Scholar
Murphy, C., Wu, M., Butler, G., Tsang, A.: Curation of characterized glycoside hydrolases of fungal origin. Database (May 26, 2011)
Google Scholar
Cvijovic, M., Olivares-Hernández, R., Agren, R., Dahr, N., Vongsangnak, W., Nookaew, I., Patil, K.R., Nielsen, J.: BioMet toolbox: genome-wide analysis of metabolism. Nucleic Acids Res. 38, W144–W149 (2010)
Article Google Scholar
Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)
Google Scholar
Plewniak, F., Bianchetti, L., Brelivet, Y., Carles, A., Chalmel, F., Lecompte, O., Mochel, T., Moulinier, L., Muller, A., Muller, J., Prigent, V., Ripp, R., Thierr, J.C., Thompson, D.T., Wicker, N., Poch, O.: PipeAlign: A new toolkit for protein family analysis. Nucleic Acids Res. 31(13), 3829–3832 (2003)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada
Greg Butler

Authors

Greg Butler
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Applied Statistics, University of New Brunswick, E2L 4L5, Saint John, NB, Canada
Christopher J. O. Baker
Department of Computer Science, Concordia University, H3G 1M8, Montreal, QC, Canada
Greg Butler
Ontario Cancer Institute, University of Toronto, M5G 1L7, Toronto, ON, Canada
Igor Jurisica

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Butler, G. (2013). Putting It All Together: The Design of a Pipeline for Genome-Wide Functional Annotation of Fungi in the Modern Era of “-Omics” Data and Systems Biology. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science(), vol 7970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39437-9_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-39437-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39436-2
Online ISBN: 978-3-642-39437-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics