Abstract
The context for bioinformatics continues to change as new technology brings more varied data in greater volume. We present the preliminary design of a pipeline for functional annotation of fungal genomes. Genome-wide functional annotation benefits from the variety and volume of data available from “-omics” technology, and benefits from the perspective of systems biology.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma, K., Gerdes, S., Glass, E.M., Kubal, M., Meyer, F., Olsen, G.J., Olson, R., Osterman, A.L., Overbeek, R.A., McNeil, L.K., Paarmann, D., Paczian, T., Parrello, B., Pusch, G.D., Reich, C., Stevens, R., Vassieva, O., Vonstein, V., Wilke, A., Zagnitkos, O.: The RAST server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008)
Friedberg, I.: Automated protein function prediction–the genomic challenge. Brief. Bioinform. 7(3), 225–242 (2006)
Erdin, S., Lisewski, A.M., Lichtarge, O.: Protein function prediction: towards integration of similarity metrics. Curr. Opin. Struct. Biol. 21(2), 180–188 (2011)
Galens, K., Daugherty, S., Creasy, H.H., Angiuoli, S., White, O., Wortman, J., Mahurkar, A., Giglio, M.G.: The IGS standard operating procedure for automated prokaryotic annotation. Stand. Genomic Sci. 4(2), 244–251 (2011)
Mi, H., Muruganujan, A., Gaudet, P., Lewis, S., Thomas, P.D.: PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38, D204–D210 (2010)
Ooi, H.S., Kwo, C.Y., Wildpaner, M., Sirota, F.L., Eisenhaber, B., Maurer-Stroh, S., Wong, W.C., Schleiffer, A., Schneider, G.: ANNIE: integrated de novo protein sequence annotation. Nucleic Acids Res. 37, W435–W440 (2009)
Martinez, D., Grigoriev, I.V., Salamov, A.A.: Annotation of fungal genomes. Proc. ANAS (Biol.) 65(5-6), 177–183 (2010)
Haas, B.J., Pearson, M.D., Cuomo, C.A., Wortman, J.R.: Approaches to fungal genome annotation. Mycology 2(3), 118–141 (2011)
Mewes, H.W., Frishman, D., Gregory, R., Mannhaupt, G., Mayer, K.F., Münsterkötter, M., Ruepp, A., Spannagl, M., Stümpflen, V., Rattei, T.: MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res. 36, D196–D201 (2008)
Martin, T., Durrens, P.: Génolevures: Policy for automated annotation of genome sequences, http://www.pasteur.fr/ip/resource/filecenter/document/01s-00004f-0e5/abstract-156.pdf
Angiuoli, S.V., Matalka, M., Gussman, G., Galens, K., Vangala, M., Riley, D.R., Arze, C., White, J.R., White, O., Fricke, W.F.: CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12, 356 (2011)
Frishman, D.: Protein annotation at genomic scale: the current status. Chem. Rev. 107(8), 3448–3466 (2007)
Hawkins, T., Kihara, D.: Function prediction of uncharacterized proteins. J. Bioinform. Comput. Biol. 5(1), 1–30 (2007)
Janga, S.C., Moreno-Hagelsieb, G.: Network-based function prediction and interactomics: the case for metabolic enzymes. Metab. Eng. 13(1), 1–10 (2011)
Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)
Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Systems Biol. 3, 88 (2007)
Claudel-Renard, C., Faraut, T., Kahn, D.: Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 31(22), 6633–6639 (2003)
Ferrer, L., Dale, J.M., Karp, P.D.: A systematic study of genome context methods: calibration, normalization and combination. BMC Bioinformatics 11, 493 (2010)
Lima, T., Coudert, E., Keller, G., Michoud, K., Rivoire, C., Bulliard, V., de Castro, E., Lachaize, C., Baratin, D., Phan, I., Bougueleret, L., Bairoch, A.: HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478 (2009)
Kretschmann, E., Apweiler, R.: Automatic rule generation for protein annotation with the C4. data mining algorithm applied on SWISS-PROT. Bioinformatics 17(10), 920–926 (2001)
Yu, G.X.: Ruleminer: a knowledge system for supporting high-throughput protein function annotations. J. Bioinform. Comput. Biol. 2(4), 615–637 (2004)
Artamonova, I.I., Gelfand, M.S., Frishman, D.: Mining sequence annotation databanks for association patterns. Bioinformatics 21, iii49–iii57 (2005)
Poptsova, M.S., Gogarten, J.P.: Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology 156(7), 1909–1917 (2010)
Madupu, R., Dodson, R.J., Brinkac, L., Harkins, D., Durkin, S., Shrivastava, S., Sutton, G., Haft, D.: CharProtDB: a database of experimentally characterized protein annotations. Nucleic Acids Res. 40, D237–D241 (2012)
Overbeek, R., Devine, D., Vonstein, V.: Curation is forever: comparative genomics approaches to functional annotation. Targets 2(4), 138–146 (2003)
Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., Cohoon, M., de Crécy-Lagard, V., Diaz, N., Disz, T., Edwards, R., Fonstein, M., Frank, E.D., Gerdes, S., Glass, E.M., Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A.C., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G.D., Rodionov, D.A., Rückert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33(17), 5691–5702 (2005)
Kuzniar, A., van Ham, R.C., Pongor, S., Leunissen, J.A.: The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 24(11), 539–551 (2008)
Kristensen, D.M., Wolf, Y.I., Mushegian, A.R., Koonin, E.V.: Computational methods for Gene Orthology inference. Brief. Bioinform. 12(5), 379–391 (2011)
Engelhardt, B.E., Srouji, J.R., Brenner, S.E.: Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res. 21(11), 1969–1980 (2011)
Hawkins, T., Luban, S., Kihara, D.: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3), 566–582 (2009)
Chitale, M., Hawkins, T., Park, C., Kihara, D.: ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25(14), 1739–1745 (2009)
Hawkins, T., Kihara, D.: Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 11, 265 (2010)
Santos, F., Boele, J., Teusink, B.: A practical guide to genome-scale metabolic models and their analysis. Methods Enzymol. 500, 509–532 (2011)
Orth, J.D., Palsson, B.Ø.: Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107(3), 403–412 (2010)
Karp, P.D., Krummenacker, M., Latendresse, M., Dale, J.M., Lee, T.J., Kaipa, P., Gilham, F., Spaulding, A., Popescu, L., Altman, T., Paulsen, I., Keseler, I.M., Caspi, R.: Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11(1), 40–79 (2010)
Karp, P.D., Latendresse, M., Caspi, R.: The pathway tools pathway prediction algorithm. Stand. Genomic Sci. 5(3), 424–429 (2011)
Dale, J.M., Popescu, L., Karp, P.D.: Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 11, 15 (2010)
Green, M.L., Karp, P.D.: A bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 5, 76 (2004)
Ferrer, L., Karp, P.D.: Discovering novel subsystems using comparative genomics. Bioinformatics 27(18), 2478–2485 (2011)
Warde-Farley, D., Comes, O., Zuberi, K., Badrawi, R., Chao, P., Franz, M., Grouios, C., Kazi, F., Lopes, C.T., Maitland, A., Mostafavi, S., Montojo, J., Shao, O., Wright, G., Bader, G.D., Morris, Q.: The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38, W214–W220 (2010)
Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Neural Information Processing Systems 16. MIT Press (2004)
Tsuda, K., Shin, H.J., Schölkopf, B.: Fast protein classification with multiple networks. Bioinformatics 21(suppl. 2), ii59–ii65 (2005)
Mostafavi, S., Warde-Farley, D., Grouios, C., Morris, Q.: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology 9(suppl. 1), S4 (2008)
Rattei, T., Arnold, R., Tischler, P., Lindner, D., Stümpflen, V., Mewes, H.W.: SIMAP: the similarity matrix of proteins. Nucleic Acids Res. 34, D252–D256 (2006)
von Mering, C., Kuhn, M., Chaffron, S., Doerks, T., Krüger, B., Snel, B., Bork, P.: STRING 7–recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 35, D358–D362 (2007)
Powell, S., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T., Letunic, I., Doerks, T., Jensen, L.J., von Mering, C., Bork, P.: eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289 (2012)
Jensen, L.J., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., Bork, P., von Mering, C.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416 (2009)
Armengaud, J.: A perfect genome annotation is within reach with the proteomics and genomics alliance. Curr. Opin. Microbiol. 12(3), 292–300 (2009)
Renuse, S., Chaerkady, R., Pandey, A.: Proteogenomics. Proteomics 11(4), 620–630 (2011)
Castellana, N., Bafna, V.: Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 73(11), 2124–2135 (2010)
Majoros, W.H.: Methods for Computational Gene Prediction. CUP (2007)
Stanke, M., Schöffmann, O., Morgenstern, B., Waack, S.: Gene prediction in eukaryotes with a generalized hidden markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006)
Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H.: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8(10), 785–786 (2011)
Käll, L., Krogh, A., Sonnhammer, E.L.: A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338(5), 1027–1036 (2004)
Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000)
Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305(3), 567–580 (2001)
Horton, P., Park, K.J., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C.J., Nakai, K.: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007)
Blum, T., Briesemeister, S., Kohlbacher, O.: MultiLoc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10, 274 (2009)
Li, L., Stoeckert Jr., C.J., Roos, D.S.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003)
Ostlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Roopra, S., Frings, O., Sonnhammer, E.L.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38, D196–D203 (2010)
Altenhoff, A.M., Schneider, A., Gonnet, G.H., Dessimoz, C.: OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res. 39, D289–D294 (2011)
Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)
Soderlund, C., Nelson, W., Shoemaker, A., Paterson, A.: SyMAP: A system for discovering and viewing syntenic regions of fpc maps. Genome Res 16(9), 1159–1168 (2006)
Green, M.L., Karp, P.D.: Using genome-context data to identify specific types of functional associations in pathway/genome databases. Bioinformatics 23(13), i205–i211 (2007)
Notebaart, R.A., van Enckevort, F.H., Francke, C., Siezen, R.J., Teusink, B.: Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinformatics 7, 296 (2006)
Plata, G., Fuhrer, T., Hsiao, T.L., Sauer, U., Vitkup, D.: Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat. Chem. Biol. (September 9, 2012)
Murphy, C., Wu, M., Butler, G., Tsang, A.: Curation of characterized glycoside hydrolases of fungal origin. Database (May 26, 2011)
Cvijovic, M., Olivares-Hernández, R., Agren, R., Dahr, N., Vongsangnak, W., Nookaew, I., Patil, K.R., Nielsen, J.: BioMet toolbox: genome-wide analysis of metabolism. Nucleic Acids Res. 38, W144–W149 (2010)
Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)
Plewniak, F., Bianchetti, L., Brelivet, Y., Carles, A., Chalmel, F., Lecompte, O., Mochel, T., Moulinier, L., Muller, A., Muller, J., Prigent, V., Ripp, R., Thierr, J.C., Thompson, D.T., Wicker, N., Poch, O.: PipeAlign: A new toolkit for protein family analysis. Nucleic Acids Res. 31(13), 3829–3832 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Butler, G. (2013). Putting It All Together: The Design of a Pipeline for Genome-Wide Functional Annotation of Fungi in the Modern Era of “-Omics” Data and Systems Biology. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science(), vol 7970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39437-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-39437-9_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39436-2
Online ISBN: 978-3-642-39437-9
eBook Packages: Computer ScienceComputer Science (R0)