Skip to main content

Putting It All Together: The Design of a Pipeline for Genome-Wide Functional Annotation of Fungi in the Modern Era of “-Omics” Data and Systems Biology

  • Conference paper
Data Integration in the Life Sciences (DILS 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 7970))

Included in the following conference series:

Abstract

The context for bioinformatics continues to change as new technology brings more varied data in greater volume. We present the preliminary design of a pipeline for functional annotation of fungal genomes. Genome-wide functional annotation benefits from the variety and volume of data available from “-omics” technology, and benefits from the perspective of systems biology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aziz, R.K., Bartels, D., Best, A.A., DeJongh, M., Disz, T., Edwards, R.A., Formsma, K., Gerdes, S., Glass, E.M., Kubal, M., Meyer, F., Olsen, G.J., Olson, R., Osterman, A.L., Overbeek, R.A., McNeil, L.K., Paarmann, D., Paczian, T., Parrello, B., Pusch, G.D., Reich, C., Stevens, R., Vassieva, O., Vonstein, V., Wilke, A., Zagnitkos, O.: The RAST server: rapid annotations using subsystems technology. BMC Genomics 9, 75 (2008)

    Article  Google Scholar 

  2. Friedberg, I.: Automated protein function prediction–the genomic challenge. Brief. Bioinform. 7(3), 225–242 (2006)

    Article  Google Scholar 

  3. Erdin, S., Lisewski, A.M., Lichtarge, O.: Protein function prediction: towards integration of similarity metrics. Curr. Opin. Struct. Biol. 21(2), 180–188 (2011)

    Article  Google Scholar 

  4. Galens, K., Daugherty, S., Creasy, H.H., Angiuoli, S., White, O., Wortman, J., Mahurkar, A., Giglio, M.G.: The IGS standard operating procedure for automated prokaryotic annotation. Stand. Genomic Sci. 4(2), 244–251 (2011)

    Article  Google Scholar 

  5. Mi, H., Muruganujan, A., Gaudet, P., Lewis, S., Thomas, P.D.: PANTHER version 7: improved phylogenetic trees, orthologs and collaboration with the Gene Ontology Consortium. Nucleic Acids Res. 38, D204–D210 (2010)

    Google Scholar 

  6. Ooi, H.S., Kwo, C.Y., Wildpaner, M., Sirota, F.L., Eisenhaber, B., Maurer-Stroh, S., Wong, W.C., Schleiffer, A., Schneider, G.: ANNIE: integrated de novo protein sequence annotation. Nucleic Acids Res. 37, W435–W440 (2009)

    Google Scholar 

  7. Martinez, D., Grigoriev, I.V., Salamov, A.A.: Annotation of fungal genomes. Proc. ANAS (Biol.) 65(5-6), 177–183 (2010)

    Google Scholar 

  8. Haas, B.J., Pearson, M.D., Cuomo, C.A., Wortman, J.R.: Approaches to fungal genome annotation. Mycology 2(3), 118–141 (2011)

    Google Scholar 

  9. Mewes, H.W., Frishman, D., Gregory, R., Mannhaupt, G., Mayer, K.F., Münsterkötter, M., Ruepp, A., Spannagl, M., Stümpflen, V., Rattei, T.: MIPS: analysis and annotation of genome information in 2007. Nucleic Acids Res. 36, D196–D201 (2008)

    Google Scholar 

  10. Martin, T., Durrens, P.: Génolevures: Policy for automated annotation of genome sequences, http://www.pasteur.fr/ip/resource/filecenter/document/01s-00004f-0e5/abstract-156.pdf

  11. Angiuoli, S.V., Matalka, M., Gussman, G., Galens, K., Vangala, M., Riley, D.R., Arze, C., White, J.R., White, O., Fricke, W.F.: CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinformatics 12, 356 (2011)

    Article  Google Scholar 

  12. Frishman, D.: Protein annotation at genomic scale: the current status. Chem. Rev. 107(8), 3448–3466 (2007)

    Article  Google Scholar 

  13. Hawkins, T., Kihara, D.: Function prediction of uncharacterized proteins. J. Bioinform. Comput. Biol. 5(1), 1–30 (2007)

    Article  Google Scholar 

  14. Janga, S.C., Moreno-Hagelsieb, G.: Network-based function prediction and interactomics: the case for metabolic enzymes. Metab. Eng. 13(1), 1–10 (2011)

    Article  Google Scholar 

  15. Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15(3), 275–284 (2005)

    Article  Google Scholar 

  16. Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Mol. Systems Biol. 3, 88 (2007)

    Article  Google Scholar 

  17. Claudel-Renard, C., Faraut, T., Kahn, D.: Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. 31(22), 6633–6639 (2003)

    Article  Google Scholar 

  18. Ferrer, L., Dale, J.M., Karp, P.D.: A systematic study of genome context methods: calibration, normalization and combination. BMC Bioinformatics 11, 493 (2010)

    Article  Google Scholar 

  19. Lima, T., Coudert, E., Keller, G., Michoud, K., Rivoire, C., Bulliard, V., de Castro, E., Lachaize, C., Baratin, D., Phan, I., Bougueleret, L., Bairoch, A.: HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478 (2009)

    Google Scholar 

  20. Kretschmann, E., Apweiler, R.: Automatic rule generation for protein annotation with the C4. data mining algorithm applied on SWISS-PROT. Bioinformatics 17(10), 920–926 (2001)

    Article  Google Scholar 

  21. Yu, G.X.: Ruleminer: a knowledge system for supporting high-throughput protein function annotations. J. Bioinform. Comput. Biol. 2(4), 615–637 (2004)

    Article  Google Scholar 

  22. Artamonova, I.I., Gelfand, M.S., Frishman, D.: Mining sequence annotation databanks for association patterns. Bioinformatics 21, iii49–iii57 (2005)

    Article  Google Scholar 

  23. Poptsova, M.S., Gogarten, J.P.: Using comparative genome analysis to identify problems in annotated microbial genomes. Microbiology 156(7), 1909–1917 (2010)

    Article  Google Scholar 

  24. Madupu, R., Dodson, R.J., Brinkac, L., Harkins, D., Durkin, S., Shrivastava, S., Sutton, G., Haft, D.: CharProtDB: a database of experimentally characterized protein annotations. Nucleic Acids Res. 40, D237–D241 (2012)

    Article  Google Scholar 

  25. Overbeek, R., Devine, D., Vonstein, V.: Curation is forever: comparative genomics approaches to functional annotation. Targets 2(4), 138–146 (2003)

    Article  Google Scholar 

  26. Overbeek, R., Begley, T., Butler, R.M., Choudhuri, J.V., Chuang, H.Y., Cohoon, M., de Crécy-Lagard, V., Diaz, N., Disz, T., Edwards, R., Fonstein, M., Frank, E.D., Gerdes, S., Glass, E.M., Goesmann, A., Hanson, A., Iwata-Reuyl, D., Jensen, R., Jamshidi, N., Krause, L., Kubal, M., Larsen, N., Linke, B., McHardy, A.C., Meyer, F., Neuweger, H., Olsen, G., Olson, R., Osterman, A., Portnoy, V., Pusch, G.D., Rodionov, D.A., Rückert, C., Steiner, J., Stevens, R., Thiele, I., Vassieva, O., Ye, Y., Zagnitko, O., Vonstein, V.: The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 33(17), 5691–5702 (2005)

    Article  Google Scholar 

  27. Kuzniar, A., van Ham, R.C., Pongor, S., Leunissen, J.A.: The quest for orthologs: finding the corresponding gene across genomes. Trends Genet. 24(11), 539–551 (2008)

    Article  Google Scholar 

  28. Kristensen, D.M., Wolf, Y.I., Mushegian, A.R., Koonin, E.V.: Computational methods for Gene Orthology inference. Brief. Bioinform. 12(5), 379–391 (2011)

    Article  Google Scholar 

  29. Engelhardt, B.E., Srouji, J.R., Brenner, S.E.: Genome-scale phylogenetic function annotation of large and diverse protein families. Genome Res. 21(11), 1969–1980 (2011)

    Article  Google Scholar 

  30. Hawkins, T., Luban, S., Kihara, D.: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3), 566–582 (2009)

    Article  Google Scholar 

  31. Chitale, M., Hawkins, T., Park, C., Kihara, D.: ESG: extended similarity group method for automated protein function prediction. Bioinformatics 25(14), 1739–1745 (2009)

    Article  Google Scholar 

  32. Hawkins, T., Kihara, D.: Functional enrichment analyses and construction of functional similarity networks with high confidence function prediction by PFP. BMC Bioinformatics 11, 265 (2010)

    Article  Google Scholar 

  33. Santos, F., Boele, J., Teusink, B.: A practical guide to genome-scale metabolic models and their analysis. Methods Enzymol. 500, 509–532 (2011)

    Article  Google Scholar 

  34. Orth, J.D., Palsson, B.Ø.: Systematizing the generation of missing metabolic knowledge. Biotechnol. Bioeng. 107(3), 403–412 (2010)

    Article  Google Scholar 

  35. Karp, P.D., Krummenacker, M., Latendresse, M., Dale, J.M., Lee, T.J., Kaipa, P., Gilham, F., Spaulding, A., Popescu, L., Altman, T., Paulsen, I., Keseler, I.M., Caspi, R.: Pathway Tools version 13.0: integrated software for pathway/genome informatics and systems biology. Brief. Bioinform. 11(1), 40–79 (2010)

    Article  Google Scholar 

  36. Karp, P.D., Latendresse, M., Caspi, R.: The pathway tools pathway prediction algorithm. Stand. Genomic Sci. 5(3), 424–429 (2011)

    Article  Google Scholar 

  37. Dale, J.M., Popescu, L., Karp, P.D.: Machine learning methods for metabolic pathway prediction. BMC Bioinformatics 11, 15 (2010)

    Article  Google Scholar 

  38. Green, M.L., Karp, P.D.: A bayesian method for identifying missing enzymes in predicted metabolic pathway databases. BMC Bioinformatics 5, 76 (2004)

    Article  Google Scholar 

  39. Ferrer, L., Karp, P.D.: Discovering novel subsystems using comparative genomics. Bioinformatics 27(18), 2478–2485 (2011)

    Article  Google Scholar 

  40. Warde-Farley, D., Comes, O., Zuberi, K., Badrawi, R., Chao, P., Franz, M., Grouios, C., Kazi, F., Lopes, C.T., Maitland, A., Mostafavi, S., Montojo, J., Shao, O., Wright, G., Bader, G.D., Morris, Q.: The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. 38, W214–W220 (2010)

    Google Scholar 

  41. Zhou, D., Bousquet, O., Lal, T.N., Weston, J., Schölkopf, B.: Learning with local and global consistency. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) Neural Information Processing Systems 16. MIT Press (2004)

    Google Scholar 

  42. Tsuda, K., Shin, H.J., Schölkopf, B.: Fast protein classification with multiple networks. Bioinformatics 21(suppl. 2), ii59–ii65 (2005)

    Google Scholar 

  43. Mostafavi, S., Warde-Farley, D., Grouios, C., Morris, Q.: GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology 9(suppl. 1), S4 (2008)

    Google Scholar 

  44. Rattei, T., Arnold, R., Tischler, P., Lindner, D., Stümpflen, V., Mewes, H.W.: SIMAP: the similarity matrix of proteins. Nucleic Acids Res. 34, D252–D256 (2006)

    Article  Google Scholar 

  45. von Mering, C., Kuhn, M., Chaffron, S., Doerks, T., Krüger, B., Snel, B., Bork, P.: STRING 7–recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 35, D358–D362 (2007)

    Article  Google Scholar 

  46. Powell, S., Trachana, K., Roth, A., Kuhn, M., Muller, J., Arnold, R., Rattei, T., Letunic, I., Doerks, T., Jensen, L.J., von Mering, C., Bork, P.: eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges. Nucleic Acids Res. 40, D284–D289 (2012)

    Article  Google Scholar 

  47. Jensen, L.J., Stark, M., Chaffron, S., Creevey, C., Muller, J., Doerks, T., Julien, P., Roth, A., Simonovic, M., Bork, P., von Mering, C.: STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416 (2009)

    Article  Google Scholar 

  48. Armengaud, J.: A perfect genome annotation is within reach with the proteomics and genomics alliance. Curr. Opin. Microbiol. 12(3), 292–300 (2009)

    Article  Google Scholar 

  49. Renuse, S., Chaerkady, R., Pandey, A.: Proteogenomics. Proteomics 11(4), 620–630 (2011)

    Article  Google Scholar 

  50. Castellana, N., Bafna, V.: Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteomics 73(11), 2124–2135 (2010)

    Article  Google Scholar 

  51. Majoros, W.H.: Methods for Computational Gene Prediction. CUP (2007)

    Google Scholar 

  52. Stanke, M., Schöffmann, O., Morgenstern, B., Waack, S.: Gene prediction in eukaryotes with a generalized hidden markov model that uses hints from external sources. BMC Bioinformatics 7, 62 (2006)

    Article  Google Scholar 

  53. Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H.: SignalP 4.0: discriminating signal peptides from transmembrane regions. Nat. Methods 8(10), 785–786 (2011)

    Article  Google Scholar 

  54. Käll, L., Krogh, A., Sonnhammer, E.L.: A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol. 338(5), 1027–1036 (2004)

    Article  Google Scholar 

  55. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J. Mol. Biol. 300(4), 1005–1016 (2000)

    Article  Google Scholar 

  56. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L.: Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305(3), 567–580 (2001)

    Article  Google Scholar 

  57. Horton, P., Park, K.J., Obayashi, T., Fujita, N., Harada, H., Adams-Collier, C.J., Nakai, K.: WoLF PSORT: protein localization predictor. Nucleic Acids Res. 35, W585–W587 (2007)

    Article  Google Scholar 

  58. Blum, T., Briesemeister, S., Kohlbacher, O.: MultiLoc2: integrating phylogeny and gene ontology terms improves subcellular protein localization prediction. BMC Bioinformatics 10, 274 (2009)

    Article  Google Scholar 

  59. Li, L., Stoeckert Jr., C.J., Roos, D.S.: OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13(9), 2178–2189 (2003)

    Article  Google Scholar 

  60. Ostlund, G., Schmitt, T., Forslund, K., Köstler, T., Messina, D.N., Roopra, S., Frings, O., Sonnhammer, E.L.: InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 38, D196–D203 (2010)

    Article  Google Scholar 

  61. Altenhoff, A.M., Schneider, A., Gonnet, G.H., Dessimoz, C.: OMA 2011: orthology inference among 1000 complete genomes. Nucleic Acids Res. 39, D289–D294 (2011)

    Article  Google Scholar 

  62. Kurtz, S., Phillippy, A., Delcher, A.L., Smoot, M., Shumway, M., Antonescu, C., Salzberg, S.L.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004)

    Google Scholar 

  63. Soderlund, C., Nelson, W., Shoemaker, A., Paterson, A.: SyMAP: A system for discovering and viewing syntenic regions of fpc maps. Genome Res 16(9), 1159–1168 (2006)

    Article  Google Scholar 

  64. Green, M.L., Karp, P.D.: Using genome-context data to identify specific types of functional associations in pathway/genome databases. Bioinformatics 23(13), i205–i211 (2007)

    Article  Google Scholar 

  65. Notebaart, R.A., van Enckevort, F.H., Francke, C., Siezen, R.J., Teusink, B.: Accelerating the reconstruction of genome-scale metabolic networks. BMC Bioinformatics 7, 296 (2006)

    Article  Google Scholar 

  66. Plata, G., Fuhrer, T., Hsiao, T.L., Sauer, U., Vitkup, D.: Global probabilistic annotation of metabolic networks enables enzyme discovery. Nat. Chem. Biol. (September 9, 2012)

    Google Scholar 

  67. Murphy, C., Wu, M., Butler, G., Tsang, A.: Curation of characterized glycoside hydrolases of fungal origin. Database (May 26, 2011)

    Google Scholar 

  68. Cvijovic, M., Olivares-Hernández, R., Agren, R., Dahr, N., Vongsangnak, W., Nookaew, I., Patil, K.R., Nielsen, J.: BioMet toolbox: genome-wide analysis of metabolism. Nucleic Acids Res. 38, W144–W149 (2010)

    Article  Google Scholar 

  69. Brown, D.P., Krishnamurthy, N., Sjölander, K.: Automated protein subfamily identification and classification. PLoS Comput. Biol. 3(8), e160 (2007)

    Google Scholar 

  70. Plewniak, F., Bianchetti, L., Brelivet, Y., Carles, A., Chalmel, F., Lecompte, O., Mochel, T., Moulinier, L., Muller, A., Muller, J., Prigent, V., Ripp, R., Thierr, J.C., Thompson, D.T., Wicker, N., Poch, O.: PipeAlign: A new toolkit for protein family analysis. Nucleic Acids Res. 31(13), 3829–3832 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Butler, G. (2013). Putting It All Together: The Design of a Pipeline for Genome-Wide Functional Annotation of Fungi in the Modern Era of “-Omics” Data and Systems Biology. In: Baker, C.J.O., Butler, G., Jurisica, I. (eds) Data Integration in the Life Sciences. DILS 2013. Lecture Notes in Computer Science(), vol 7970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39437-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39437-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39436-2

  • Online ISBN: 978-3-642-39437-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics