Skip to main content

Bayesian Data Integration and Enrichment Analysis for Predicting Gene Function in Malaria

  • Conference paper
  • 884 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5635))

Abstract

Malaria is one of the world’s most deadly diseases and is caused by the parasite Plasmodium falciparum. Sixty percent of P. falciparum genes have no known function and therefore new methods of gene function prediction are needed. To address this problem, we train a naïve Bayes classifier on multiple sources of data and subsequently apply a modified version of the Gene Set Enrichment Analysis Algorithm to predict gene function in P. falciparum. To define gene function, we exploit the hierarchical structure of the Gene Ontology, specifically using the Biological Process category. We demonstrate the value of integrating multiple data sources by achieving accurate predictions on genes that cannot be annotated using simple sequence similarity based methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liolios, K., et al.: The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 36, D475–D479 (2008)

    Article  Google Scholar 

  2. Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  3. Pena-Castillo, L., et al.: A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9(suppl. 1), S2 (2008)

    Article  Google Scholar 

  4. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)

    Google Scholar 

  5. Gardner, M.J., et al.: Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498–511 (2002)

    Article  Google Scholar 

  6. Brehelin, L., et al.: PlasmoDraft: a database of Plasmodium falciparum gene function predictions based on postgenomic data. BMC Bioinformatics 9, 440 (2008)

    Article  Google Scholar 

  7. Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research 11, 95–130 (1999)

    MATH  Google Scholar 

  8. Lord, P.W., et al.: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics 19, 1275–1283 (2003)

    Article  Google Scholar 

  9. Wang, J.Z., et al.: A new method to measure the semantic similarity of GO terms. Bioinformatics 23, 1274–1281 (2007)

    Article  Google Scholar 

  10. dfmax.c, ftp://dimacs.rutgers.edu/pub/challenge/graph/solvers

  11. Mulder, N.J., et al.: New developments in the InterPro database. Nucleic Acids Res. 35, D224–D228 (2007)

    Article  Google Scholar 

  12. Quevillon, E., et al.: InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120 (2005)

    Article  Google Scholar 

  13. Stoeckert Jr., C.J., et al.: PlasmoDB v5: new looks, new genomes. Trends Parasitol 22, 543–546 (2006)

    Article  Google Scholar 

  14. Chen, F., et al.: OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. 34, D363–D368 (2006)

    Article  Google Scholar 

  15. Date, S.V., Stoeckert Jr., C.J.: Computational modeling of the Plasmodium falciparum interactome reveals protein function on a genome-wide scale. Genome Res. 16, 542–549 (2006)

    Article  Google Scholar 

  16. Llinas, M., et al.: Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains. Nucleic Acids Res. 34, 1166–1173 (2006)

    Article  Google Scholar 

  17. Young, J.A., et al.: The Plasmodium falciparum sexual development transcriptome: a microarray analysis using ontology-based pattern identification. Mol. Biochem. Parasitol 143, 67–79 (2005)

    Article  Google Scholar 

  18. Le Roch, K.G., et al.: Discovery of gene function by expression profiling of the malaria parasite life cycle. Science 301, 1503–1508 (2003)

    Article  Google Scholar 

  19. Florens, L., et al.: A proteomic view of the Plasmodium falciparum life cycle. Nature 419, 520–526 (2002)

    Article  Google Scholar 

  20. Lasonder, E., et al.: Analysis of the Plasmodium falciparum proteome by high-accuracy mass spectrometry. Nature 419, 537–542 (2002)

    Article  Google Scholar 

  21. Khan, S.M., et al.: Proteome analysis of separated male and female gametocytes reveals novel sex-specific Plasmodium biology. Cell 121, 675–687 (2005)

    Article  Google Scholar 

  22. Le Roch, K.G., et al.: Global analysis of transcript and protein levels across the Plasmodium falciparum life cycle. Genome Res. 14, 2308–2318 (2004)

    Article  Google Scholar 

  23. Hermjakob, H., et al.: IntAct: an open source molecular interaction database. Nucleic Acids Res. 32, D452–D455 (2004)

    Article  Google Scholar 

  24. LaCount, D.J., et al.: A protein interaction network of the malaria parasite Plasmodium falciparum. Nature 438, 103–107 (2005)

    Article  Google Scholar 

  25. Murphy, K.P.: The Bayes Net Toolbox for Matlab. Computing Science and Statistics 33 (2001)

    Google Scholar 

  26. Subramanian, A., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U. S. A. 102, 15545–15550 (2005)

    Article  Google Scholar 

  27. Wuchty, S., Ipsaro, J.J.: A draft of protein interactions in the malaria parasite P. falciparum. J. Proteome Res. 6, 1461–1470 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tedder, P.M.R., Bradford, J.R., Needham, C.J., McConkey, G.A., Bulpitt, A.J., Westhead, D.R. (2009). Bayesian Data Integration and Enrichment Analysis for Predicting Gene Function in Malaria. In: Ambos-Spies, K., Löwe, B., Merkle, W. (eds) Mathematical Theory and Computational Practice. CiE 2009. Lecture Notes in Computer Science, vol 5635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03073-4_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03073-4_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03072-7

  • Online ISBN: 978-3-642-03073-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics