Skip to main content

Data Integration and Knowledge Discovery in Life Sciences

  • Conference paper
Trends in Applied Intelligent Systems (IEA/AIE 2010)

Abstract

Recent advances in various forms of omics technologies have generated huge amount of data. To fully exploit these data sets that in many cases are publicly available, robust computational methodologies need to be developed to deal with the storage, integration, analysis, visualization, and dissemination of these data. In this paper, we describe some of our research activities in data integration leading to novel knowledge discovery in life sciences. Our multi-strategy approach with integration of prior knowledge facilitates a novel means to identify informative genes that could have been missed by the commonly used methods. Our transcriptomics-proteomics integrative framework serves as a means to enhance the confidence of and also to complement transcriptomics discovery. Our new research direction in integrative data analysis of omics data is targeted to identify molecular associations to disease and therapeutic response signatures. The ultimate goal of this research is to facilitate the development of clinical test-kits for early detection, accurate diagnosis/prognosis of disease, and better personalized therapeutic management.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Joyce, A.R., Palsson, B.O.: The model organism as a system: integrating ‘omics’ data sets. Nat. Rev. Mol. Cell Biol. 7, 198–210 (2006)

    Article  Google Scholar 

  2. Baxevanis, A.D.: The importance of biological databases in biological discovery. Curr. Protoc. Bioinformatics Chapter 1: Unit 1.1 (2009)

    Google Scholar 

  3. Galperin, M.Y., Cochrane, G.R.: Nucleic acids research annual database issue and the nar online molecular biology database collection in 2009. Nucleic Acids Res. 37, D1–D4 (2009)

    Article  Google Scholar 

  4. Fleischmann, R.D., Adams, M.D., et al.: Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995)

    Article  Google Scholar 

  5. National Center for Biotechnology Information (NCBI): Genome sequencing projects statistics, http://www.ncbi.nlm.nih.gov (retrieved December 6, 2009)

  6. Brent, M.R.: Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9, 62–73 (2008)

    Article  Google Scholar 

  7. ENCODE Project Consortium: The ENCODE (ENCyclopedia Of DNA Elements) Project. Science 306, 636–640 (2004)

    Google Scholar 

  8. Allison, D.B., Cui, X., et al.: Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 7, 55–65 (2006)

    Article  Google Scholar 

  9. Mockler, T.C., Chan, S., et al.: Applications of DNA tiling arrays for whole-genome analysis. Genomics 85, 1–15 (2005)

    Article  Google Scholar 

  10. Shendure, J., Ji, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008)

    Article  Google Scholar 

  11. Ostrowski, J., Wyrwicz, L.S.: Integrating genomics, proteomics and bioinformatics in translational studies of molecular medicine. Expert. Rev. Mol. Diagn. 9, 623–630 (2009)

    Article  Google Scholar 

  12. Hu, Q., Noll, R.J., et al.: The Orbitrap: a new mass spectrometer. J. Mass. Spectrom. 40, 430–443 (2005)

    Article  Google Scholar 

  13. Lubec, G., Afjehi-Sadat, L.: Limitations and pitfalls in protein identification by mass spectrometry. Chem. Rev. 107, 3568–3584 (2007)

    Article  Google Scholar 

  14. Nie, L., Wu, G., et al.: Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications. Crit. Rev. Biotechnol. 27, 63–75 (2007)

    Article  Google Scholar 

  15. Liu, Z., Phan, S., Famili, F., Pan, Y., Lenferink, A., Cantin, C., Collins, C., O’Connor-McCourt, M.: A multi-strategy approach to informative genes identification from gene expression data. J. Bioinfo. Comput. Biol (2010) (in press)

    Google Scholar 

  16. Phan, S., Shearer, H., Tchagang, A., Liu, Z., Famili, F., Fobert, F., Pan, Y.: Arabidopsis thaliana defense gene response under pathogen challenge. In: The 9th GHI-AGM, Montreal, June 8-10 (2009)

    Google Scholar 

  17. Subramanian, A., Tamayo, P., et al.: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. 102, 15545–15550 (2005)

    Article  Google Scholar 

  18. Goeman, J.J., Buhlmann, P.: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980–987 (2007)

    Article  Google Scholar 

  19. Ogata, H., Goto, S., et al.: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 27, 29–34 (1999)

    Article  Google Scholar 

  20. Ashburner, M., Ball, C.A., et al.: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000)

    Article  Google Scholar 

  21. Fobert, P., Després, C.: Redox control of systemic acquired resistance. Curr. Op. Plant Biol. 8, 378–382 (2005)

    Article  Google Scholar 

  22. Kesarwani, M., Yoo, J., Dong, X.: Genetic Interactions of TGA transcription factors in the regulation of pathogenesis-related genes and disease resistance in Arabidopsis. Plant Physiol. 14, 336–346 (2007)

    Article  Google Scholar 

  23. Lenferink, A.E.G., Magoon, J., Cantin, C., O’Connor-McCourt, M.D.: Investigation of three new mouse mammary tumor cell lines as models for transforming growth factor (TGF)-β and Neu pathway signaling studies: identification of a novel model for TGF-β-induced epithelial-to-mesenchymal transition. Breast Cancer Res. 6, 514–530 (2004)

    Article  Google Scholar 

  24. Hill, J.J., Tremblay, T.L., Cantin, C., O’Connor-McCourt, M.D., Kelly, J.F., Lenferink, A.E.G.: Glycoproteomic analysis of two mouse mammary cell lines during transforming growth factor (TGF)-β induced epithelial to mesenchymal transition. Proteome Science 7(2) (2009)

    Google Scholar 

  25. Tainsky, M.A.: Genomic and proteomic biomarkers for cancer: a multitude of opportunities. Biochim. Biophys. Acta 1796, 176–193 (2009)

    Google Scholar 

  26. Chin, L., Gray, J.W.: Translating insights from the cancer genome into clinical practice. Nature 452, 553–563 (2008)

    Article  Google Scholar 

  27. Ross, J.S.: Multigene classifiers, prognostic factors, and predictors of breast cancer clinical outcome. Adv. Anat. Pathol. 16, 204–215 (2009)

    Article  Google Scholar 

  28. The Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008)

    Google Scholar 

  29. Dinu, I., Potter, J.D., et al.: Gene-set analysis and reduction. Brief Bioinform. 10, 24–34 (2009)

    Article  Google Scholar 

  30. Khatri, P., Draghici, S.: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Famili, F., Phan, S., Fauteux, F., Liu, Z., Pan, Y. (2010). Data Integration and Knowledge Discovery in Life Sciences. In: García-Pedrajas, N., Herrera, F., Fyfe, C., Benítez, J.M., Ali, M. (eds) Trends in Applied Intelligent Systems. IEA/AIE 2010. Lecture Notes in Computer Science(), vol 6098. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13033-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13033-5_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13032-8

  • Online ISBN: 978-3-642-13033-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics