Skip to main content

A New Graph Database System for Multi-omics Data Integration and Mining Complex Biological Information

  • Conference paper
  • First Online:
Computational Advances in Bio and Medical Sciences (ICCABS 2019)

Abstract

Due to the advancement in high throughput technologies and robust experimental designs, many recent studies attempt to incorporate heterogeneous data obtained from multiple technologies to improve our understanding of the molecular dynamics associated with biological processes. Currently available technologies produce wide variety of large amount of data spanning from genomics, transcriptomics, proteomics, and epigenetics. Due to the fact that such multi-omics data are very diverse and come from different biological levels, it has been a major research challenge to develop a model to properly integrate all available and relevant data to advance biomedical research. It has been argued by many researchers that the integration of multi-omics data to extract relevant biological information is currently one of the major biomedical informatics challenges. This paper proposes a new graph database model to efficiently store and mine multi-omics data. We show a working model of this graph database with transcriptomics, genomics, epigenetics and clinical data for three cancer types from the Cancer Genome Atlas. Moreover, we highlight the usefulness of graph database mining to extract relevant biological interpretations and also to find novel relationships between different data levels.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arencibia, J.M., Martín, S., Pérez-Rodríguez, F.J., Bonnin, A.: Gene expression profiling reveals overexpression of TSPAN13 in prostate cancer. Int. J. Oncol. 34(2), 457–463 (2009)

    Google Scholar 

  2. Balaur, I., Mazein, A., Saqi, M., Lysenko, A., Rawlings, C.J., Auffray, C.: Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks. Bioinformatics 33(7), 1096–1098 (2016)

    Google Scholar 

  3. Balaur, I., et al.: Epigenet: a graph database of interdependencies between genetic and epigenetic events in colorectal cancer. J. Comput. Biol. 24(10), 969–980 (2017)

    Article  Google Scholar 

  4. Barretina, J., et al.: The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391), 603 (2012)

    Article  Google Scholar 

  5. Costa, R.L., Gadelha, L., Ribeiro-Alves, M., Porto, F.: Gennet: An integrated platform for unifying scientific workflow management and graph databases for transcriptome data analysis, p. 095257. bioRxiv (2016)

    Google Scholar 

  6. Decker, S., Deus, H., Iqbal, A., Kamdar, M., Saleem, M.: Genomesnip: fragmenting the genomic wheel to augment discovery in cancer research. In: Conference on Semantics in Healthcare and Life Sciences (CSHALS). ISCB (2014)

    Google Scholar 

  7. Fabregat, A., et al.: Reactome graph database: efficient access to complex pathway data. PLoS Comput. Biol. 14(1), e1005968 (2018)

    Article  MathSciNet  Google Scholar 

  8. Fagan, A., Culhane, A.C., Higgins, D.G.: A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7(13), 2162–2171 (2007)

    Article  Google Scholar 

  9. Gao, J., et al.: Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal. Sci. Signal. 6(269), pl1–pl1 (2013). https://doi.org/10.1126/scisignal.2004088. http://stke.sciencemag.org/content/6/269/pl1

    Article  Google Scholar 

  10. He, X., et al.: Methycancer: the database of human dna methylation and cancer. Nucleic Acids Res. 36(suppl-1), D836–D841 (2007)

    Article  Google Scholar 

  11. Hirai, M.Y., et al.: Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in arabidopsis thaliana. Proc. Nat. Acad. Sci. USA 101(27), 10205–10210 (2004)

    Article  Google Scholar 

  12. Kan, Z., et al.: Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res. 23(9), 1422–1433 (2013)

    Article  Google Scholar 

  13. Kazantsev, F., et al.: Mammoth: a new database for curated mathematical models of biomolecular systems. J. Bioinform. Comput. Biol. 16(01), 1740010 (2018)

    Article  Google Scholar 

  14. Lawrence, M.S., et al.: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457), 214 (2013)

    Article  Google Scholar 

  15. Lee, W., et al.: The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465(7297), 473 (2010)

    Article  Google Scholar 

  16. Meng, C., Kuster, B., Culhane, A.C., Gholami, A.M.: A multivariate approach tothe integration of multi-omics datasets. BMC Bioinform. 15(1), 162 (2014). https://doi.org/10.1186/1471-2105-15-162

    Article  Google Scholar 

  17. Network, C.G.A.R., et al.: Comprehensive molecular profiling of lung adenocarcinoma. Nature 511(7511), 543 (2014)

    Article  Google Scholar 

  18. Nomoto, S., et al.: Adverse prognosis of epigenetic inactivation in runx3 gene at 1p36 in human pancreatic cancer. Br. J. Cancer 98(10), 1690 (2008)

    Article  Google Scholar 

  19. Ren, B., et al.: MCM7 amplification and overexpression are associated with prostate cancer progression. Oncogene 25(7), 1090 (2006)

    Article  Google Scholar 

  20. Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media, Inc., Newton (2013)

    Google Scholar 

  21. Romero, R., et al.: The use of high-dimensional biology (genomics, transcriptomics, proteomics, and metabolomics) to understand the preterm parturition syndrome. BJOG Int. J. Obstet. Gynaecol. 113(s3), 118–135 (2006)

    Article  Google Scholar 

  22. Samur, M.K., et al.: canEvolve: a web portal for integrative oncogenomics. PLOS ONE 8(2), 1–10 (2013). https://doi.org/10.1371/journal.pone.0056228

    Article  Google Scholar 

  23. Swainston, N., et al.: biochem4j: Integrated and extensible biochemical knowledge through graph databases. PLoS One 12(7), e0179130 (2017)

    Article  Google Scholar 

  24. Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19(1A), A68 (2015)

    Google Scholar 

  25. Touré, V., et al.: STON: exploring biological pathways using the SBGN standard and graph databases. BMC Bioinform. 17(1), 494 (2016). https://doi.org/10.1186/s12859-016-1394-x

    Article  Google Scholar 

  26. Ueki, T., et al.: Hypermethylation of multiple genes in pancreatic adenocarcinoma. Cancer Res. 60(7), 1835–1839 (2000)

    Google Scholar 

  27. Yoon, B.H., Kim, S.K., Kim, S.Y.: Use of graph database for the integration of heterogeneous biological data. Genomics Inform. 15(1), 19–27 (2017)

    Article  Google Scholar 

Download references

Acknowledegment

This work was partly funded by the System Science Grant supported by Nebraska Research Initiative (NRI).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hesham Ali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Thapa, I., Ali, H. (2020). A New Graph Database System for Multi-omics Data Integration and Mining Complex Biological Information. In: Măndoiu, I., Murali, T., Narasimhan, G., Rajasekaran, S., Skums, P., Zelikovsky, A. (eds) Computational Advances in Bio and Medical Sciences. ICCABS 2019. Lecture Notes in Computer Science(), vol 12029. Springer, Cham. https://doi.org/10.1007/978-3-030-46165-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46165-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46164-5

  • Online ISBN: 978-3-030-46165-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics