Abstract
Due to the advancement in high throughput technologies and robust experimental designs, many recent studies attempt to incorporate heterogeneous data obtained from multiple technologies to improve our understanding of the molecular dynamics associated with biological processes. Currently available technologies produce wide variety of large amount of data spanning from genomics, transcriptomics, proteomics, and epigenetics. Due to the fact that such multi-omics data are very diverse and come from different biological levels, it has been a major research challenge to develop a model to properly integrate all available and relevant data to advance biomedical research. It has been argued by many researchers that the integration of multi-omics data to extract relevant biological information is currently one of the major biomedical informatics challenges. This paper proposes a new graph database model to efficiently store and mine multi-omics data. We show a working model of this graph database with transcriptomics, genomics, epigenetics and clinical data for three cancer types from the Cancer Genome Atlas. Moreover, we highlight the usefulness of graph database mining to extract relevant biological interpretations and also to find novel relationships between different data levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arencibia, J.M., MartÃn, S., Pérez-RodrÃguez, F.J., Bonnin, A.: Gene expression profiling reveals overexpression of TSPAN13 in prostate cancer. Int. J. Oncol. 34(2), 457–463 (2009)
Balaur, I., Mazein, A., Saqi, M., Lysenko, A., Rawlings, C.J., Auffray, C.: Recon2Neo4j: applying graph database technologies for managing comprehensive genome-scale networks. Bioinformatics 33(7), 1096–1098 (2016)
Balaur, I., et al.: Epigenet: a graph database of interdependencies between genetic and epigenetic events in colorectal cancer. J. Comput. Biol. 24(10), 969–980 (2017)
Barretina, J., et al.: The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483(7391), 603 (2012)
Costa, R.L., Gadelha, L., Ribeiro-Alves, M., Porto, F.: Gennet: An integrated platform for unifying scientific workflow management and graph databases for transcriptome data analysis, p. 095257. bioRxiv (2016)
Decker, S., Deus, H., Iqbal, A., Kamdar, M., Saleem, M.: Genomesnip: fragmenting the genomic wheel to augment discovery in cancer research. In: Conference on Semantics in Healthcare and Life Sciences (CSHALS). ISCB (2014)
Fabregat, A., et al.: Reactome graph database: efficient access to complex pathway data. PLoS Comput. Biol. 14(1), e1005968 (2018)
Fagan, A., Culhane, A.C., Higgins, D.G.: A multivariate analysis approach to the integration of proteomic and gene expression data. Proteomics 7(13), 2162–2171 (2007)
Gao, J., et al.: Integrative analysis of complex cancer genomics and clinical profiles using the cbioportal. Sci. Signal. 6(269), pl1–pl1 (2013). https://doi.org/10.1126/scisignal.2004088. http://stke.sciencemag.org/content/6/269/pl1
He, X., et al.: Methycancer: the database of human dna methylation and cancer. Nucleic Acids Res. 36(suppl-1), D836–D841 (2007)
Hirai, M.Y., et al.: Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in arabidopsis thaliana. Proc. Nat. Acad. Sci. USA 101(27), 10205–10210 (2004)
Kan, Z., et al.: Whole-genome sequencing identifies recurrent mutations in hepatocellular carcinoma. Genome Res. 23(9), 1422–1433 (2013)
Kazantsev, F., et al.: Mammoth: a new database for curated mathematical models of biomolecular systems. J. Bioinform. Comput. Biol. 16(01), 1740010 (2018)
Lawrence, M.S., et al.: Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499(7457), 214 (2013)
Lee, W., et al.: The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465(7297), 473 (2010)
Meng, C., Kuster, B., Culhane, A.C., Gholami, A.M.: A multivariate approach tothe integration of multi-omics datasets. BMC Bioinform. 15(1), 162 (2014). https://doi.org/10.1186/1471-2105-15-162
Network, C.G.A.R., et al.: Comprehensive molecular profiling of lung adenocarcinoma. Nature 511(7511), 543 (2014)
Nomoto, S., et al.: Adverse prognosis of epigenetic inactivation in runx3 gene at 1p36 in human pancreatic cancer. Br. J. Cancer 98(10), 1690 (2008)
Ren, B., et al.: MCM7 amplification and overexpression are associated with prostate cancer progression. Oncogene 25(7), 1090 (2006)
Robinson, I., Webber, J., Eifrem, E.: Graph Databases. O’Reilly Media, Inc., Newton (2013)
Romero, R., et al.: The use of high-dimensional biology (genomics, transcriptomics, proteomics, and metabolomics) to understand the preterm parturition syndrome. BJOG Int. J. Obstet. Gynaecol. 113(s3), 118–135 (2006)
Samur, M.K., et al.: canEvolve: a web portal for integrative oncogenomics. PLOS ONE 8(2), 1–10 (2013). https://doi.org/10.1371/journal.pone.0056228
Swainston, N., et al.: biochem4j: Integrated and extensible biochemical knowledge through graph databases. PLoS One 12(7), e0179130 (2017)
Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19(1A), A68 (2015)
Touré, V., et al.: STON: exploring biological pathways using the SBGN standard and graph databases. BMC Bioinform. 17(1), 494 (2016). https://doi.org/10.1186/s12859-016-1394-x
Ueki, T., et al.: Hypermethylation of multiple genes in pancreatic adenocarcinoma. Cancer Res. 60(7), 1835–1839 (2000)
Yoon, B.H., Kim, S.K., Kim, S.Y.: Use of graph database for the integration of heterogeneous biological data. Genomics Inform. 15(1), 19–27 (2017)
Acknowledegment
This work was partly funded by the System Science Grant supported by Nebraska Research Initiative (NRI).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Thapa, I., Ali, H. (2020). A New Graph Database System for Multi-omics Data Integration and Mining Complex Biological Information. In: Măndoiu, I., Murali, T., Narasimhan, G., Rajasekaran, S., Skums, P., Zelikovsky, A. (eds) Computational Advances in Bio and Medical Sciences. ICCABS 2019. Lecture Notes in Computer Science(), vol 12029. Springer, Cham. https://doi.org/10.1007/978-3-030-46165-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-46165-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46164-5
Online ISBN: 978-3-030-46165-2
eBook Packages: Computer ScienceComputer Science (R0)