Abstract
In recent years, due to the high availability of omic data, data driven biology has greatly expanded. However, the analysis of different data sources is still an open challenge. A few multi-omic approaches have been proposed in literature. However, none of them take into consideration the intrinsic topology of each omic. In this work, an unsupervised learning method based on a deep neural network is proposed. For each omic, a separate network is trained, whose outputs are fused into a single graph; for this purpose, an innovative loss function has been designed to better represent the data cluster manifolds. A graph adjacency matrix is exploited to determine similarities among samples. With this approach, omics having a different number of features are merged into a unique representation. Quantitative and qualitative analyses show that the proposed method has results comparable to the state of the art. The method has a great intrinsic flexibility as it can be customized according to the complexity of the tasks and it has a lot of room for future improvements compared to more fine-tuned methods, opening the way for future research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2016), pp. 265–283 (2016)
Altman, N., Krzywinski, M.: The curse (s) of dimensionality. Nat. Methods 15(6), 399–400 (2018)
Anders, S., Huber, W.: Differential expression of RNA-Seq data at the gene level – the DESeq package. European Molecular Biology Laboratory (EMBL), Heidelberg, Germany, 10, f1000research (2012)
Anders, S., Pyl, P.T., Huber, W.: HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)
Barbiero, P., Bertotti, A., Ciravegna, G., Cirrincione, G., Cirrincione, M., Piccolo, E.: Neural biclustering in gene expression analysis. In: International Conference on Computational Science and Computational Intelligence (2017)
Barbiero, P., Ciravegna, G., Randazzo, V., Cirrincione, G.: Topological gradient-based competitive learning (2020)
Barbiero, P., Squillero, G., Tonda, A.: Modeling generalization in machine learning: a methodological and computational study (2020)
Chaudhary, K., Poirion, O.B., Lu, L., Garmire, L.X.: Deep learning–based multi-omics integration robustly predicts survival in liver cancer. Clin. Cancer Res. 24(6), 1248–1259 (2018)
Chu, A., et al.: Large-scale profiling of microRNAs for the cancer genome atlas. Nucleic Acids Res. 44(1), e3 (2016)
Cirrincione, G., Ciravegna, G., Barbiero, P., Randazzo, V., Pasero, E.: The GH-EXIN neural network for hierarchical clustering. Neural Networks 121, 57–73 (2020)
Ester, M., Kriegel, H.P., Sander, J., Xu, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Gao, S., et al.: Unsupervised clustering reveals new prostate cancer subtypes. Transl. Cancer Res. 6(3), 561–572 (2017)
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Hagberg, A., Swart, P., Chult, D.S.: Exploring network structure, dynamics, and function using NetworkX. Technical report, Los Alamos National Lab. (LANL), LosAlamos, NM (United States) (2008)
Hubbard, T., et al.: The ensembl genome database project. Nucleic Acids Res. 30(1), 38–41 (2002)
Huber, W., Von Heydebreck, A., Sültmann, H., Poustka, A., Vingron, M.: Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18(suppl1), S96–S104 (2002)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Jensen, M.A., Ferretti, V., Grossman, R.L., Staudt, L.M.: The NCI genomic data commons as an engine for precision medicine. Blood J. Am. Soc. Hematol. 130(4), 453–459 (2017)
Kamada, T., Kawai, S.: An algorithm for drawing general undirected graphs. Inf. Process. Lett. 31(1), 7–15 (1989). https://doi.org/10.1016/0020-0190(89)90102-6. http://www.sciencedirect.com/science/article/pii/0020019089901026
Legrain, P., et al.: The human proteome project: current state and future direction. Mol. Cell. Proteomics 10(7) (2011)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Genome Biol. 15(12), 550 (2014). https://doi.org/10.1186/s13059-014-0550-8
National Cancer Institute: GDC data portal. https://portal.gdc.cancer.gov/. Accessed 14 June 2020
National Human Genome Research Institute: The cost of sequencing a human genome. https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost. Accessed 14 June 2020
Rappoport, N., Shamir, R.: Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 46(20), 10546–10562 (2018)
Tomczak, K., Czerwinska, P., Wiznerowicz, M.: The cancer genome atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19(1A), A68 (2015)
Wang, B., et al.: Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11(3), 333 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Barbiero, P. et al. (2020). Unsupervised Multi-omic Data Fusion: The Neural Graph Learning Network. In: Huang, DS., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2020. Lecture Notes in Computer Science(), vol 12463. Springer, Cham. https://doi.org/10.1007/978-3-030-60799-9_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-60799-9_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60798-2
Online ISBN: 978-3-030-60799-9
eBook Packages: Computer ScienceComputer Science (R0)