Abstract
Cancer is one of the most deadly diseases in the world. Accurate cancer subtype classification is critical for patient diagnosis, treatment, and prognosis. Ever-increasing multi-omics data describes the characteristics of the patients from different views and serves as complementary information to promote cancer subtype identification. However, omics data generally have different distributions and high dimensions. How to effectively integrate multiple omics data to classify cancer subtypes accurately is a challenge for researchers. This work proposes a method integrating multi-omics data based on supervised graph contrast learning (MCRGCN) to classify cancer subtypes. The method considers the unique feature distribution of each omics data and the interaction of different omics data features to improve the accuracy of cancer subtype classification. To achieve this, MCRGCN first constructs different sample networks based on the multi-omics data of the samples. Then, it puts the omics data and adjacency matrix of the sample into different residual graph convolution models to get multi-omics features of the samples, which are trained with a supervised comparison loss to maintain that the sample features of each omics should be as consistent as possible. Finally, we input the sample features combining multi-omics features into a classifier to obtain the cancer subtypes. We applied MCRGCN to the invasive breast carcinoma (BRCA) and glioblastoma multiforme (GBM) datasets, integrating gene expression, miRNA expression, and DNA methylation data. The results demonstrate that our model is superior to other methods in integrating multi-omics data. Moreover, the results of survival analysis experiments demonstrate that the cancer subtypes identified by our model have significant clinical features. Furthermore, our model can help to identify potential biomarkers and pathways associated with cancer subtypes.
Similar content being viewed by others
Data availability
Data and code are publicly available via https://github.com/weiba/MCRGCN.
References
Janku F. Tumor heterogeneity in the clinic: is it a real problem? Ther Adv Med Oncol. 2014;6:43–51.
Fisher R, Pusztai L, Swanton C. Cancer heterogeneity: implications for targeted therapeutics. Br J Cancer. 2013;108:479–85.
Peng W, Chen T, Liu H, Dai W, Yu N, Lan W. Improving drug response prediction based on two-space graph convolution. Comput Biol Med. 2023;158:106859.
Song J, Peng W, Wang F. An entropy-based method for identifying mutual exclusive driver genes in cancer. IEEE/ACM Trans Comput Biol Bioinform. 2019;17:758–68.
Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y. The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups. Nature. 2012;486:346–52.
Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160.
Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Van De Rijn M, Jeffrey SS. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci. 2001;98:10869–74.
Bedard PL, Hansen AR, Ratain MJ, Siu LL. Tumour heterogeneity in the clinic. Nature. 2013;501:355–64.
Berger MF, Mardis ER. The emerging clinical relevance of genomics in cancer medicine, Nature reviews. Clin Oncol. 2018;15:353–65.
Kim D, Joung J-G, Sohn K-A, Shin H, Park YR, Ritchie MD, Kim JH. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc. 2015;22:109–20.
Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009;25:2906–12.
Tini G, Marchetti L, Priami C, Scott-Boyer M-P. Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform. 2019;20:1269–79.
Song W, Wang W, Dai D-Q. Subtype-WESLR: identifying cancer subtype with weighted ensemble sparse latent representation of multi-view data. Br Bioinform. 2022;23:bbab398.
Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z. SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Front Genet. 2019;10:166.
Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinform. 2019;20:1–11.
Lin Y, Zhang W, Cao H, Li G, Du W. Classifying breast cancer subtypes using deep neural networks based on multi-omics data. Genes. 2020;11:888.
Yang H, Chen R, Li D, Wang Z. Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data. Bioinformatics. 2021;37:2231–7.
Wang X, Yu G, Wang J, Zain AM, Guo W. Lung cancer subtype diagnosis using weakly-paired multi-omics data. Bioinformatics. 2022;38:5092–9.
Peng W, Liu M, Dai W, Chen T, Fu Y, Pan Y. Multi-view feature aggregation for predicting microbe-disease association. IEEE/ACM Trans Comput Biol Bioinf. 2023;20(5):2748–58.
Song J, Peng W, Wang F. Identifying cancer patient subgroups by finding co-modules from the driver mutation profiles and downstream gene expression profiles. IEEE/ACM Trans Comput Biol Bioinf. 2021;19:2863–72.
Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12:3445.
Sun Q, Cheng L, Meng A, Ge S, Chen J, Zhang L, Gong P. SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. Front Genet. 2022;13:1032768.
Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46:10546–62.
Dai W, Yue W, Peng W, Fu X, Liu L, Liu L. Identifying cancer subtypes using a residual graph convolution model on a sample similarity network. Genes. 2022;13:65.
Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Adv Neural Inform Process Syst. 2017;30:13332.
Franco EF, Rana P, Cruz A, Calderón VV, Azevedo V, Ramos RT, Ghosh P. Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data. Cancers. 2021;13:2013.
Bao J, Chen D, Wen F, Li H, Hua G, CVAE-GAN: fine-grained image generation through asymmetric training, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2745–2754.
Yu G, Wang L-G, Han Y, He Q-Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. Omics J integr Biol. 2012;16:284–7.
Guo Y, Lei X, Pan Y. An encoding-decoding framework based on CNN for circRNA-RBP binding sites prediction. Chin J Electron. 2024;33:1–9.
Chen M, Jiang Y, Lei X, Pan Y, Ji C, Jiang W. Drug-target interactions prediction based on signed heterogeneous graph neural networks. Chin J Electron. 2024;33:1–13.
Peng W, Che Z, Dai W, Wei S, Lan W. Predicting miRNA-disease associations from miRNA-gene-disease heterogeneous network with multi-relational graph convolutional network model. IEEE/ACM Trans Comput Biol Bioinf. 2023;20(6):3363–75.
Funding
This research is supported by National Natural Science Foundation of China No.61972185, Natural Science Foundation of Yunnan Province of China (2019FA024), and the Yunnan Ten Thousand Talents Plan young.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Not applicable.
Informed consent
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, F., Peng, W., Dai, W. et al. Supervised graph contrastive learning for cancer subtype identification through multi-omics data integration. Health Inf Sci Syst 12, 12 (2024). https://doi.org/10.1007/s13755-024-00274-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13755-024-00274-x