Skip to main content
Log in

Supervised graph contrastive learning for cancer subtype identification through multi-omics data integration

  • Research
  • Published:
Health Information Science and Systems Aims and scope Submit manuscript

Abstract

Cancer is one of the most deadly diseases in the world. Accurate cancer subtype classification is critical for patient diagnosis, treatment, and prognosis. Ever-increasing multi-omics data describes the characteristics of the patients from different views and serves as complementary information to promote cancer subtype identification. However, omics data generally have different distributions and high dimensions. How to effectively integrate multiple omics data to classify cancer subtypes accurately is a challenge for researchers. This work proposes a method integrating multi-omics data based on supervised graph contrast learning (MCRGCN) to classify cancer subtypes. The method considers the unique feature distribution of each omics data and the interaction of different omics data features to improve the accuracy of cancer subtype classification. To achieve this, MCRGCN first constructs different sample networks based on the multi-omics data of the samples. Then, it puts the omics data and adjacency matrix of the sample into different residual graph convolution models to get multi-omics features of the samples, which are trained with a supervised comparison loss to maintain that the sample features of each omics should be as consistent as possible. Finally, we input the sample features combining multi-omics features into a classifier to obtain the cancer subtypes. We applied MCRGCN to the invasive breast carcinoma (BRCA) and glioblastoma multiforme (GBM) datasets, integrating gene expression, miRNA expression, and DNA methylation data. The results demonstrate that our model is superior to other methods in integrating multi-omics data. Moreover, the results of survival analysis experiments demonstrate that the cancer subtypes identified by our model have significant clinical features. Furthermore, our model can help to identify potential biomarkers and pathways associated with cancer subtypes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data and code are publicly available via https://github.com/weiba/MCRGCN.

References

  1. Janku F. Tumor heterogeneity in the clinic: is it a real problem? Ther Adv Med Oncol. 2014;6:43–51.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Fisher R, Pusztai L, Swanton C. Cancer heterogeneity: implications for targeted therapeutics. Br J Cancer. 2013;108:479–85.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Peng W, Chen T, Liu H, Dai W, Yu N, Lan W. Improving drug response prediction based on two-space graph convolution. Comput Biol Med. 2023;158:106859.

    Article  PubMed  Google Scholar 

  4. Song J, Peng W, Wang F. An entropy-based method for identifying mutual exclusive driver genes in cancer. IEEE/ACM Trans Comput Biol Bioinform. 2019;17:758–68.

    Article  PubMed  Google Scholar 

  5. Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y. The genomic and transcriptomic architecture of 2000 breast tumours reveals novel subgroups. Nature. 2012;486:346–52.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z. Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009;27:1160.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, Van De Rijn M, Jeffrey SS. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci. 2001;98:10869–74.

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  8. Bedard PL, Hansen AR, Ratain MJ, Siu LL. Tumour heterogeneity in the clinic. Nature. 2013;501:355–64.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  9. Berger MF, Mardis ER. The emerging clinical relevance of genomics in cancer medicine, Nature reviews. Clin Oncol. 2018;15:353–65.

    CAS  Google Scholar 

  10. Kim D, Joung J-G, Sohn K-A, Shin H, Park YR, Ritchie MD, Kim JH. Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc. 2015;22:109–20.

    Article  PubMed  Google Scholar 

  11. Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics. 2009;25:2906–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Tini G, Marchetti L, Priami C, Scott-Boyer M-P. Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform. 2019;20:1269–79.

    Article  CAS  PubMed  Google Scholar 

  13. Song W, Wang W, Dai D-Q. Subtype-WESLR: identifying cancer subtype with weighted ensemble sparse latent representation of multi-view data. Br Bioinform. 2022;23:bbab398.

    Article  Google Scholar 

  14. Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, Zhang J, Salama P, Rizkalla M, Han Z. SALMON: survival analysis learning with multi-omics neural networks on breast cancer. Front Genet. 2019;10:166.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Xu J, Wu P, Chen Y, Meng Q, Dawood H, Dawood H. A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC Bioinform. 2019;20:1–11.

    Article  Google Scholar 

  16. Lin Y, Zhang W, Cao H, Li G, Du W. Classifying breast cancer subtypes using deep neural networks based on multi-omics data. Genes. 2020;11:888.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Yang H, Chen R, Li D, Wang Z. Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data. Bioinformatics. 2021;37:2231–7.

    Article  PubMed  Google Scholar 

  18. Wang X, Yu G, Wang J, Zain AM, Guo W. Lung cancer subtype diagnosis using weakly-paired multi-omics data. Bioinformatics. 2022;38:5092–9.

    Article  CAS  PubMed  Google Scholar 

  19. Peng W, Liu M, Dai W, Chen T, Fu Y, Pan Y. Multi-view feature aggregation for predicting microbe-disease association. IEEE/ACM Trans Comput Biol Bioinf. 2023;20(5):2748–58.

    Google Scholar 

  20. Song J, Peng W, Wang F. Identifying cancer patient subgroups by finding co-modules from the driver mutation profiles and downstream gene expression profiles. IEEE/ACM Trans Comput Biol Bioinf. 2021;19:2863–72.

    Article  Google Scholar 

  21. Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, Huang K. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12:3445.

    Article  ADS  CAS  PubMed  PubMed Central  Google Scholar 

  22. Sun Q, Cheng L, Meng A, Ge S, Chen J, Zhang L, Gong P. SADLN: Self-attention based deep learning network of integrating multi-omics data for cancer subtype recognition. Front Genet. 2022;13:1032768.

    Article  CAS  PubMed  Google Scholar 

  23. Rappoport N, Shamir R. Multi-omic and multi-view clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46:10546–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Dai W, Yue W, Peng W, Fu X, Liu L, Liu L. Identifying cancer subtypes using a residual graph convolution model on a sample similarity network. Genes. 2022;13:65.

    Article  CAS  Google Scholar 

  25. Hamilton W, Ying Z, Leskovec J. Inductive representation learning on large graphs. Adv Neural Inform Process Syst. 2017;30:13332.

    Google Scholar 

  26. Franco EF, Rana P, Cruz A, Calderón VV, Azevedo V, Ramos RT, Ghosh P. Performance comparison of deep learning autoencoders for cancer subtype detection using multi-omics data. Cancers. 2021;13:2013.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Bao J, Chen D, Wen F, Li H, Hua G, CVAE-GAN: fine-grained image generation through asymmetric training, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 2745–2754.

  28. Yu G, Wang L-G, Han Y, He Q-Y. ClusterProfiler: an R package for comparing biological themes among gene clusters. Omics J integr Biol. 2012;16:284–7.

    Article  CAS  Google Scholar 

  29. Guo Y, Lei X, Pan Y. An encoding-decoding framework based on CNN for circRNA-RBP binding sites prediction. Chin J Electron. 2024;33:1–9.

    Google Scholar 

  30. Chen M, Jiang Y, Lei X, Pan Y, Ji C, Jiang W. Drug-target interactions prediction based on signed heterogeneous graph neural networks. Chin J Electron. 2024;33:1–13.

    ADS  Google Scholar 

  31. Peng W, Che Z, Dai W, Wei S, Lan W. Predicting miRNA-disease associations from miRNA-gene-disease heterogeneous network with multi-relational graph convolutional network model. IEEE/ACM Trans Comput Biol Bioinf. 2023;20(6):3363–75.

    Article  Google Scholar 

Download references

Funding

This research is supported by National Natural Science Foundation of China No.61972185, Natural Science Foundation of Yunnan Province of China (2019FA024), and the Yunnan Ten Thousand Talents Plan young.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Peng.

Ethics declarations

Conflict of interest

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Ethical approval

Not applicable.

Informed consent

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, F., Peng, W., Dai, W. et al. Supervised graph contrastive learning for cancer subtype identification through multi-omics data integration. Health Inf Sci Syst 12, 12 (2024). https://doi.org/10.1007/s13755-024-00274-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s13755-024-00274-x

Keywords

Navigation