Abstract
Tumor heterogeneity is one of the challenges to study malignant tumors. In general, tumors are driven by combinations of mutated genes that vary greatly from patient to patient. Constructing cancer gene networks using omics data is of great significance for understanding the underlying mechanisms of cancer. In this work, we present a method for network embedding based on gene networks constructed by random forest (RFNE). Gene networks are constructed using random forests (RF) by integrating omics data from heterogeneous samples of different cancers. We assume that if genes are located on leaf nodes of the same tree, then their proximity is plus one. Summing all leaf nodes and dividing by the total number of trees in the forest yields the pairwise closeness between genes. A multi-layer weighted graph is then constructed and random roaming is used to capture distant but structurally similar gene pairs in the network. Finally, we use a network embedding approach to code genes and integrate mutation data from 5290 samples of 14 cancer types to construct patient features. We then used the lightGBM classification algorithm to make predictions for patients. Experimental results show that compared with other methods, our method has better performance than other three methods. We can separate patients with specific cancer types into several subtypes by applying an unsupervised clustering method to learn patient features. Among 14 cancer types. In most cancer types, patient subgroups revealed by RFNE are significantly linked with patient survival.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Code Availability:
The code of RFNE is available on GitHub: github.com/Feng Li12/RFNE.
References
Hanahan, D., Weinberg, R.A.: Hallmarks of cancer: the next generation. Cell 144(5), 646–674 (2011)
Dobin, A., et al.: STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1), 15–21 (2013)
Anders, S., Pyl, P.T., Huber, W.: HTSeq—a Python framework to work with high-throughput sequencing data. Bioinformatics 31(2), 166–169 (2015)
Weinstein, J.N., et al.: The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45(10), 1113–1120 (2013)
Zhao, L., Lee, V.H., Ng, M.K., Yan, H., Bijlsma, M.F.: Molecular subtyping of cancer: current status and moving toward clinical applications. Brief. Bioinform. 20(2), 572–584 (2019)
Cheng, F., Jia, P., Wang, Q., Lin, C.-C., Li, W.-H., Zhao, Z.: Studying tumorigenesis through network evolution and somatic mutational perturbations in the cancer interactome. Mol. Biol. Evol. 31(8), 2156–2169 (2014)
Liu, H., Zhao, R., Fang, H., Cheng, F., Fu, Y., Liu, Y.-Y.: Entropy-based consensus clustering for patient stratification. Bioinformatics 33(17), 2691–2698 (2017)
Network, C.G.A.R.: Integrated genomic analyses of ovarian carcinoma. Nature 474(7353), 609 (2011)
Levine, D.A.: Integrated genomic characterization of endometrial carcinoma. Nature 497(7447), 67–73 (2013)
Esteva, F.J., et al.: Prognostic role of a multigene reverse transcriptase-PCR assay in patients with node-negative breast cancer not receiving adjuvant systemic therapy. Clin. Cancer Res. 11(9), 3315–3319 (2005)
Hoadley, K.A., et al.: Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin. Cell 158(4), 929–944 (2014)
Koboldt, D., et al.: Comprehensive molecular portraits of human breast tumours. Nature 490(7418), 61–70 (2012)
Cheng, F., et al.: A gene gravity model for the evolution of cancer genomes: a study of 3,000 cancer genomes across 9 cancer types. PLoS Comput. Biol. 11(9), e1004497 (2015)
Hofree, M., Shen, J.P., Carter, H., Gross, A., Ideker, T.: Network-based stratification of tumor mutations. Nat Methods 10(11), 1108–1115 (2013)
Liu, C., Han, Z., Zhang, Z.-K., Nussinov, R., Cheng, F.: A network-based deep learning methodology for stratification of tumor mutations. Bioinformatics 37(1), 82–88 (2021)
Liu, C., et al.: Computational network biology: data, models, and applications. Phys. Rep. 846, 1–66 (2020)
Peng, J., Guan, J., Shang, X.: Predicting Parkinson’s disease genes based on node2vec and autoencoder. Front. Genet. 10, 226 (2019)
Zeng, X., et al.: Target identification among known drugs by deep learning from heterogeneous networks. Chem. Sci. 11(7), 1775–1797 (2020)
Zong, N., Kim, H., Ngo, V., Harismendy, O.: Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations. Bioinformatics 33(15), 2337–2344 (2017)
Wang, B., et al.: Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11(3), 333–337 (2014)
Lee, J.-H., et al.: Integrative analysis of mutational and transcriptional profiles reveals driver mutations of metastatic breast cancers. Cell Discov. 2(1), 1–14 (2016)
Breiman, L.: Random forests. Mach Learn 45(1), 5–32 (2001)
Chen, X., Liu, X.: A weighted bagging LightGBM model for potential lncRNA-disease association identification. In: Qiao, J., Zhao, X., Pan, L., Zuo, X., Zhang, X., Zhang, Q., Huang, S. (eds.) BIC-TA 2018. CCIS, vol. 951, pp. 307–314. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2826-8_27
Dassun, J.C., Reyes, A., Yokoyama, H., Dolendo, M.: Ordering points to identify the clustering structure algorithm in fingerprint-based age classification. Virtutis Incunabula 2(1), 17–27 (2015)
Tomczak, K., Czerwińska, P., Wiznerowicz, M.: The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge. Contemp. Oncol. 19(1A), A68 (2015)
Zhu, Y., Qiu, P., Ji, Y.: TCGA-assembler: open-source software for retrieving and processing TCGA data. Nat. Methods 11(6), 599–600 (2014)
Freund Y, Mason L: The alternating decision tree learning algorithm. In: icml: 1999. Citeseer: 124–133
Altman, N.S.: An introduction to kernel and nearest-neighbor nonparametric regression. Am. Stat. 46(3), 175–185 (1992)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: Learning node representations from structural identity. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 385–394 (2017)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD Workshop, Seattle, WA, USA, pp. 359–370 (1994)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781 (2013)
Chen, T., et al.: Xgboost: extreme gradient boosting. R package version 04–2 1(4), 1–4 (2015)
Rao, H., et al.: Feature selection based on artificial bee colony and gradient boosting decision tree. Appl. Soft Comput. 74, 634–642 (2019)
Yang, S., Berdine, G.: The receiver operating characteristic (ROC) curve. Southwest Respiratory Critical Care Chronicles 5(19), 34–36 (2017)
He, X., Deng, K., Wang, X., Li, Y., Zhang, Y., Wang, M.: Lightgcn: Simplifying and powering graph convolution network for recommendation. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 639–648 (2020)
Funding
This work has been supported by the National Natural Science Foundation of China (61902216, 61972236 and 61972226), and Natural Science Foundation of Shandong Province (No. ZR2018MF013).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sun, Z. et al. (2022). Construction of Gene Network Based on Inter-tumor Heterogeneity for Tumor Type Identification. In: Huang, DS., Jo, KH., Jing, J., Premaratne, P., Bevilacqua, V., Hussain, A. (eds) Intelligent Computing Theories and Application. ICIC 2022. Lecture Notes in Computer Science, vol 13394. Springer, Cham. https://doi.org/10.1007/978-3-031-13829-4_29
Download citation
DOI: https://doi.org/10.1007/978-3-031-13829-4_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-13828-7
Online ISBN: 978-3-031-13829-4
eBook Packages: Computer ScienceComputer Science (R0)