Abstract
Network information is gaining importance in the generation of predictive models in cancer genomics, with the premise that prior biological knowledge offers the models interpretability and reproducibility, an invaluable contribution in precision medicine. This work evaluates the usefulness of accounting for gene network information provided by the data correlation structure and external STRING information in the classification of Breast Invasive Carcinoma (BRCA) RNA-Seq data from The Cancer Genome Atlas (TCGA) into tumor and normal tissue, by sparse logistic regression (SLR). Within the correlation-based approaches, two directions were investigated: first, imposing smaller penalties on hub genes, i.e., highly connected genes in the network (hubSLR); second, favouring the selection of orphan or weakly correlated genes (orphanSLR). Without loss of predictive ability, a considerable overlap between the genes selected by the methods was achieved, with fewer genes exclusively selected by each method. Besides a consensus list of genes, the complementarity offered by sets of genes exclusively selected by each model based on different network information shall be regarded as a means to enhance biological interpretability, drawing attention to genes with a known role in the network, either hubs, orphans or highly connected genes in protein-protein interaction networks. This represents a major advantage over non network-based methods, enabling the disclosure of the relevance of known gene subnetworks in the disease under study, while boosting biomarker discovery and precision medicine.
Partially funded by H2020 (No. 633974) and the Portuguese Foundation for Science & Technology (UID/EEA/50008/2019, UID/CEC/50021/2019, UID/EMS/50022/2019, PTDC/CCI-CIF/29877/2017, PTDC/EMS-SIS/0642/2014, and SFRH/BD/97415/2013).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ali, M., Sjöblom, T.: Molecular pathways in tumor progression: from discovery to functional understanding. Mol. BioSyst. 5(9), 902–8 (2009)
Boyle, E., Li, Y., Pritchard, J.: An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017)
Carrasquinha, E., Veríssimo, A., Lopes, M., Vinga, S.: Variable selection and outlier detection in regularized survival models: application to melanoma gene expression data. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds.) Machine Learning, Optimization, and Data Science, vol. 11331, pp. 431–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-13709-0_36
Chuang, H., Lee, E., Liu, Y., Lee, D., Ideker, T.: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2017)
Durinck, S., et al.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Fröhlich, H.: Network based consensus gene signatures for biomarker discovery in breast cancer. PLoS One 6(10), e25364 (2011)
Ito-Kureha, T., et al.: Tropomodulin 1 expression driven by NF-kB enhances breast cancer growth. Cancer Res. 75(1), 62–72 (2015)
Kela, I., Ein-Dor, L., Getz, G., Givol, D., Domany, E.: Outcome signature genes in breast cancer: is there a unique set? Breast Cancer Res. 7(Suppl2), 4.38 (2005)
Mering, C., et al.: String: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acid Res. 33(Suppl 1), D433–437 (2005)
Nan, H., Dorgan, J., Rebbeck, T.R.: Genetic variants in hypothalamic-pituitary-adrenal axis genes and breast cancer risk in caucasians and African Americans. Int. J. Mol. Epidemiol. Genet. 675(1), 33–40 (2015)
Ozturk, K., Dow, M., Carlin, D., Bejar, R., Carter, H.: The emerging potential for network analysis to inform precision cancer medicine. J. Mol. Biol. (2018). https://doi.org/10.1016/j.jmb.2018.06.016
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017). https://www.R-project.org/
Durinck, S., Spellman, P.T., Birney, E., Huber, W.: Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat. Protoc. 4, 1184–1191 (2009)
Szklarczyk, D., et al.: String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acid Res. 43, D447–D452 (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67(1), 91–108 (2005)
Veríssimo, A., Carrasquinha, E., Lopes, M., Oliveira, A., Sagot, M.F., Vinga, S.: Sparse network-based regularization for the analysis of patientomics high-dimensional survival data. bioRxiv (2018). https://doi.org/10.1101/403402
Veríssimo, A., Oliveira, A., Sagot, M., Vinga, S.: Degreecox - a network-based regularization method for survival analysis. BMC Bioinformatics 17(Suppl16), 449 (2016)
Veríssimo, A., Vinga, S., Carrasquinha, E., Lopes, M.: glmSparseNet - network centrality metrics for elastic-net regularized models. Bioconductor R package version 0.99.31 (2018). https://bioconductor.org/packages/3.8/bioc/html/glmSparseNet.html
Winslow, S., Lindquist, K., Edsjö, A., Larsson, C.: The expression pattern of matrix-producing tumor stroma is of prognostic importance in breast cancer. BMC Cancer 16, 841 (2016)
Wu, M., Zhang, X., Dai, D., Ou-Yang, L., Zhu, Y., Yan, H.: Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer. BMC Bioinformatics 17, 108 (2016)
Zhang, Q.: A powerful nonparametric method for detecting differentially co-expressed genes: distance correlation screening and edge-count test. BMC Syst. Biol. 12, 58 (2018)
Zhang, W., Ota, T., Shridhar, V., Chien, J., Wu, B., Kuang, R.: Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput. Biol. 9(3), e1002975 (2013)
Zhang, W., Wan, Y., Allen, G., Pang, K., Anderson, M.: Molecular pathway identification using biological network-regularized logistic models. BMC Genomics 14(Suppl 8), 57 (2013)
Zhang, W., Zeng, T., Chen, L.: Edgemarker: identifying differentially correlated molecule pairs as edge-biomarkers. J. Theorethical Biol. 362, 35–43 (2014)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)
Zwiener, I., Frisch, B., Binder, H.: Transforming RNA-Seq data to improve the performance of prognostic gene signatures. PLoS One 9(1), e85150 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lopes, M.B., Veríssimo, A., Carrasquinha, E., Vinga, S. (2019). On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-37599-7_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)