On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma

Lopes, Marta B.; Veríssimo, André; Carrasquinha, Eunice; Vinga, Susana

doi:10.1007/978-3-030-37599-7_52

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

1718 Accesses

Abstract

Network information is gaining importance in the generation of predictive models in cancer genomics, with the premise that prior biological knowledge offers the models interpretability and reproducibility, an invaluable contribution in precision medicine. This work evaluates the usefulness of accounting for gene network information provided by the data correlation structure and external STRING information in the classification of Breast Invasive Carcinoma (BRCA) RNA-Seq data from The Cancer Genome Atlas (TCGA) into tumor and normal tissue, by sparse logistic regression (SLR). Within the correlation-based approaches, two directions were investigated: first, imposing smaller penalties on hub genes, i.e., highly connected genes in the network (hubSLR); second, favouring the selection of orphan or weakly correlated genes (orphanSLR). Without loss of predictive ability, a considerable overlap between the genes selected by the methods was achieved, with fewer genes exclusively selected by each method. Besides a consensus list of genes, the complementarity offered by sets of genes exclusively selected by each model based on different network information shall be regarded as a means to enhance biological interpretability, drawing attention to genes with a known role in the network, either hubs, orphans or highly connected genes in protein-protein interaction networks. This represents a major advantage over non network-based methods, enabling the disclosure of the relevance of known gene subnetworks in the disease under study, while boosting biomarker discovery and precision medicine.

Partially funded by H2020 (No. 633974) and the Portuguese Foundation for Science & Technology (UID/EEA/50008/2019, UID/CEC/50021/2019, UID/EMS/50022/2019, PTDC/CCI-CIF/29877/2017, PTDC/EMS-SIS/0642/2014, and SFRH/BD/97415/2013).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ali, M., Sjöblom, T.: Molecular pathways in tumor progression: from discovery to functional understanding. Mol. BioSyst. 5(9), 902–8 (2009)
Article Google Scholar
Boyle, E., Li, Y., Pritchard, J.: An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017)
Article Google Scholar
Carrasquinha, E., Veríssimo, A., Lopes, M., Vinga, S.: Variable selection and outlier detection in regularized survival models: application to melanoma gene expression data. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds.) Machine Learning, Optimization, and Data Science, vol. 11331, pp. 431–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-13709-0_36
Chapter Google Scholar
Chuang, H., Lee, E., Liu, Y., Lee, D., Ideker, T.: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2017)
Article Google Scholar
Durinck, S., et al.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Article Google Scholar
Fröhlich, H.: Network based consensus gene signatures for biomarker discovery in breast cancer. PLoS One 6(10), e25364 (2011)
Article Google Scholar
Ito-Kureha, T., et al.: Tropomodulin 1 expression driven by NF-kB enhances breast cancer growth. Cancer Res. 75(1), 62–72 (2015)
Article Google Scholar
Kela, I., Ein-Dor, L., Getz, G., Givol, D., Domany, E.: Outcome signature genes in breast cancer: is there a unique set? Breast Cancer Res. 7(Suppl2), 4.38 (2005)
Article Google Scholar
Mering, C., et al.: String: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acid Res. 33(Suppl 1), D433–437 (2005)
Google Scholar
Nan, H., Dorgan, J., Rebbeck, T.R.: Genetic variants in hypothalamic-pituitary-adrenal axis genes and breast cancer risk in caucasians and African Americans. Int. J. Mol. Epidemiol. Genet. 675(1), 33–40 (2015)
Google Scholar
Ozturk, K., Dow, M., Carlin, D., Bejar, R., Carter, H.: The emerging potential for network analysis to inform precision cancer medicine. J. Mol. Biol. (2018). https://doi.org/10.1016/j.jmb.2018.06.016
Article Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017). https://www.R-project.org/
Durinck, S., Spellman, P.T., Birney, E., Huber, W.: Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat. Protoc. 4, 1184–1191 (2009)
Article Google Scholar
Szklarczyk, D., et al.: String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acid Res. 43, D447–D452 (2015)
Article Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67(1), 91–108 (2005)
Article MathSciNet Google Scholar
Veríssimo, A., Carrasquinha, E., Lopes, M., Oliveira, A., Sagot, M.F., Vinga, S.: Sparse network-based regularization for the analysis of patientomics high-dimensional survival data. bioRxiv (2018). https://doi.org/10.1101/403402
Veríssimo, A., Oliveira, A., Sagot, M., Vinga, S.: Degreecox - a network-based regularization method for survival analysis. BMC Bioinformatics 17(Suppl16), 449 (2016)
Article Google Scholar
Veríssimo, A., Vinga, S., Carrasquinha, E., Lopes, M.: glmSparseNet - network centrality metrics for elastic-net regularized models. Bioconductor R package version 0.99.31 (2018). https://bioconductor.org/packages/3.8/bioc/html/glmSparseNet.html
Winslow, S., Lindquist, K., Edsjö, A., Larsson, C.: The expression pattern of matrix-producing tumor stroma is of prognostic importance in breast cancer. BMC Cancer 16, 841 (2016)
Article Google Scholar
Wu, M., Zhang, X., Dai, D., Ou-Yang, L., Zhu, Y., Yan, H.: Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer. BMC Bioinformatics 17, 108 (2016)
Article Google Scholar
Zhang, Q.: A powerful nonparametric method for detecting differentially co-expressed genes: distance correlation screening and edge-count test. BMC Syst. Biol. 12, 58 (2018)
Article Google Scholar
Zhang, W., Ota, T., Shridhar, V., Chien, J., Wu, B., Kuang, R.: Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput. Biol. 9(3), e1002975 (2013)
Article Google Scholar
Zhang, W., Wan, Y., Allen, G., Pang, K., Anderson, M.: Molecular pathway identification using biological network-regularized logistic models. BMC Genomics 14(Suppl 8), 57 (2013)
Google Scholar
Zhang, W., Zeng, T., Chen, L.: Edgemarker: identifying differentially correlated molecule pairs as edge-biomarkers. J. Theorethical Biol. 362, 35–43 (2014)
Article Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)
Article MathSciNet Google Scholar
Zwiener, I., Frisch, B., Binder, H.: Transforming RNA-Seq data to improve the performance of prognostic gene signatures. PLoS One 9(1), e85150 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Telecomunicações, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
Marta B. Lopes
INESC-ID, Instituto de Engenharia de Sistemas e Computadores - Investigação e Desenvolvimento, Rua Alves Redol, 9, 1000-029, Lisbon, Portugal
Marta B. Lopes, André Veríssimo, Eunice Carrasquinha & Susana Vinga
IDMEC, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001, Lisbon, Portugal
André Veríssimo & Susana Vinga

Authors

Marta B. Lopes
View author publications
You can also search for this author in PubMed Google Scholar
André Veríssimo
View author publications
You can also search for this author in PubMed Google Scholar
Eunice Carrasquinha
View author publications
You can also search for this author in PubMed Google Scholar
Susana Vinga
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marta B. Lopes .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Giuseppe Nicosia
University of Florida, Gainesville, FL, USA
Panos Pardalos
Harvard University, Cambridge, MA, USA
Renato Umeton
Università di Catania, Catania, Catania, Italy
Giovanni Giuffrida
Almawave, Rome, Roma, Italy
Vincenzo Sciacca

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lopes, M.B., Veríssimo, A., Carrasquinha, E., Vinga, S. (2019). On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_52

Download citation

DOI: https://doi.org/10.1007/978-3-030-37599-7_52
Published: 03 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37598-0
Online ISBN: 978-3-030-37599-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics