Skip to main content

On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2019)

Abstract

Network information is gaining importance in the generation of predictive models in cancer genomics, with the premise that prior biological knowledge offers the models interpretability and reproducibility, an invaluable contribution in precision medicine. This work evaluates the usefulness of accounting for gene network information provided by the data correlation structure and external STRING information in the classification of Breast Invasive Carcinoma (BRCA) RNA-Seq data from The Cancer Genome Atlas (TCGA) into tumor and normal tissue, by sparse logistic regression (SLR). Within the correlation-based approaches, two directions were investigated: first, imposing smaller penalties on hub genes, i.e., highly connected genes in the network (hubSLR); second, favouring the selection of orphan or weakly correlated genes (orphanSLR). Without loss of predictive ability, a considerable overlap between the genes selected by the methods was achieved, with fewer genes exclusively selected by each method. Besides a consensus list of genes, the complementarity offered by sets of genes exclusively selected by each model based on different network information shall be regarded as a means to enhance biological interpretability, drawing attention to genes with a known role in the network, either hubs, orphans or highly connected genes in protein-protein interaction networks. This represents a major advantage over non network-based methods, enabling the disclosure of the relevance of known gene subnetworks in the disease under study, while boosting biomarker discovery and precision medicine.

Partially funded by H2020 (No. 633974) and the Portuguese Foundation for Science & Technology (UID/EEA/50008/2019, UID/CEC/50021/2019, UID/EMS/50022/2019, PTDC/CCI-CIF/29877/2017, PTDC/EMS-SIS/0642/2014, and SFRH/BD/97415/2013).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ali, M., Sjöblom, T.: Molecular pathways in tumor progression: from discovery to functional understanding. Mol. BioSyst. 5(9), 902–8 (2009)

    Article  Google Scholar 

  2. Boyle, E., Li, Y., Pritchard, J.: An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017)

    Article  Google Scholar 

  3. Carrasquinha, E., Veríssimo, A., Lopes, M., Vinga, S.: Variable selection and outlier detection in regularized survival models: application to melanoma gene expression data. In: Nicosia, G., Pardalos, P., Giuffrida, G., Umeton, R., Sciacca, V. (eds.) Machine Learning, Optimization, and Data Science, vol. 11331, pp. 431–440. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-13709-0_36

    Chapter  Google Scholar 

  4. Chuang, H., Lee, E., Liu, Y., Lee, D., Ideker, T.: Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140 (2017)

    Article  Google Scholar 

  5. Durinck, S., et al.: Biomart and bioconductor: a powerful link between biological databases and microarray data analysis. Bioinformatics 21, 3439–3440 (2005)

    Article  Google Scholar 

  6. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Article  Google Scholar 

  7. Fröhlich, H.: Network based consensus gene signatures for biomarker discovery in breast cancer. PLoS One 6(10), e25364 (2011)

    Article  Google Scholar 

  8. Ito-Kureha, T., et al.: Tropomodulin 1 expression driven by NF-kB enhances breast cancer growth. Cancer Res. 75(1), 62–72 (2015)

    Article  Google Scholar 

  9. Kela, I., Ein-Dor, L., Getz, G., Givol, D., Domany, E.: Outcome signature genes in breast cancer: is there a unique set? Breast Cancer Res. 7(Suppl2), 4.38 (2005)

    Article  Google Scholar 

  10. Mering, C., et al.: String: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acid Res. 33(Suppl 1), D433–437 (2005)

    Google Scholar 

  11. Nan, H., Dorgan, J., Rebbeck, T.R.: Genetic variants in hypothalamic-pituitary-adrenal axis genes and breast cancer risk in caucasians and African Americans. Int. J. Mol. Epidemiol. Genet. 675(1), 33–40 (2015)

    Google Scholar 

  12. Ozturk, K., Dow, M., Carlin, D., Bejar, R., Carter, H.: The emerging potential for network analysis to inform precision cancer medicine. J. Mol. Biol. (2018). https://doi.org/10.1016/j.jmb.2018.06.016

    Article  Google Scholar 

  13. R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2017). https://www.R-project.org/

  14. Durinck, S., Spellman, P.T., Birney, E., Huber, W.: Mapping identifiers for the integration of genomic datasets with the r/bioconductor package biomart. Nat. Protoc. 4, 1184–1191 (2009)

    Article  Google Scholar 

  15. Szklarczyk, D., et al.: String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acid Res. 43, D447–D452 (2015)

    Article  Google Scholar 

  16. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. B 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  17. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. J. Roy. Stat. Soc. B 67(1), 91–108 (2005)

    Article  MathSciNet  Google Scholar 

  18. Veríssimo, A., Carrasquinha, E., Lopes, M., Oliveira, A., Sagot, M.F., Vinga, S.: Sparse network-based regularization for the analysis of patientomics high-dimensional survival data. bioRxiv (2018). https://doi.org/10.1101/403402

  19. Veríssimo, A., Oliveira, A., Sagot, M., Vinga, S.: Degreecox - a network-based regularization method for survival analysis. BMC Bioinformatics 17(Suppl16), 449 (2016)

    Article  Google Scholar 

  20. Veríssimo, A., Vinga, S., Carrasquinha, E., Lopes, M.: glmSparseNet - network centrality metrics for elastic-net regularized models. Bioconductor R package version 0.99.31 (2018). https://bioconductor.org/packages/3.8/bioc/html/glmSparseNet.html

  21. Winslow, S., Lindquist, K., Edsjö, A., Larsson, C.: The expression pattern of matrix-producing tumor stroma is of prognostic importance in breast cancer. BMC Cancer 16, 841 (2016)

    Article  Google Scholar 

  22. Wu, M., Zhang, X., Dai, D., Ou-Yang, L., Zhu, Y., Yan, H.: Regularized logistic regression with network-based pairwise interaction for biomarker identification in breast cancer. BMC Bioinformatics 17, 108 (2016)

    Article  Google Scholar 

  23. Zhang, Q.: A powerful nonparametric method for detecting differentially co-expressed genes: distance correlation screening and edge-count test. BMC Syst. Biol. 12, 58 (2018)

    Article  Google Scholar 

  24. Zhang, W., Ota, T., Shridhar, V., Chien, J., Wu, B., Kuang, R.: Network-based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput. Biol. 9(3), e1002975 (2013)

    Article  Google Scholar 

  25. Zhang, W., Wan, Y., Allen, G., Pang, K., Anderson, M.: Molecular pathway identification using biological network-regularized logistic models. BMC Genomics 14(Suppl 8), 57 (2013)

    Google Scholar 

  26. Zhang, W., Zeng, T., Chen, L.: Edgemarker: identifying differentially correlated molecule pairs as edge-biomarkers. J. Theorethical Biol. 362, 35–43 (2014)

    Article  Google Scholar 

  27. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. B 67(2), 301–320 (2005)

    Article  MathSciNet  Google Scholar 

  28. Zwiener, I., Frisch, B., Binder, H.: Transforming RNA-Seq data to improve the performance of prognostic gene signatures. PLoS One 9(1), e85150 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marta B. Lopes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lopes, M.B., Veríssimo, A., Carrasquinha, E., Vinga, S. (2019). On the Role of Hub and Orphan Genes in the Diagnosis of Breast Invasive Carcinoma. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_52

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37599-7_52

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37598-0

  • Online ISBN: 978-3-030-37599-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics