Skip to main content

Explainable Machine Learning Unveils Novel Insights into Breast Cancer Metastases Sites Bio-Markers

  • Conference paper
  • First Online:
ICT Innovations 2023. Learning: Humans, Theory, Machines, and Data (ICT Innovations 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1991))

Included in the following conference series:

  • 51 Accesses

Abstract

Tumor metastasis is the major cause of cancer fatality. Taking this perspective into account, the examination of gene expressions within malignant cells and the alterations in their transcriptome hold significance in the investigation of the molecular mechanisms and cellular phenomena associated with tumor metastasis. Accurately assessing a patient’s cancer condition and predicting their prognosis constitutes the central hurdle in formulating an effective therapeutic schedule for them. In recent years, a variety of machine learning techniques have widely contributed to analyzing empirical gene expression data from actual biological contexts, predicting medical outcomes, and supporting decision-making processes. This paper focuses on extracting important genes linked with each of the most common metastasis sites for breast cancer. Furthermore, the implications of the expression levels of each of the identified sets of bio-markers on the probability of predicting the occurrence of a certain metastasis are illustrated using the Shapley values as a model’s explainability framework - an approach that has never been applied on this problem before, unveils novel insights and directions for future research. The pioneering advancements of this research lie in the application of specific feature selection methods and compatible evaluation metrics to produce a small set of bio-markers for targeting a specific metastasis site, and further performing explanatory analysis of the impact of gene expression values on each of the examined metastasis sites.

M. Trajanoska, V. Mijalcheva and M. Simjanoska—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sung, H., et al.: Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer J. Clin. 71(3), 209–249 (2021)

    Google Scholar 

  2. WHO: World Health Organization: Breast Cancer (2021). https://www.who.int/news-room/fact-sheets/detail/breast-cancer

  3. Sun, Y.-S., et al.: Risk factors and preventions of breast cancer. Int. J. Biol. Sci. 13(11), 1387 (2017)

    Article  Google Scholar 

  4. Peart, O.: Metastatic breast cancer. Radiol. Technol. 88(5), 519–539 (2017)

    Google Scholar 

  5. Salhia, B., et al.: Integrated genomic and epigenomic analysis of breast cancer brain metastasis. PLoS ONE 9(1), 85448 (2014)

    Article  Google Scholar 

  6. Xu, Y., Cui, X., Wang, Y.: Pan-cancer metastasis prediction based on graph deep learning method. Front. Cell Dev. Biol. 9, 1133 (2021)

    Google Scholar 

  7. Chaurasia, V., Pal, S., Tiwari, B.: Prediction of benign and malignant breast cancer using data mining techniques. J. Algorithms Comput. Technol. 12(2), 119–126 (2018)

    Article  Google Scholar 

  8. Landemaine, T., et al.: A six-gene signature predicting breast cancer lung metastasis. Cancer Res. 68(15), 6092–6099 (2008)

    Article  Google Scholar 

  9. Hwang, S., et al.: Humannet v2: human gene networks for disease research. Nucleic Acids Res. 47(D1), 573–580 (2019)

    Article  Google Scholar 

  10. Net, H.: Human Net tool (2021). http://www.inetbio.org/humannet

  11. Kursa, M.B., Rudnicki, W.R., et al.: Feature selection with the boruta package. J. Stat. Softw. 36(11), 1–13 (2010)

    Article  Google Scholar 

  12. Winter, E.: The shapley value. Handb. Game Theory Econ. Appl. 3, 2025–2054 (2002)

    Google Scholar 

  13. Zheng, G., Ma, Y., Zou, Y., Yin, A., Li, W., Dong, D.: HCMDB: the human cancer metastasis database. Nucleic Acids Res. 46(D1), 950–955 (2018)

    Article  Google Scholar 

  14. NCBI: National Center for Biotechnology (2021). https://www.ncbi.nlm.nih.gov

  15. TCGA: The Cancer Genome Atlas (2021). https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga

  16. Gohlmann, H., Talloen, W.: Gene Expression Studies Using Affymetrix Microarrays. CRC Press, Boca Raton (2009)

    Book  Google Scholar 

  17. Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Bayesian posterior probability classification of colorectal cancer probed with affymetrix microarray technology. In: 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 959–964 (2013). IEEE

    Google Scholar 

  18. Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Recognition of colorectal carcinogenic tissue with gene expression analysis using Bayesian probability. In: Markovski, S., Gusev, M. (eds.) ICT Innovations 2012. AISC, vol. 207, pp. 305–314. Springer, Cham (2012). https://doi.org/10.1007/978-3-642-37169-1_30

    Chapter  Google Scholar 

  19. Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Bayesian multiclass classification of gene expression colorectal cancer stages. In: Trajkovik, V., Anastas, M. (eds.) ICT Innovations, 2013. AISC, vol. 231, pp. 177–186. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01466-1_17

    Chapter  Google Scholar 

  20. Millenaar, F.F., Okyere, J., May, S.T., Zanten, M., Voesenek, L.A., Peeters, A.J.: How to decide different methods of calculating gene expression from short oligonucleotide array data will give different results. BMC Bioinform. 7(1), 1–16 (2006)

    Article  Google Scholar 

  21. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al.: Xgboost: extreme gradient boosting. R package version 0.4-2 1(4), 1–4 (2015)

    Google Scholar 

  22. Nowak, A.S., Radzik, T.: The shapley value for n-person games in generalized characteristic function form. Games Econom. Behav. 6(1), 150–161 (1994)

    Article  MathSciNet  Google Scholar 

  23. Roth, A.E.: The Shapley value: essays in honor of Lloyd S. Cambridge University Press, Cambridge (1988)

    Book  Google Scholar 

  24. Trajanoska, M., Mijalcheva, V., Simjanoska, M.: Affymetrix probes to gene names mapping. https://github.com/MilenaTrajanoska/explainable-ml-breast-cancer-metastases-bio-markers/blob/main/Supporting%20Information/A3.%20Affymetrix_probes_to_gene_names_mapping.pdf

  25. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)

    Article  Google Scholar 

  26. Yen, S.-J., Lee, Y.-S.: Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In: Huang, D.S., Li, K., Irwin, G.W. (eds.) Intelligent Control and Automation. LNCIS, vol. 344, pp. 731–740. Springer, Cham (2006). https://doi.org/10.1007/978-3-540-37256-1_89

    Chapter  Google Scholar 

  27. Browne, M.W.: Cross-validation methods. J. Math. Psychol. 44(1), 108–132 (2000)

    Article  MathSciNet  Google Scholar 

  28. Webb, G.I., Sammut, C., Perlich, C., et al.: Lazy Learning. Encyclopedia of Ma-chine Learning. springer us (2011)

    Google Scholar 

  29. Trajanoska, M., Mijalcheva, V., Simjanoska, M.: Mapping metastasis bio-markers to gene names

    Google Scholar 

  30. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using David bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2009)

    Article  Google Scholar 

  31. Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1), 1–13 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Milena Trajanoska , Viktorija Mijalcheva or Monika Simjanoska .

Editor information

Editors and Affiliations

Appendix A Supporting Information

Appendix A Supporting Information

Fig. 4.
figure 4

Bio-markers importance plot for each target metastasis. Elevated expression levels of probes within the dataset are denoted by red data points, while low expression levels of these same probes are signified by blue data points. The X-axis values indicate the magnitude and orientation of influence that each gene expression value wields over the prognostication of the target metastasis. Positive values, situated to the right of 0, correspond to an increase on the likelihood of predicting the occurrence of the target metastasis in patients. Conversely, negative values positioned to the left of 0 indicate a diminishing effect on the predictive probability of the target metastasis. (Color figure online)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trajanoska, M., Mijalcheva, V., Simjanoska, M. (2024). Explainable Machine Learning Unveils Novel Insights into Breast Cancer Metastases Sites Bio-Markers. In: Mihova, M., Jovanov, M. (eds) ICT Innovations 2023. Learning: Humans, Theory, Machines, and Data. ICT Innovations 2023. Communications in Computer and Information Science, vol 1991. Springer, Cham. https://doi.org/10.1007/978-3-031-54321-0_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-54321-0_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-54320-3

  • Online ISBN: 978-3-031-54321-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics