Abstract
Tumor metastasis is the major cause of cancer fatality. Taking this perspective into account, the examination of gene expressions within malignant cells and the alterations in their transcriptome hold significance in the investigation of the molecular mechanisms and cellular phenomena associated with tumor metastasis. Accurately assessing a patient’s cancer condition and predicting their prognosis constitutes the central hurdle in formulating an effective therapeutic schedule for them. In recent years, a variety of machine learning techniques have widely contributed to analyzing empirical gene expression data from actual biological contexts, predicting medical outcomes, and supporting decision-making processes. This paper focuses on extracting important genes linked with each of the most common metastasis sites for breast cancer. Furthermore, the implications of the expression levels of each of the identified sets of bio-markers on the probability of predicting the occurrence of a certain metastasis are illustrated using the Shapley values as a model’s explainability framework - an approach that has never been applied on this problem before, unveils novel insights and directions for future research. The pioneering advancements of this research lie in the application of specific feature selection methods and compatible evaluation metrics to produce a small set of bio-markers for targeting a specific metastasis site, and further performing explanatory analysis of the impact of gene expression values on each of the examined metastasis sites.
M. Trajanoska, V. Mijalcheva and M. Simjanoska—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sung, H., et al.: Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: A Cancer J. Clin. 71(3), 209–249 (2021)
WHO: World Health Organization: Breast Cancer (2021). https://www.who.int/news-room/fact-sheets/detail/breast-cancer
Sun, Y.-S., et al.: Risk factors and preventions of breast cancer. Int. J. Biol. Sci. 13(11), 1387 (2017)
Peart, O.: Metastatic breast cancer. Radiol. Technol. 88(5), 519–539 (2017)
Salhia, B., et al.: Integrated genomic and epigenomic analysis of breast cancer brain metastasis. PLoS ONE 9(1), 85448 (2014)
Xu, Y., Cui, X., Wang, Y.: Pan-cancer metastasis prediction based on graph deep learning method. Front. Cell Dev. Biol. 9, 1133 (2021)
Chaurasia, V., Pal, S., Tiwari, B.: Prediction of benign and malignant breast cancer using data mining techniques. J. Algorithms Comput. Technol. 12(2), 119–126 (2018)
Landemaine, T., et al.: A six-gene signature predicting breast cancer lung metastasis. Cancer Res. 68(15), 6092–6099 (2008)
Hwang, S., et al.: Humannet v2: human gene networks for disease research. Nucleic Acids Res. 47(D1), 573–580 (2019)
Net, H.: Human Net tool (2021). http://www.inetbio.org/humannet
Kursa, M.B., Rudnicki, W.R., et al.: Feature selection with the boruta package. J. Stat. Softw. 36(11), 1–13 (2010)
Winter, E.: The shapley value. Handb. Game Theory Econ. Appl. 3, 2025–2054 (2002)
Zheng, G., Ma, Y., Zou, Y., Yin, A., Li, W., Dong, D.: HCMDB: the human cancer metastasis database. Nucleic Acids Res. 46(D1), 950–955 (2018)
NCBI: National Center for Biotechnology (2021). https://www.ncbi.nlm.nih.gov
TCGA: The Cancer Genome Atlas (2021). https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
Gohlmann, H., Talloen, W.: Gene Expression Studies Using Affymetrix Microarrays. CRC Press, Boca Raton (2009)
Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Bayesian posterior probability classification of colorectal cancer probed with affymetrix microarray technology. In: 2013 36th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), pp. 959–964 (2013). IEEE
Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Recognition of colorectal carcinogenic tissue with gene expression analysis using Bayesian probability. In: Markovski, S., Gusev, M. (eds.) ICT Innovations 2012. AISC, vol. 207, pp. 305–314. Springer, Cham (2012). https://doi.org/10.1007/978-3-642-37169-1_30
Simjanoska, M., Bogdanova, A.M., Popeska, Z.: Bayesian multiclass classification of gene expression colorectal cancer stages. In: Trajkovik, V., Anastas, M. (eds.) ICT Innovations, 2013. AISC, vol. 231, pp. 177–186. Springer, Cham (2013). https://doi.org/10.1007/978-3-319-01466-1_17
Millenaar, F.F., Okyere, J., May, S.T., Zanten, M., Voesenek, L.A., Peeters, A.J.: How to decide different methods of calculating gene expression from short oligonucleotide array data will give different results. BMC Bioinform. 7(1), 1–16 (2006)
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., et al.: Xgboost: extreme gradient boosting. R package version 0.4-2 1(4), 1–4 (2015)
Nowak, A.S., Radzik, T.: The shapley value for n-person games in generalized characteristic function form. Games Econom. Behav. 6(1), 150–161 (1994)
Roth, A.E.: The Shapley value: essays in honor of Lloyd S. Cambridge University Press, Cambridge (1988)
Trajanoska, M., Mijalcheva, V., Simjanoska, M.: Affymetrix probes to gene names mapping. https://github.com/MilenaTrajanoska/explainable-ml-breast-cancer-metastases-bio-markers/blob/main/Supporting%20Information/A3.%20Affymetrix_probes_to_gene_names_mapping.pdf
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Yen, S.-J., Lee, Y.-S.: Under-sampling approaches for improving prediction of the minority class in an imbalanced dataset. In: Huang, D.S., Li, K., Irwin, G.W. (eds.) Intelligent Control and Automation. LNCIS, vol. 344, pp. 731–740. Springer, Cham (2006). https://doi.org/10.1007/978-3-540-37256-1_89
Browne, M.W.: Cross-validation methods. J. Math. Psychol. 44(1), 108–132 (2000)
Webb, G.I., Sammut, C., Perlich, C., et al.: Lazy Learning. Encyclopedia of Ma-chine Learning. springer us (2011)
Trajanoska, M., Mijalcheva, V., Simjanoska, M.: Mapping metastasis bio-markers to gene names
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Systematic and integrative analysis of large gene lists using David bioinformatics resources. Nat. Protoc. 4(1), 44–57 (2009)
Huang, D.W., Sherman, B.T., Lempicki, R.A.: Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37(1), 1–13 (2009)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Appendix A Supporting Information
Appendix A Supporting Information
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Trajanoska, M., Mijalcheva, V., Simjanoska, M. (2024). Explainable Machine Learning Unveils Novel Insights into Breast Cancer Metastases Sites Bio-Markers. In: Mihova, M., Jovanov, M. (eds) ICT Innovations 2023. Learning: Humans, Theory, Machines, and Data. ICT Innovations 2023. Communications in Computer and Information Science, vol 1991. Springer, Cham. https://doi.org/10.1007/978-3-031-54321-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-54321-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-54320-3
Online ISBN: 978-3-031-54321-0
eBook Packages: Computer ScienceComputer Science (R0)