Abstract
Breast cancer is a common type of cancer affecting women worldwide. Continuous efforts are being made for the identification of significant genes/biomarkers for prognosis of breast cancer. These prognostic biomarkers are very useful to predict the resemblance between query patients (new) and reference patients (existing). Here, 1d-DDg (one-dimensional data-driven grouping) model has been used to make prognostic model for breast cancer diagnosis. The Cox proportional hazard regression model has been applied to select the predictive genes for breast cancer. Microarray gene expression data and clinical information have been used to select the predictive genes. Based on these biomarkers, patients are categorized into two groups, namely low-risk and high-risk groups. After that, the Manhattan distance has been applied to compute the resemblance/similarity between query (newly admitted) patients and the reference (existing) patients. Two breast cancer datasets with accession number GSE2990 and GSE45255 obtained from National Centre for Biotechnology Information (NCBI) data portal containing miRNA and mRNA expression profiles have been used in the experiential purpose. The clinical information, i.e., disease relapse, overall survival, body mass index, and age, is available in both the datasets. Microarray gene expression data along with clinical data have been considered to compute resemblance between query and reference patients. Regarding computing resemblance, the literature suggests that, the Manhattan distance is more appropriate for high-dimension vector/data compared to Euclidean distance. In this regard, a comparison has also been made between the Manhattan and Euclidean distance on the basis of elapsed time. The experimental result shows that the Manhattan distance executes faster than Euclidean distance. Therefore, for getting a faster response without losing the quality and accuracy of the solution, the ranking of reference patients has been performed using Manhattan distance. Treatment to query patient is provided based on reference patient occupying the first rank in resemblance. This Manhattan distance-based algorithm based on genetic as well as clinical data is a new approach for prognosis to breast cancer.
Similar content being viewed by others
References
International Agency for Research on Cancer (2018) Latest global cancer data: Cancer burden rises to 18.1 million new cases and 9.6 million cancer deaths in 2018
Ali I, Wani WA, Saleem K (2011) Cancer scenario in India with future perspectives. Cancer Ther 8:56–70
Hassanpour SH, Dehghani M (2017) Review of cancer from perspective of molecular. J Cancer Res Pract 4(4):127–129
Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) A.: global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6):394–424
Berrar DP, Dubitzky W, Granzow M (eds) (2003) A practical approach to microarray data analysis. Kluwer academic publishers, New York, pp 15–19
Wang Z, Jensen MA, Zenklusen JC (2016) A practical guide to The Cancer Genome Atlas (TCGA). In: Mathé E, Davis S (eds) Statistical genomics. Methods in molecular biology, 1418. Humana Press, New York, NY, pp 111–141
Tarca AL, Romero R, Draghici S (2006) Analysis of microarray experiments of gene expression profiling. Am J Obstet Gynecol 195(2):373–388
Hansebout RR, Cornacchi SD, Haines T, Goldsmith CH (2009) How to use an article about prognosis. Can J Surg 52(4):328
Nounou MI, ElAmrawy F, Ahmed N, Abdelraouf K, Goda S, Syed-Sha-Qhattal H (2015) Breast cancer: conventional diagnosis and treatment modalities and recent patents and technologies. Breast Cancer Basic Clin Res. https://doi.org/10.4137/BCBCR.S29420
Dobbin KK, Simon RM (2011) Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genom 4(1):31
Tang Z, Ow GS, Thiery JP, Ivshina AV, Kuznetsov VA (2014) Meta-analysis of transcriptome reveals let-7b as an unfavorable prognostic biomarker and predicts molecular and clinical subclasses in high-grade serous ovarian carcinoma. Int J Cancer 134(2):306–318
Bao T, Davidson NE (2008) Gene expression profiling of breast cancer. Adv Surg 42:249–260
Chen MH, Ibrahim JG, Shao QM (2009) Maximum likelihood inference for the Cox regression model with applications to missing covariates. J Multivar Anal 100(9):2018–2030
Ades F, Tryfonidis K, Zardavas D (2017) The past and future of breast cancer treatment—from the papyrus to individualised treatment approaches. Ecancermedicalscience 11:746
Motakis E, Ivshina AV, Kuznetsov VA (2009) Data-driven approach to predict survival of cancer patients. IEEE Eng Med Biol Mag 28(4):58–66
Ow GS, Tang Z, Kuznetsov VA (2016) Big data and computational biology strategy for personalized prognosis. Oncotarget 7(26):40200
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. The annals of statistics 29(4):1165–1188
Fox J (2002) Cox proportional-hazards regression for survival data. An R and S-PLUS companion to applied regression, 2002
Broët P, Kuznetsov VA, Bergh J, Liu ET, Miller LD (2006) Identifying gene expression changes in breast cancer that distinguish early and late relapse among uncured patients. Bioinformatics 22(12):1477–1485
Kouser K, Sunita A (2013) A comparative study of K means algorithm by different distance measures. Int J Innov Res Comput Commun Eng 1(9):2443–2447
Sharma SK, Kumar S (2016) Comparative analysis of Manhattan and Euclidean distance metrics using A* algorithm. J. Res. Eng. Appl. Sci 1(4):196–198
Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, Springer, pp. 420–434
Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17
Mihaylov I, Nisheva M, Vassilev D (2019) Application of machine learning models for survival prognosis in breast cancer studies. Information 10(3):93
Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inf Decis Making 19(1):48
Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G, Rovera F (2020) Machine learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9(9):3234–3243
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Das, M., Jana, B., Mitra, S. et al. Estimation of resemblance and risk level of a breast cancer patient by prognostic variables using microarray gene expression data. Innovations Syst Softw Eng 17, 73–88 (2021). https://doi.org/10.1007/s11334-020-00367-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11334-020-00367-2