Skip to main content

Advertisement

Log in

Estimation of resemblance and risk level of a breast cancer patient by prognostic variables using microarray gene expression data

  • S.I. : ARIIS
  • Published:
Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Abstract

Breast cancer is a common type of cancer affecting women worldwide. Continuous efforts are being made for the identification of significant genes/biomarkers for prognosis of breast cancer. These prognostic biomarkers are very useful to predict the resemblance between query patients (new) and reference patients (existing). Here, 1d-DDg (one-dimensional data-driven grouping) model has been used to make prognostic model for breast cancer diagnosis. The Cox proportional hazard regression model has been applied to select the predictive genes for breast cancer. Microarray gene expression data and clinical information have been used to select the predictive genes. Based on these biomarkers, patients are categorized into two groups, namely low-risk and high-risk groups. After that, the Manhattan distance has been applied to compute the resemblance/similarity between query (newly admitted) patients and the reference (existing) patients. Two breast cancer datasets with accession number GSE2990 and GSE45255 obtained from National Centre for Biotechnology Information (NCBI) data portal containing miRNA and mRNA expression profiles have been used in the experiential purpose. The clinical information, i.e., disease relapse, overall survival, body mass index, and age, is available in both the datasets. Microarray gene expression data along with clinical data have been considered to compute resemblance between query and reference patients. Regarding computing resemblance, the literature suggests that, the Manhattan distance is more appropriate for high-dimension vector/data compared to Euclidean distance. In this regard, a comparison has also been made between the Manhattan and Euclidean distance on the basis of elapsed time. The experimental result shows that the Manhattan distance executes faster than Euclidean distance. Therefore, for getting a faster response without losing the quality and accuracy of the solution, the ranking of reference patients has been performed using Manhattan distance. Treatment to query patient is provided based on reference patient occupying the first rank in resemblance. This Manhattan distance-based algorithm based on genetic as well as clinical data is a new approach for prognosis to breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. International Agency for Research on Cancer (2018) Latest global cancer data: Cancer burden rises to 18.1 million new cases and 9.6 million cancer deaths in 2018

  2. Ali I, Wani WA, Saleem K (2011) Cancer scenario in India with future perspectives. Cancer Ther 8:56–70

    Google Scholar 

  3. Hassanpour SH, Dehghani M (2017) Review of cancer from perspective of molecular. J Cancer Res Pract 4(4):127–129

    Article  Google Scholar 

  4. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A (2018) A.: global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 68(6):394–424

    Article  Google Scholar 

  5. Berrar DP, Dubitzky W, Granzow M (eds) (2003) A practical approach to microarray data analysis. Kluwer academic publishers, New York, pp 15–19

    Google Scholar 

  6. Wang Z, Jensen MA, Zenklusen JC (2016) A practical guide to The Cancer Genome Atlas (TCGA). In: Mathé E, Davis S (eds) Statistical genomics. Methods in molecular biology, 1418. Humana Press, New York, NY, pp 111–141

  7. Tarca AL, Romero R, Draghici S (2006) Analysis of microarray experiments of gene expression profiling. Am J Obstet Gynecol 195(2):373–388

    Article  Google Scholar 

  8. Hansebout RR, Cornacchi SD, Haines T, Goldsmith CH (2009) How to use an article about prognosis. Can J Surg 52(4):328

    Google Scholar 

  9. Nounou MI, ElAmrawy F, Ahmed N, Abdelraouf K, Goda S, Syed-Sha-Qhattal H (2015) Breast cancer: conventional diagnosis and treatment modalities and recent patents and technologies. Breast Cancer Basic Clin Res. https://doi.org/10.4137/BCBCR.S29420

    Article  Google Scholar 

  10. Dobbin KK, Simon RM (2011) Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genom 4(1):31

    Article  Google Scholar 

  11. Tang Z, Ow GS, Thiery JP, Ivshina AV, Kuznetsov VA (2014) Meta-analysis of transcriptome reveals let-7b as an unfavorable prognostic biomarker and predicts molecular and clinical subclasses in high-grade serous ovarian carcinoma. Int J Cancer 134(2):306–318

    Article  Google Scholar 

  12. Bao T, Davidson NE (2008) Gene expression profiling of breast cancer. Adv Surg 42:249–260

    Article  Google Scholar 

  13. Chen MH, Ibrahim JG, Shao QM (2009) Maximum likelihood inference for the Cox regression model with applications to missing covariates. J Multivar Anal 100(9):2018–2030

    Article  MathSciNet  Google Scholar 

  14. Ades F, Tryfonidis K, Zardavas D (2017) The past and future of breast cancer treatment—from the papyrus to individualised treatment approaches. Ecancermedicalscience 11:746

    Article  Google Scholar 

  15. Motakis E, Ivshina AV, Kuznetsov VA (2009) Data-driven approach to predict survival of cancer patients. IEEE Eng Med Biol Mag 28(4):58–66

    Article  Google Scholar 

  16. Ow GS, Tang Z, Kuznetsov VA (2016) Big data and computational biology strategy for personalized prognosis. Oncotarget 7(26):40200

    Article  Google Scholar 

  17. Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. The annals of statistics 29(4):1165–1188

    Article  MathSciNet  Google Scholar 

  18. Fox J (2002) Cox proportional-hazards regression for survival data. An R and S-PLUS companion to applied regression, 2002

  19. Broët P, Kuznetsov VA, Bergh J, Liu ET, Miller LD (2006) Identifying gene expression changes in breast cancer that distinguish early and late relapse among uncured patients. Bioinformatics 22(12):1477–1485

    Article  Google Scholar 

  20. Kouser K, Sunita A (2013) A comparative study of K means algorithm by different distance measures. Int J Innov Res Comput Commun Eng 1(9):2443–2447

    Google Scholar 

  21. Sharma SK, Kumar S (2016) Comparative analysis of Manhattan and Euclidean distance metrics using A* algorithm. J. Res. Eng. Appl. Sci 1(4):196–198

    Google Scholar 

  22. Aggarwal CC, Hinneburg A, Keim DA (2001) On the surprising behavior of distance metrics in high dimensional space. In: International conference on database theory, Springer, pp. 420–434

  23. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis DI (2015) Machine learning applications in cancer prognosis and prediction. Comput Struct Biotechnol J 13:8–17

    Article  Google Scholar 

  24. Mihaylov I, Nisheva M, Vassilev D (2019) Application of machine learning models for survival prognosis in breast cancer studies. Information 10(3):93

    Article  Google Scholar 

  25. Ganggayah MD, Taib NA, Har YC, Lio P, Dhillon SK (2019) Predicting factors for survival of breast cancer patients using machine learning techniques. BMC Med Inf Decis Making 19(1):48

    Article  Google Scholar 

  26. Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G, Rovera F (2020) Machine learning techniques in breast cancer prognosis prediction: a primary evaluation. Cancer Med 9(9):3234–3243

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sriyankar Acharyya.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Das, M., Jana, B., Mitra, S. et al. Estimation of resemblance and risk level of a breast cancer patient by prognostic variables using microarray gene expression data. Innovations Syst Softw Eng 17, 73–88 (2021). https://doi.org/10.1007/s11334-020-00367-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11334-020-00367-2

Keywords

Navigation