Estimation of resemblance and risk level of a breast cancer patient by prognostic variables using microarray gene expression data

Das, Madhurima; Jana, Biswajit; Mitra, Suman; Acharyya, Sriyankar

doi:10.1007/s11334-020-00367-2

Estimation of resemblance and risk level of a breast cancer patient by prognostic variables using microarray gene expression data

S.I. : ARIIS
Published: 22 July 2020

Volume 17, pages 73–88, (2021)
Cite this article

Innovations in Systems and Software Engineering Aims and scope Submit manuscript

Madhurima Das¹,
Biswajit Jana¹,
Suman Mitra¹ &
…
Sriyankar Acharyya¹

154 Accesses
1 Citation
Explore all metrics

Abstract

Breast cancer is a common type of cancer affecting women worldwide. Continuous efforts are being made for the identification of significant genes/biomarkers for prognosis of breast cancer. These prognostic biomarkers are very useful to predict the resemblance between query patients (new) and reference patients (existing). Here, 1d-DDg (one-dimensional data-driven grouping) model has been used to make prognostic model for breast cancer diagnosis. The Cox proportional hazard regression model has been applied to select the predictive genes for breast cancer. Microarray gene expression data and clinical information have been used to select the predictive genes. Based on these biomarkers, patients are categorized into two groups, namely low-risk and high-risk groups. After that, the Manhattan distance has been applied to compute the resemblance/similarity between query (newly admitted) patients and the reference (existing) patients. Two breast cancer datasets with accession number GSE2990 and GSE45255 obtained from National Centre for Biotechnology Information (NCBI) data portal containing miRNA and mRNA expression profiles have been used in the experiential purpose. The clinical information, i.e., disease relapse, overall survival, body mass index, and age, is available in both the datasets. Microarray gene expression data along with clinical data have been considered to compute resemblance between query and reference patients. Regarding computing resemblance, the literature suggests that, the Manhattan distance is more appropriate for high-dimension vector/data compared to Euclidean distance. In this regard, a comparison has also been made between the Manhattan and Euclidean distance on the basis of elapsed time. The experimental result shows that the Manhattan distance executes faster than Euclidean distance. Therefore, for getting a faster response without losing the quality and accuracy of the solution, the ranking of reference patients has been performed using Manhattan distance. Treatment to query patient is provided based on reference patient occupying the first rank in resemblance. This Manhattan distance-based algorithm based on genetic as well as clinical data is a new approach for prognosis to breast cancer.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions