Robust breast cancer prediction system based on rough set theory at National Cancer Institute of Egypt

https://doi.org/10.1016/j.cmpb.2017.10.016Get rights and content

Highlights

  • This is one of the first studies using Rough Set theory for early prediction and diagnosis of breast cancer according to the High Risk Status attribute.

  • Our approach has been compared with the clinical, RBF, and ANN classifications.

  • A novel robust breast cancer prediction and diagnosis system based on the Rough Set (RS) has been proposed.

Abstract

Background: Breast cancer is one of the major death causing diseases of the women in the world. Every year more than million women are diagnosed with breast cancer more than half of them will die because of inaccuracies and delays in diagnosis of the disease. High accuracy in cancer prediction is important to improve the treatment quality and the survivability rate of patients.

Objectives: In this paper, we are going to propose a new and robust breast cancer prediction and diagnosis system based on the Rough Set (RS). Also, introducing the robust classification process based on some new and most effective attributes. Comparing and evaluating the performance of our proposed approach with the clinical, Radial Basis Function, and Artificial Neural Networks classification schemes.

Methods: The dataset used in our experiments consists of 60 samples obtained from the National Cancer Institute (NCI) of Egypt. We have used the RS theory to robustly find dependence relationships among data, and evaluate the importance of attributes through:

  • Applying the Approximation Sets on this dataset to identify the patient’s cancer stage (0, IA, IB, IIA, IIB, IIIA, IIIB, IIIC, IV); and

  • Running the Reduction process on this dataset to identify which attributes (symptoms) are most effective for description and predict breast cancer stage.

Results:

  • Our approach has classified the patients into 9 different stages, Stage 0 with accuracy 75%, Stage IA with accuracy 71%, Stage IB with zero patients, Stage IIA with accuracy 86%, Stage IIB with accuracy 67.5%, Stage IIIA with accuracy 85%, Stage IIIB with accuracy 100%, Stage IIIC with zero patients, and Stage IV with accuracy 100%;

  • The Reduction process gives as output, the most effective symptoms to early predict and accurately diagnosis the breast cancer. That are represented in Lymph Node Status, Tumor Size, Estrogen Receptor Status, Progesterone Receptor Status, and Metastasis; and

  • The last but not least, we have found two patients (patient No. 11 and 51 from our dataset) in High Risk Status, which requires intensive and special treatment.

Conclusion: We have introduced the robustness of the RS theory in early predicting and diagnosing the breast cancer. This lay more importance to the contribution and efficiency of RS theory in the field of computational biology.

Introduction

The body is made up of trillions of living cells, normal body cells grow and divide into new cells and die in an orderly way. Cancer begins when cells of the body start to divide without stopping (i.e. out of control) [1], [2]. Breast cancer is characterized by the uncontrolled growth of abnormal cells in the milk producing glands of the breast, It is the most common neoplasm among women in the majority of the developed countries, accounting for one-third of newly diagnosed malignancies [3]. It is a highly heterogeneous disease, encompassing a number of biologically distinct entities with specific pathologic features and biological behaviors [3], [4]. Different breast tumor subtypes have different risk factors, clinical presentation, histopathological features, outcome, and response to systemic therapies. Thus, an accurate and early classification of breast cancer is urgently required. Improvements in prevention and diagnosis have resulted in earlier diagnosis and treatment. Earlier detection of cancer is curable and may increase the survivability, but detecting cancer in earlier stage is difficult [1], [2], [5], [6].

Large amounts of data about the patients with their medical conditions are presented in the medical databases. Breast cancer is one of the most important medical problems. The growth of the amount of data and the number of existing databases far exceeds the ability of humans to analyze this data. Analyzing all these databases is one of the difficult tasks in the medical environment. Thus, there is both a need and an opportunity to extract knowledge from databases. Medical databases have a large quantity of information about patients and their medical conditions. In order to accomplish move through the correct treatment, physicians classify the individual breast cancer according to standard parameters that include type, grade, stage, and gene expression of the breast cancer. In this paper, we are interested in classifying breast cancer according to stage, to describe the correct treatment as early as possible. For screening purposes, mammography is quite often used hence it gives the maximum possibilities for a physician to trace the exact location of micro calcifications and other possible indicators in the breast tissue.

In this paper, we are doing the classification accuracy of the TNM staging process using the RS theory. Also, the results are compared with the previously proposed RS, RBF, and ANN. In this study, the total number of patients with breast cancer studied are 60. For all the types of classifications, the input variables are nothing but the TNM variables (such as Metastasis (M), Tumour Size (TS), Lymph Node Status (LNS), Estrogen Receptor Status (ERS), Progesterone Receptor Status(PRS), Histological Type (HT), and Histological Grade (HG)). To increase the classification accuracy, we have consulted and the physicians Tumor Department at both Zagazig University Hospitals and Suez Canal University Hospitals, Egypt, about the most effective and redundant attributes.

The theory of RS theory is a mathematical tool for extracting knowledge from uncertain and incomplete database information [7], [8]. The theory of RS can be used to find dependence relationships among data, and evaluate the importance of attributes. The theory assumes that we first have the necessary information or we have exactly the same information of two objects then we say that they are indiscernible (similar), i.e., we cannot distinguish them with known knowledge.

To the best of our knowledge and according to the most recent literature in breast cancer classification and treatment, most of the proposed approaches are working on international databases such as WBC data set. Also, they are doing the classification process based on some static and old attributes. As well as, there is no clear treatment description. This is behind our motivation to propose a novel intelligent approach for breast cancer prediction and diagnosis based on rough set theory, new most effective attributes, and High Risk Status (HRS). The patient is classified as a HRS, if Age35(ERS=+vePRS=+ve)T>2cmHG=2:3, for ERS and PRS to be +ve means that their values is greater than 50. This is due to the discussion with specialist physicians at Zagazig University Hospitals and Suez Canal University Hospitals.

Also, the authors in [9] have introduced a comprehensive breast cancer classification with Radial Basis Function and Gaussian Mixture Model, which is only based on five classification stages. Thus, we have to introduce nine classification stages, to be more accurate and specific.

The rest of this paper is organized as follows: Section 2 provides the related literature and discussion, which introduce a discussion for the breast cancer classification with two well known methods (RBF, and ANN) and other different contributions in that field. Section 3 describes an overview on breast cancer and Rough Set theory basic concepts. Section 4 presents our proposed approach with its materials and methods, approximations and accuracy, and reduction and core attributes. The proposed approach implementation is presented in Section 5. Section 6 provides a detailed comparison for our approach with the clinical, RBF, and ANN classifications. As well as, our obtained results. This is followed by the conclusion Section 7.

Section snippets

Rough set theory and breast cancer

Chul-Heui Lee et al. [10] proposed a new classification method based on the hierarchical granulation structure using the rough set theory. The hierarchical granulation structure was adopted to find the classification rules effectively. The classification rules had minimal attributes and the knowledge reduction was accomplished by using the upper and lower approximations of rough sets. A simulation was performed on WBC dataset to show the effectiveness of the proposed method. The simulation

Breast cancer overview

Breast cancer is the most common cancer disease among women. The information about the tumor from certain examinations and diagnostic tests are gathered using staging to determine how widespread the cancer is. The anatomy of normal breast [1], [2], [5] shown in Fig. 1(a). This figure shows the lobes and ducts inside the breast. It also shows lymph nodes near the breast.

Proposed scheme

Our proposed approach is composed of two basic stages, Rough Set (RS) theory and Classification by Matlab. The RS theory stage is internally composed of two basic processes, Approximation and Reduction. The Classification stage is based on Table 1. The flowchart of our proposed approach is shown in Fig. 3.

Implementation

We have implemented the proposed approach using Matlab, which is running under MacOS Sierra Version 10.12.6. The configuration of our machine is MacBook Pro-(3.3 GHz Intel Core i7, 16 GB 2133 MHz LPDDR3). We have introduced a friendly GUI (Fig. 4), which is able to get the patient’s medical attributes from the patient itself for the classification process.

Approximation sets

After constructing our approximation algorithm using Matlab, we obtained the lower, upper, and accuracy attributes description of each class (stage) results. We see that stages IIIB and IV are crisp describable in the system, and the remaining stages are roughly describable with the accuracy given in the last column. Thus, we can say that the data (symptoms) available from the patients exactly characterize classes IIIB and IV only, and the remaining classes are not characterized exactly by

Conclusion

In order to increase the survivability rate among the breast cancer patients, there should be a novel robust breast cancer prediction and diagnosis system. We have introduced a new and robust breast cancer prediction and diagnosis system based on the RS theory. The RS theory has proved its efficiency and robustness in early predicting and diagnosing the breast cancer patients. It should be mentioned that, patient No. 11 and 51 from our dataset are in High Risk Status. The comparison of our

Acknowledgments

We would like thank the anonymous reviewers for their valuable and helpful comments and suggestions that have improved our paper.

References (41)

  • S.D.M.L.B.A. Spitale A et al.

    Breast cancer classification according to immunohistochemical markers: clinicopathologic features and short-term survival analysis in a population-based study from the south of switzerland

    Ann. Oncol.: Off. J. Eur. Soc. Med. Oncol. / ESMO

    (2009)
  • B.P. Tang P et al.

    Molecular classifications of breast carcinoma with similar terminology and different definitions: are they the same?

    Hum. Pathol.

    (2009)
  • American cancer society, (2017), Available online:...
  • National cancer institute, (2017), Available online:...
  • American cancer society, california department of health services, a Woman’s guide to breast cancer diagnosis and...
  • Z. Pawlak et al.

    Rough sets

    Commun. ACM

    (1995)
  • S.K.P. Harikumar Rajaguru

    A comprehensive analysis on breast cancer classification with radial basis function and gaussian mixture model

    The 16th International Conference on Biomedical Engineering (ICBME)

    (2016)
  • ChulL.H. et al.

    Rule discovery using hierarchial classification structure with rough sets

    FSA World Congress and 20th NAFIPS International Conference

    (2001)
  • H.E. Aboul et al.

    Rough set approach for generation of classification rules of breast cancer data

    J. Inf.

    (2001)
  • National cancer institute of egypt, (2017), Available online:...
  • Cited by (28)

    • Breast cancer detection model using fuzzy entropy segmentation and ensemble classification

      2023, Biomedical Signal Processing and Control
      Citation Excerpt :

      Therefore, large datasets are required for implementation in deep learning-based approaches. Moreover, rough set theory was introduced in [8], which offers higher accuracy with better sensitivity. However, time factors were not focused.

    • R-HEFS: Rough set based heterogeneous ensemble feature selection method for medical data classification

      2021, Artificial Intelligence in Medicine
      Citation Excerpt :

      Therefore to improve the classification accuracy there is a high demand for suitable FS process through the removal of irrelevant or redundant features. Different studies [18–21] have successfully applied FS methods underlying on RST in medical domain for the classification problem. Besides the earlier mentioned three different approaches (viz., filter, wrapper and embedded) of FS methods a new paradigm of FS technique has recently emerged which is built on top of the existing FS methods, i.e., the ensemble FS (EFS) [3,21,22].

    • R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data

      2020, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      Finally, the effectiveness of the proposed method is evaluated by using different classifiers on several benchmark medical datasets. For the present investigation, by considering the different advantages of RST and by looking at the successful well cited state-of-the-art work based on RST in medical and healthcare data analysis [23,31–33] three RST based filter methods (namely Supervised Quick reduct (SQR) [18], Unsupervised Quick reduct (USQR) [34] and Distance Metric Quick Reduct (DMQR) [8]) are considered as base attribute selectors for the proposed R-Ensembler method. All these three methods select three different attribute subsets from the dataset, then the resulting subsets are combined by the proposed ensembler for giving an optimal subset of attributes.

    • Efficient composing rough approximations for distributed data

      2019, Knowledge-Based Systems
      Citation Excerpt :

      Rough Set Theory (RST) is one of main theories in Granular Computing [5,6]. As a kind of powerful mathematics tool for processing uncertain, inconsistent and fuzzy information, RST [7] has been successfully applied in many scientific research fields related to data mining and knowledge discovery [8–11] such as evaluation of clean energy development levels [12], breast cancer prediction [13], water eutrophication assessment [14] and risk assessment [15]. Skowron et al. introduced the strategies of granular composition and decomposition based on RST [16,17].

    • Effect of fractal-shaped outer boundary of glioblastoma multiforme on drug delivery

      2019, Computer Methods and Programs in Biomedicine
      Citation Excerpt :

      This type of cells cannot die; they do not exhibit apoptosis, which is the normal death. When the cells are out of control, they begin to divide without stopping and they can invade into surrounding tissues [2,3]. This process can create cancer tumors at any part of the body.

    View all citing articles on Scopus
    View full text