Multi-label transfer learning for the early diagnosis of breast cancer
Introduction
Breast cancer is the most common cancer in women, and the second most common cause of death by cancer after lung cancer. [1]. About 40,920 women in the U.S. are expected to die in 2018 from breast cancer, though death rates have been decreasing since 1989 [1]. These decreases are thought to be the result of treatment advances, earlier detection through screening, and increased awareness [1]. Early diagnosis of breast cancer continues to be the best way to save lives and decrease healthcare costs over time. Technologies to detect and diagnose breast cancer continue to advance for the purpose of giving patients less invasive options and better diagnoses [2].
Mammography is the primary factor in breast cancer mortality reduction, despite the potential drawbacks to the procedure [3]. In fact, reading a mammogram accurately is challenging for most radiologists. In some recent surveys [4], error in diagnosis was the most common cause of malpractice suits against radiologists. The majority of such cases arose from failure to diagnose breast cancer on mammography [4]. To reduce the rate of false-negative diagnoses, lesions with a 2% chance being malignant are recommended for a biopsy [5]. However, only 15 to 30% of the biopsies are found to be malignant [5]. Benign biopsies cause many negative consequences which include fear, pain, anxiety, direct financial expenses, indirect costs related to work missed, and risk of complications [6], [7], [8].
One way researchers have sought to improve the performance of mammography and increase the accuracy of the diagnoses was through a better estimation of breast cancer risk on the basis of mammography findings [9], [10].
Mammograms are normally subject to multiple annotations. The labels commonly attempt to describe the density of the breast according to the BI-RADS categories, the type of findings (masses, calcifications, etc.) and the pathology [11]. Initially, the radiologist will inspect the images looking for abnormalities in the form of masses, calcifications or other. Masses are defined as three-dimensional and occupy space with completely or partially convex-outward borders. When a new mass is identified, a diagnostic evaluation maybe warranted. Calcifications are deposits of calcium salts in the breast. Sometimes the calcifications can be associated with cancer. Certain characteristics of the calcifications help the radiologist decide if further action is needed. The Breast Imaging Reporting and Data System (BI-RADS) [12] was developed in part to improve the predictive capability of mammography. Radiologists classify density for each mammographic examination into one of four categories, as defined in the BI-RADS lexicon, fourth edition [13]. Approximately 9% of women have almost entirely fatty breasts (BI-RADS I), 40% have scattered fibro-glandular densities (BI-RADS II), 45% have heterogeneously dense breasts (BI-RADS III), and 6% have extremely dense breasts (BI-RADS IV) [13]. Dense breasts are defined as BI-RADS density categories III or IV. Thus, approximately 50% of the population who undergo mammography, have been categorized as having dense breasts [14]. The risk of breast cancer is higher for women with higher breast densities. It has been reported that women with a high breast density compared to women with a low breast density have a four to six fold increased risk of developing the disease [15], [16]. After careful annotations, the radiologist is then compelled to provide an accurate, specific, and sufficiently comprehensive diagnosis from the mammogram, to enable the clinician to estimate the prognosis and develop an optimal plan of treatment.
Computer-aided diagnosis (CAD) systems were proven very efficient in recognizing patterns in mammogram images that might suggest malignancy [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. Automated screening of mammograms or computer-aided diagnosis (CAD) of breast cancer is a vast field of research. In [29], [30], authors provide an extensive review on different stages of a CAD methodology for breast cancer. To the best of our knowledge, most existing works focus on single-label classification problems, where each mammogram is assumed to have only one class label from the aforementioned mammogram characteristics. On the one hand, we have those who focused on the types of findings in a mammogram to give a diagnosis. For example in some works [17], [18], [19], authors used micro-calcification to identify breast cancer. While in others [20], [21], [22], [23], [24], [25] authors focused on the morphology of the contour and shape of the breast mass lesion in mammography images, considering them the most discriminating criterions between benign and malignant masses. In other works [16], [26], [27], [28], authors relied on the mammographic density and parenchymal patterns for breast cancer risk assessment. These techniques, although working well, fail to exploit the dependencies that exist between the different annotations to be able to provide a full diagnosis. Multi-label classification is the extension of single-label classification in which the goal is to predict the set of relevant labels for a given input. The inputs used to train such a model for this type of problems have several labels. It differs with multi-class classification in terms of output label space, labels are not assumed to be mutually exclusive and multiple labels may be associated with a single training example [31], [32]. Historically, multi-label classification has primarily been applied to text categorisation [33] and medical diagnosis [34]. More recently, the use of multi-label classification has become increasingly popular for a variety of problem types [35], such as image and video annotation [36], [37], genomics [38], [39], [40] and sentiment classification [41], [42].
There are two prevailing categories of algorithms in multi-label learning, namely algorithm adaptation and problem transformation [34], [35]. Algorithm adaptation methods extend specific learning models to handle the multi-labelled data. On the other hand, problem transformation algorithms transform the multi-label learning task into either several binary classifications or one multi-class classification problem. Another category is multi-label ensemble methods [31], where models are defined as meta-algorithms based on the top of common multi-label learners. Many other methods have been proposed in the literature, some of which exploit graphical models to capture the label dependencies and conduct structured classification, including those using Bayesian networks [43], [44], [45] and conditional random fields [46].
Work on CAD for mammography [47], [48], [49] has been done since the early nineties. However, most of the proposed methods were developed on private data sets [50], [51], [52] which are not always shared and algorithms which are difficult to compare [49]. Moreover, the proposed methods for early diagnosis of breast cancer relied on only one of the indicators in the mammogram to diagnose the patient's condition. Whether it is identifying differences in shapes and patterns of the findings (i.e. masses, calcifications,…etc.) or the assessment the breast density. Based on one of these indicators, separate CAD systems are then developed to identify the disease. A drawback of such methods is that they completely ignore the interdependencies among the multiple labels, we therefore felt motivated to improve upon the state-of-the art and propose the joint learning of the tasks using multi-label image classification.
Fig. 1 demonstrates that some labels may be related to other labels. For instance, we notice that there is a connection between the findings and the BI-RADS density class IV in most datasets, which is noticeable in the bottom left corner of the co-occurrence matrices. We can also see that the masses and calcifications found in the BI-RADS I or II density tissue are most likely to be benign and this in all datasets. The matrices also indicate that most of the malignant cases are highly linked to the finding of a mass. To that end, modelling the rich semantic information contained in a mammography image and the possible label dependencies that may exit, is essential for understanding the image as a whole. Therefore, it is a practical and important problem to be able to design a framework that can accurately assign multiple labels to a suspected region.
On a separate note, transfer learning [53], [54] is an attractive technique when dealing with little data, which is the case in the medical domain. Our last work [21] demonstrated that using transfer learning, from natural images with fine-tuning, we could efficiently learn from mammography image datasets and achieve better results than when learning from scratch. Some of the limitations of the work were due to the texture of some of the images, i.e. when examining these images, we noticed that the texture of some of the benign and malignant mass lesion images was similar and this resulted in a misclassification of the suspected region. Most of the misclassified mass lesion images were also labelled as highly dense. Accordingly, research showed that cancer is more difficult to detect, in mammograms of women with radiographically dense breasts [55]. Breasts are composed of lobules, ducts, fatty and fibrous connective tissue. The breasts are dense in the presence of a lot of glandular tissue and not much fat. On mammograms, dense breast tissue looks white, while breast masses or tumours also look white. Therefore, the dense tissue hides the potential findings. On the other hand, fatty tissue looks almost black, and on a black background, it is clearly easier to identify a tumour that looks white (see Fig. 2). Therefore, mammograms can be less accurate in women with dense breasts.
Inspired by the success of Convolutional Neural Networks (CNNs) in single-label mammography classification [17], [18], [21], we seek to build an end-to-end deep learning framework for multi-label breast lesion classification. We want to take advantage of the very expressive convolutional neural network architecture (CNN) [56] to build an automatic multi-labelling framework able to help assist the radiologist in giving a full report and more accurate diagnoses to his patients. This is a follow-up and improvement of our last work [21], which extends the image classification from a single-label to a multi-label problem. In this work, we compare the performance of the CNN while using different initialization and optimization procedures. When fine-tuning we propose to use the new strategy we previously presented as a preliminary proposal in [57], but this work goes in much more depth to give detailed explanations and extensive experimentations that underline the superiority of the proposed approach. The method is a new training procedure for fine-tuning when using transfer learning, the idea behind it is that when fine-tuning we don't want all the weights to change in the same manner, we want some of the layers to be more or less receptive to change, depending on their nature. Accordingly, the proposed fine-tuning strategy optimizes the model using SGD momentum with an exponentially decaying learning rate to customize all the pre-trained weights and make them more suited to our type of data. The per-layer decaying learning rate helps control the rate at which weights change in each part of the network i.e. the change will be small to non-existent as we go backwards in the network towards the first layers. The results obtained from using this approach show robustness and efficiency when predicting labels for new breast lesions, whether trained on a small or a slightly larger dataset. We also adopt a final decision labelling mechanism adapted to this task and we evaluate the proposed approach on four benchmark datasets and using many evaluation criteria.
The remainder of this paper is organized as follows: Section 2 formulates the problem and describes the methodology proposed to solve the issue at hand. In Section 3 we provide details of the experimentations lead to evaluate the proposed approach, and we give the results along with a discussion to analyze them in Section 4. Finally Section 5 concludes the paper.
Section snippets
Problem formulation
Let be our dataset, the task is a multi-label classification problem, where the input is a Region-Of-Interest (ROI) image with and the output is the set of labels .
Each instance ROI is associated with multiple class labels , in this case () and the predefined set of class labels is {1: BI-RADS I, 2: BI-RADS II, 3: BI-RADS III, 4: BI-RADS IV, 5: Mass, 6: Calcification, 7: Benign, 8: Malignant}.
For an image x the target is the label set
Data
We used four publicly available datasets, CBIS-DDSM [76], BCDR [77], INBreast[78] and MIAS[79], and adopted the same pre-processing steps described in [21] for cropping, normalizing and augmenting the images. First we cropped fixed sized regions of interest using the ground truth provided with the datasets and resized them to 224 × 224. Then, we used global contrast normalization where every image was normalized by subtracting the mean and dividing by the standard deviation of its elements.
Results and discussion
Table 1 compares the performance of the 4 models on the CBIS-DDSM [76], BCDR [77], INBreast [78] and the MIAS [79] datasets. We report results with respect to the metrics introduced above. The results indicate that the proposed fine-tuning strategy achieves a substantial improvement over existing methods, in terms of all of the used metrics and over all the datasets.
When comparing the models’ performance on the different datasets, we can see that the under-achieving model is either VGG-Sc or
Conclusions
In this paper, we propose a computer-aided diagnosis system that aims to make use of all the annotations in a mammography image which are normally neglected in the diagnostic building process of other proposed CADs, and this in order to build a CAD system capable of providing a full diagnosis.
The CAD is a multi-label image classification system that aims to capture spontaneous label correlation relationships while taking advantage of an end-to-end image representation learning architecture. The
Acknowledgment
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
The BCDR database used in this work was a courtesy of MA Guevara Lopez and coauthors, Breast Cancer Digital Repository Consortium.
The INbreast database used in this work was a courtesy of the Breast Research Group, INESC Porto, Portugal.
Declarations of interest
None.
Hiba Chougrad received her Master of Engineering degree in Computer Science from the National School of Applied Sciences of Safi, Morocco in 2013. Currently, she is a Ph.D. candidate in the department of mathematics and computer science at Chouaib Doukkali University. Her current research interests include computer vision, machine learning, deep learning & AI for healthcare and medical research.
References (88)
- et al.
Why have breast cancer mortality rates declined?
J. Cancer Policy
(2015) - et al.
Scar formation after stereotactic vacuum-assisted core biopsy of benign breast lesions
Clin. Radiol.
(2006) - et al.
Pain in different methods of breast biopsy: emphasis on vacuum-assisted breast biopsy
Breast Edinb. Scotl.
(2008) - et al.
A deep feature based framework for breast masses classification
Neurocomputing
(2016) - et al.
Breast masses in mammography classification with local contour features
Biomed. Eng. Online
(2017) - et al.
Masses in mammography: what are the underlying anatomopathological lesions?
Diagn. Interv. Imaging
(2014) - et al.
Deep Convolutional Neural Networks for breast cancer screening
Comput. Methods Programs Biomed.
(2018) - et al.
Representation learning for mammography mass lesion classification with convolutional neural networks
Comput. Methods Programs Biomed.
(2016) - et al.
Computer-aided detection and diagnosis in mammography
Handb. Image Video Process.
(2005) - et al.
A review of computer-aided diagnosis of breast cancer: toward the detection of subtle signs
J. Frankl. Inst.
(2007)
An extensive experimental comparison of methods for multi-label learning
Pattern Recognit.
Learning multi-label scene classification
Pattern Recognit.
A multi-label approach using binary relevance and decision trees applied to functional genomics
J. Biomed. Inform.
A multi-label classification based approach for sentiment classification
Expert Syst. Appl.
Multi-dimensional classification with Bayesian networks
Int. J. Approx. Reason.
Computer-aided detection in mammography
Clin. Radiol.
Current status and future directions of computer-aided diagnosis in mammography
Comput. Med. Imaging Graph.
Computer-aided detection: the effect of training databases on detection of subtle breast masses
Acad. Radiol.
Large scale deep learning for computer aided detection of mammographic lesions
Med. Image Anal.
On the momentum term in gradient descent learning algorithms
Neural Netw.
INbreast: toward a full-field digital mammographic database
Acad. Radiol.
INbreast
Acad. Radiol.
Labelling strategies for hierarchical multi-label classification techniques
Pattern Recognit.
Cancer statistics, 2018
CA Cancer J. Clin.
Advantages and disadvantages of mammography screening
Breast Care
The causes of medical malpractice suits against radiologists in the United States
Radiology
Periodic mammographic follow-up of probably benign lesions: results in 3184 consecutive cases
Radiology
Systematic review: the long-term effects of false-positive mammograms
Ann. Intern. Med.
Breast imaging reporting and data system standardized mammography lexicon: observer variability in lesion description
Am. J. Roentgenol.
The American College of Radiology mammography lexicon: an initial attempt to standardize terminology
Am. J. Roentgenol.
Trends in breast tissue sampling and pathology diagnoses among women undergoing mammography in the U.S.: a report from the breast cancer surveillance consortium
Cancer
Breast imaging reporting and data system
Am. J. Roentgenol.
ACR BI-RADS®-Mammography: Breast Imaging Reporting and Data System
Prevalence of mammographically dense breasts in the United States
J. Natl. Cancer Inst.
Mammographic features and subsequent risk of breast cancer: a comparison of qualitative and quantitative evaluations in the Guernsey prospective studies
Cancer Epidemiol. Biomark. Prev. Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol.
Mammographic density, breast cancer risk and risk prediction
Breast Cancer Res. BCR.
Breast cancer diagnosis: analyzing texture of tissue surrounding microcalcifications
IEEE Trans. Inf. Technol. Biomed.
A CAD\bf xScheme for mammography empowered with topological information from clustered microcalcifications’ Atlases
IEEE J. Biomed. Health Inform.
Discrimination of breast cancer with microcalcifications on mammography by deep learning
Sci. Rep.
Computer-aided diagnosis of mammographic masses using vocabulary tree-based image retrieval
Mammographic density and structural features can individually and jointly contribute to breast cancer risk assessment in mammography screening: a case–control study
BMC Cancer
Classification of mammographic breast density using a combined classifier paradigm
Beyond breast density: a review on the advancing role of parenchymal texture analysis in breast cancer risk assessment
Breast Cancer Res.
A review on multi-label learning algorithms
IEEE Trans. Knowl. Data Eng.
Cited by (0)
Hiba Chougrad received her Master of Engineering degree in Computer Science from the National School of Applied Sciences of Safi, Morocco in 2013. Currently, she is a Ph.D. candidate in the department of mathematics and computer science at Chouaib Doukkali University. Her current research interests include computer vision, machine learning, deep learning & AI for healthcare and medical research.
Hamid Zouaki received his Ph.D. In Mathematics from the institute of Applied Mathematics of Grenoble, France, 1991. He is currently Professor at the department of Mathematics of the University Chouaib Doukkali, El Jadida (Morocco). His research interests are representation tools for shape description, numerical optimization, image analysis, CBIR and machine learning.
Omar Alehyane received his Ph.D. in Mathematics from the University Paul Sabatier of Toulouse, France, 1997. He is currently Professor at the department of Mathematics of the University Chouaib Doukkali, El Jadida (Morocco). His research interests are complex geometry, geodesic distances and hyperbolic geometry.