Elsevier

Neurocomputing

Volume 392, 7 June 2020, Pages 168-180
Neurocomputing

Multi-label transfer learning for the early diagnosis of breast cancer

https://doi.org/10.1016/j.neucom.2019.01.112Get rights and content

Abstract

Early diagnosis of breast cancer, when it is small and has not spread, can make the disease easier to treat which increases the patient's chances of survival. The recent proposed methods for the early diagnosis of breast cancer, and while showing great success in achieving this goal, rely on one of the indicators in the mammogram to diagnose the patient's condition. Whether it is identifying differences in shapes and patterns of the findings (i.e. masses, calcifications,…etc.) or assessing the breast density as a risk indicator, these Computer-aided Diagnosis (CAD) systems by using single-label classification, fail to exploit the intrinsic useful correlation information among data from correlated domains.

Rather than learning to identify the disease based on one of the indicators, we propose the joint learning of the tasks using multi-label image classification. Furthermore, we introduce a new fine-tuning strategy for using transfer learning, that takes advantage of the end-to-end image representation learning when adapting the pre-trained Convolutional Neural Network (CNN) to the new task. We also propose a customized label decision scheme, adapted to this problem, which estimates the optimal confidence for each visual concept. We demonstrate the effectiveness of our approach on four benchmark datasets, CBIS-DDSM, BCDR, INBreast and MIAS, obtaining better results compared to other commonly used baselines.

Introduction

Breast cancer is the most common cancer in women, and the second most common cause of death by cancer after lung cancer. [1]. About 40,920 women in the U.S. are expected to die in 2018 from breast cancer, though death rates have been decreasing since 1989 [1]. These decreases are thought to be the result of treatment advances, earlier detection through screening, and increased awareness [1]. Early diagnosis of breast cancer continues to be the best way to save lives and decrease healthcare costs over time. Technologies to detect and diagnose breast cancer continue to advance for the purpose of giving patients less invasive options and better diagnoses [2].

Mammography is the primary factor in breast cancer mortality reduction, despite the potential drawbacks to the procedure [3]. In fact, reading a mammogram accurately is challenging for most radiologists. In some recent surveys [4], error in diagnosis was the most common cause of malpractice suits against radiologists. The majority of such cases arose from failure to diagnose breast cancer on mammography [4]. To reduce the rate of false-negative diagnoses, lesions with a 2% chance being malignant are recommended for a biopsy [5]. However, only 15 to 30% of the biopsies are found to be malignant [5]. Benign biopsies cause many negative consequences which include fear, pain, anxiety, direct financial expenses, indirect costs related to work missed, and risk of complications [6], [7], [8].

One way researchers have sought to improve the performance of mammography and increase the accuracy of the diagnoses was through a better estimation of breast cancer risk on the basis of mammography findings [9], [10].

Mammograms are normally subject to multiple annotations. The labels commonly attempt to describe the density of the breast according to the BI-RADS categories, the type of findings (masses, calcifications, etc.) and the pathology [11]. Initially, the radiologist will inspect the images looking for abnormalities in the form of masses, calcifications or other. Masses are defined as three-dimensional and occupy space with completely or partially convex-outward borders. When a new mass is identified, a diagnostic evaluation maybe warranted. Calcifications are deposits of calcium salts in the breast. Sometimes the calcifications can be associated with cancer. Certain characteristics of the calcifications help the radiologist decide if further action is needed. The Breast Imaging Reporting and Data System (BI-RADS) [12] was developed in part to improve the predictive capability of mammography. Radiologists classify density for each mammographic examination into one of four categories, as defined in the BI-RADS lexicon, fourth edition [13]. Approximately 9% of women have almost entirely fatty breasts (BI-RADS I), 40% have scattered fibro-glandular densities (BI-RADS II), 45% have heterogeneously dense breasts (BI-RADS III), and 6% have extremely dense breasts (BI-RADS IV) [13]. Dense breasts are defined as BI-RADS density categories III or IV. Thus, approximately 50% of the population who undergo mammography, have been categorized as having dense breasts [14]. The risk of breast cancer is higher for women with higher breast densities. It has been reported that women with a high breast density compared to women with a low breast density have a four to six fold increased risk of developing the disease [15], [16]. After careful annotations, the radiologist is then compelled to provide an accurate, specific, and sufficiently comprehensive diagnosis from the mammogram, to enable the clinician to estimate the prognosis and develop an optimal plan of treatment.

Computer-aided diagnosis (CAD) systems were proven very efficient in recognizing patterns in mammogram images that might suggest malignancy [16], [17], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. Automated screening of mammograms or computer-aided diagnosis (CAD) of breast cancer is a vast field of research. In [29], [30], authors provide an extensive review on different stages of a CAD methodology for breast cancer. To the best of our knowledge, most existing works focus on single-label classification problems, where each mammogram is assumed to have only one class label from the aforementioned mammogram characteristics. On the one hand, we have those who focused on the types of findings in a mammogram to give a diagnosis. For example in some works [17], [18], [19], authors used micro-calcification to identify breast cancer. While in others [20], [21], [22], [23], [24], [25] authors focused on the morphology of the contour and shape of the breast mass lesion in mammography images, considering them the most discriminating criterions between benign and malignant masses. In other works [16], [26], [27], [28], authors relied on the mammographic density and parenchymal patterns for breast cancer risk assessment. These techniques, although working well, fail to exploit the dependencies that exist between the different annotations to be able to provide a full diagnosis. Multi-label classification is the extension of single-label classification in which the goal is to predict the set of relevant labels for a given input. The inputs used to train such a model for this type of problems have several labels. It differs with multi-class classification in terms of output label space, labels are not assumed to be mutually exclusive and multiple labels may be associated with a single training example [31], [32]. Historically, multi-label classification has primarily been applied to text categorisation [33] and medical diagnosis [34]. More recently, the use of multi-label classification has become increasingly popular for a variety of problem types [35], such as image and video annotation [36], [37], genomics [38], [39], [40] and sentiment classification [41], [42].

There are two prevailing categories of algorithms in multi-label learning, namely algorithm adaptation and problem transformation [34], [35]. Algorithm adaptation methods extend specific learning models to handle the multi-labelled data. On the other hand, problem transformation algorithms transform the multi-label learning task into either several binary classifications or one multi-class classification problem. Another category is multi-label ensemble methods [31], where models are defined as meta-algorithms based on the top of common multi-label learners. Many other methods have been proposed in the literature, some of which exploit graphical models to capture the label dependencies and conduct structured classification, including those using Bayesian networks [43], [44], [45] and conditional random fields [46].

Work on CAD for mammography [47], [48], [49] has been done since the early nineties. However, most of the proposed methods were developed on private data sets [50], [51], [52] which are not always shared and algorithms which are difficult to compare [49]. Moreover, the proposed methods for early diagnosis of breast cancer relied on only one of the indicators in the mammogram to diagnose the patient's condition. Whether it is identifying differences in shapes and patterns of the findings (i.e. masses, calcifications,…etc.) or the assessment the breast density. Based on one of these indicators, separate CAD systems are then developed to identify the disease. A drawback of such methods is that they completely ignore the interdependencies among the multiple labels, we therefore felt motivated to improve upon the state-of-the art and propose the joint learning of the tasks using multi-label image classification.

Fig. 1 demonstrates that some labels may be related to other labels. For instance, we notice that there is a connection between the findings and the BI-RADS density class IV in most datasets, which is noticeable in the bottom left corner of the co-occurrence matrices. We can also see that the masses and calcifications found in the BI-RADS I or II density tissue are most likely to be benign and this in all datasets. The matrices also indicate that most of the malignant cases are highly linked to the finding of a mass. To that end, modelling the rich semantic information contained in a mammography image and the possible label dependencies that may exit, is essential for understanding the image as a whole. Therefore, it is a practical and important problem to be able to design a framework that can accurately assign multiple labels to a suspected region.

On a separate note, transfer learning [53], [54] is an attractive technique when dealing with little data, which is the case in the medical domain. Our last work [21] demonstrated that using transfer learning, from natural images with fine-tuning, we could efficiently learn from mammography image datasets and achieve better results than when learning from scratch. Some of the limitations of the work were due to the texture of some of the images, i.e. when examining these images, we noticed that the texture of some of the benign and malignant mass lesion images was similar and this resulted in a misclassification of the suspected region. Most of the misclassified mass lesion images were also labelled as highly dense. Accordingly, research showed that cancer is more difficult to detect, in mammograms of women with radiographically dense breasts [55]. Breasts are composed of lobules, ducts, fatty and fibrous connective tissue. The breasts are dense in the presence of a lot of glandular tissue and not much fat. On mammograms, dense breast tissue looks white, while breast masses or tumours also look white. Therefore, the dense tissue hides the potential findings. On the other hand, fatty tissue looks almost black, and on a black background, it is clearly easier to identify a tumour that looks white (see Fig. 2). Therefore, mammograms can be less accurate in women with dense breasts.

Inspired by the success of Convolutional Neural Networks (CNNs) in single-label mammography classification [17], [18], [21], we seek to build an end-to-end deep learning framework for multi-label breast lesion classification. We want to take advantage of the very expressive convolutional neural network architecture (CNN) [56] to build an automatic multi-labelling framework able to help assist the radiologist in giving a full report and more accurate diagnoses to his patients. This is a follow-up and improvement of our last work [21], which extends the image classification from a single-label to a multi-label problem. In this work, we compare the performance of the CNN while using different initialization and optimization procedures. When fine-tuning we propose to use the new strategy we previously presented as a preliminary proposal in [57], but this work goes in much more depth to give detailed explanations and extensive experimentations that underline the superiority of the proposed approach. The method is a new training procedure for fine-tuning when using transfer learning, the idea behind it is that when fine-tuning we don't want all the weights to change in the same manner, we want some of the layers to be more or less receptive to change, depending on their nature. Accordingly, the proposed fine-tuning strategy optimizes the model using SGD momentum with an exponentially decaying learning rate to customize all the pre-trained weights and make them more suited to our type of data. The per-layer decaying learning rate helps control the rate at which weights change in each part of the network i.e. the change will be small to non-existent as we go backwards in the network towards the first layers. The results obtained from using this approach show robustness and efficiency when predicting labels for new breast lesions, whether trained on a small or a slightly larger dataset. We also adopt a final decision labelling mechanism adapted to this task and we evaluate the proposed approach on four benchmark datasets and using many evaluation criteria.

The remainder of this paper is organized as follows: Section 2 formulates the problem and describes the methodology proposed to solve the issue at hand. In Section 3 we provide details of the experimentations lead to evaluate the proposed approach, and we give the results along with a discussion to analyze them in Section 4. Finally Section 5 concludes the paper.

Section snippets

Problem formulation

Let D={(xi,yi)}i=1N be our dataset, the task is a multi-label classification problem, where the input is a Region-Of-Interest (ROI) image xRd with d=rxrand the output is the set of labels yY={0,1}L.

Each instance ROI is associated with multiple class labels L={1,,L}, in this case (L=8) and the predefined set of class labels is {1: BI-RADS I, 2: BI-RADS II, 3: BI-RADS III, 4: BI-RADS IV, 5: Mass, 6: Calcification, 7: Benign, 8: Malignant}.

For an image x the target is the label set y=[y1,,yL]Y

Data

We used four publicly available datasets, CBIS-DDSM [76], BCDR [77], INBreast[78] and MIAS[79], and adopted the same pre-processing steps described in [21] for cropping, normalizing and augmenting the images. First we cropped fixed sized regions of interest using the ground truth provided with the datasets and resized them to 224 × 224. Then, we used global contrast normalization where every image was normalized by subtracting the mean and dividing by the standard deviation of its elements.

Results and discussion

Table 1 compares the performance of the 4 models on the CBIS-DDSM [76], BCDR [77], INBreast [78] and the MIAS [79] datasets. We report results with respect to the metrics introduced above. The results indicate that the proposed fine-tuning strategy achieves a substantial improvement over existing methods, in terms of all of the used metrics and over all the datasets.

When comparing the models’ performance on the different datasets, we can see that the under-achieving model is either VGG-Sc or

Conclusions

In this paper, we propose a computer-aided diagnosis system that aims to make use of all the annotations in a mammography image which are normally neglected in the diagnostic building process of other proposed CADs, and this in order to build a CAD system capable of providing a full diagnosis.

The CAD is a multi-label image classification system that aims to capture spontaneous label correlation relationships while taking advantage of an end-to-end image representation learning architecture. The

Acknowledgment

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

The BCDR database used in this work was a courtesy of MA Guevara Lopez and coauthors, Breast Cancer Digital Repository Consortium.

The INbreast database used in this work was a courtesy of the Breast Research Group, INESC Porto, Portugal.

Declarations of interest

None.

Hiba Chougrad received her Master of Engineering degree in Computer Science from the National School of Applied Sciences of Safi, Morocco in 2013. Currently, she is a Ph.D. candidate in the department of mathematics and computer science at Chouaib Doukkali University. Her current research interests include computer vision, machine learning, deep learning & AI for healthcare and medical research.

References (88)

  • G. Madjarov et al.

    An extensive experimental comparison of methods for multi-label learning

    Pattern Recognit.

    (2012)
  • M.R. Boutell et al.

    Learning multi-label scene classification

    Pattern Recognit.

    (2004)
  • E.A. Tanaka et al.

    A multi-label approach using binary relevance and decision trees applied to functional genomics

    J. Biomed. Inform.

    (2015)
  • LiuS.M. et al.

    A multi-label classification based approach for sentiment classification

    Expert Syst. Appl.

    (2015)
  • C. Bielza et al.

    Multi-dimensional classification with Bayesian networks

    Int. J. Approx. Reason.

    (2011)
  • S.M. Astley et al.

    Computer-aided detection in mammography

    Clin. Radiol.

    (2004)
  • R.M. Nishikawa

    Current status and future directions of computer-aided diagnosis in mammography

    Comput. Med. Imaging Graph.

    (2007)
  • ZhengB. et al.

    Computer-aided detection: the effect of training databases on detection of subtle breast masses

    Acad. Radiol.

    (2010)
  • T. Kooi et al.

    Large scale deep learning for computer aided detection of mammographic lesions

    Med. Image Anal.

    (2017)
  • QianN.

    On the momentum term in gradient descent learning algorithms

    Neural Netw.

    (1999)
  • I.C. Moreira et al.

    INbreast: toward a full-field digital mammographic database

    Acad. Radiol.

    (2012)
  • I.C. Moreira et al.

    INbreast

    Acad. Radiol.

    (2012)
  • I. Triguero et al.

    Labelling strategies for hierarchical multi-label classification techniques

    Pattern Recognit.

    (2016)
  • R.L. Siegel et al.

    Cancer statistics, 2018

    CA Cancer J. Clin.

    (2018)
  • S.H. Heywang-Köbrunner et al.

    Advantages and disadvantages of mammography screening

    Breast Care

    (2011)
  • WhangJ.S. et al.

    The causes of medical malpractice suits against radiologists in the United States

    Radiology

    (2013)
  • E.A. Sickles

    Periodic mammographic follow-up of probably benign lesions: results in 3184 consecutive cases

    Radiology

    (1991)
  • N.T. Brewer et al.

    Systematic review: the long-term effects of false-positive mammograms

    Ann. Intern. Med.

    (2007)
  • J.A. Baker et al.

    Breast imaging reporting and data system standardized mammography lexicon: observer variability in lesion description

    Am. J. Roentgenol.

    (1996)
  • C.J. D'Orsi

    The American College of Radiology mammography lexicon: an initial attempt to standardize terminology

    Am. J. Roentgenol.

    (1996)
  • K.H. Allison et al.

    Trends in breast tissue sampling and pathology diagnoses among women undergoing mammography in the U.S.: a report from the breast cancer surveillance consortium

    Cancer

    (2015)
  • W.A. Berg et al.

    Breast imaging reporting and data system

    Am. J. Roentgenol.

    (2000)
  • American College of Radiology

    ACR BI-RADS®-Mammography: Breast Imaging Reporting and Data System

    (in 2003)
  • B.L. Sprague et al.

    Prevalence of mammographically dense breasts in the United States

    J. Natl. Cancer Inst.

    (2014)
  • G. Torres-Mejía et al.

    Mammographic features and subsequent risk of breast cancer: a comparison of qualitative and quantitative evaluations in the Guernsey prospective studies

    Cancer Epidemiol. Biomark. Prev. Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol.

    (2005)
  • C.M. Vachon et al.

    Mammographic density, breast cancer risk and risk prediction

    Breast Cancer Res. BCR.

    (2007)
  • A.N. Karahaliou et al.

    Breast cancer diagnosis: analyzing texture of tissue surrounding microcalcifications

    IEEE Trans. Inf. Technol. Biomed.

    (2008)
  • I.I. Andreadis et al.

    A CAD\bf xScheme for mammography empowered with topological information from clustered microcalcifications’ Atlases

    IEEE J. Biomed. Health Inform.

    (2015)
  • WangJ. et al.

    Discrimination of breast cancer with microcalcifications on mammography by deep learning

    Sci. Rep.

    (2016)
  • JiangM. et al.

    Computer-aided diagnosis of mammographic masses using vocabulary tree-based image retrieval

  • R.R. Winkel et al.

    Mammographic density and structural features can individually and jointly contribute to breast cancer risk assessment in mammography screening: a case–control study

    BMC Cancer

    (2016)
  • K. Bovis et al.

    Classification of mammographic breast density using a combined classifier paradigm

  • A. Gastounioti et al.

    Beyond breast density: a review on the advancing role of parenchymal texture analysis in breast cancer risk assessment

    Breast Cancer Res.

    (2016)
  • M.-L. Zhang et al.

    A review on multi-label learning algorithms

    IEEE Trans. Knowl. Data Eng.

    (2014)
  • Cited by (0)

    Hiba Chougrad received her Master of Engineering degree in Computer Science from the National School of Applied Sciences of Safi, Morocco in 2013. Currently, she is a Ph.D. candidate in the department of mathematics and computer science at Chouaib Doukkali University. Her current research interests include computer vision, machine learning, deep learning & AI for healthcare and medical research.

    Hamid Zouaki received his Ph.D. In Mathematics from the institute of Applied Mathematics of Grenoble, France, 1991. He is currently Professor at the department of Mathematics of the University Chouaib Doukkali, El Jadida (Morocco). His research interests are representation tools for shape description, numerical optimization, image analysis, CBIR and machine learning.

    Omar Alehyane received his Ph.D. in Mathematics from the University Paul Sabatier of Toulouse, France, 1997. He is currently Professor at the department of Mathematics of the University Chouaib Doukkali, El Jadida (Morocco). His research interests are complex geometry, geodesic distances and hyperbolic geometry.

    View full text