Explainable skin lesion diagnosis using taxonomies
Introduction
Skin cancer is one of the most common types of cancer, and one of the few whose incidence rates have been steadily increasing [1]. Thus, it is crucial to improve the diagnostic accuracy, as well as the rates of early diagnosis. Two lines of work are being pursued to address this health problem: (i) investment in newer and better imaging techniques, such as confocal microscopy and spectral imaging; and (ii) development of computer aided diagnostic systems (CADS) for the automatic analysis of dermoscopy images. In particular, the latter has seen an impressive growth in the past years [2], [3], mainly due to the public release of increasingly larger data sets [4]. Another changing factor was the increase in computational power, thanks to more powerful graphical processing units (GPUs) that accelerated the development of methods based on convolutional neural networks (CNNs). These networks are able to achieve (near) human expert diagnostic performances [5], [6], and are trained in an end-to-end fashion, eliminating the need for hand-crafted features [7].
The features learned by CNN models are optimal, in the sense that they are optimized to give the best classification performance. However, they are not easy to interpret, especially by non-experts, and the user is left without much information to understand the output of a CNN. In safety-critical medical applications, such as the one addressed in this paper, it is crucial for CADS to provide explainable outputs to physicians. Otherwise, an incorrect diagnosis may be rendered, incurring in high costs for both the patient and the practitioner. Our work aims to address this issue through the design of an explainable CADS.
Various approaches have been proposed by the machine learning community to improve the explainability of a CNN, most of them focused on inspecting the features learned by the model. Two popular strategies are class activation maps (CAMs [8] or Grad-CAMs [9]), which highlight the image regions that contribute the most to an output, and attention modules [10] that are trained to guide the CNN towards the most discriminative features. It is also possible to inspect each filter learned by the CNN [11], [12]. Most visualization methods are applied only during the inference phase and after the network is fully trained. On the other hand, there are methods try to simultaneously improve the explainability of the CNN and its performance. In this case, the network is trained to jointly perform a set of related tasks. These multi-task networks learn better features that capture common and discriminative properties [13], [14].
In this work we propose to combine multi-task CNNs with visualization methods to develop an explainable CADS for skin cancer diagnosis. Towards this goal we will take into account a property of skin lesions that remains relatively unexplored in the literature: their inherent hierarchical structure. Lesions are progressively organized by dermatologists into various classes, according to their origin (melanocytic or non-melanocytic) and degree of malignancy (malignant or benign), until a differential diagnosis is reached (see Fig. 1). In order to determine these sequential classes, dermatologists screen the lesions for the presence of localized dermoscopic criteria [15]. Various dermoscopic criteria, such as streaks or blood vessels, are highly correlated with the origin of the lesion (e.g., melanocytic for the streaks and non-melanocytic for blood vessels), while a more detailed assessment of the structures makes it possible for dermatologists to perform a differential diagnosis based on the following medical facts: (i) irregular streaks are a sign of melanoma, but regular ones are a hallmark of the reed and spitz nevi; (ii) arborizing vessels are associated with basal cell carcinomas, while the hairpin ones are more common in seborrheic/benign keratosis.
Expert dermatologists are able to achieve better diagnosis by understanding the aforementioned similarities and differences between the various lesions. Thus, it is expected that CADS would also benefit from this knowledge. In this work, we will develop a deep learning based CADS that makes hierarchical decisions about the lesion (multi-task) at the following levels: origin (melanocytic/non-melanocytic), degree of malignancy (benign/malignant), and differential diagnosis (e.g., melanoma, basal cell carcinoma, benign keratosis), where each decision is conditioned on the previous one. To mimic the localized analysis and improve the explainability of the model, we will take advantage of attention modules. Attention will guide the model towards the most discriminative regions and features of the lesion, for each of the decision levels.
Our work demonstrates the advantages of combining a multi-task CNN with attention modules. First, we prove that an explainable hierarchical model can be efficiently trained without the need to add external data, even with a small training set (2000 images), and generalizes well to new images. The model achieves competitive diagnostic results on public data sets, especially when compared with more complex methods based on ensembles of CNNs. Second, the visualization of the attention modules allows an easy interpretation of the correct and incorrect diagnosis, increasing the safety of the model. Finally, the importance of the attention module is further supported by our robustness experiments, where we visualize the impact of various image transformations. We believe that our work is a relevant contribution towards the design of more efficient, robust, and safe deep learning models.
Section snippets
Related work
In recent years, the field of dermoscopy image analysis has profoundly changed by the adoption of deep learning methods. CNNs have been shown to achieve accuracies very similar to those of dermatologists on the diagnosis of multiple types of skin lesions [5], [6], while in prior works the focus was mainly on the differentiation between melanoma and nevi. These studies demonstrate the ability of CNNs to learn discriminative lesion representations.
The process by which a convolutional neural
Proposed system
This work proposes a new CADS for skin lesions with the following properties: (i) it mimics the hierarchical decisions made by dermatologists (recall Fig. 1), thus medical knowledge is incorporated in the design of the network; and (ii) it is explainable, since it provides visual information regarding the most relevant regions and features in each step of the diagnosis.
Hierarchical classification may be seen as a problem of finding a set of sequential class labels () that better
Hierarchical diagnosis model
The proposed hierarchical diagnosis model is formed by three main blocks, as shown in Fig. 2: (i) image encoder, which extracts image features; (ii) image decoder, which performs the hierarchical classification; and iii) attention module that guides the model towards the most discriminative features and regions according to the previous output of the LSTM. The following subsections are organized according to these blocks.
Data set and experiments
We developed our model using the ISIC 2017 and 2018 dermoscopy data sets [4], [40]. The first set comprises 2750 images divided into training (2000), validation (150), and test (600) sets. These images contain examples of the following classes of lesions: melanocytic (melanoma and nevi) and non-melanocytic (seborrheic keratosis). The second set is larger and more complex, containing 11,527 examples of the following lesions: melanocytic (melanoma and nevi) and non-melanocytic (basal cell
Ablation studies on ISIC 2017
In this section, we report the results for the several ablation studies described in Section 5.1. We will report the results of the studies as follows. First, we show the results for the analysis of the loss function for all image encoders, but using only CI1 (3) as the class inference method, since the performance for CI2 (4) was similar. We then evaluate the performance after the incorporation of the channel attention module, using all of the image encoders. We also use this scenario to
Conclusions
This paper proposes a diagnostic model for dermoscopy images that: (i) uses a multi-task network to perform a hierarchical diagnosis of skin lesions; and (ii) provides visual information to explain the diagnosis. By leveraging these two factors, we achieved competitive results on two state-of-the-art dermoscopy data sets (ISIC 2017 and 2018), without the need to augment the training data with external or artificially generated data and without using CNN ensembles. The experimental results show
Acknowledgments
This work was supported by the FCT project and multi-year funding: [CEECIND/ 00326/2017], [PTDC/ EEIPRO/0426/2014], LARSyS - FCT Plurianual funding 2020–2023.
The Titan Xp used in this project was donated by the NVIDIA Corporation.
Catarina Barata received B.Sc. and M.Sc. degrees in Biomedical Engineering, and Ph.D. degree in Electrical and Computer Engineering from Instituto Superior Técnico, University of Lisbon, in 2009, 2011, and 2017 respectively. Currently, she is an Invited Assistant Professor at Instituto Superior Técnico, and a Research assistant at Institute for Systems and Robotics (ISR). Her research interests include the development of interpretable classification methods with applications in medical image
References (47)
A survey on deep learning in medical image analysis
Med. Image Anal.
(2017)Recent advances in convolutional neural networks
Pattern Recognit.
(2018)Curriculum learning of visual attribute clusters for multi-task classification
Pattern Recognit.
(2018)Learning multi-layer coarse-to-fine representations for large-scale image classification
Pattern Recognit.
(2019)Tree-CNN: a hierarchical deep convolutional neural network for incremental learning
Neural Netw.
(2020)A hierarchical and regional deep learning architecture for image description generation
Pattern Recognit. Lett.
(2019)- et al.
A survey on automatic image caption generation
Neurocomputing
(2018) Skin lesion classification with ensembles of deep convolutional neural networks
Elsevier JBI
(2018)Fusing fine-tuned deep features for skin lesion classification
CMIG
(2019)Cancer statistics, 2019
CA Cancer J. Clin.
(2019)
Dermoscopy image analysis: overview and future directions
IEEE JBHI
A survey of feature extraction in dermoscopy image analysis of skin cancer
IEEE JBHI
The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
Sci. Data
Dermatologist-level classification of skin cancer with deep neural networks
Nature
Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis
Br. J. Dermatol.
Learning deep features for discriminative localization
Computer Vision and Pattern Recognition
Grad-CAM: visual explanations from deep networks via gradient-based localization
International Conference on Computer Vision
Interpreting deep visual representations via network dissection
IEEE Trans. Pattern Anal. Mach. Intell.
Understanding trained CNNs by indexing neuron selectivity
Pattern Recognit. Lett.
Interactive Atlas of Dermoscopy
Classification for dermoscopy images using convolutional neural networks based on region average pooling
IEEE Access
Attention residual learning for skin lesion classification
IEEE TMI
Visualizing convolutional neural networks to improve decision support for skin lesion classification
MLCN 2018, DLF 2018, and iMIMIC 2018
Cited by (0)
Catarina Barata received B.Sc. and M.Sc. degrees in Biomedical Engineering, and Ph.D. degree in Electrical and Computer Engineering from Instituto Superior Técnico, University of Lisbon, in 2009, 2011, and 2017 respectively. Currently, she is an Invited Assistant Professor at Instituto Superior Técnico, and a Research assistant at Institute for Systems and Robotics (ISR). Her research interests include the development of interpretable classification methods with applications in medical image analysis and surveillance.
M. Emre Celebi received the B.Sc. degree in computer engineering from the Middle East Technical University, Ankara, Turkey, in 2002, and the M.Sc. and Ph.D. degrees in computer science and engineering from The University of Texas at Arlington, Arlington, TX, USA, in 2003 and 2006, respectively. He is currently Professor and Chair of the Department of Computer Science, University of Central Arkansas, Arkansas, USA. He has pursued research in the field of image processing and analysis.
Jorge S. Marques received the E.E., Ph.D., and Aggregation degrees from the Technical University of Lisbon, Lisbon, Portugal, in 1981, 1990, and 2002, respectively. He is currently a Full Professor with the Electrical and Computer Engineering Department, Instituto Superior Técnico, Lisbon, and a Researcher at the Institute for Systems and Robotics. His research interests include the areas of image processing and pattern recognition. Dr. Marques was Co-chairman of the IAPR Conference IbPRIA 2005, and President of the Portuguese Association for Pattern Recognition (20012003).