Explainable skin lesion diagnosis using taxonomies

doi:10.1016/j.patcog.2020.107413

Pattern Recognition

Volume 110, February 2021, 107413

https://doi.org/10.1016/j.patcog.2020.107413 Get rights and content

Highlights

•
A hierarchical deep neural network is designed to diagnose skin cancer.
•
The proposed hierarchy is inspired by medical knowledge and improves the diagnostic performance.
•
The model relies on channel and spatial attention methods to guide the network towards relevant features and regions.
•
The use of attention significantly improves the explainability of the model.

Abstract

Deep neural networks have rapidly become an indispensable tool in many classification applications. However, the inclusion of deep learning methods in medical diagnostic systems has come at the cost of diminishing their explainability. This significantly reduces the safety of a diagnostic system, since the physician is unable to interpret and validate the output. Therefore, in this work we aim to address this major limitation and improve the explainability of a skin cancer diagnostic system. We propose to leverage two sources of information: (i) medical knowledge, in particular the taxonomic organization of skin lesions, which will be used to develop a hierarchical neural network; and (ii) recent advances in channel and spatial attention modules, which can identify interpretable features and regions in dermoscopy images. We demonstrate that the proposed approach achieves competitive results in two dermoscopy data sets (ISIC 2017 and 2018) and provides insightful information about its decisions, thus increasing the safety of the model.

Introduction

Skin cancer is one of the most common types of cancer, and one of the few whose incidence rates have been steadily increasing [1]. Thus, it is crucial to improve the diagnostic accuracy, as well as the rates of early diagnosis. Two lines of work are being pursued to address this health problem: (i) investment in newer and better imaging techniques, such as confocal microscopy and spectral imaging; and (ii) development of computer aided diagnostic systems (CADS) for the automatic analysis of dermoscopy images. In particular, the latter has seen an impressive growth in the past years [2], [3], mainly due to the public release of increasingly larger data sets [4]. Another changing factor was the increase in computational power, thanks to more powerful graphical processing units (GPUs) that accelerated the development of methods based on convolutional neural networks (CNNs). These networks are able to achieve (near) human expert diagnostic performances [5], [6], and are trained in an end-to-end fashion, eliminating the need for hand-crafted features [7].

The features learned by CNN models are optimal, in the sense that they are optimized to give the best classification performance. However, they are not easy to interpret, especially by non-experts, and the user is left without much information to understand the output of a CNN. In safety-critical medical applications, such as the one addressed in this paper, it is crucial for CADS to provide explainable outputs to physicians. Otherwise, an incorrect diagnosis may be rendered, incurring in high costs for both the patient and the practitioner. Our work aims to address this issue through the design of an explainable CADS.

Various approaches have been proposed by the machine learning community to improve the explainability of a CNN, most of them focused on inspecting the features learned by the model. Two popular strategies are class activation maps (CAMs [8] or Grad-CAMs [9]), which highlight the image regions that contribute the most to an output, and attention modules [10] that are trained to guide the CNN towards the most discriminative features. It is also possible to inspect each filter learned by the CNN [11], [12]. Most visualization methods are applied only during the inference phase and after the network is fully trained. On the other hand, there are methods try to simultaneously improve the explainability of the CNN and its performance. In this case, the network is trained to jointly perform a set of related tasks. These multi-task networks learn better features that capture common and discriminative properties [13], [14].

In this work we propose to combine multi-task CNNs with visualization methods to develop an explainable CADS for skin cancer diagnosis. Towards this goal we will take into account a property of skin lesions that remains relatively unexplored in the literature: their inherent hierarchical structure. Lesions are progressively organized by dermatologists into various classes, according to their origin (melanocytic or non-melanocytic) and degree of malignancy (malignant or benign), until a differential diagnosis is reached (see Fig. 1). In order to determine these sequential classes, dermatologists screen the lesions for the presence of localized dermoscopic criteria [15]. Various dermoscopic criteria, such as streaks or blood vessels, are highly correlated with the origin of the lesion (e.g., melanocytic for the streaks and non-melanocytic for blood vessels), while a more detailed assessment of the structures makes it possible for dermatologists to perform a differential diagnosis based on the following medical facts: (i) irregular streaks are a sign of melanoma, but regular ones are a hallmark of the reed and spitz nevi; (ii) arborizing vessels are associated with basal cell carcinomas, while the hairpin ones are more common in seborrheic/benign keratosis.

Expert dermatologists are able to achieve better diagnosis by understanding the aforementioned similarities and differences between the various lesions. Thus, it is expected that CADS would also benefit from this knowledge. In this work, we will develop a deep learning based CADS that makes hierarchical decisions about the lesion (multi-task) at the following levels: origin (melanocytic/non-melanocytic), degree of malignancy (benign/malignant), and differential diagnosis (e.g., melanoma, basal cell carcinoma, benign keratosis), where each decision is conditioned on the previous one. To mimic the localized analysis and improve the explainability of the model, we will take advantage of attention modules. Attention will guide the model towards the most discriminative regions and features of the lesion, for each of the decision levels.

Our work demonstrates the advantages of combining a multi-task CNN with attention modules. First, we prove that an explainable hierarchical model can be efficiently trained without the need to add external data, even with a small training set (2000 images), and generalizes well to new images. The model achieves competitive diagnostic results on public data sets, especially when compared with more complex methods based on ensembles of CNNs. Second, the visualization of the attention modules allows an easy interpretation of the correct and incorrect diagnosis, increasing the safety of the model. Finally, the importance of the attention module is further supported by our robustness experiments, where we visualize the impact of various image transformations. We believe that our work is a relevant contribution towards the design of more efficient, robust, and safe deep learning models.

Section snippets

Related work

In recent years, the field of dermoscopy image analysis has profoundly changed by the adoption of deep learning methods. CNNs have been shown to achieve accuracies very similar to those of dermatologists on the diagnosis of multiple types of skin lesions [5], [6], while in prior works the focus was mainly on the differentiation between melanoma and nevi. These studies demonstrate the ability of CNNs to learn discriminative lesion representations.

The process by which a convolutional neural

Proposed system

This work proposes a new CADS for skin lesions with the following properties: (i) it mimics the hierarchical decisions made by dermatologists (recall Fig. 1), thus medical knowledge is incorporated in the design of the network; and (ii) it is explainable, since it provides visual information regarding the most relevant regions and features in each step of the diagnosis.

Hierarchical classification may be seen as a problem of finding a set of sequential class labels ( $C = {C_{1}, \dots, C_{T}}$ ) that better

Hierarchical diagnosis model

The proposed hierarchical diagnosis model is formed by three main blocks, as shown in Fig. 2: (i) image encoder, which extracts image features; (ii) image decoder, which performs the hierarchical classification; and iii) attention module that guides the model towards the most discriminative features and regions according to the previous output of the LSTM. The following subsections are organized according to these blocks.

Data set and experiments

We developed our model using the ISIC 2017 and 2018 dermoscopy data sets [4], [40]. The first set comprises 2750 images divided into training (2000), validation (150), and test (600) sets. These images contain examples of the following classes of lesions: melanocytic (melanoma and nevi) and non-melanocytic (seborrheic keratosis). The second set is larger and more complex, containing 11,527 examples of the following lesions: melanocytic (melanoma and nevi) and non-melanocytic (basal cell

Ablation studies on ISIC 2017

In this section, we report the results for the several ablation studies described in Section 5.1. We will report the results of the studies as follows. First, we show the results for the analysis of the loss function for all image encoders, but using only CI1 (3) as the class inference method, since the performance for CI2 (4) was similar. We then evaluate the performance after the incorporation of the channel attention module, using all of the image encoders. We also use this scenario to

Conclusions

This paper proposes a diagnostic model for dermoscopy images that: (i) uses a multi-task network to perform a hierarchical diagnosis of skin lesions; and (ii) provides visual information to explain the diagnosis. By leveraging these two factors, we achieved competitive results on two state-of-the-art dermoscopy data sets (ISIC 2017 and 2018), without the need to augment the training data with external or artificially generated data and without using CNN ensembles. The experimental results show

Acknowledgments

This work was supported by the FCT project and multi-year funding: [CEECIND/ 00326/2017], [PTDC/ EEIPRO/0426/2014], LARSyS - FCT Plurianual funding 2020–2023.

The Titan Xp used in this project was donated by the NVIDIA Corporation.

Catarina Barata received B.Sc. and M.Sc. degrees in Biomedical Engineering, and Ph.D. degree in Electrical and Computer Engineering from Instituto Superior Técnico, University of Lisbon, in 2009, 2011, and 2017 respectively. Currently, she is an Invited Assistant Professor at Instituto Superior Técnico, and a Research assistant at Institute for Systems and Robotics (ISR). Her research interests include the development of interpretable classification methods with applications in medical image

References (47)

G. Litjens
A survey on deep learning in medical image analysis
Med. Image Anal.
(2017)
J. Gu
Recent advances in convolutional neural networks
Pattern Recognit.
(2018)
N. Sarafianos
Curriculum learning of visual attribute clusters for multi-task classification
Pattern Recognit.
(2018)
J. Zhang
Learning multi-layer coarse-to-fine representations for large-scale image classification
Pattern Recognit.
(2019)
D. Roy
Tree-CNN: a hierarchical deep convolutional neural network for incremental learning
Neural Netw.
(2020)
P. Kinghorn
A hierarchical and regional deep learning architecture for image description generation
Pattern Recognit. Lett.
(2019)
S. Bai et al.
A survey on automatic image caption generation
Neurocomputing
(2018)
B. Harangi
Skin lesion classification with ensembles of deep convolutional neural networks
Elsevier JBI
(2018)
A. Mahbod
Fusing fine-tuned deep features for skin lesion classification
CMIG
(2019)
R.L. Siegel
Cancer statistics, 2019
CA Cancer J. Clin.
(2019)

M.E. Celebi

Dermoscopy image analysis: overview and future directions

IEEE JBHI

(2019)

C. Barata

A survey of feature extraction in dermoscopy image analysis of skin cancer

IEEE JBHI

(2019)

P. Tschandl

The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Sci. Data

(2018)

A. Esteva

Dermatologist-level classification of skin cancer with deep neural networks

Nature

(2017)

Y. Fujisawa

Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis

Br. J. Dermatol.

(2019)

B. Zhou

Learning deep features for discriminative localization

Computer Vision and Pattern Recognition

(2016)

R.R. Selvaraju

Grad-CAM: visual explanations from deep networks via gradient-based localization

International Conference on Computer Vision

(2017)

B. Zhou

Interpreting deep visual representations via network dissection

IEEE Trans. Pattern Anal. Mach. Intell.

(2018)

I. Rafegas

Understanding trained CNNs by indexing neuron selectivity

Pattern Recognit. Lett.

(2019)

G. Argenziano

Interactive Atlas of Dermoscopy

(2000)

J. Yang

Classification for dermoscopy images using convolutional neural networks based on region average pooling

IEEE Access

(2018)

J. Zhang

Attention residual learning for skin lesion classification

IEEE TMI

(2019)

P. Simoens et al.

Visualizing convolutional neural networks to improve decision support for skin lesion classification

MLCN 2018, DLF 2018, and iMIMIC 2018

(2018)

Cited by (0)

M. Emre Celebi received the B.Sc. degree in computer engineering from the Middle East Technical University, Ankara, Turkey, in 2002, and the M.Sc. and Ph.D. degrees in computer science and engineering from The University of Texas at Arlington, Arlington, TX, USA, in 2003 and 2006, respectively. He is currently Professor and Chair of the Department of Computer Science, University of Central Arkansas, Arkansas, USA. He has pursued research in the field of image processing and analysis.

Jorge S. Marques received the E.E., Ph.D., and Aggregation degrees from the Technical University of Lisbon, Lisbon, Portugal, in 1981, 1990, and 2002, respectively. He is currently a Full Professor with the Electrical and Computer Engineering Department, Instituto Superior Técnico, Lisbon, and a Researcher at the Institute for Systems and Robotics. His research interests include the areas of image processing and pattern recognition. Dr. Marques was Co-chairman of the IAPR Conference IbPRIA 2005, and President of the Portuguese Association for Pattern Recognition (20012003).

View full text

Explainable skin lesion diagnosis using taxonomies

Highlights

Abstract

Introduction

Section snippets

Related work

Proposed system

Hierarchical diagnosis model

Data set and experiments

Ablation studies on ISIC 2017

Conclusions

Acknowledgments

Med. Image Anal.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Neural Netw.

Pattern Recognit. Lett.

Neurocomputing

Elsevier JBI

CMIG

Cancer statistics, 2019

CA Cancer J. Clin.

Dermoscopy image analysis: overview and future directions

IEEE JBHI

A survey of feature extraction in dermoscopy image analysis of skin cancer

IEEE JBHI

The ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Sci. Data

Dermatologist-level classification of skin cancer with deep neural networks

Nature

Deep-learning-based, computer-aided classifier developed with a small dataset of clinical images surpasses board-certified dermatologists in skin tumour diagnosis

Br. J. Dermatol.

Learning deep features for discriminative localization

Computer Vision and Pattern Recognition

Grad-CAM: visual explanations from deep networks via gradient-based localization

International Conference on Computer Vision

Interpreting deep visual representations via network dissection

IEEE Trans. Pattern Anal. Mach. Intell.

Understanding trained CNNs by indexing neuron selectivity

Pattern Recognit. Lett.

Interactive Atlas of Dermoscopy

Classification for dermoscopy images using convolutional neural networks based on region average pooling

IEEE Access

Attention residual learning for skin lesion classification

IEEE TMI

Visualizing convolutional neural networks to improve decision support for skin lesion classification

MLCN 2018, DLF 2018, and iMIMIC 2018