Elsevier

Medical Image Analysis

Volume 76, February 2022, 102334
Medical Image Analysis

Hyper-fusion network for semi-automatic segmentation of skin lesions

https://doi.org/10.1016/j.media.2021.102334Get rights and content

Highlights

  • We propose to use deep learning with few user-clicks to achieve accurate skin lesion segmentation results.

  • We propose to leverage user-inputs in optimizing the learning of skin lesion characteristics.

  • We propose hyper-integration modules (HIMs) to iteratively propagate user-input features and skin lesion image features to ensure the appearance of the segmented skin lesions is spatially consistent.

  • Our method is capable to segment skin lesions that are known to be challenging, such as those with fuzzy boundaries, inhomogeneous textures and low-contrast to the background.

  • We had consistently better segmentation results on three well-established public datasets (ISBI 2017, ISBI 2016 skin lesion challenge datasets and PH2 dataset).

Abstract

Segmentation of skin lesions is an important step for imaging-based clinical decision support systems. Automatic skin lesion segmentation methods based on fully convolutional networks (FCNs) are regarded as the state-of-the-art for accuracy. When there are, however, insufficient training data to cover all the variations in skin lesions, where lesions from different patients may have major differences in size/shape/texture, these methods failed to segment the lesions that have image characteristics, which are less common in the training datasets. FCN-based semi-automatic segmentation methods, which fuse user-inputs with high-level semantic image features derived from FCNs offer an ideal complement to overcome limitations of automatic segmentation methods. These semi-automatic methods rely on the automated state-of-the-art FCNs coupled with user-inputs for refinements, and therefore being able to tackle challenging skin lesions. However, there are a limited number of FCN-based semi-automatic segmentation methods and all these methods focused on ‘early-fusion’, where the first few convolutional layers are used to fuse image features and user-inputs and then derive fused image features for segmentation. For early-fusion based methods, because the user-input information can be lost after the first few convolutional layers, consequently, the user-input information will have limited guidance and constraint in segmenting the challenging skin lesions with inhomogeneous textures and fuzzy boundaries. Hence, in this work, we introduce a hyper-fusion network (HFN) to fuse the extracted user-inputs and image features over multiple stages. We separately extract complementary features which then allows for an iterative use of user-inputs along all the fusion stages to refine the segmentation. We evaluated our HFN on three well-established public benchmark datasets – ISBI Skin Lesion Challenge 2017, 2016 and PH2 – and our results show that the HFN is more accurate and generalizable than the state-of-the-art methods, in particular with challenging skin lesions.

Introduction

Melanoma (also known as malignant melanoma) has one of the most rapidly increasing incidences in the world and has considerable mortality rate if left untreated (Rigel et al., 1996). Early diagnosis is particularly important because melanoma can be cured with early excision (Celebi et al., 2007; Capdehourat et al., 2011). Skin lesion images such as dermoscopy are commonly acquired as a non-invasive imaging technique for the in-vivo evaluation of pigmented skin lesions and play an important role in early diagnosis (Celebi et al., 2007). The identification of melanoma from skin lesion images using human vision alone, can be subjective, inaccurate and poorly reproducible, even amongst experienced dermatologists (Celebi et al., 2008; Abbas et al., 2013). This is attributed to the challenges in interpreting skin lesion images where there can be diverse visual characteristics such as variations in size, shape boundaries (e.g., ‘fuzzy’), artifacts and has hairs (Fig. 1) (Barata et al., 2015). Therefore, automated image analysis is a valuable aid for clinical decision support (CDS) systems and for the image-based diagnosis of skin lesions (Serrano and Acha 2009; Esteva et al., 2017). Skin lesion segmentation is the fundamental for these CDS systems and has motivated the development of numerous segmentation methods.

Traditional fully automatic segmentation methods mainly focus on extracting pixel-level or region-level features such as Gaussian (Wighton et al., 2011) and texture (He and Xie 2012) features and then use various classifiers, such as Wavelet network (Sadri et al., 2012) and Bayes classifier (Wighton et al., 2011), to separate the skin lesions from surrounding healthy skin. However, their performance depends heavily on correctly tuning a large number of parameters and effective pre- and post-processing techniques such as hair removal and corrections to illumination. These methods, without pre- and post-processing techniques, have difficulty in segmenting lesions when there are artifacts, hair or when the lesion reaches the boundary of the image.

Deep learning based fully automatic segmentation methods are regarded as the state-of-the-art in skin lesion segmentation, and most of these methods are based on fully convolutional networks (FCNs) (Shelhamer et al., 2016). The success of FCNs is primarily attributed to their use of encoders and decoders to derive an image feature representation that combines low-level appearance information with high-level semantic information. The encoders use convolutional layers and downsampling processes to extract high-level semantic features from images. The decoders then upsample the extracted image features to output the segmentation results. Therefore, FCNs can be trained in an end-to-end manner for efficient inference, i.e., images are taken as inputs and the segmentation results are outputted directly. Yuan et al. (Yuan et al., 2017) replaced the cross-entropy loss used in a traditional FCN with a Jaccard distance loss for skin lesion segmentation. Yu et al. (Lequan et al., 2017) increased the FCN network depth (number of layers) with a 50-layer deep residual network for the segmentation based on deeper image features. Bi et al. (Bi et al., 2017) proposed a class-specific learning to combine (ensemble) multiple trained FCNs (trained only with melanoma or non-melanoma images) for segmentation. More recently, Xie et al. (Xie et al., 2020) proposed learning skin lesion segmentation and classification (melanoma vs. non-melanoma) via a mutual bootstrapping network, where skin lesion classification results were used to guide and improve the segmentation results. However, all these FCN-based methods are reliant on large annotated training data that include all the possible variations in skin lesions, including differences between patients in lesion size, shape and texture. When there are, however, insufficient training data to cover all the variations in skin lesions, these methods failed to segment the lesions that have image characteristics, which are less common in the training datasets. Further, skin lesions from different datasets may have major differences in appearance e.g., illumination and field-of-view (as shown in Fig. 1). The end result is that these methods tend to overfit to one dataset and have limited generalizability to a different dataset.

FCN-based semi-automatic segmentation methods for medical images, which combine manual user-inputs (priori knowledge) with high-level semantic features derived from FCNs, offer an alternative approach to segment the skin lesions. Currently, there are few such methods. Wang et al. (G. Wang et al. 2018) proposed a semi-automatic medical image segmentation method with two FCNs: the first FCN automatically segmented the input image, and the second FCN repeated the segmentation but with the fusion of the input image, the segmentation result (from the first FCN) and the user-inputs. The regions that failed to be segmented by the first FCN were then refined by the second FCN. Lei et al. (Lei et al., 2019) replaced the FCNs in the approach reported by Wang et al. (G. Wang et al. 2018), with a lightweight network architecture to segment organs-at-risk structures from computed tomography (CT) images. Koohbanani et al. (Koohbanani et al., 2020) fused user-inputs with a multi-scale FCN for microscopy images. Sakinis et al. (Sakinis et al., 2019) fused user-inputs with a U-Net for organs segmentation in abdominal CT images. Wang et al. (G. Wang et al. 2018) proposed fine-tuning the individual test images with user-inputs (including scribbles and user-defined bounding boxes) to enclose the regions of interest. (Zhang et al. 2021) proposed a patch-based segmentation method where a user-defined centroid point was used to partition the medical image into small patches and the small patches were then segmented with a convolutional recurrent neural network (ConvRNN). For non-medical images, Majumder et al. (Majumder and Yao 2019) fused superpixel-based user-inputs with input images for natural image segmentation and the superpixel-based user-inputs were derived by calculating the Euclidean distance from the centroid of the superpixel to the user-clicks. All these FCN-based methods focused on early-fusion, where the medical images are fused with the user-inputs (both foreground and background inputs) as a single input prior to the FCN. The reliance on a single fused input means that the important user-input information could be lost after early-fusion, and so there will be limited priori knowledge that can be used by the FCN. In addition, the reliance on a user-defined centroid point is not always feasible. It is challenging to accurately place a centroid point for lesions with differing shapes and the centroid point may not always be within the lesion region. Further, fine-tuning individual test images requires additional computational time and manual input e.g., bounding boxes and scribbles, and this is challenging to implement for a large cohort study. Hu et al. (Hu et al., 2019) proposed a two-stream late fusion network for natural image segmentation, where the image and the user-inputs were separately processed by two FCN networks with fusion of the resultant features. The late fusion of extracted image features, however, tends to dismiss the correlations between the image and the user-inputs; the correlations may only accessible at the early stage of the network. In addition, when these methods are applied to skin lesion segmentation, they usually have difficulty in accurately delineating the boundary of the lesion and have inconsistent outcomes for the challenging skin lesions.

Our hyper-fusion network (HFN) shown in Fig. 2b, separately extracts features from user-inputs. Our fusion strategy provides the flexibility to learn complementary features between the lesion images and the user-input, and provide continuous guidance and constraint to the segmentation results. Our HFN adds the following contributions to the current knowledge: (i) separate extraction of features from skin lesion images and user-inputs; they will allow to continuous leverage of user-inputs to optimize learning of skin lesion characteristics and minimize the loss of user-input information during early-fusion. (ii) training and predicting segmentation results in multiple fusion stages. When compared to early-fusion based semi-automatic segmentation methods, multiple fusion stages have the advantage of using the user-inputs to iteratively refine the segmentation, which ensures better segmentation of skin lesion boundaries. (iii) the introduction of hyper-integration modules (HIMs) to fuse user-input features and skin lesion image features at individual fusion stages. HIMs help guide and constrain the learning of the lesion characteristics and then propagate the intermediary segmentation results to the next stage of the decoder. The fusion from individual stages ensures the appearance of the segmented skin lesions is spatially consistent.

Section snippets

Materials

We used three well-established public benchmark datasets to train and test the effectiveness of our method.

  • The 2017 and 2016 ISBI Skin Lesion Challenge (denoted as ISBI 2017 (Codella et al., 2017) and ISBI 2016 (Gutman et al., 2016)) datasets are a subset of the large International Skin Imaging Collaboration (ISIC) archive, which contains skin lesion images acquired on a variety of different devices at numerous leading international clinical centers. The ISBI 2017 challenge dataset provides

Experiment setup

We performed the following experiments on the three datasets: (a) comparison of the overall performance of our method with fully automated and semi-automated segmentation methods; (b) comparison of the results from (a) using different number of user-inputs; (c) analysis of the performance of each component in our proposed method; (d) analysis of the segmentation results on the challenging skin lesions; and (e) analysis of the segmentation results with noisy user-inputs. For experiments using

Segmentation results on ISBI 2017, ISBI 2016 and PH2 datasets

Table 1, Table 2 and Fig. 7 show that our HFN method achieved the best overall performance across all measurements on the ISBI 2017 dataset. When compared with the recently published fully automatic methods of MB and BiDFL, our method improved by a large margin of 3.3% and 2.23% in Jaccard measure (Table 1).

Table 3, Table 4 and Fig. 8 show that our HFN method outperformed all the current methods on the ISBI 2016 dataset. When compared to the current state-of-the-art method DAGAN, our method

Discussions

Our main findings are that: (i) our HFN with user-inputs consistently improved skin lesion segmentation, in particular, for the skin lesions, where the image characteristics are less common in the training datasets; (ii) compared to early-fusion methods, fusing separately extracted complementary features (user-inputs and image features) produced advantages in leveraging user-inputs that resulted in improved segmentation of challenging skin lesions; and (iii) HIMs ensured the appearance of the

Application to total-body 3-Dimensional (3D) photography

Total-body 3D photography, currently being implemented in the clinic, that constructs a digital 3D avatar of the patient that can be used to view and monitor skin lesions across the body over time. When compared to current manual dermoscopy and limited-access time-consuming 2D total body photography systems, total-body 3D photography brings new spatial and temporal capabilities and skin lesions at different sites of the body and at different times can be detected simultaneously. The Australian

Conclusions

In this paper, we proposed a method to segment skin lesions in a semi-automated manner. Our method used a deep hyper-fusion FCN to iteratively fuse, separately extracted user-input features, with skin lesion image features and to continuously leverage user-inputs to guide and constrain the learning of skin lesion characteristics. By learning and inferring user-inputs derived from few user-clicks, we achieved accurate segmentation results for skin lesions that are known to be challenging, such

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by Australia Research Council (ARC) grants (IC170100022 and DP200103748).

References (45)

  • A. Roy et al.

    JCLMM: a finite mixture model for clustering of circular-linear data and its application to psoriatic plaque segmentation

    Pattern Recognit.

    (2017)
  • C. Serrano et al.

    Pattern analysis of dermoscopic images based on Markov random fields

    Pattern Recognit.

    (2009)
  • C. Barata et al.

    Improving dermoscopy image classification using color constancy

    IEEE J. Biomed. Health Inform.

    (2015)
  • L. Bi et al.

    Dermoscopic Image Segmentation via Multi-Stage Fully Convolutional Networks

    IEEE Trans. Biomed. Eng.

    (2017)
  • B. Bozorgtabar et al.

    Sparse coding based skin lesion segmentation using dynamic rule-based refinement

  • L.-.C. Chen et al.

    Encoder-decoder with atrous separable convolution for semantic image segmentation

  • N.C. Codella et al.

    Skin lesion analysis toward melanoma detection

  • J. Dean et al.

    Large scale distributed deep networks

  • A. Esteva et al.

    Dermatologist-level classification of skin cancer with deep neural networks

    Nature

    (2017)
  • M. Goyal et al.

    Skin lesion segmentation in dermoscopic images with ensemble deep learning methods

    IEEE Access

    (2019)
  • D. Gutman et al.

    Skin lesion analysis toward melanoma detection:

  • K. He et al.

    Mask r-cnn

  • Cited by (0)

    View full text