DiSegNet: A deep dilated convolutional encoder-decoder architecture for lymph node segmentation on PET/CT images

https://doi.org/10.1016/j.compmedimag.2020.101851Get rights and content

Highlights

  • A multi-stage Atrous spatial pyramid pooling (MS-ASPP) sub-module that can be integrated into the SegNet architecture to improve the semantically accurate predictions and detailed segmentation along LN boundaries owing to the ability of multi-scale feature learning.

  • A simple but effective cosine-sine (CS) loss function as an objective function for training different networks to deal with the problem of class imbalance for LN segmentation. The CS loss function can focus on learning to detect the hard-to-classify (misclassified) voxels but down-weight the well-classified voxels at the same time.

  • Four-fold cross-validation is performed on 63 PET/CT data sets. The results show that we reach an average 77% Dice similarity coefficient score with CS loss function by trained DiSegNet.

Abstract

Purpose

Automated lymph node (LN) recognition and segmentation from cross-sectional medical images is an important step for the automated diagnostic assessment of patients with cancer. Yet, it is still a difficult task owing to the low contrast of LNs and surrounding soft tissues as well as due to the variation in nodal size and shape. In this paper, we present a novel LN segmentation method based on a newly designed neural network for positron emission tomography/computed tomography (PET/CT) images.

Methods

This work communicates two problems involved in LN segmentation task. Firstly, an efficient loss function named cosine-sine (CS) is proposed for the voxel class imbalance problem in the convolution network training process. Second, a multi-stage and multi-scale Atrous (Dilated) spatial pyramid pooling sub-module, named MS-ASPP, is introduced to the encoder-decoder architecture (SegNet), which aims to make use of multi-scale information to improve the performance of LN segmentation. The new architecture is named DiSegNet (Dilated SegNet).

Results

Four-fold cross-validation is performed on 63 PET/CT data sets. In each experiment, 10 data sets are selected randomly for testing and the other 53 for training. The results show that we reach an average 77 % Dice similarity coefficient score with CS loss function by trained DiSegNet, compared to a baseline method SegNet by cross-entropy (CE) with 71 % Dice similarity coefficient.

Conclusions

The performance of the proposed DiSegNet with CS loss function suggests its potential clinical value for disease quantification.

Introduction

Lymph node (LN) segmentation aims to assign a categorical label to every voxel in an image, which plays an important role in medical image analysis and disease quantification. Yet, manual detection and measurement/segmentation of LNs in images by human observers is time-consuming and error prone (Feulner et al., 2013). Moreover, LNs are difficult to recognize and segment owing to the low contrast between LNs and surrounding soft tissues as well as due to the variation in nodal size and shape.

This topic has received much attention in recent decades. Discriminative learning and a spatial prior probability have been used for LN detection and segmentation in chest computed tomography (CT) images (Feulner et al., 2013). The discriminative model was used to detect LNs from their appearance combined with the anatomical prior knowledge, and the graph cut method was used to segment LNs. In (Barbu et al., 2012a), the authors proposed a learning-based approach that used marginal space learning for LN segmentation in the axillary region. Pathological LNs were detected and segmented in (Hoogi et al., 2017). The method employed machine learning techniques such as marginal space learning, convolutional neural network, and active contour models for organ detection, LN detection, and LN segmentation. All of the methods above are based on the hand-crafted features or prior knowledge to recognize and delineate LNs.

Recently, fully convolutional networks (FCNs) (Long et al., 2015) were used for pixel-wise semantic segmentation tasks due to the development of deep learning methods, especially given the success of deep convolutional neural network (DCNN) models such as AlexNet (Krizhevsky et al., 2012), VGG (Simonyan and Zisserman, 2014), and ResNet(He et al., 2016). Many semantic segmentation architectures after FCNs appeared used the encoder and decoder structure, such as SegNet (Badrinarayanan et al., 2017) and U-Net (Ronneberger et al., 2015). However, there still exists a problem in the decoder part by recovering the low-resolution feature maps owing to the max-pooling operation. In (Chen et al., 2018), the dilated (or Atrous) convolution was proposed to overcome this problem, where Atrous-spatial-pyramid-pooling (ASPP) with various dilation rates was proposed to robustly segment objects at multiple scales. The DeepLab architecture has been designed by the integration of a ASPP sub-module in order to capture objects and context at multiple scales. Yet, it only uses one ASPP sub-module after the fifth max-pooling layer and did not exploit ASPP in different stages after the max-pooling operation that may miss some important features, especially from the small size of LNs. Moreover, it did not discuss the order that was employed for ASPP in the architecture, which is still an open question in terms of which layer/layers should use ASPP in the architecture of DCNN.

Although much success has been achieved in semantic segmentation, there are only a few works that employ semantic segmentation neural network for LN segmentation. In (Bouget et al., 2019), the authors combined U-Net and Mask R-CNN for segmentation and detection of mediastinal LNs. However, this method needs to train two DCNNs, which increased many repeat computations for feature learning because feature maps from U-Net can also be used in Mask R-CNN (Ren et al., 2017). In (Oda, 2018), the 3D U-Net was trained to segment LNs and other anatomical structures on contrast-enhanced chest CT volumes. Yet, it incurs much computation cost and memory consumption, especially GPU memory, given the 3D convolution operation.

There are two issues which are not addressed by the LN segmentation methods reviewed above. Firstly, none of the proposed methods considered the imbalance in voxel classes between the LNs and the remaining part of the volumes, which will impede the training efficiency. In (Lin et al., 2017), they proposed a focal loss function for the dense object detection problem, which focused on the imbalance of the bounding box between the objects and the background by down-weighting the loss from the well-classified examples. Inspired by the focal loss function, an exponential loss function (Xu et al., 2020) is proposed for the slice classification task, which aims to classify each slice as to whether or not it contains pathological LNs on PET/CT. However, the problem of imbalance is more severe in the LN segmentation task owing to the fact that most voxels do not constitute LNs. Second, the multi-scale information from feature maps is helpful to do semantic segmentation in natural images (Chen et al., 2018), which did not used in LN segmentation.

In this work, there are three novelties:

Firstly, considering the merit of multi-scale feature analysis by ASPP and training time and efficiency, a new strategy is proposed by using more than one ASPP sub-module called multi-stage Atrous-spatial-pyramid-pooling (MS-ASPP) after the max-pooling layer in the encoder part, which could provide more context information into the decoder to aid with LN segmentation. We named this DiSegNet, which involves the integration of MA-ASPP into the SegNet architecture.

Second, we tested the loss functions that are used to train the semantic segmentation network such as cross-entropy and Dice loss (Sudre et al., 2017), and found that these loss functions did not consider the imbalance of voxel classes. In (Lin et al., 2017), the authors designed a focal loss function for the imbalance problem of dense object detection. A new exponential loss function is proposed in (Xu et al., 2020) for pathological LN classification, which aims to handle the imbalance of slice classes that include and do not include LNs. Inspired by the idea of focal loss function (FL) and exponential loss function (EL), we propose a novel loss function named cosine-sine (CS) loss function to deal with the imbalance of voxel classes for LN segmentation in this study. Compared to FL and EL, the proposed CS loss function will up-weight the loss from misclassified voxels while down-weighting the loss from well-classified voxels, which facilitates the pathological LN segmentation task.

Third, to our best knowledge, this is the first report of pathological LN segmentation in positron emission tomography/computed tomography (PET/CT) images of thorax based on DCNN. We also compared our method to other published papers that perform LN segmentation on CT or PET/CT volumes.

In this paper, the definitions of the cross-entropy loss function and focal loss function are first introduced. Then, the materials are introduced in Section 2.1 and a novel cosine-sine (CS) loss function is proposed in Section 2.2.1. Two variants of the DiSegNet architecture that employ the MS-ASPP are introduced in section 2.3.2. In part 3 and part 4, the experiments and discussion will be shown. Finally, the conclusion is provided.

Section snippets

Materials

This retrospective study was conducted following approval from the Institutional Review Board at the University of Pennsylvania (UPenn) along with a Health Insurance Portability and Accountability Act waiver. The data set was retrospectively selected from our health system patient image database by a board-certified radiologist (D.A.T), which consists of 63 18F-fluorodeoxyglucose (FDG) PET/CT image data sets from 63 subjects with either Hodgkin lymphoma or diffuse large B-cell lymphoma (DLBCL).

Results

In this work, we utilize the VGG16 framework by discarding the fully connected layers combined with MS-ASPP module in the encoder part. The stochastic gradient descent with momentum optimizer is used. The momentum value, the initial learning rate, and the minimum batch size are 0.9, 0.001 and 4, respectively. The proposed cosine-sine (CS) loss function is employed as the objective function for training the network, and the cross-entropy loss, focal loss, and dice loss functions are used as

Discussion

In this paper, we aim two issues, which are not addressed by the previous LNs segmentation. The first is the imbalance of voxel classes and the second is the lack of multi-scale information from feature map of SegNet. Keeping this in mind, we proposed a novel deep convolutional neural network (DCNN) named DiSegNet to overcome these two issues.

Firstly, the proposed CS loss function could up-weight the loss from misclassified voxels. The idea is similar to the AdaBoost(Freund and Schapire, 1995),

Conclusion

In this work, we proposed a simple but effective cosine-sine (CS) loss function as an objective function for training different networks to deal with the imbalance class problem for LN segmentation. The CS loss function can focus on learning the hard-to-classify (misclassified) voxels (examples) and down-weight the well-classified voxels (examples) at the same time. Our experimental results show that the proposed loss function can achieve good results for automatic abnormal LN segmentation in

CRediT authorship contribution statement

Guoping Xu: Investigation, Software, Writing - original draft. Hanqiang Cao: Methodology, Validation. Jayaram K. Udupa: Supervision, Conceptualization, Methodology, Writing - review & editing. Yubing Tong: Visualization, Writing - review & editing. Drew A. Torigian: Data curation, Resources, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledges

The training of Mr. Guoping Xu in the Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, for the duration of one year was supported by the China Scholarship Council. His subsequent training was supported by research funds provided by Dr. Drew Torigian at the University of Pennsylvania.

References (26)

  • J. Feulner et al.

    Lymph node detection and segmentation in chest CT data using discriminative learning and a spatial prior

    Med. Image Anal.

    (2013)
  • J.K. Udupa

    Body-wide hierarchical fuzzy modeling, recognition, and delineation of anatomy in medical images

    Med. Image Anal.

    (2014)
  • V. Badrinarayanan et al.

    SegNet: a deep convolutional encoder-decoder architecture for image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • A. Barbu et al.

    Automatic detection and segmentation of lymph nodes from CT data

    IEEE Trans. Med. Imaging

    (2012)
  • A. Barbu et al.

    Automatic detection and segmentation of lymph nodes from CT data

    IEEE Trans. Med. Imaging

    (2012)
  • D. Bouget et al.

    Semantic segmentation and detection of mediastinal lymph nodes and anatomical structures in CT data for lung cancer staging

    Int. J. Comput. Assist. Radiol. Surg.

    (2019)
  • L.C. Chen et al.

    DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2018)
  • D. Eigen et al.
  • Y. Freund et al.

    A desicion-theoretic generalization of on-line learning and an application to boosting

    European Conference on Computational Learning Theory

    (1995)
  • K. He et al.

    Deep residual learning for image recognition

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2016)
  • A. Hoogi et al.

    A Fully-Automated Pipeline for Detection and Segmentation of Liver Lesions and Pathological Lymph Nodes

    (2017)
  • I. N. B
  • H. Kaiming et al.

    Delving deep into rectifiers: surpassing human-level performance on imagenet classification

    Proceedings of the IEEE International Conference on Computer Vision

    (2015)
  • Cited by (22)

    View all citing articles on Scopus
    View full text