DiSegNet: A deep dilated convolutional encoder-decoder architecture for lymph node segmentation on PET/CT images
Introduction
Lymph node (LN) segmentation aims to assign a categorical label to every voxel in an image, which plays an important role in medical image analysis and disease quantification. Yet, manual detection and measurement/segmentation of LNs in images by human observers is time-consuming and error prone (Feulner et al., 2013). Moreover, LNs are difficult to recognize and segment owing to the low contrast between LNs and surrounding soft tissues as well as due to the variation in nodal size and shape.
This topic has received much attention in recent decades. Discriminative learning and a spatial prior probability have been used for LN detection and segmentation in chest computed tomography (CT) images (Feulner et al., 2013). The discriminative model was used to detect LNs from their appearance combined with the anatomical prior knowledge, and the graph cut method was used to segment LNs. In (Barbu et al., 2012a), the authors proposed a learning-based approach that used marginal space learning for LN segmentation in the axillary region. Pathological LNs were detected and segmented in (Hoogi et al., 2017). The method employed machine learning techniques such as marginal space learning, convolutional neural network, and active contour models for organ detection, LN detection, and LN segmentation. All of the methods above are based on the hand-crafted features or prior knowledge to recognize and delineate LNs.
Recently, fully convolutional networks (FCNs) (Long et al., 2015) were used for pixel-wise semantic segmentation tasks due to the development of deep learning methods, especially given the success of deep convolutional neural network (DCNN) models such as AlexNet (Krizhevsky et al., 2012), VGG (Simonyan and Zisserman, 2014), and ResNet(He et al., 2016). Many semantic segmentation architectures after FCNs appeared used the encoder and decoder structure, such as SegNet (Badrinarayanan et al., 2017) and U-Net (Ronneberger et al., 2015). However, there still exists a problem in the decoder part by recovering the low-resolution feature maps owing to the max-pooling operation. In (Chen et al., 2018), the dilated (or Atrous) convolution was proposed to overcome this problem, where Atrous-spatial-pyramid-pooling (ASPP) with various dilation rates was proposed to robustly segment objects at multiple scales. The DeepLab architecture has been designed by the integration of a ASPP sub-module in order to capture objects and context at multiple scales. Yet, it only uses one ASPP sub-module after the fifth max-pooling layer and did not exploit ASPP in different stages after the max-pooling operation that may miss some important features, especially from the small size of LNs. Moreover, it did not discuss the order that was employed for ASPP in the architecture, which is still an open question in terms of which layer/layers should use ASPP in the architecture of DCNN.
Although much success has been achieved in semantic segmentation, there are only a few works that employ semantic segmentation neural network for LN segmentation. In (Bouget et al., 2019), the authors combined U-Net and Mask R-CNN for segmentation and detection of mediastinal LNs. However, this method needs to train two DCNNs, which increased many repeat computations for feature learning because feature maps from U-Net can also be used in Mask R-CNN (Ren et al., 2017). In (Oda, 2018), the 3D U-Net was trained to segment LNs and other anatomical structures on contrast-enhanced chest CT volumes. Yet, it incurs much computation cost and memory consumption, especially GPU memory, given the 3D convolution operation.
There are two issues which are not addressed by the LN segmentation methods reviewed above. Firstly, none of the proposed methods considered the imbalance in voxel classes between the LNs and the remaining part of the volumes, which will impede the training efficiency. In (Lin et al., 2017), they proposed a focal loss function for the dense object detection problem, which focused on the imbalance of the bounding box between the objects and the background by down-weighting the loss from the well-classified examples. Inspired by the focal loss function, an exponential loss function (Xu et al., 2020) is proposed for the slice classification task, which aims to classify each slice as to whether or not it contains pathological LNs on PET/CT. However, the problem of imbalance is more severe in the LN segmentation task owing to the fact that most voxels do not constitute LNs. Second, the multi-scale information from feature maps is helpful to do semantic segmentation in natural images (Chen et al., 2018), which did not used in LN segmentation.
In this work, there are three novelties:
Firstly, considering the merit of multi-scale feature analysis by ASPP and training time and efficiency, a new strategy is proposed by using more than one ASPP sub-module called multi-stage Atrous-spatial-pyramid-pooling (MS-ASPP) after the max-pooling layer in the encoder part, which could provide more context information into the decoder to aid with LN segmentation. We named this DiSegNet, which involves the integration of MA-ASPP into the SegNet architecture.
Second, we tested the loss functions that are used to train the semantic segmentation network such as cross-entropy and Dice loss (Sudre et al., 2017), and found that these loss functions did not consider the imbalance of voxel classes. In (Lin et al., 2017), the authors designed a focal loss function for the imbalance problem of dense object detection. A new exponential loss function is proposed in (Xu et al., 2020) for pathological LN classification, which aims to handle the imbalance of slice classes that include and do not include LNs. Inspired by the idea of focal loss function (FL) and exponential loss function (EL), we propose a novel loss function named cosine-sine (CS) loss function to deal with the imbalance of voxel classes for LN segmentation in this study. Compared to FL and EL, the proposed CS loss function will up-weight the loss from misclassified voxels while down-weighting the loss from well-classified voxels, which facilitates the pathological LN segmentation task.
Third, to our best knowledge, this is the first report of pathological LN segmentation in positron emission tomography/computed tomography (PET/CT) images of thorax based on DCNN. We also compared our method to other published papers that perform LN segmentation on CT or PET/CT volumes.
In this paper, the definitions of the cross-entropy loss function and focal loss function are first introduced. Then, the materials are introduced in Section 2.1 and a novel cosine-sine (CS) loss function is proposed in Section 2.2.1. Two variants of the DiSegNet architecture that employ the MS-ASPP are introduced in section 2.3.2. In part 3 and part 4, the experiments and discussion will be shown. Finally, the conclusion is provided.
Section snippets
Materials
This retrospective study was conducted following approval from the Institutional Review Board at the University of Pennsylvania (UPenn) along with a Health Insurance Portability and Accountability Act waiver. The data set was retrospectively selected from our health system patient image database by a board-certified radiologist (D.A.T), which consists of 63 18F-fluorodeoxyglucose (FDG) PET/CT image data sets from 63 subjects with either Hodgkin lymphoma or diffuse large B-cell lymphoma (DLBCL).
Results
In this work, we utilize the VGG16 framework by discarding the fully connected layers combined with MS-ASPP module in the encoder part. The stochastic gradient descent with momentum optimizer is used. The momentum value, the initial learning rate, and the minimum batch size are 0.9, 0.001 and 4, respectively. The proposed cosine-sine (CS) loss function is employed as the objective function for training the network, and the cross-entropy loss, focal loss, and dice loss functions are used as
Discussion
In this paper, we aim two issues, which are not addressed by the previous LNs segmentation. The first is the imbalance of voxel classes and the second is the lack of multi-scale information from feature map of SegNet. Keeping this in mind, we proposed a novel deep convolutional neural network (DCNN) named DiSegNet to overcome these two issues.
Firstly, the proposed CS loss function could up-weight the loss from misclassified voxels. The idea is similar to the AdaBoost(Freund and Schapire, 1995),
Conclusion
In this work, we proposed a simple but effective cosine-sine (CS) loss function as an objective function for training different networks to deal with the imbalance class problem for LN segmentation. The CS loss function can focus on learning the hard-to-classify (misclassified) voxels (examples) and down-weight the well-classified voxels (examples) at the same time. Our experimental results show that the proposed loss function can achieve good results for automatic abnormal LN segmentation in
CRediT authorship contribution statement
Guoping Xu: Investigation, Software, Writing - original draft. Hanqiang Cao: Methodology, Validation. Jayaram K. Udupa: Supervision, Conceptualization, Methodology, Writing - review & editing. Yubing Tong: Visualization, Writing - review & editing. Drew A. Torigian: Data curation, Resources, Writing - review & editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledges
The training of Mr. Guoping Xu in the Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, for the duration of one year was supported by the China Scholarship Council. His subsequent training was supported by research funds provided by Dr. Drew Torigian at the University of Pennsylvania.
References (26)
- et al.
Lymph node detection and segmentation in chest CT data using discriminative learning and a spatial prior
Med. Image Anal.
(2013) Body-wide hierarchical fuzzy modeling, recognition, and delineation of anatomy in medical images
Med. Image Anal.
(2014)- et al.
SegNet: a deep convolutional encoder-decoder architecture for image segmentation
IEEE Trans. Pattern Anal. Mach. Intell.
(2017) - et al.
Automatic detection and segmentation of lymph nodes from CT data
IEEE Trans. Med. Imaging
(2012) - et al.
Automatic detection and segmentation of lymph nodes from CT data
IEEE Trans. Med. Imaging
(2012) - et al.
Semantic segmentation and detection of mediastinal lymph nodes and anatomical structures in CT data for lung cancer staging
Int. J. Comput. Assist. Radiol. Surg.
(2019) - et al.
DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs
IEEE Trans. Pattern Anal. Mach. Intell.
(2018) - et al.
- et al.
A desicion-theoretic generalization of on-line learning and an application to boosting
European Conference on Computational Learning Theory
(1995) - et al.
Deep residual learning for image recognition
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2016)
A Fully-Automated Pipeline for Detection and Segmentation of Liver Lesions and Pathological Lymph Nodes
Delving deep into rectifiers: surpassing human-level performance on imagenet classification
Proceedings of the IEEE International Conference on Computer Vision
Cited by (22)
Evaluation of mediastinal lymph node segmentation of heterogeneous CT data with full and weak supervision
2024, Computerized Medical Imaging and GraphicsFocusing on features with higher importance degree: A significance-wise mechanism for image restoration[Formula presented]
2023, Digital Signal Processing: A Review JournalHaar wavelet downsampling: A simple but effective downsampling module for semantic segmentation
2023, Pattern RecognitionDeep learning for automatic tumor lesions delineation and prognostic assessment in multi-modality PET/CT: A prospective survey
2023, Engineering Applications of Artificial IntelligenceA hard segmentation network guided by soft segmentation for tumor segmentation on PET/CT images
2023, Biomedical Signal Processing and ControlSiamese semi-disentanglement network for robust PET-CT segmentation
2023, Expert Systems with Applications