A deep residual attention-based U-Net with a biplane joint method for liver segmentation from CT scans
Introduction
Liver cancer is one of the most common cancers and is one of the leading causes of cancer deaths worldwide. With the increasing maturity of computed tomography (CT) technology, medical image analysis has been widely used in clinical medicine, such as Coronavirus Disease 2019 (COVID-19) [1,2]. In conventional clinical diagnosis, the physician detects lesions based on the medical images of the patient. As the accuracy of CT has improved, the number of CT slices obtained from each scan has greatly increased, making the workload of the physician much heavier; additionally, the analysis and diagnosis of CT rely on the subjective judgement of the physician, increasing the probability of misdiagnoses or missed diagnoses. The prerequisite liver lesion analysis is the rapid and accurate localization of the liver area from CT scans; hence, the design of a computer-aided diagnosis (CAD) system for rapid and accurate liver segmentation is of great importance in clinical applications.
Previously proposed conventional segmentation methods, such as fuzzy C-means clustering [3], region growing [4,5], deformable models [6,7], and threshold based methods [[8], [9], [10], [11], [12]], are less human-dependent but still rely on handcrafted features and have limited feature representation capabilities. Recently, deep learning [13] methods have been applied to many fields, such as medical image processing, medical image detection [14], medical image segmentation, and image classification [15,16], among which convolutional neural network (CNN)-based methods, such as fully convolutional networks (FCNs) [17], DeepLab [18], dense convolutional networks (DenseNets) [19], residual networks (ResNets) [20], generative adversarial networks (GANs) [21], and U-shaped networks (U-Nets) [22], are the most widely used approaches. CNNs learn features in an end-to-end manner and continuously tune their parameters to complete segmentation tasks, often achieving performance beyond that of conventional segmentation methods. Medical images are typically 3D images, and the segmentation of livers and tumours via 3D networks is more likely to result in higher accuracy, but 3D networks are limited by equipment performance in clinical settings. The performance requirements of segmentation methods based on 2D networks are much lower than those of 3D networks, but 2D networks cannot utilize the 3D spatial information contained in images, limiting their potential segmentation accuracy.
To address such concerns, a novel biplane joint method is proposed in this paper; this approach incorporates the segmentation results of coronal slices into the segmentation results of transverse slices to endow the final segmentation results with other dimensional information to prevent the loss of spatial information in the segmentation processes of 2D networks. The developed method is combined with the proposed deep residual attention U-Net (DRAUNet) and evaluated on three datasets. The contributions of this paper can be summarized as follows.
- (1)
A deep residual block (DR block) is proposed for the encoder of DRAUNet, and the middle layer features of the DR block are reused to prevent the performance degradation caused by an overly deep network.
- (2)
A dual-effect attention module (DAM) is proposed to fuse the features in the encoder and decoder more efficiently to prevent the information loss caused by the max pooling and overpassing of low-resolution information.
- (3)
A biplane joint method for liver segmentation is proposed to fuse the segmentation results of transverse slices and coronal slices to endow the segmentation results with more spatial information, thereby solving the problem regarding the difficulty of obtaining 3D CT spatial information with a 2D network structure.
Our paper is organized as follows. First, we briefly review previous work related to the proposed method, including liver segmentation, residual structure, attention mechanism, and multislice segmentation (Section 2). Then, we describe the proposed method in detail (Section 3). Experiment-related datasets, implementation details, and experimental results are presented in Section 4. Finally, we discuss and summarize the method and experimental results (Sections 5 and 6, respectively).
U-Net is one of the most efficient structures in the liver segmentation network. Liu et al. [23] proposed GIU-Net, a combination of an improved U-Net with graph cutting. First, the input CT scan was segmented by using the improved U-Net to obtain a probability distribution map of the liver region. Second, the starting slice of the segmentation process was selected to construct a graph cutting energy function by using the contextual information of the liver sequence and the liver probability distribution map. Finally, the segmentation procedure was completed by minimizing the graph cutting energy function. To make full use of the output features of convolutional U-Net units, Tran et al. [24] proposed Un-Net, which uses skip connections for each output of these units. The U-Net-based derivative structure is also a popular topic for liver segmentation networks, Li et al. [25] proposed the bottleneck supervised U-Net (BS U-Net), adding a dense block, an inception block [26] and a dilation convolution [27] to the encoder of U-Net. Lei et al. [28] proposed a deformable encoder-decoder network (DefED-Net) to obtain contextual CT information; this network includes deformable convolution to enhance its feature representations and ladder-atrous spatial pyramid pooling (Ladder-ASPP) with a multiscale dilation rate. Li et al. [29] proposed a dual-path network H-DenseUNet, including a 2D DenseUNet and a 3D DenseUNet, for extracting intraslice features and contextual information, respectively. Jin et al. [30] proposed a residual attention U-Net (RA-UNet) for liver tumour segmentation, replacing the convolutional blocks of the conventional U-Net with residual blocks while proposing a residual attention mechanism and utilizing it for the jump connection component, which combines low-level feature maps with high-level feature maps to extract contextual information. Chen et al. [31] proposed a hybrid attention-based densely connected U-Net (HDU-Net) for automatic liver segmentation, introducing a global average pooling (GAP) block, a hybrid attention module, and a dense block to effectively acquire the 3D contextual information of CT scans. To explore both the 2D contextual information and 3D contextual information of CT scans, Song et al. [32] proposed a full-context CNN, effectively bridging the gap between 2D and 3D contexts.
He et al. [20] proposed ResNet, which introduces residual learning to solve the problem of training difficulties in deep neural networks. ResNet works well by adding shortcut paths between its convolutional layers to pass the gradient to a more distant layer and prevent the vanishing gradient problem; however, the weight of the shallow layers cannot be trained effectively. Residual blocks frequently appear as components in various deeper networks, such as the dual-path U-ResNet proposed by Xi et al. [33], where residual blocks are used in U-Net, trained by different loss functions, and then integrated into one model. Drozdzal et al. [34] proposed FC-ResNet, combining an FCN and ResNet, by using the FCN for image preprocessing and then applying FC-ResNet to the segmentation task. In this paper, a DR block was constructed based on the idea of ResNet and used in the encoder of the network, producing good feature extraction results.
Attentional mechanisms make the network tend to focus more on effective features. Yan et al. [35] proposed an attention-guided concatenation (AGC) module to filter features that were useful for segmentation, adaptively selecting useful contextual features from low-level features via the guidance of high-level features. Zhang et al. [36] proposed a deep attention refinement network (DARN), which introduces a semantic attention refinement (SemRef) module and a spatial attention refinement (SpaRef) module to utilize the feature relationships in different layers. Similarly, the residual attention mechanism proposed by Jin et al. [30] and the hybrid attention module proposed by Chen et al. [31] have contributed to the improvement of segmentation performance. Models that combine multiple attention mechanisms are also popular research topics, and the convolutional block attention module (CBAM) proposed by Woo et al. [37] uses squeeze and excitation (SE) [38] and spatial attention (SA) [39] sequentially to focus on recognizing target objects. The proposed DAM also explores the correlations of feature channels and spatial locations of feature maps and fuses them at different layers of the network to improve its segmentation efficiency.
Multislice segmentation networks, also known as 2.5D networks, have two main implementations. One is to use the joint segmentation of upper and lower adjacent slices; for example, Ben-Cohen et al. [40] combined an FCN with a GAN and changed the input size to three channels, inputting the relevant upper and lower slices into the network to utilize the 3D information contained in CT scans. The other implementation involves joint segmentation among the slices in three planes of CT scans. For example, Wang et al. [41] proposed a multi-plane Network (MPNet), in which three segmentation models were trained by using the transverse, coronal and sagittal slices of CT scans; the segmentation results of the different models were fused to obtain the final segmentation results. Both methods can preserve the 3D information of CT scans to different degrees, but the former is limited in terms of the 3D contextual information that can be obtained because the adjacent slices contain only a small amount of contextual information. The latter requires higher equipment performance and more time to train its three segmentation models, making it difficult to implement clinically; additionally, fusing too many slices in different planes will lead to a decrease in segmentation performance instead of an increase due to the differences among the segmentation results in different planes. The proposed biplane joint method uses transverse slices for network training and uses the same trained model to segment both transverse and coronal slices to incorporate as much 3D spatial information as possible into the segmentation results.
Section snippets
DRAUNet
This paper designs the novel DRAUNet based on the encoder-decoder architecture of U-Net, utilizing and enhancing the advantages of the U-shaped structure; Fig. 1 shows the structure of DRAUNet.
DRAUNet has 12 layers, which can be divided into three parts: an encoder, a decoder and skip connections.
- (1)
Encoder: The encoder of DRAUNet consists of six layers. First, the input feature is coarsely extracted in the first layer by using a Conv3 × 3-BN-ReLU operation consisting of a 3 × 3 convolution, batch
Data description
The experiments are mainly conducted on the Liver Tumour Segmentation (LiTS), 3D Image Reconstruction for Comparison of Algorithms Database (3DIRCADb), and Segmentation of the Liver Competition 2007 (Sliver07) datasets. The LiTS dataset contains 131 abdominal CT scans with ground truths and 70 C T scans without ground truths. The 3DIRCADb dataset contains 20 C T scans with ground truths. The Sliver07 dataset contains 20 C T scans with ground truths and 10 C T scans without ground truths. The CT
Discussion
The 3D information of CT scans contains richer features than 2D information. Recently, many 2D networks have been able to achieve high segmentation accuracy in the field of liver segmentation, but the continued improvement of segmentation accuracy is limited by the dimensionality of the utilized network; hence, how to obtaining 3D spatial information from CT scans is one of the problems to be solved with 2D networks. To address this problem, several multislice segmentation methods that apply 3D
Conclusion
In this paper, we propose DRAUNet for liver segmentation from CT scans, investigate an effective method for obtaining 3D spatial information from CT scans through a 2D network and propose a combination of DRAUNet and the biplane joint method. A novel DR block and a DAM are included in DRAUNet. We demonstrate the high segmentation performance of DRAUNet, as well as its robustness and generalizability, via experiments on several datasets. The proposed DR block and DAM are demonstrated by ablation
Declaration of competing interest
All authors disclosed no relevant relationships. The work described has not been submitted elsewhere for publication, in whole or in part, and all the authors listed have approved the manuscript that is enclosed.
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant 61762067 and Grant 61867004) and the Natural Science Foundation of Jiangxi Province (Grant 20202BABL202029 and Grant 20202BABL202028).
References (47)
Directional mutation and crossover boosted ant colony optimization with application to COVID-19 X-ray image segmentation
Comput. Biol. Med.
(Sep. 2022)Gaussian barebone salp swarm algorithm with stochastic fractal search for medical image segmentation: a COVID-19 case study
Comput. Biol. Med.
(Dec. 2021)- et al.
Kernelized fuzzy C-means clustering with adaptive thresholding for segmenting liver tumors
Procedia Comput. Sci.
(Jan. 2016) - et al.
A ship target discrimination method based on change detection in SAR imagery
J. Electron. Inf. Technol.
(2015) - et al.
The study and application of the improved region growing algorithm for liver segmentation
Optik
(May 2014) - et al.
Iterative mesh transformation for 3D segmentation of livers with cancers in CT images
Comput. Med. Imag. Graph.
(2015) - et al.
Liver vessel segmentation based on centerline constraint and intensity model
Biomed. Signal Process Control
(Aug. 2018) An efficient multilevel thresholding image segmentation method based on the slime mould algorithm with bee foraging mechanism: a real case with lupus nephritis images
Comput. Biol. Med.
(Mar. 2022)Multilevel threshold image segmentation for COVID-19 chest radiography: a framework using horizontal and vertical multiverse optimization
Comput. Biol. Med.
(Jul. 2022)Performance optimization of water cycle algorithm for multilevel lupus nephritis image segmentation
Biomed. Signal Process Control
(Feb. 2023)