A deep residual attention-based U-Net with a biplane joint method for liver segmentation from CT scans

https://doi.org/10.1016/j.compbiomed.2022.106421Get rights and content

Highlights

  • The 3D spatial information of 3D medical images is extracted via a 2D network structure.

  • A deep residual block with powerful feature extraction capability is proposed.

  • A dual-effect attention module that can efficiently fuse features in the encoder and decoder is proposed.

  • The proposed method demonstrates obvious improvements in liver segmentation performance.

Abstract

Liver tumours are diseases with high morbidity and high deterioration probabilities, and accurate liver area segmentation from computed tomography (CT) scans is a prerequisite for quick tumour diagnosis. While 2D network segmentation methods can perform segmentation with lower device performance requirements, they often discard the rich 3D spatial information contained in CT scans, limiting their segmentation accuracy. Hence, a deep residual attention-based U-shaped network (DRAUNet) with a biplane joint method for liver segmentation is proposed in this paper, where the biplane joint method introduces coronal CT slices to assist the transverse slices with segmentation, incorporating more 3D spatial information into the segmentation results to improve the segmentation performance of the network. Additionally, a novel deep residual block (DR block) and dual-effect attention module (DAM) are introduced in DRAUNet, where the DR block has deeper layers and two shortcut paths. The DAM efficiently combines the correlations of feature channels and the spatial locations of feature maps. The DRAUNet with the biplane joint method is tested on three datasets, Liver Tumour Segmentation (LiTS), 3D Image Reconstruction for Comparison of Algorithms Database (3DIRCADb), and Segmentation of the Liver Competition 2007 (Sliver07), and it achieves 97.3%, 97.4%, and 96.9% Dice similarity coefficients (DSCs) for liver segmentation, respectively, outperforming most state-of-the-art networks; this strongly demonstrates the segmentation performance of DRAUNet and the ability of the biplane joint method to obtain 3D spatial information from 3D images.

Introduction

Liver cancer is one of the most common cancers and is one of the leading causes of cancer deaths worldwide. With the increasing maturity of computed tomography (CT) technology, medical image analysis has been widely used in clinical medicine, such as Coronavirus Disease 2019 (COVID-19) [1,2]. In conventional clinical diagnosis, the physician detects lesions based on the medical images of the patient. As the accuracy of CT has improved, the number of CT slices obtained from each scan has greatly increased, making the workload of the physician much heavier; additionally, the analysis and diagnosis of CT rely on the subjective judgement of the physician, increasing the probability of misdiagnoses or missed diagnoses. The prerequisite liver lesion analysis is the rapid and accurate localization of the liver area from CT scans; hence, the design of a computer-aided diagnosis (CAD) system for rapid and accurate liver segmentation is of great importance in clinical applications.

Previously proposed conventional segmentation methods, such as fuzzy C-means clustering [3], region growing [4,5], deformable models [6,7], and threshold based methods [[8], [9], [10], [11], [12]], are less human-dependent but still rely on handcrafted features and have limited feature representation capabilities. Recently, deep learning [13] methods have been applied to many fields, such as medical image processing, medical image detection [14], medical image segmentation, and image classification [15,16], among which convolutional neural network (CNN)-based methods, such as fully convolutional networks (FCNs) [17], DeepLab [18], dense convolutional networks (DenseNets) [19], residual networks (ResNets) [20], generative adversarial networks (GANs) [21], and U-shaped networks (U-Nets) [22], are the most widely used approaches. CNNs learn features in an end-to-end manner and continuously tune their parameters to complete segmentation tasks, often achieving performance beyond that of conventional segmentation methods. Medical images are typically 3D images, and the segmentation of livers and tumours via 3D networks is more likely to result in higher accuracy, but 3D networks are limited by equipment performance in clinical settings. The performance requirements of segmentation methods based on 2D networks are much lower than those of 3D networks, but 2D networks cannot utilize the 3D spatial information contained in images, limiting their potential segmentation accuracy.

To address such concerns, a novel biplane joint method is proposed in this paper; this approach incorporates the segmentation results of coronal slices into the segmentation results of transverse slices to endow the final segmentation results with other dimensional information to prevent the loss of spatial information in the segmentation processes of 2D networks. The developed method is combined with the proposed deep residual attention U-Net (DRAUNet) and evaluated on three datasets. The contributions of this paper can be summarized as follows.

  • (1)

    A deep residual block (DR block) is proposed for the encoder of DRAUNet, and the middle layer features of the DR block are reused to prevent the performance degradation caused by an overly deep network.

  • (2)

    A dual-effect attention module (DAM) is proposed to fuse the features in the encoder and decoder more efficiently to prevent the information loss caused by the max pooling and overpassing of low-resolution information.

  • (3)

    A biplane joint method for liver segmentation is proposed to fuse the segmentation results of transverse slices and coronal slices to endow the segmentation results with more spatial information, thereby solving the problem regarding the difficulty of obtaining 3D CT spatial information with a 2D network structure.

Our paper is organized as follows. First, we briefly review previous work related to the proposed method, including liver segmentation, residual structure, attention mechanism, and multislice segmentation (Section 2). Then, we describe the proposed method in detail (Section 3). Experiment-related datasets, implementation details, and experimental results are presented in Section 4. Finally, we discuss and summarize the method and experimental results (Sections 5 and 6, respectively).

U-Net is one of the most efficient structures in the liver segmentation network. Liu et al. [23] proposed GIU-Net, a combination of an improved U-Net with graph cutting. First, the input CT scan was segmented by using the improved U-Net to obtain a probability distribution map of the liver region. Second, the starting slice of the segmentation process was selected to construct a graph cutting energy function by using the contextual information of the liver sequence and the liver probability distribution map. Finally, the segmentation procedure was completed by minimizing the graph cutting energy function. To make full use of the output features of convolutional U-Net units, Tran et al. [24] proposed Un-Net, which uses skip connections for each output of these units. The U-Net-based derivative structure is also a popular topic for liver segmentation networks, Li et al. [25] proposed the bottleneck supervised U-Net (BS U-Net), adding a dense block, an inception block [26] and a dilation convolution [27] to the encoder of U-Net. Lei et al. [28] proposed a deformable encoder-decoder network (DefED-Net) to obtain contextual CT information; this network includes deformable convolution to enhance its feature representations and ladder-atrous spatial pyramid pooling (Ladder-ASPP) with a multiscale dilation rate. Li et al. [29] proposed a dual-path network H-DenseUNet, including a 2D DenseUNet and a 3D DenseUNet, for extracting intraslice features and contextual information, respectively. Jin et al. [30] proposed a residual attention U-Net (RA-UNet) for liver tumour segmentation, replacing the convolutional blocks of the conventional U-Net with residual blocks while proposing a residual attention mechanism and utilizing it for the jump connection component, which combines low-level feature maps with high-level feature maps to extract contextual information. Chen et al. [31] proposed a hybrid attention-based densely connected U-Net (HDU-Net) for automatic liver segmentation, introducing a global average pooling (GAP) block, a hybrid attention module, and a dense block to effectively acquire the 3D contextual information of CT scans. To explore both the 2D contextual information and 3D contextual information of CT scans, Song et al. [32] proposed a full-context CNN, effectively bridging the gap between 2D and 3D contexts.

He et al. [20] proposed ResNet, which introduces residual learning to solve the problem of training difficulties in deep neural networks. ResNet works well by adding shortcut paths between its convolutional layers to pass the gradient to a more distant layer and prevent the vanishing gradient problem; however, the weight of the shallow layers cannot be trained effectively. Residual blocks frequently appear as components in various deeper networks, such as the dual-path U-ResNet proposed by Xi et al. [33], where residual blocks are used in U-Net, trained by different loss functions, and then integrated into one model. Drozdzal et al. [34] proposed FC-ResNet, combining an FCN and ResNet, by using the FCN for image preprocessing and then applying FC-ResNet to the segmentation task. In this paper, a DR block was constructed based on the idea of ResNet and used in the encoder of the network, producing good feature extraction results.

Attentional mechanisms make the network tend to focus more on effective features. Yan et al. [35] proposed an attention-guided concatenation (AGC) module to filter features that were useful for segmentation, adaptively selecting useful contextual features from low-level features via the guidance of high-level features. Zhang et al. [36] proposed a deep attention refinement network (DARN), which introduces a semantic attention refinement (SemRef) module and a spatial attention refinement (SpaRef) module to utilize the feature relationships in different layers. Similarly, the residual attention mechanism proposed by Jin et al. [30] and the hybrid attention module proposed by Chen et al. [31] have contributed to the improvement of segmentation performance. Models that combine multiple attention mechanisms are also popular research topics, and the convolutional block attention module (CBAM) proposed by Woo et al. [37] uses squeeze and excitation (SE) [38] and spatial attention (SA) [39] sequentially to focus on recognizing target objects. The proposed DAM also explores the correlations of feature channels and spatial locations of feature maps and fuses them at different layers of the network to improve its segmentation efficiency.

Multislice segmentation networks, also known as 2.5D networks, have two main implementations. One is to use the joint segmentation of upper and lower adjacent slices; for example, Ben-Cohen et al. [40] combined an FCN with a GAN and changed the input size to three channels, inputting the relevant upper and lower slices into the network to utilize the 3D information contained in CT scans. The other implementation involves joint segmentation among the slices in three planes of CT scans. For example, Wang et al. [41] proposed a multi-plane Network (MPNet), in which three segmentation models were trained by using the transverse, coronal and sagittal slices of CT scans; the segmentation results of the different models were fused to obtain the final segmentation results. Both methods can preserve the 3D information of CT scans to different degrees, but the former is limited in terms of the 3D contextual information that can be obtained because the adjacent slices contain only a small amount of contextual information. The latter requires higher equipment performance and more time to train its three segmentation models, making it difficult to implement clinically; additionally, fusing too many slices in different planes will lead to a decrease in segmentation performance instead of an increase due to the differences among the segmentation results in different planes. The proposed biplane joint method uses transverse slices for network training and uses the same trained model to segment both transverse and coronal slices to incorporate as much 3D spatial information as possible into the segmentation results.

Section snippets

DRAUNet

This paper designs the novel DRAUNet based on the encoder-decoder architecture of U-Net, utilizing and enhancing the advantages of the U-shaped structure; Fig. 1 shows the structure of DRAUNet.

DRAUNet has 12 layers, which can be divided into three parts: an encoder, a decoder and skip connections.

  • (1)

    Encoder: The encoder of DRAUNet consists of six layers. First, the input feature is coarsely extracted in the first layer by using a Conv3 × 3-BN-ReLU operation consisting of a 3 × 3 convolution, batch

Data description

The experiments are mainly conducted on the Liver Tumour Segmentation (LiTS), 3D Image Reconstruction for Comparison of Algorithms Database (3DIRCADb), and Segmentation of the Liver Competition 2007 (Sliver07) datasets. The LiTS dataset contains 131 abdominal CT scans with ground truths and 70 C T scans without ground truths. The 3DIRCADb dataset contains 20 C T scans with ground truths. The Sliver07 dataset contains 20 C T scans with ground truths and 10 C T scans without ground truths. The CT

Discussion

The 3D information of CT scans contains richer features than 2D information. Recently, many 2D networks have been able to achieve high segmentation accuracy in the field of liver segmentation, but the continued improvement of segmentation accuracy is limited by the dimensionality of the utilized network; hence, how to obtaining 3D spatial information from CT scans is one of the problems to be solved with 2D networks. To address this problem, several multislice segmentation methods that apply 3D

Conclusion

In this paper, we propose DRAUNet for liver segmentation from CT scans, investigate an effective method for obtaining 3D spatial information from CT scans through a 2D network and propose a combination of DRAUNet and the biplane joint method. A novel DR block and a DAM are included in DRAUNet. We demonstrate the high segmentation performance of DRAUNet, as well as its robustness and generalizability, via experiments on several datasets. The proposed DR block and DAM are demonstrated by ablation

Declaration of competing interest

All authors disclosed no relevant relationships. The work described has not been submitted elsewhere for publication, in whole or in part, and all the authors listed have approved the manuscript that is enclosed.

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant 61762067 and Grant 61867004) and the Natural Science Foundation of Jiangxi Province (Grant 20202BABL202029 and Grant 20202BABL202028).

References (47)

  • Z. Liu

    Liver CT sequence segmentation based with improved U-Net and graph cut

    Expert Syst. Appl.

    (2019)
  • M. Drozdzal

    Learning normalized inputs for iterative estimation in medical image segmentation

    Med. Image Anal.

    (2018)
  • A. Ben-Cohen

    Cross-modality synthesis from CT to PET using FCN and GAN networks for improved automated lesion detection

    Eng. Appl. Artif. Intell.

    (2019)
  • C. Wang

    Automatic liver segmentation using multi-plane integrated fully convolutional neural networks

    2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

    (2018)
  • X. Yang

    Multi-threshold image segmentation for melanoma based on Kapur's entropy using enhanced ant colony optimization

    Front. Neuroinf.

    (2022)
  • W. Wang et al.

    Improved minimum spanning tree based image segmentation with guided matting

    KSII Transactions on Internet and Information Systems

    (Jan. 2022)
  • G.E. Hinton et al.

    Reducing the dimensionality of data with neural networks

    Sci. Technol. Humanit.

    (2006)
  • B. He

    Image segmentation algorithm of lung cancer based on neural network model

    Expet Syst.

    (2022)
  • Y. Wang et al.

    Architecture evolution of convolutional neural network using monarch butterfly optimization

    J. Ambient Intell. Hum. Comput.

    (Mar. 2022)
  • G.-G. Wang et al.

    Self-adaptive extreme learning machine

    Neural Comput. Appl.

    (Feb. 2016)
  • J. Long et al.

    Fully convolutional networks for semantic segmentation

  • L.-C. Chen et al.

    Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • G. Huang et al.

    Densely connected convolutional networks

  • Cited by (0)

    View full text