Dilated MultiResUNet: Dilated multiresidual blocks network based on U-Net for biomedical image segmentation

https://doi.org/10.1016/j.bspc.2021.102643Get rights and content

Highlights

  • MultiRes Block, Res Block, and Dilated Multi Block are used to replace common convolution to reduce loss of position information caused by pooling layer via enhancing the resolution of feature maps with original receptive field.

  • Two special Dilated Multi-Residual U-Nets are applied to four datasets to extract different-sized features.

  • Group Normalization and Dropping Block are applied to improve the generalization performance of network.

Abstract

U-net is a classical and high-efficiency network, which achieves better performance than other end-to-end networks in Biomedical Image Segmentation with fewer training images. However, the feature details of images are often missing out of pooling layers and the contour of small objects cannot be totally reconstructed by the up-sampling layers of U-Net. To reduce the loss of important features, we propose a Dilated MultiResUNet network to improve the performance of end-to-end image segmentation based on U-Net, Res2Net, MultiResUNet, Dilated Residual Networks and Squeeze-and-Excitation Networks. We use Improved Multi Block, Res Block, and Dilated Multi Block to substitute for common convolutional operation. Besides, we design two special networks of Dilated MultiResUNet with various combination of blocks to improve up-sampling and concatenation operation of U-Net. We evaluate the proposed models in terms of typical evaluation indexes on four biomedical datasets, i.e. ISBI2015, INBreast, blood vessel and psoriasis dataset. The experimental results show that the two Dilated MultiResUNet networks have superior accuracy and achieve better generalization performance only with 59.7 % and 63.3 % number of parameters of U-Net model respectively.

Introduction

Deep learning excels in image processing including image classification, object detection and image segmentation. The experiments on various datasets have shown models like GoogleNet [1], AlexNet [2], VGG [3] and ResNet [4] etc. have stable and distinguished performance. There are several reasons for CNNs’ success. First, CNNs have many convolutional layers, each of which consists of multi convolutional kernels. The convolutional kernels scan the whole image from left-right to top-bottom and obtain the feature map. The former part of the network captures the local details of the image. The subsequent convolutional layers have larger receptive field and can capture more complex and abstract features. Second, CNNs introduce nonlinear activation function to enhance the expression ability of network [5,6]. Third, various optimization blocks, such as dropout [7], batch-normalization [8], L1 and L2 regularization, Momentum [9] and Adam [10], have been applied to enhance learning ability of features, avoiding overfitting. Then, various CNN models were applied for recognition and segmentation of images and showed their excellent performance. For example, Jon Long et al. [11] proposed Fully Convolutional Networks(FCN) which is regarded as the basic framework of semantic segmentation networks. FCN replaces the fully connected layers with convolutional layers, using deconvolution to reproduce the low-resolution feature map to map with size of inputs for pixel-by-pixel classification. Vijay Badrinarayanan et al. [12] proposed SegNet whose encoder uses pooling layer to gradually reduce spatial dimension of the input data, while the decoder gradually recovers details of the objects and the feature map size. Unlike using deconvolution to expand the size in FCN, SegNet uses up-pooling and remembers the position of maximum value when down-sampling, and uses feature map to fill in the index when up-sampling. Other positions are filled with zeros. Compared to deconvolution, up-sampling can greatly speed up training. These two networks work well in natural scene images. However, medical images are different from natural scene images. Usually, the structure of objects in natural scenes is simpler than that in medical images, which have characteristics of shape polymorphism. In addition, medical image types are multiple, e.g. CT, MRI, X-ray, Ultra sonogram and Pathological Section. Therefore, to adapt to the characteristics of medical images, U-Net [13], a symmetric encoding and decoding network, is proposed to improve segmentation performance of FCN on small medical images. The decoding part of U-Net consists of up-sampling operation and convolution, copying feature maps from contractive paths and concatenating them to corresponding layers in expansive path which reduces the loss of position information caused by pooling layers. As being the pioneer in biomedical image segmentation, U-Net still has room for improvement. Inspired by Residual Network, Ariel H. Curiale et al. [14] proposed a deep residual U-Net network (ResUnet) for automatic myocardial segmentation in 2017. Nabil Ibtehaz et al. [15] proposed MultiResUNet, using MultiRes block to replace common convolutional operation of U-Net to deal with multi-scale problems. Another improvement is that replacing straightforward concatenation with Res path which reduces the semantic gap between encoding parts and decoding parts. Md Zahangir Alom1 et al. [16] proposed R2U-Net which uses residual structure multiple times to extract features of images.

In this paper, by analyzing the above excellent networks, we propose the Improved Multi Block, Res block and Dilated Multi Block. In order to reduce the gridding effect caused by dilated convolution, we use ResAcMulti Block and ResMulti Block to substitute for common convolution, which achieves better performance with fewer numbers of parameters. Meanwhile, we design two different Dilated MultiResUNet networks for various datasets. The results show that our models achieve better performance with fewer parameters than U-Net. On the basis of U-net network, this paper removes pooling layers and increases dilated convolution layers to ensure the receptive field of deep features. Besides, we use special Improved Multi Block to alleviate gridding effect resulting from dilated convolution. In this study, the Improved Multi Block is used as the basic unit of deep feature extraction, and the input layer is compressed using a 1 × 1 convolution to reduce parameters of models. We also add Drop Block layer and Group Normalization layer to improve the generalization performance of the model.

The contributions of this paper can be summarized as follows.

  • 1.

    Several new convolution units are proposed which can be applied in replacing common convolutional layer to reduce the number of parameters.

  • 2.

    New units can reduce the loss of position information caused by pooling layer through enhancing the resolution of feature maps in deep layers while keeping the receptive field of neurons.

  • 3.

    Adding Group Normalization and dropping block to the network enables the network to perform well on small datasets. In addition, the experimental results also show our approach has better generalization performance of network.

Section snippets

Data augmentation

We evaluate our models on ISBI2015 Cell Tracking Dataset [17], INBreast Dataset [18], blood vessel dataset [19] and clinical psoriasis dataset. ISBI2015 Cell Tracking Dataset has 30 labeled microscopic images with a resolution of 512 × 512 × 1. We use 24 images for data augmentation and training and the rest for test. The INBreast dataset includes 58 training images and 58 test images with a resolution of 40 × 40 × 1.The dataset has also been preprocessed by the ROI extraction and image scaling

Experimental results

The experiments are conducted in Google Colab with NVIDIA Tesla T4 GPU which includes 2560 CUDA cores, 320 Turing tensor cores, 16GB video memory and supports multi-precision reasoning.

Discussion

To evaluate the proposed models comprehensively, we implement cross validation on cell edge dataset and blood vessel dataset. Every dataset is divided into 5 parts and each of them serves as validation in turn. In cell edge experiment above, 30 training images are expanded to 120 images using data augmentation, 6 images of which are used for validation. However, in 5-fold cross validation, every 24 images from data augmentation dataset serve as validation. We calculate the average of 5-fold

Conclusion

In this paper, we proposed an improved end-to-end segmentation network, which can prevent overfitting and improve segmentation performance on biomedical images. Compared to the U-Net and ResU-net model, the proposed Dilated MultiResUNet model has better segmentation performance on the four datasets. In this study, we only use two kinds of opened biomedical datasets to validate the Dilated MultiResUNet model. In our future research, we will test the proposed model on more clinical medical images

CRediT authorship contribution statement

Jingdong Yang: Supervision, Writing - review & editing. Jintu Zhu: Data curation, Writing - original draft, Software, Validation. Hailing Wang: Visualization, Investigation. Xin Yang: Conceptualization, Methodology, Software.

Acknowledgements

We greatly appreciate the helpful comments of the anonymous reviewers, which have significantly improved the style and content of this paper. This research is funded by the National Natural Science Foundation of China (81973749, 8187040043).

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (31)

  • N. Qian

    On the momentum term in gradient descent learning algorithms

    Neural Netw.

    (1999)
  • N. Ibtehaz et al.

    MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation

    Neural Netw.

    (2020)
  • C. Szegedy et al.

    Going deeper with convolutions

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • A. Krizhevsky et al.

    Imagenet classification with deep convolutional neural networks

    Adv. Neural Inform. Proces. Syst.

    (2012)
  • K. Simonyan et al.

    Very deep convolutional networks for large-scale image recognition

    arXiv preprint arXiv:1409.1556

    (2014)
  • K. He et al.

    Deep residual learning for image recognition. 2015

    arXiv preprint arXiv:1512.03385

    (2016)
  • T. Lei et al.

    Medical Image Segmentation Using Deep Learning: A Survey

    arXiv preprint arXiv:2009.13120

    (2020)
  • A. Khan et al.

    A survey of the recent architectures of deep convolutional neural networks

    Artif. Intell. Rev.

    (2020)
  • N. Srivastava et al.

    Dropout: a simple way to prevent neural networks from overfitting

    J. Mach. Learn. Res.

    (2014)
  • S. Ioffe et al.

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    arXiv preprint arXiv:1502.03167

    (2015)
  • D.P. Kingma et al.

    A method for stochastic optimization

    arXiv preprint arXiv:1412.6980

    (2014)
  • J. Long et al.

    Fully convolutional networks for semantic segmentation

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    (2015)
  • V. Badrinarayanan et al.

    Segnet: A deep convolutional encoder-decoder architecture for image segmentation

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2017)
  • O. Ronneberger et al.

    U-net: convolutional networks for biomedical image segmentation

  • A.H. Curiale et al.

    Automatic myocardial segmentation by using a deep learning network in cardiac MRI

  • Cited by (0)

    View full text