Dilated MultiResUNet: Dilated multiresidual blocks network based on U-Net for biomedical image segmentation
Introduction
Deep learning excels in image processing including image classification, object detection and image segmentation. The experiments on various datasets have shown models like GoogleNet [1], AlexNet [2], VGG [3] and ResNet [4] etc. have stable and distinguished performance. There are several reasons for CNNs’ success. First, CNNs have many convolutional layers, each of which consists of multi convolutional kernels. The convolutional kernels scan the whole image from left-right to top-bottom and obtain the feature map. The former part of the network captures the local details of the image. The subsequent convolutional layers have larger receptive field and can capture more complex and abstract features. Second, CNNs introduce nonlinear activation function to enhance the expression ability of network [5,6]. Third, various optimization blocks, such as dropout [7], batch-normalization [8], L1 and L2 regularization, Momentum [9] and Adam [10], have been applied to enhance learning ability of features, avoiding overfitting. Then, various CNN models were applied for recognition and segmentation of images and showed their excellent performance. For example, Jon Long et al. [11] proposed Fully Convolutional Networks(FCN) which is regarded as the basic framework of semantic segmentation networks. FCN replaces the fully connected layers with convolutional layers, using deconvolution to reproduce the low-resolution feature map to map with size of inputs for pixel-by-pixel classification. Vijay Badrinarayanan et al. [12] proposed SegNet whose encoder uses pooling layer to gradually reduce spatial dimension of the input data, while the decoder gradually recovers details of the objects and the feature map size. Unlike using deconvolution to expand the size in FCN, SegNet uses up-pooling and remembers the position of maximum value when down-sampling, and uses feature map to fill in the index when up-sampling. Other positions are filled with zeros. Compared to deconvolution, up-sampling can greatly speed up training. These two networks work well in natural scene images. However, medical images are different from natural scene images. Usually, the structure of objects in natural scenes is simpler than that in medical images, which have characteristics of shape polymorphism. In addition, medical image types are multiple, e.g. CT, MRI, X-ray, Ultra sonogram and Pathological Section. Therefore, to adapt to the characteristics of medical images, U-Net [13], a symmetric encoding and decoding network, is proposed to improve segmentation performance of FCN on small medical images. The decoding part of U-Net consists of up-sampling operation and convolution, copying feature maps from contractive paths and concatenating them to corresponding layers in expansive path which reduces the loss of position information caused by pooling layers. As being the pioneer in biomedical image segmentation, U-Net still has room for improvement. Inspired by Residual Network, Ariel H. Curiale et al. [14] proposed a deep residual U-Net network (ResUnet) for automatic myocardial segmentation in 2017. Nabil Ibtehaz et al. [15] proposed MultiResUNet, using MultiRes block to replace common convolutional operation of U-Net to deal with multi-scale problems. Another improvement is that replacing straightforward concatenation with Res path which reduces the semantic gap between encoding parts and decoding parts. Md Zahangir Alom1 et al. [16] proposed R2U-Net which uses residual structure multiple times to extract features of images.
In this paper, by analyzing the above excellent networks, we propose the Improved Multi Block, Res block and Dilated Multi Block. In order to reduce the gridding effect caused by dilated convolution, we use ResAcMulti Block and ResMulti Block to substitute for common convolution, which achieves better performance with fewer numbers of parameters. Meanwhile, we design two different Dilated MultiResUNet networks for various datasets. The results show that our models achieve better performance with fewer parameters than U-Net. On the basis of U-net network, this paper removes pooling layers and increases dilated convolution layers to ensure the receptive field of deep features. Besides, we use special Improved Multi Block to alleviate gridding effect resulting from dilated convolution. In this study, the Improved Multi Block is used as the basic unit of deep feature extraction, and the input layer is compressed using a 1 × 1 convolution to reduce parameters of models. We also add Drop Block layer and Group Normalization layer to improve the generalization performance of the model.
The contributions of this paper can be summarized as follows.
- 1.
Several new convolution units are proposed which can be applied in replacing common convolutional layer to reduce the number of parameters.
- 2.
New units can reduce the loss of position information caused by pooling layer through enhancing the resolution of feature maps in deep layers while keeping the receptive field of neurons.
- 3.
Adding Group Normalization and dropping block to the network enables the network to perform well on small datasets. In addition, the experimental results also show our approach has better generalization performance of network.
Section snippets
Data augmentation
We evaluate our models on ISBI2015 Cell Tracking Dataset [17], INBreast Dataset [18], blood vessel dataset [19] and clinical psoriasis dataset. ISBI2015 Cell Tracking Dataset has 30 labeled microscopic images with a resolution of 512 × 512 × 1. We use 24 images for data augmentation and training and the rest for test. The INBreast dataset includes 58 training images and 58 test images with a resolution of 40 × 40 × 1.The dataset has also been preprocessed by the ROI extraction and image scaling
Experimental results
The experiments are conducted in Google Colab with NVIDIA Tesla T4 GPU which includes 2560 CUDA cores, 320 Turing tensor cores, 16GB video memory and supports multi-precision reasoning.
Discussion
To evaluate the proposed models comprehensively, we implement cross validation on cell edge dataset and blood vessel dataset. Every dataset is divided into 5 parts and each of them serves as validation in turn. In cell edge experiment above, 30 training images are expanded to 120 images using data augmentation, 6 images of which are used for validation. However, in 5-fold cross validation, every 24 images from data augmentation dataset serve as validation. We calculate the average of 5-fold
Conclusion
In this paper, we proposed an improved end-to-end segmentation network, which can prevent overfitting and improve segmentation performance on biomedical images. Compared to the U-Net and ResU-net model, the proposed Dilated MultiResUNet model has better segmentation performance on the four datasets. In this study, we only use two kinds of opened biomedical datasets to validate the Dilated MultiResUNet model. In our future research, we will test the proposed model on more clinical medical images
CRediT authorship contribution statement
Jingdong Yang: Supervision, Writing - review & editing. Jintu Zhu: Data curation, Writing - original draft, Software, Validation. Hailing Wang: Visualization, Investigation. Xin Yang: Conceptualization, Methodology, Software.
Acknowledgements
We greatly appreciate the helpful comments of the anonymous reviewers, which have significantly improved the style and content of this paper. This research is funded by the National Natural Science Foundation of China (81973749, 8187040043).
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
References (31)
On the momentum term in gradient descent learning algorithms
Neural Netw.
(1999)- et al.
MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation
Neural Netw.
(2020) - et al.
Going deeper with convolutions
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(2015) - et al.
Imagenet classification with deep convolutional neural networks
Adv. Neural Inform. Proces. Syst.
(2012) - et al.
Very deep convolutional networks for large-scale image recognition
arXiv preprint arXiv:1409.1556
(2014) - et al.
Deep residual learning for image recognition. 2015
arXiv preprint arXiv:1512.03385
(2016) - et al.
Medical Image Segmentation Using Deep Learning: A Survey
arXiv preprint arXiv:2009.13120
(2020) - et al.
A survey of the recent architectures of deep convolutional neural networks
Artif. Intell. Rev.
(2020) - et al.
Dropout: a simple way to prevent neural networks from overfitting
J. Mach. Learn. Res.
(2014) - et al.
Batch normalization: Accelerating deep network training by reducing internal covariate shift
arXiv preprint arXiv:1502.03167
(2015)