Abstract
Deformable image registration is one of the challenging inverse problems in medical images due to inevitable geometric distortions during the imaging. Conventional convolutional neural networks (CNNs) are limited to the fixed weights, and require substantial examples for supervised training. In this paper, we employ Bayesian deep learning in order to register the medical images that are corrupted by nonlinear geometric distortions. Given a pair of reference and moving image, our objective is to register the moving image onto the reference image coordinates while maintaining the pixel distribution of moving image. To this end, we propose a novel Bayesian deep architecture that contains 5 downsampling and 4 upsampling layers with kernel size of 3 \(\times \) 3 interleaved by pooling layer with tanh and rectified linear units (ReLU) as activation functions. The architecture estimates the displacement vector field (DVF) of the moving image. Finally, it undergoes the grid resampling to warp the moving image using estimated DVF. We demonstrate that the proposed Bayesian deep architecture learns the probability distribution over the weights and produce accurate registration. Experiments are performed on the publicly available Cardiac MRI and Lungs CT scan datasets. The proposed model achieves better performance when compared to state-of-the-art approaches.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Image registration is the process of aligning two or more images of the same scene with respect to each other where the images are captured by different sensors, at different times and/or from different viewpoints [1]. It is an essential task for automated medical image analysis since the images are subject to unpredictable geometric distortions during the acquisition. Typically, an image is considered as the reference image and images which are to be aligned with respect to the reference image are referred to as the moving images. Image registration is broadly classified as: (i) rigid or parametric registration, and (ii) non-rigid or deformable registration [1]. In rigid registration, the algorithms typically depend on a small set of parameters, i.e., few matching points between moving and reference image, from affine group of transformations. The goal of such methods is to obtain optimal parameters describing the geometric mapping between the reference and moving image by maximizing a similarity metric. On the other hand, deformable image registration algorithms aim to estimate the underlying displacement vector field (DVF) considering the distortions across all the pixel locations in the moving image [2].
Recently, deep learning based registration techniques have found performing faster and are robust when compared to the classical approaches. Studies where deep learning based methods are used for image registration include convolutional stacked autoencoder [3] which is applied to LONI and ADNI brain MRI datasets. A Multi-resolution image registration framework which learns spatial transformation at different resolution in a self supervised fully convolutional network (FCN) is proposed in [4]. In [5], spatial transformer network to estimate the transformation parameters to align the image is proposed. In [6], FCN based techniques are proposed as Deformable Image Registration network (DIRnet). Image registration is also attempted by deep generative model in [7]. More recently, Voxelmorph is proposed where the network is trained with Brain MRI dataset, while incorporating auxiliary information such as segmentation in order to ease the registration task [8]. Note that, all this conventional deep learning models require substantial amount of data examples for the supervised training and may also prone to underfitting/overfitting. Moreover, once such CNN is trained, it relies on its fixed set of weights to predict the output of the test image. However, there is high chance of underfitting and uncertainity involved in medical data due to scarcity of data examples and unpredictable geometric distortions, respectively. To overcome these issues, in this paper, we propose a novel Bayesian deep learning architecture for achieving deformable medical image registration. Unlike existing approaches, our architecture learns the posterior distribution over the weights with limited examples by the backpropagation. The network then learns the DVF by analyzing reference and moving image pair. In turn the DVF together with moving image is used to perform grid resampling in order to register it with the reference coordinates.
2 Proposed Architecture
In this section, we discuss the proposed Bayesian deep architecture to perform deformable image registration of medical data. Given a pair of reference and moving image our objective is to register the moving image onto the reference image coordinates while maintaining the pixel distribution of the moving image.
In the conventional convolutional neural network (CNN) set-up, given a set of training examples \(\mathbf D =(\mathbf x _{i},\mathbf y _{i})_{i} \) where \(\mathbf x _{i}\) and \(\mathbf y _{i}\) are the \(i^{th}\) input and output respectively, the goal is to learn weights w that shall predict the output \(\mathbf y _\mathbf{test }\) corresponding to test input \(\mathbf x _\mathbf{test }\). The training is typically done by the backpropagation, where one assume that \(\log P(\mathbf D |\mathbf w )\) is differentiable in w. Note that, at the end of training, the estimated weights w are fixed real numbers. Hence, the network has to rely on these hard (fixed) weights (numbers) to predict the output which may lead to severe error especially in medical data.
Unlike the conventional CNNs, Bayesian neural networks estimate the posterior distribution of the weights given the training data \(P(\mathbf w |D)\) by backpropagation [9]. The predictive distribution of output \(P(\mathbf y _\mathbf{test })\) of a test data input item \(\mathbf x _\mathbf{test }\) is then given by
where \({{\,\mathrm{\mathbb {E}}\,}}_{P(\mathbf w |\mathbf D )}\) is conditional expectation over posterior distribution of weights given the training data D. Thus taking the conditional expectation under the posterior distribution on weights (Eq. (1)) is equivalent to using a large number of neural networks for learning the weights. We propose one such deep Bayesian architecture Fig. 1 for achieving deformable medical image registration.
As shown in Fig. 1 a pair of reference and moving image of size \(150 \times 150\) pixels concatenated into 2-channel 2D image \((150 \times 150 \times 2)\) is then passed through downsampling and upsampling layers. It consists of 5 downsampling and 4 upsampling layers. Each downsampling layer consists of two convolutional layers with filters of size \(3\times 3\) where stride is 1 and pooling layer with a stride of 2. Whereas each upsampling layer consists of bilinear interpolation with scale size of 2 followed by two convolutional layers with filters of size \(3 \times 3\) where stride is 1. In order to incorporate non-linearity, we have used rectified linear units (ReLU) and tanh as activation functions. As shown in Fig. 1, the output of the architecture is the DVF for the moving images. It estimates the geometric displacement required by each pixel coordinate of moving image in order to get it registered with respect to the reference image. The estimated DVF along with grid resampling and bilinear interpolation of moving image gives us the final registered image as shown in Fig. 1.
2.1 Training/Test
The supervised training of proposed approach (Fig. 1) is done as follows: we use a differentiable loss function L that is a combination of structural similarity index (SSIM) [10] and mean-squared error (MSE) [11],
where \(\mathbf I _\mathbf{ref }\) is reference and \(\mathbf I _\mathbf{reg } \) is registered image, i.e., (\(\mathbf I _\mathbf{mov } \circ \varvec{\psi }\) ) where \(\varvec{\psi }\) is DVF, and
where \(\mathbf I _\mathbf{ref }(x, y)\) and \(\mathbf I _\mathbf{reg }(x, y)\) are reference and registered image coordinates, \(\mu _{I_{ref}}\) and \(\mu _{I_{reg}}\) are means and \(\sigma _{I_{ref}}^{2}\) and \(\sigma _{I_{reg}}^{2}\) are variances of reference and registered image, respectively. Here \(C_{1}\) and \(C_{2}\) are constants, m, n are total number of pixels for images. Given the differentiable loss function (2), the weights w = \([\text {w}_{1}, \text {w}_{2},....,\text {w}_{n}] \) and bias \(\mathbf b =[\text {b}_{1}, \text {b}_{2},....,\text {b}_{m}] \) where \(\text {w}_{i}\sim \mathcal {N}(\mu _{w},\sigma _{w})\), \(\text {b}_{i}\sim \mathcal {N}(\mu _{b},\sigma _{b})\) [12] of each layers are updated using backpropagation. We have employed batch normalization after every convolution layer, learning rate of 0.001, and batch size of 1 throughout training process.
3 Experiments and Results
In this section, we discuss the results obtained by applying the proposed Bayesian deep learning approach for the deformable medical image registration and compared with recent state-of-the-art approaches. In our experiment, we have used publicly available lungs computerized tomography (CT) scan data from the deformable image registration laboratory [13] as well as cardiac magnetic resonance imaging (MRI) data [14]. We implement the algorithm in Python and pytorch 1.0 on a machine with Intel core i5, Nvidia Geforce GTX 1050 4GB with 24 GB RAM.
3.1 Results on Lung CT Dataset
In this subsection, we show results of our approach on the lung CT images [13] and compare it with state-of-the-art approaches. The dataset consists of Thoracic 4DCT images as well as inspiratory and exipratory breath-hold CT image pairs. Entire dataset is arranged in 10 different cases. The images are of size \(512 \times 512 \times 128\) voxels with a dimension of each voxel \(0.97 \times 0.97 \times 2.5\) mm. In this experiment we have used 15 slices out of them, and resized it to 150 \(\times \) 150 pixels i.e, it yields \(150 \times 150 \times 15\). In order to generate moving images with different degrees of geometric distortions, we applied deformable or elastic distortion of order 0.8. We have generated 1000 distorted images for a single slice, giving a total of 15000 image pairs where the original slice (image) is considered as the reference and the corresponding distorted images are treated as its moving images. For the validation purpose, we split the dataset into training and test of 5000 and 3000 pairs, respectively. We first show the visual results of few challenging cases in Fig. 2. We now compare the reference image with estimated registered images using our approach as well as state-of-the-art approaches with different similarity metrics i.e, normalized cross correlation, structural similarity index (SSIM) [10], Hausdorff distance (HD) [15], root mean-squared error (RMSE) [11] for the test data of 3000 image pairs. We first show visual comparisons in Fig. 3. The calculated average values are as listed in the Table 1 along with results using state-of-the-art registration frameworks such as selfsupervised FCN [4], DIRnet [6], and Voxelmorph [8]. Note that, \(\epsilon ^{100}\) and \(\epsilon ^{1000}\) are the means of 100 and 1000 samples from the posterior distribution. Since we are taking the mean of samples, the proposed approach with \(\epsilon ^{100}\) and \(\epsilon ^{1000}\) have shown better results when compared to existing state-of-the-art approaches.
3.2 Results on Cardiac Dataset
In this subsection, we discuss results of our approach on the cardiac MRI images [14] and compare it with state-of-the-art approaches. The dataset consists of short axis cardiac MR images and the ground truth of their left ventricles endocardial and epicardial segmentations. The cardiac magnetic resonance images are acquired from 33 subjects. Each subject’s sequence consist of 20 time points and 8–15 slices along the long axis.
We considered slices of \(256 \times 256\) voxels of one subject and resized it to \(150 \times 150\) pixels. In order to generate moving images with different degrees of geometric distortions, we applied deformable or elastic distortion of order 0.8. We have generated 1000 distorted images for a single slice giving a total of 10000 image pairs. For the validation purpose, we split the dataset into training and test sets of size 5000 and 3000 image pairs, respectively. We first show the visual results of few challenging cases as shown in Fig. 4 and then compare the reference image with estimated registered images using our approach as well as recent state-of-the-art approaches for the test data of 3000 image pairs. We show visual comparison results in Fig. 5 while calculated average values of similarity metrics are as listed in the Table 2.
4 Conclusion
We propose Bayesian deep learning approach for deformable medical image registration. Unlike existing deep learning frameworks, the proposed approach learns posterior distribution over the weights by backpropagation. Experimental results demonstrate the efficacy of proposed architecture when compared to state-of-the-art approaches.
References
Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)
Jahne, B.: Displacement vector fields. In: Jahne, B. (ed.) Digital Image Processing, pp. 297–317. Springer, Heidelberg (1991). https://doi.org/10.1007/978-3-662-11565-7_16
Wu, G., Kim, M., Wang, Q., Munsell, B.C., Shen, D.: Scalable high-performance image registration framework by unsupervised deep feature representations learning. IEEE Trans. Biomed. Eng. 63(7), 1505–1516 (2016)
Li, H., Fan, Y.: Non-rigid image registration using self-supervised fully convolutional networks without training data. In: IEEE International Symposium on Biomedical Imaging (ISBI), pp. 1075–1078 (2018)
Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K., : Spatial transformer networks. In: Proceedings of the 28th Neural Information Processing Systems (NIPS), pp. 2017–2025. MIT Press (2015)
de Vos, B.D., et al.: End-to-end unsupervised deformable image registration with a convolutional neural network. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 204–212. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_24
Mahapatra, D., et al.,: Deformable medical image registration using generative adversarial networks. In: IEEE Conference on International Symposium on Biomedical Imaging (ISBI), pp. 1449–1453 (2018)
Balakrishnan, G., et al.,: An unsupervised learning model for deformable medical image registration. In: IEEE Conference Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, pp. 8340–8348 (2018)
Blundell, C., et al.: Weight uncertainty in neural networks. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 1613–1622. JMLR.org, France (2015)
Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
Rasmussen, C.E., Christopher, K.I.: Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning). The MIT Press, Cambridge (2005)
Castillo, R., et al.: A framework for evaluation of deformable image registration spatial accuracy using large landmark point sets. Phys. Med. Biol. 54(7), 1849–1870 (2009)
Andreopoulos, A., Tsotsos, J.K.: Efficient and generalizable statistical models of shape and appearance for analysis of cardiac MRI. Med. Image Anal. 12(3), 335–357 (2008)
Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Trans. Pattern Anal. Mach. Intell. 15(9), 850–863 (1993)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Deshpande, V.S., Bhatt, J.S. (2019). Bayesian Deep Learning for Deformable Medical Image Registration. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11942. Springer, Cham. https://doi.org/10.1007/978-3-030-34872-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-34872-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34871-7
Online ISBN: 978-3-030-34872-4
eBook Packages: Computer ScienceComputer Science (R0)