Abstract
Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNNs) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to understand spatial correspondence between features, a task in which the self-attention mechanism excels. This paper aims to develop a self-attention mechanism specifically for cross-modal image registration. Our proposed cross-modal attention block effectively maps each of the features in one volume to all features in the corresponding volume. Our experimental results demonstrate that a CNN network designed with the cross-modal attention block embedded outperforms an advanced CNN network 10 times of its size. We also incorporated visualization techniques to improve the interpretability of our network. The source code of our work is available at https://github.com/DIAL-RPI/Attention-Reg.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: VoxelMorph: a learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–1800 (2019)
Bashkanov, O., et al.: Learning multi-modal volumetric prostate registration with weak inter-subject spatial correspondence (2021)
Guo, H., Kruger, M., Xu, S., Wood, B.J., Yan, P.: Deep adaptive registration of multi-modal prostate images. Comput. Med. Imaging Graph. 84, 101769 (2020)
Haskins, G., et al.: Learning deep similarity metric for 3D MR-TRUS image registration. Int. J. Comput. Assist. Radiol. Surg. 14(3), 417–425 (2019)
Haskins, G., Kruger, U., Yan, P.: Deep learning in medical image registration: a survey. Mach. Vis. Appl. 31(1), 8 (2020)
Heinrich, M.P., et al.: MIND: modality independent neighbourhood descriptor for multi-modal deformable registration. Med. Image Anal. 16(7), 1423–1435 (2012)
Hu, Y., et al.: Weakly-supervised convolutional neural networks for multimodal image registration. Med. Image Anal. 49, 1–13 (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodality image registration by maximization of mutual information. IEEE TMI 16(2), 187–198 (1997)
Paszke, A., Gross, S., Chintala, S., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop Autodiff, pp. 1–4 (2017)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Sun, Y., Moelker, A., Niessen, W.J., van Walsum, T.: Towards robust CT-ultrasound registration using deep learning methods. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 43–51. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_5
Thomson, B.R., et al.: MR-to-US registration using multiclass segmentation of hepatic vasculature with a reduced 3D U-Net. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 275–284. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_27
de Vos, B.D., Berendsen, F.F., Viergever, M.A., Staring, M., Išgum, I.: End-to-end unsupervised deformable image registration with a convolutional neural network. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 204–212. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_24
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Wells, W.M., Viola, P., Atsumi, H., Nakajima, S., Kikinis, R.: Multi-modal volume registration by maximization of mutual information. Med. Image Anal. 1(1), 35–51 (1996)
Wu, G., Kim, M., Wang, Q., Gao, Y., Liao, S., Shen, D.: Unsupervised deep feature learning for deformable registration of MR brain images. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8150, pp. 649–656. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40763-5_80
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on CVPR, pp. 1492–1500 (2017)
Yan, P., Xu, S., Rastinehad, A.R., Wood, B.J.: Adversarial image registration with application for MR and trus image fusion. In: Shi, Y., Suk, H.-I., Liu, M. (eds.) MLMI 2018. LNCS, vol. 11046, pp. 197–204. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00919-9_23
Zhang, Y., Bi, J., Zhang, W., Du, H., Xu, Y.: Recent advances in registration methods for MRI-TRUS fusion image-guided interventions of prostate. Recent Patents Eng. 11(2), 115–124 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Song, X. et al. (2021). Cross-Modal Attention for MRI and Ultrasound Volume Registration. In: de Bruijne, M., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2021. MICCAI 2021. Lecture Notes in Computer Science(), vol 12904. Springer, Cham. https://doi.org/10.1007/978-3-030-87202-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-87202-1_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87201-4
Online ISBN: 978-3-030-87202-1
eBook Packages: Computer ScienceComputer Science (R0)