Abstract
Medical image registration is a fundamental task in computer-aided medical diagnosis. Recently, researchers have begun to use deep learning methods based on convolutional neural networks (CNN) for registration, and have made remarkable achievements in medical image registration. Although CNN based methods can provide rich local information on registration, their global modeling ability is weak to carry out the long distance information interaction and restrict the registration performance. The Transformer is originally used for sequence-to-sequence prediction. Now it also achieves great results in various visual tasks, due to its strong global modeling capability. Compared with CNN, Transformer can provide rich global information, in contrast, Transformer lacks of local information. To address Transformer lacks local information, we propose a hybrid network which is similar to U-Net to combine Transformer and CNN, to extract global and local information (at each level). Specifically, CNN is first used to obtain the feature maps of the image, and the Transformer is used as encoder to extract global information. Then the results obtained by Transformer encoding are connected to the upsampling process. The upsampling uses CNN to integrate local information and global information. Finally, the resolution is restored to the input image, and obtain the displacement field after several convolution layers. We evaluate our method on brain MRI scans. Experimental results demonstrate that our method improves the accuracy by 1% compared with the state-of-the-art approaches.
Similar content being viewed by others
Availability of data and material
The data is a public data set and it can be obtained https://www.oasis-brains.org/. We preprocessed the data.
Code Availability
References
Sotiras A, Davatzikos C, Paragios N (2013) Deformable medical image registration: A survey. IEEE transactions on medical imaging, 32, https://doi.org/10.1109/TMI.2013.2265603
Avants BB, Epstein CL, Grossman M, Gee JC (2008) Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain. Med Image Anal 12(1):26–41
Klein A, Andersson J, Ardekani BA, Ashburner J, Avants B, Chiang M-C, Christensen GE, Collins DL, Gee J, Hellier P, Song JH, Jenkinson M, Lepage C, Rueckert D, Thompson P, Vercauteren T, Woods RP, Mann JJ, Parsey RV (2009) Evaluation of 14 nonlinear deformation algorithms applied to human brain mri registration. NeuroImage 46(3):786–802. https://doi.org/10.1016/j.neuroimage.2008.12.037, https://www.sciencedirect.com/science/article/pii/S1053811908012974
Lorenzi M, Ayache N, Frisoni GB, Pennec X (2013) Lcc-demons: A robust and accurate symmetric diffeomorphic registration algorithm. NeuroImage 81:470–483. https://doi.org/10.1016/j.neuroimage.2013.04.114, https://www.sciencedirect.com/science/article/pii/S1053811913004825
Balakrishnan G, Zhao A, Sabuncu MR, Guttag J, Dalca AV (2019) Voxelmorph: A learning framework for deformable medical image registration. IEEE Trans Med Imaging, pp 1788–1800
Mok T, Chung A (2020) Fast symmetric diffeomorphic image registration with convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Zhao S, Lau T, Luo J, Chang E I-C, Xu Y (2020) Unsupervised 3d end-to-end medical image registration with volume tweening network. IEEE J Biomed Health Inf 24(5):1394–1404. https://doi.org/10.1109/JBHI.2019.2951024
Vos B, Berendsen FF, Viergever MA, Staring M, Igum I (2017) End-to-end unsupervised deformable image registration with a convolutional neural network. International Workshop on Deep Learning in Medical Image Analysis International Workshop on Multimodal Learning for Clinical Decision Support
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. Springer International Publishing, https://doi.org/10.1007/978-3-319-24574-4_28
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, p 5998?6008
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2020) An image is worth 16x16 words: Transformers for image recognition at scale
Wu H, Xiao B, Codella N, Liu M, Dai X, Yuan L, Zhang L (2021) Cvt: Introducing convolutions to vision transformers
Zhang Q, Yang Y (2021) Rest: An efficient transformer for visual recognition
Chen J, He Y, Frey EC, Li Y, Du Y (2021) Vit-v-net: Vision transformer for unsupervised volumetric medical image registration
Wang H, Zhu Y, Adam H, Yuille A, Chen L-C (2021) Max-deeplab: End-to-end panoptic segmentation with mask transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp 5463–5474
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers, pp 213–229. https://doi.org/10.1007/978-3-030-58452-8_13
Chen J, Lu Y, Yu Q, Luo X, Adeli E, Wang Y, Lu L, Yuille A, Zhou Y (2021) Transunet: Transformers make strong encoders for medical image segmentation
Jaderberg M, Simonyan K, Zisserman A (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025
Cao X, Yang J, Zhang J, Nie D, Kim M, Wang Q (2017) Deformable image registration based on similarity-steered cnn regression, vol 10433, pp 300–308
Rohe M-M, Datar M, Heimann T, Sermesant M, Pennec X (2017) Svf-net: Learning deformable image registration using shape matching, pp 266–274. https://doi.org/10.1007/978-3-319-66182-7_31
Krebs J, Mansi T, Delingette H, Li P, Ghesu F, Miao S, Maier A, Ayache N, Liao R, Kamen A (2017) Robust non-rigid registration through agent-based action learning, pp 344–352. https://doi.org/10.1007/978-3-319-66182-7_40
Yang F, Yang H, Fu J, Lu H, Guo B (2020) Learning texture transformer network for image super-resolution
Zeng Y, Fu J, Chao H (2020) Learning joint spatial-temporal transformations for video inpainting, pp 528–543. https://doi.org/10.1007/978-3-030-58517-4_31
Cao H, Wang Y, Chen J, Jiang D, Zhang X, Tian Q, Wang M (2021) Swin-unet: Unet-like pure transformer for medical image segmentation
Ulyanov D, Vedaldi A, Lempitsky V (2016) Instance normalization: The missing ingredient for fast stylization
Ba J, Kiros J, Hinton G (2016) Layer normalization
Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL (2007) Open access series of imaging studies (oasis): Cross-sectional mri data in young, middle aged, nondemented, and demented older adults. J Cogn Neurosci 19(9):1498–1507
Fischl B (2012) Freesurfer. NeuroImage (Orlando, Fla.) 62(2):774–781
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Kopf A, Yang E, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S (2019) Pytorch: An imperative style, high-performance deep learning library. In: Wallach H, Larochelle H, Beygelzimer A, d’ Alché-Buc F, Fox E, Garnett R (eds) advances in neural information processing systems. https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf, vol 32. Curran Associates, Inc.
Kingma DP, Ba JL (2015) Adam: A method for stochastic optimization
Dice LR (1945) Measures of the amount of ecologic association between species. Ecology (Durham) 26(3):297–302
Avants BB, Tustison NJ, Song G, Cook PA, Klein A, Gee JC (2011) A reproducible evaluation of ants similarity metric performance in brain image registration. NeuroImage (Orlando, Fla.) 54(3):2033–2044
Acknowledgements
This work was supported by the National Nature Science Foundation of China [grant number61772226,61862056]; Science and Technology Development Program of Jilin Province [grant number 20210204133YY]; The Natural Science Foundation of Jilin Province (Grant number No. 20200201159JC), Key Laboratory for Symbol Computation and Knowledge Engineering of the National Education Ministry of China,Jilin University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Lei Song, Guixia Liu and Mingrui Ma contributed equally to this work.
Rights and permissions
About this article
Cite this article
Song, L., Liu, G. & Ma, M. TD-Net:unsupervised medical image registration network based on Transformer and CNN. Appl Intell 52, 18201–18209 (2022). https://doi.org/10.1007/s10489-022-03472-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03472-w