Deformable Cross-Attention Transformer for Medical Image Registration

Chen, Junyu; Liu, Yihao; He, Yufan; Du, Yong

doi:10.1007/978-3-031-45673-2_12

Junyu Chen¹²,
Yihao Liu¹³,
Yufan He¹⁴ &
…
Yong Du¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14348))

Included in the following conference series:

International Workshop on Machine Learning in Medical Imaging

1456 Accesses
6 Citations

Abstract

Transformers have recently shown promise for medical image applications, leading to an increasing interest in developing such models for medical image registration. Recent advancements in designing registration Transformers have focused on using cross-attention (CA) to enable a more precise understanding of spatial correspondences between moving and fixed images. Here, we propose a novel CA mechanism that computes windowed attention using deformable windows. In contrast to existing CA mechanisms that require intensive computational complexity by either computing CA globally or locally with a fixed and expanded search window, the proposed deformable CA can selectively sample a diverse set of features over a large search window while maintaining low computational complexity. The proposed model was extensively evaluated on multi-modal, mono-modal, and atlas-to-patient registration tasks, demonstrating promising performance against state-of-the-art methods and indicating its effectiveness for medical image registration. The source code for this work is available at http://bit.ly/47HcEex.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://brain-development.org/ixi-dataset/.

References

Balakrishnan, G., Zhao, A., Sabuncu, M.R., Guttag, J., Dalca, A.V.: Voxelmorph: a learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 38(8), 1788–1800 (2019)
Article Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: cross-attention multi-scale vision transformer for image classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 357–366 (2021)
Google Scholar
Chen, J., Frey, E.C., Du, Y.: Unsupervised learning of diffeomorphic image registration via transmorph. In: Hering, A., Schnabel, J., Zhang, M., Ferrante, E., Heinrich, M., Rueckert, D. (eds.) WBIR 2022. LNCS, vol. 13386, pp. 96–102. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-11203-4_11
Chapter Google Scholar
Chen, J., Frey, E.C., He, Y., Segars, W.P., Li, Y., Du, Y.: Transmorph: transformer for unsupervised medical image registration. Med. Image Anal. 82, 102615 (2022)
Article Google Scholar
Chen, J., He, Y., Frey, E., Li, Y., Du, Y.: ViT-V-Net: vision transformer for unsupervised volumetric medical image registration. In: Medical Imaging with Deep Learning (2021)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Gousias, I.S., et al.: Magnetic resonance imaging of the newborn brain: manual segmentation of labelled atlases in term-born and preterm infants. Neuroimage 62(3), 1499–1509 (2012)
Article Google Scholar
Hering, A., et al.: Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. IEEE Trans. Med. Imaging 42(3), 697–712 (2022)
Article Google Scholar
Huang, J., Xing, X., Gao, Z., Yang, G.: Swin deformable attention U-net transformer (SDAUT) for explainable fast MRI. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13436, pp. 538–548. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16446-0_51
Chapter Google Scholar
Kim, B., Kim, D.H., Park, S.H., Kim, J., Lee, J.G., Ye, J.C.: Cyclemorph: cycle consistent unsupervised deformable image registration. Med. Image Anal. 71, 102036 (2021)
Article Google Scholar
Kim, H.H., Yu, S., Yuan, S., Tomasi, C.: Cross-attention transformer for video interpolation. In: Proceedings of the Asian Conference on Computer Vision, pp. 320–337 (2022)
Google Scholar
Li, J., Chen, J., Tang, Y., Landman, B.A., Zhou, S.K.: Transforming medical imaging with transformers? A comparative review of key properties, current progresses, and future perspectives. arXiv preprint arXiv:2206.01136 (2022)
Liu, Y., Chen, J., Wei, S., Carass, A., Prince, J.: On finite difference jacobian computation in deformable image registration. arXiv preprint arXiv:2212.06060 (2022)
Liu, Y., Zuo, L., Han, S., Xue, Y., Prince, J.L., Carass, A.: Coordinate translator for learning deformable medical image registration. In: Li, X., Lv, J., Huo, Y., Dong, B., Leahy, R.M., Li, Q. (eds.) MMMI 2022. LNCS, vol. 13594, pp. 98–109. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-18814-5_10
Chapter Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Marcus, D.S., Wang, T.H., Parker, J., Csernansky, J.G., Morris, J.C., Buckner, R.L.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19(9), 1498–1507 (2007)
Article Google Scholar
Mok, T.C.W., Chung, A.C.S.: Conditional deformable image registration with convolutional neural network. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 35–45. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_4
Chapter Google Scholar
Mok, T.C., Chung, A.: Affine medical image registration with coarse-to-fine vision transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20835–20844 (2022)
Google Scholar
Mok, T.C.W., Chung, A.C.S.: Large deformation diffeomorphic image registration with laplacian pyramid networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 211–221. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_21
Chapter Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
Shi, J., et al.: XMorpher: full transformer for deformable medical image registration via cross attention. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, vol. 13436, pp. 217–226. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16446-0_21
Chapter Google Scholar
Siebert, H., Hansen, L., Heinrich, M.P.: Fast 3D registration with accurate optimisation and little learning for Learn2Reg 2021. In: Aubreville, M., Zimmerer, D., Heinrich, M. (eds.) MICCAI 2021. LNCS, vol. 13166, pp. 174–179. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-97281-3_25
Chapter Google Scholar
Song, X., et al.: Cross-modal attention for multi-modal image registration. Med. Image Anal. 82, 102612 (2022)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Xia, Z., Pan, X., Song, S., Li, L.E., Huang, G.: Vision transformer with deformable attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4794–4803 (2022)
Google Scholar
Xu, X., Wang, T., Yang, Y., Zuo, L., Shen, F., Shen, H.T.: Cross-modal attention with semantic consistence for image-text matching. IEEE Trans. Neural Netw. Learn. Syst. 31(12), 5412–5425 (2020)
Article Google Scholar
Zhang, Y., Pei, Y., Zha, H.: Learning dual transformer network for diffeomorphic registration. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 129–138. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_13
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Russell H. Morgan Department of Radiology and Radiological Science, Johns Hopkins Medical Institutes, Baltimore, MD, USA
Junyu Chen & Yong Du
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA
Yihao Liu
NVIDIA Corporation, Bethesda, MD, USA
Yufan He

Authors

Junyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yihao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yufan He
View author publications
You can also search for this author in PubMed Google Scholar
Yong Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junyu Chen .

Editor information

Editors and Affiliations

Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xiaohuan Cao
Rensselaer Polytechnic Institute, Troy, NY, USA
Xuanang Xu
Imperial College London, London, UK
Islem Rekik
ShanghaiTech University, Shanghai, China
Zhiming Cui
Shanghai United Imaging Intelligence Co., Ltd., Shanghai, China
Xi Ouyang

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 5332 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, J., Liu, Y., He, Y., Du, Y. (2024). Deformable Cross-Attention Transformer for Medical Image Registration. In: Cao, X., Xu, X., Rekik, I., Cui, Z., Ouyang, X. (eds) Machine Learning in Medical Imaging. MLMI 2023. Lecture Notes in Computer Science, vol 14348. Springer, Cham. https://doi.org/10.1007/978-3-031-45673-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-031-45673-2_12
Published: 15 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45672-5
Online ISBN: 978-3-031-45673-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Deformable Cross-Attention Transformer for Medical Image Registration